Updated on 2025.11.21

Usage instructions: here

Manipulation

Publish Date Title Authors PDF Code
2025-11-20 Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations Homanga Bharadhwaj Team 2511.16661 null
2025-11-20 InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy Jiangmiao Pang Team 2511.16651 null
2025-11-20 Toward Artificial Palpation: Representation Learning of Touch on Soft Bodies Aviv Tamar Team 2511.16596 null
2025-11-20 Green Resilience of Cyber-Physical Systems: Doctoral Dissertation Diaeddin Rimawi Team 2511.16593 null
2025-11-20 VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference Bo Zhao Team 2511.16449 null
2025-11-20 Graph Neural Networks for Surgical Scene Segmentation Danail Stoyanov Team 2511.16430 null
2025-11-20 LAOF: Robust Latent Action Learning with Optical Flow Constraints Wei Li Team 2511.16407 link
2025-11-20 Homogeneous Proportional-Integral-Derivative Controller in Mobile Robotic Manipulators Andrey Polyakov Team 2511.16406 null
2025-11-20 Robot Metacognition: Decision Making with Confidence for Tool Invention Pablo Lanillos Team 2511.16390 null
2025-11-20 Beyond Generative AI: World Models for Clinical Prediction, Counterfactuals, and Planning Mohammad Yaqub Team 2511.16333 null
2025-11-20 Safe and Optimal Variable Impedance Control via Certified Reinforcement Learning Ravi Prakash Team 2511.16330 null
2025-11-20 InEKFormer: A Hybrid State Estimator for Humanoid Robots Frank Kirchner Team 2511.16306 null
2025-11-20 DynaMimicGen: A Data Generation Framework for Robot Learning of Dynamic Tasks Anna Valente Team 2511.16223 null
2025-11-20 When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models Yaochu Jin Team 2511.16203 null
2025-11-20 Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight Zhijie Deng Team 2511.16175 null
2025-11-20 EvoVLA: Self-Evolving Vision-Language-Action Model Hao Tang Team 2511.16166 null
2025-11-20 MagBotSim: Physics-Based Simulation and Reinforcement Learning Environments for Magnetic Robotics Klaus Neumann Team 2511.16158 null
2025-11-20 Real-Time 3D Object Detection with Inference-Aligned Learning Nan Xue Team 2511.16140 null
2025-11-20 Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers Yuki Uranishi Team 2511.16050 null
2025-11-20 PushingBots: Collaborative Pushing via Neural Accelerated Combinatorial Hybrid Optimization Meng Guo Team 2511.15995 null
2025-11-19 Optimus-Q: Utilizing Federated Learning in Adaptive Robots for Intelligent Nuclear Power Plant Operations through Quantum Cryptography Sajedul Talukder Team 2511.15614 null
2025-11-19 SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models Xipeng Qiu Team 2511.15605 null
2025-11-19 Learning from Mistakes: Loss-Aware Memory Enhanced Continual Learning for LiDAR Place Recognition Tiantian Feng Team 2511.15597 null
2025-11-19 NMPC-based Motion Planning with Adaptive Weighting for Dynamic Object Interception Steven Liu Team 2511.15532 null
2025-11-19 Decentralized Gaussian Process Classification and an Application in Subsea Robotics James McMahon Team 2511.15529 null
2025-11-19 Theoretical Closed-loop Stability Bounds for Dynamical System Coupled with Diffusion Policies François Ferland Team 2511.15520 null
2025-11-19 IPR-1: Interactive Physical Reasoner Yong-Lu Li Team 2511.15407 null
2025-11-19 Platform-Agnostic Reinforcement Learning Framework for Safe Exploration of Cluttered Environments with Graph Attention George Nikolakopoulos Team 2511.15358 null
2025-11-19 Adversarial Attack on Black-Box Multi-Agent by Adaptive Perturbation Fanjiang Xu Team 2511.15292 null
2025-11-19 Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments Moharram Challenger Team 2511.15284 null
2025-11-19 Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception Wenzhao Lian Team 2511.15279 null
2025-11-19 Behavior Trees vs Executable Ontologies: a Comparative Analysis of Robot Control Paradigms Alexander Boldachev Team 2511.15274 null
2025-11-19 Symmetry-Breaking in Multi-Agent Navigation: Winding Number-Aware MPC with a Learned Topological Strategy Tadashi Kozuno Team 2511.15239 null
2025-11-19 Efficient Transformer-Integrated Deep Neural Architectures for Robust EEG Decoding of Complex Visual Imagery Byoung-Hee Kwon Team 2511.15218 null
2025-11-19 VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation Yuke Zhu Team 2511.15200 link
2025-11-19 Eq.Bot: Enhance Robotic Manipulation Learning via Group Equivariant Canonicalization Zhenzhou Shao Team 2511.15194 null
2025-11-19 Learning Depth from Past Selves: Self-Evolution Contrast for Robust Depth Estimation Yong Huang Team 2511.15167 null
2025-11-19 An Alignment-Based Approach to Learning Motions from Demonstrations Julie A Shah Team 2511.14988 null
2025-11-18 Automated laboratory x-ray diffractometer and fluorescence spectrometer for high-throughput materials characterization Todd C. Hufnagel Team 2511.14905 link
2025-11-19 $π^{*}_{0.6}$ : a VLA That Learns From Experience Zhiyuan Zhou Team 2511.14759 null
2025-11-18 HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation Xiaolong Wang Team 2511.14756 null
2025-11-18 Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language Andreea Bobu Team 2511.14565 null
2025-11-18 A Neuro-Symbolic Framework for Reasoning under Perceptual Uncertainty: Bridging Continuous Perception and Discrete Symbolic Planning Shengwen Yu Team 2511.14533 null
2025-11-18 Achieving Safe Control Online through Integration of Harmonic Control Lyapunov-Barrier Functions with Unsafe Object-Centric Action Policies Matthias Scheutz Team 2511.14434 null
2025-11-18 Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning Georgia Chalvatzaki Team 2511.14427 null
2025-11-18 Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning Hongpeng Wang Team 2511.14396 link
2025-11-18 MA-SLAM: Active SLAM in Large-Scale Unknown Environment using Map Aware Deep Reinforcement Learning Yi Jiang Team 2511.14330 null
2025-11-18 NeuralBoneReg: A Novel Self-Supervised Method for Robust and Accurate Multi-Modal Bone Surface Registration Philipp Fürnstahl Team 2511.14286 null
2025-11-18 Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion Fei Chen Team 2511.14178 null
2025-11-18 RoboTidy : A 3D Gaussian Splatting Household Tidying Benchmark for Embodied Navigation and Action Jiayu Chen Team 2511.14161 null
2025-11-18 AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models Biqing Qi Team 2511.14148 null
2025-11-17 From Power to Precision: Learning Fine-grained Dexterity for Multi-fingered Robotic Hands Xiaolong Wang Team 2511.13710 link
2025-11-17 OpenRoboCare: A Multimodal Multi-Task Expert Demonstration Dataset for Robot Caregiving Tapomayukh Bhattacharjee Team 2511.13707 null
2025-11-17 PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image Ziwei Liu Team 2511.13648 link
2025-11-17 Contact-Safe Reinforcement Learning with ProMP Reparameterization and Energy Awareness Luis Figueredo Team 2511.13459 null
2025-11-17 ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning Ruizhen Hu Team 2511.13327 null
2025-11-17 EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation Sven Behnke Team 2511.13312 null
2025-11-17 Robust Control Design Using a Hybrid-Gain Finite-Time Sliding-Mode Controller Fernando A. C. C. Fontes Team 2511.13260 null
2025-11-17 Difficulty-Aware Label-Guided Denoising for Monocular 3D Object Detection Dongbo Min Team 2511.13195 null
2025-11-17 Orientation-Free Neural Network-Based Bias Estimation for Low-Cost Stationary Accelerometers Itzik Klein Team 2511.13071 null
2025-11-17 Learning Branching Policies for MILPs with Proximal Policy Optimization Amal El Fallah Seghrouchni Team 2511.12986 null
2025-11-17 ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes Feng Zheng Team 2511.12977 null
2025-11-17 DiffuDepGrasp: Diffusion-based Depth Noise Modeling Empowers Sim2Real Robotic Grasping Dongbin Zhao Team 2511.12912 null
2025-11-17 Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views Hesheng Wang Team 2511.12878 null
2025-11-17 Structured Imitation Learning of Interactive Policies through Inverse Games Todd Murphey Team 2511.12848 link
2025-11-17 Mapping fNIRS Signals to Agent Performance: Toward Reinforcement Learning from Neural Feedback Jivko SInapov Team 2511.12844 null
2025-11-16 Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation Hongyang R. Zhang Team 2511.12779 null
2025-11-16 Task-Aware Morphology Optimization of Planar Manipulators via Reinforcement Learning Sohom Chakrabarty Team 2511.12650 null
2025-11-16 Botany Meets Robotics in Alpine Scree Monitoring Manolo Garabini Team 2511.12526 null
2025-11-16 RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation Long Chen Team 2511.12436 null
2025-11-16 VLA-R: Vision-Language Action Retrieval toward Open-World End-to-End Autonomous Driving David Hyunchul Shim Team 2511.12405 null
2025-11-14 Volumetric Ergodic Control Todd Murphey Team 2511.11533 null
2025-11-14 Terrain Costmap Generation via Scaled Preference Conditioning Joydeep Biswas Team 2511.11529 null
2025-11-14 Scalable Policy Evaluation with Video World Models Lin Yen-Chen Team 2511.11520 null
2025-11-14 Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities Jingyuan Chen Team 2511.11512 null
2025-11-14 Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective Ngan Le Team 2511.11478 null
2025-11-14 Simulating an Autonomous System in CARLA using ROS 2 Mohamed Al-Musleh Team 2511.11310 null
2025-11-14 Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation Xi Zheng Team 2511.11298 null
2025-11-14 Sashimi-Bot: Autonomous Tri-manual Advanced Manipulation and Cutting of Deformable Objects Ekrem Misimi Team 2511.11223 null
2025-11-14 Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning Xiaoyu Ren Team 2511.11218 null
2025-11-14 One-to-N Backdoor Attack in 3D Point Cloud via Spherical Trigger Chongxia Wang Team 2511.11210 null
2025-11-14 Viper-F1: Fast and Fine-Grained Multimodal Understanding with Cross-Modal State-Space Modulation Debesh Jha Team 2511.11177 null
2025-11-14 Phys-Liquid: A Physics-Informed Dataset for Estimating 3D Geometry and Volume of Transparent Deformable Liquids Tian Xia Team 2511.11077 link
2025-11-14 AdaptPNP: Integrating Prehensile and Non-Prehensile Skills for Adaptive Robotic Manipulation Lin Shao Team 2511.11052 null
2025-11-14 Autonomous Vehicle Path Planning by Searching With Differentiable Simulation Luc Van Gool Team 2511.11043 null
2025-11-14 Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment Yi Sun Team 2511.10987 null
2025-11-14 Collaborative Multi-Robot Non-Prehensile Manipulation via Flow-Matching Co-Generation Jiaoyang Li Team 2511.10874 null
2025-11-14 WetExplorer: Automating Wetland Greenhouse-Gas Surveys with an Autonomous Mobile Robot Xuping Zhang Team 2511.10864 null
2025-11-13 SURFACEBENCH: Can Self-Evolving LLMs Find the Equations of 3D Scientific Surfaces? Chandan K. Reddy Team 2511.10833 null
2025-11-13 Expert Consensus-based Video-Based Assessment Tool for Workflow Analysis in Minimally Invasive Colorectal Surgery: Development and Validation of ColoWorkflow Nicolas Padoy Team 2511.10766 null
2025-11-13 Attentive Feature Aggregation or: How Policies Learn to Stop Worrying about Robustness and Attend to Task-Relevant Visual Cues Chris Xiaoxuan Lu Team 2511.10762 null
2025-11-13 Robot Crash Course: Learning Soft and Stylized Falling Moritz Bächer Team 2511.10635 null
2025-11-13 OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Ziwei Liu Team 2511.10560 link
2025-11-13 SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation Liqiang Nie Team 2511.10518 link
2025-11-13 RoboBenchMart: Benchmarking Robots in Retail Environment Vlad Shakhuro Team 2511.10276 null
2025-11-13 Learning a Thousand Tasks in a Day Edward Johns Team 2511.10110 link
2025-11-13 Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning Xiaocong Li Team 2511.10087 null
2025-11-13 Physics-informed Machine Learning for Static Friction Modeling in Robotic Manipulators Based on Kolmogorov-Arnold Networks Yinghua Liu Team 2511.10079 null
2025-11-13 Efficient Verification and Falsification of ReLU Neural Barrier Certificates Bai Xue Team 2511.10015 null
2025-11-13 Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation Changbo Wang Team 2511.09958 null
2025-11-13 A Study on Enhancing the Generalization Ability of Visuomotor Policies via Data Augmentation Hanwen Wang Team 2511.09932 null
2025-11-13 Harnessing Bounded-Support Evolution Strategies for Policy Refinement Fabio Ramos Team 2511.09923 null
2025-11-13 Evolving Rules: Imitation and Best Response Learning in Cournot Oligopoly Boyu Zhang Team 2511.09839 null
2025-11-13 Provably Safe Stein Variational Clarity-Aware Informative Planning Dimitra Panagou Team 2511.09836 link)
2025-11-12 A Robust Task-Level Control Architecture for Learned Dynamical Systems Naira Hovakimyan Team 2511.09790 null
2025-11-12 Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy Peter R. Wurman Team 2511.09737 link
2025-11-12 Baby Sophia: A Developmental Approach to Self-Exploration through Self-Touch and Hand Regard Katerina Pastra Team 2511.09727 null
2025-11-12 SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning Haibo Hu Team 2511.09681 null
2025-11-12 Statistically Consistent Approximate Model Predictive Control Melanie N. Zeilinger Team 2511.09661 null
2025-11-12 IFG: Internet-Scale Guidance for Functional Grasping Generation Deepak Pathak Team 2511.09558 link
2025-11-12 SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation Gao Huang Team 2511.09555 link
2025-11-10 Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields Pieter Abbeel Team 2511.07418 link
2025-11-10 Robot Learning from a Physical World Model Yue Wang Team 2511.07416 link
2025-11-10 Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization Shalabh Bhatnagar Team 2511.07288 null
2025-11-10 SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation Ngan Le Team 2511.06754 null
2025-11-10 Physically-Grounded Goal Imagination: Physics-Informed Variational Autoencoder for Self-Supervised Reinforcement Learning Nam Pham Hai Team 2511.06745 null
2025-11-10 Rapidly Learning Soft Robot Control via Implicit Time-Stepping Dezhong Tong Team 2511.06667 link
2025-11-09 Real Garment Benchmark (RGBench): A Comprehensive Benchmark for Robotic Garment Manipulation featuring a High-Fidelity Scalable Simulator Ruigang Yang Team 2511.06434 null
2025-11-09 ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval Jeff Ichnowski Team 2511.06202 null
2025-11-08 Exploring Category-level Articulated Object Pose Tracking on SE(3) Manifolds Jun Liu Team 2511.05996 null
2025-11-08 Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills Renjing Xu Team 2511.05855 null
2025-11-08 VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models Aniket Bera Team 2511.05791 null
2025-11-07 VLM-driven Skill Selection for Robotic Assembly Tasks Chang-Hyun Kim Team 2511.05680 null
2025-11-07 EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation Samuel Dickerson Team 2511.05397 null
2025-11-07 ETHOS: A Robotic Encountered-Type Haptic Display for Social Interaction in Virtual Reality Matthew K. X. J. Pan Team 2511.05379 null
2025-11-07 Force-Safe Environment Maps and Real-Time Detection for Soft Robot Manipulators Andrew P. Sabelhaus Team 2511.05307 null
2025-11-07 Context-aware Learned Mesh-based Simulation via Trajectory-Level Meta-Learning Gerhard Neumann Team 2511.05234 null
2025-11-07 Let Me Show You: Learning by Retrieving from Egocentric Video for Robotic Manipulation Feifei Feng Team 2511.05199 null
2025-11-07 Follow-Me in Micro-Mobility with End-to-End Imitation Learning Jorge Peña Queralta Team 2511.05158 null
2025-11-07 TAPOM: Task-Space Topology-Guided Motion Planning for Manipulating Elongated Object in Cluttered Environments Yijiang Huang Team 2511.05052 null
2025-11-07 MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery Huazhe Xu Team 2511.05007 null
2025-11-06 Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning Gavriel State Team 2511.04831 link
2025-11-06 Unified Multimodal Diffusion Forcing for Forceful Manipulation Dmitry Berenson Team 2511.04812 link
2025-11-06 ReGen: Generative Robot Simulation via Inverse Design Daniela Rus Team 2511.04769 null
2025-11-06 X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations Kushal Kedia Team 2511.04671 null
2025-11-06 Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions Yunzhu Li Team 2511.04665 link
2025-11-06 ForeRobo: Unlocking Infinite Simulation Data for 3D Goal-driven Robotic Manipulation Chunsheng Liu Team 2511.04381 null
2025-11-06 GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies Cédric Buche Team 2511.04357 null
2025-11-06 Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots Mingguo Zhao Team 2511.03996 link
2025-11-05 Investigating Robot Control Policy Learning for Autonomous X-ray-guided Spine Procedures Mathias Unberath Team 2511.03882 null
2025-11-05 Going Beyond Expert Performance via Deep Implicit Imitation Reinforcement Learning Georgios Chalkiadakis Team 2511.03616 null
2025-11-05 Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances Georgios Chalkiadakis Team 2511.03565 null
2025-11-05 Development of the Bioinspired Tendon-Driven DexHand 021 with Proprioceptive Compliance Control Sheng Yi Team 2511.03481 null
2025-11-05 Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control Kensuke Harada Team 2511.03181 null
2025-11-05 Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning Feng Gao Team 2511.03167 null
2025-11-05 ISC-Perception: A Hybrid Computer Vision Dataset for Object Detection in Novel Steel Assembly Debra F. Laefer Team 2511.03098 null
2025-11-04 3D Cal: An Open-Source Software Library for Calibrating Tactile Sensors Gregory Reardon Team 2511.03078 null
2025-11-04 Audience Amplified: Virtual Audiences in Asynchronously Performed AR Theater Tobias Höllerer Team 2511.02807 null
2025-11-04 Dexterous Robotic Piano Playing at Scale Dieter Büchler Team 2511.02504 null
2025-11-04 LACY: A Vision-Language Model-based Language-Action Cycle for Self-Improving Robotic Manipulation Changhyun Choi Team 2511.02239 link
2025-10-31 A Step Toward World Models: A Survey on Robotic Manipulation Heng Tao Shen Team 2511.02097 null
2025-11-03 TRACE: Textual Reasoning for Affordance Coordinate Extraction Matthew S. Brown Team 2511.01999 null
2025-11-01 iFlyBot-VLA Technical Report Jia Pan Team 2511.01914 null
2025-11-03 SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation Georgia Chalvatzaki Team 2511.01501 null
2025-11-03 RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models Donglin Wang Team 2511.01331 null
2025-11-03 Improving Needle Penetration via Precise Rotational Insertion Using Iterative Learning Control Tsu-Chin Tsao Team 2511.01256 null
2025-11-03 Embodiment Transfer Learning for Vision-Language-Action Models Yaxin Peng Team 2511.01224 null
2025-11-02 Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment Nina Mahmoudian Team 2511.01083 null
2025-11-02 GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies Ruimao Zhang Team 2511.00998 link
2025-11-01 Improving Robustness to Out-of-Distribution States in Imitation Learning via Deep Koopman-Boosted Diffusion Policy Zhongliang Jiang Team 2511.00555 null
2025-10-31 EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations Philipp Wu Team 2511.00153 null
2025-10-31 Whole-Body Proprioceptive Morphing: A Modular Soft Gripper for Robust Cross-Scale Grasping Xiaonan Huang Team 2510.27666 null
2025-10-31 Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs Shinkyu Park Team 2510.27558 null
2025-10-31 When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making Nick Firoozye Team 2510.27334 null
2025-10-31 Learning Generalizable Visuomotor Policy through Dynamics-Alignment Jungwoo Lee Team 2510.27114 null
2025-10-30 Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation Qiaojun Yu Team 2510.26670 null
2025-10-31 An Impulse Control Approach to Market Making in a Hawkes LOB Market Philip Treleaven Team 2510.26438 null
2025-10-30 Human-in-the-loop Online Rejection Sampling for Robotic Manipulation Yansong Tang Team 2510.26406 null
2025-10-30 Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving Yandan Luo Team 2510.26292 null
2025-10-30 Learning to Manage Investment Portfolios beyond Simple Utility Functions J. Doyne Farmer Team 2510.26165 null
2025-10-28 A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation Kyung-Joong Kim Team 2510.25725 null
2025-10-29 Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning Florian T. Pokorny Team 2510.25405 null
2025-10-29 SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation Dan Guo Team 2510.25268 null
2025-10-29 Time-Optimal Transport of Loosely Placed Liquid Filled Cups along Prescribed Paths Andreas Mueller Team 2510.25255 null
2025-10-29 Hybrid Vision Servoing with Depp Alignment and GRU-Based Occlusion Recovery Jongseong Brad Choi Team 2510.25233 null
2025-10-29 Learning Spatial-Aware Manipulation Ordering Jian Pu Team 2510.25138 null
2025-10-29 NanoVLA: Routing Decoupled Vision-Language Understanding for Nano-sized Generalist Robotic Policies Jinghui Lu Team 2510.25122 null
2025-10-28 Fare: Failure Resilience in Learned Visual Navigation Control David Hsu Team 2510.24680 null
2025-10-28 Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning Arnold W. Schumann Team 2510.24650 null
2025-10-28 DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation Gang Hua Team 2510.24261 null
2025-10-28 Manipulate as Human: Learning Task-oriented Manipulation Skills by Adversarial Motion Priors Yue Gao Team 2510.24257 null
2025-10-28 Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames Aviv Tamar Team 2510.24194 null
2025-10-28 PFEA: An LLM-based High-Level Natural Language Planning and Feedback Embodied Agent for Human-Centered AI Philip Dames Team 2510.24109 null
2025-10-28 ZTRS: Zero-Imitation End-to-end Autonomous Driving with Trajectory Scoring Jose M. Alvarez Team 2510.24108 null
2025-10-28 Learning Parameterized Skills from Demonstrations George Konidaris Team 2510.24095 null
2025-10-28 Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation Jiashuo Bai Team 2510.24055 null
2025-10-27 Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments Giuseppe Loianno Team 2510.23928 null
2025-10-29 RoboOmni: Proactive Robot Manipulation in Omni-modal Context Xipeng Qiu Team 2510.23763 null
2025-10-27 RobotArena $\infty$ : Scalable Robot Benchmarking via Real-to-Sim Translation Katerina Fragkiadaki Team 2510.23571 link
2025-10-27 Optimal Dimensioning of Elastic-Link Manipulators regarding Lifetime Estimation Andreas Mueller Team 2510.23234 null
2025-10-27 Workspace Registration and Collision Detection for Industrial Robotics Applications Andreas Mueller Team 2510.23227 null
2025-10-27 Finding 3D Scene Analogies with Multimodal Foundation Models Young Min Kim Team 2510.23184 null
2025-10-27 ManiDP: Manipulability-Aware Diffusion Policy for Posture-Dependent Bimanual Manipulation Fei Chen Team 2510.23016 null
2025-10-26 Learning Neural Observer-Predictor Models for Limb-level Sampling-based Locomotion Planning Guoquan Huang Team 2510.22789 null
2025-10-26 Edge Collaborative Gaussian Splatting with Integrated Rendering and Communication Chengzhong Xu Team 2510.22718 null
2025-10-26 FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference Manjesh Kumar Hanawal Team 2510.22641 null
2025-10-25 A Novel Multi-Timescale Stability-Preserving Hierarchical Reinforcement Learning Controller Framework for Adaptive Control in High-Dimensional Dynamical Systems Benyamin Safizadeh Team 2510.22420 null
2025-10-25 ACG: Action Coherence Guidance for Flow-based VLA models Jaegul Choo Team 2510.22201 null
2025-10-25 RaycastGrasp: Eye-Gaze Interaction with Wearable Devices for Robotic Manipulation Yang Ye Team 2510.22113 null
2025-10-24 Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising Yinchuan Li Team 2510.21991 null
2025-10-27 On Uncertainty Calibration for Equivariant Functions Robin Walters Team 2510.21691 link
2025-10-24 Enhancing Tactile-based Reinforcement Learning for Robotic Control Sethu Vijayakumar Team 2510.21609 null
2025-10-24 Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos Baining Guo Team 2510.21571 link
2025-10-24 Learning Neural Control Barrier Functions from Expert Demonstrations using Inverse Constraint Learning Hussein Sibai Team 2510.21560 null
2025-10-24 Generalizable Hierarchical Skill Learning via Object-Centric Representation Robert Platt Team 2510.21121 null
2025-10-23 BioDet: Boosting Industrial Object Detection with Image Preprocessing Strategies Benjamin Busam Team 2510.21000 null
2025-10-23 SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing Axel Krieger Team 2510.20965 null
2025-10-23 GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation Xiaolong Wang Team 2510.20813 null
2025-10-23 FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation Yao Mu Team 2510.20774 link
2025-10-23 A Parameter-Linear Formulation of the Optimal Path Following Problem for Robotic Manipulator Andreas Mueller Team 2510.20496 null
2025-10-23 Dual Control Reference Generation for Optimal Pick-and-Place Execution under Payload Uncertainty Tom Lefebvre Team 2510.20483 null
2025-10-23 PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning Gerhard Neumann Team 2510.20406 null
2025-10-23 NeuralTouch: Neural Descriptors for Precise Sim-to-Real Tactile Robot Control Nathan F. Lepora Team 2510.20390 null
2025-10-23 MemER: Scaling Up Memory for Robot Control via Experience Retrieval Chelsea Finn Team 2510.20328 link
2025-10-22 Approximate Model Predictive Control for Microgrid Energy Management via Imitation Learning Bart De Schutter Team 2510.20040 null
2025-10-22 Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets Xuanmeng Zhang Team 2510.19944 link
2025-10-25 Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning Abhishek Gupta Team 2510.19495 null
2025-10-22 Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Baining Guo Team 2510.19400 link
2025-10-22 Using Temperature Sampling to Effectively Train Robot Learning Policies on Imbalanced Datasets Bernadette Bucher Team 2510.19373 null
2025-10-22 Imitation Learning Policy based on Multi-Step Consistent Integration Shortcut Model Jie Zhao Team 2510.19356 null
2025-10-22 Unified Reinforcement and Imitation Learning for Vision-Language Models Yueh-Hua Wu Team 2510.19307 link
2025-10-22 TARMAC: A Taxonomy for Robot Manipulation in Chemistry Jihong Zhu Team 2510.19289 null
2025-10-21 A Cross-Environment and Cross-Embodiment Path Planning Framework via a Conditional Diffusion Model Homayoun Najjaran Team 2510.19128 null
2025-10-21 Efficient Model-Based Reinforcement Learning for Robot Control via Online Learning Marco Hutter Team 2510.18518 null
2025-10-23 MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning Heng Yang Team 2510.18337 null
2025-10-21 MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation Li Fei-Fei Team 2510.18316 null
2025-10-20 Quality Over Quantity: Curating Contact-Based Robot Datasets Improves Learning Ian Abraham Team 2510.18137 null
2025-10-20 R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations Daniel S. Brown Team 2510.18085 null
2025-10-20 SPACeR: Self-Play Anchoring with Centralized Reference Models Wei Zhan Team 2510.18060 link
2025-10-20 RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation Ziwei Wang Team 2510.17640 null
2025-10-20 Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm Xiaoji Niu Team 2510.17604 null
2025-10-20 Plasma Shape Control via Zero-shot Generative Reinforcement Learning Wulyu Zhong Team 2510.17531 null
2025-10-20 A Generalization of Input-Output Linearization via Dynamic Switching Between Melds of Output Functions Antonio Franchi Team 2510.17448 null
2025-10-22 OmniVIC: A Self-Improving Variable Impedance Controller with Vision-Language In-Context Learning for Safe Robotic Manipulation Arash Ajoudani Team 2510.17150 link
2025-10-20 Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning Sihao Sun Team 2510.17143 null
2025-10-20 Learning to Design Soft Hands using Reward Models Sha Yi Team 2510.17086 null
2025-10-19 End-to-end Listen, Look, Speak and Act Chao Zhang Team 2510.16756 null
2025-10-18 MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation Ufuk Topcu Team 2510.16617 null
2025-10-18 Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making Jean-Michel Loubes Team 2510.16462 null
2025-10-18 Learning to Optimize Edge Robotics: A Fast Integrated Perception-Motion-Communication Approach Chengzhong Xu Team 2510.16424 null
2025-10-17 DeGrip: A Compact Cable-driven Robotic Gripper for Desktop Disassembly Minghui Zheng Team 2510.16231 null
2025-10-17 DexCanvas: Bridging Human Demonstrations and Robot Learning for Dexterous Manipulation Yiwen Lu Team 2510.15786 null
2025-10-22 VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation Bin He Team 2510.15530 null
2025-10-17 Exploring Conditions for Diffusion models in Robotic Control Taekyung Kim Team 2510.15510 link
2025-10-17 Perfect Prediction or Plenty of Proposals? What Matters Most in Planning for Autonomous Driving Joschka Boedecker Team 2510.15505 null
2025-10-17 Learning to Answer from Correct Demonstrations Nathan Srebro Team 2510.15464 null
2025-10-17 GaussGym: An open-source real-to-sim framework for learning locomotion from pixels Pieter Abbeel Team 2510.15352 null
2025-10-16 RM-RL: Role-Model Reinforcement Learning for Precise Robot Manipulation Jianfei Yang Team 2510.15189 null
2025-10-18 VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning Yunzhu Li Team 2510.14930 link
2025-10-16 SADCHER: Scheduling using Attention-based Dynamic Coalitions of Heterogeneous Robots in Real-Time Javier Alonso-Mora Team 2510.14851 link
2025-10-16 RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning Huazhe Xu Team 2510.14830 link
2025-10-16 Open TeleDex: A Hardware-Agnostic Teleoperation System for Imitation Learning based Dexterous Manipulation Shan An Team 2510.14771 null
2025-10-16 Accelerated Multi-Modal Motion Planning Using Context-Conditioned Diffusion Models Wilm Decré Team 2510.14615 null
2025-10-16 Restoring Noisy Demonstration for Imitation Learning With Diffusion Models Shao-Hua Sun Team 2510.14467 null
2025-10-16 Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning Yao Mu Team 2510.14300 null
2025-10-15 ViTacGen: Robotic Pushing with Vision-to-Touch Generation Shan Luo Team 2510.14117 null
2025-10-15 Optimistic Reinforcement Learning-Based Skill Insertions for Task and Motion Planning Bram Vanderborght Team 2510.14065 null
2025-10-17 CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations Kun Zhang Team 2510.14049 null
2025-10-15 LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models Xipeng Qiu Team 2510.13626 null
2025-10-15 Efficient Force and Stiffness Prediction in Robotic Produce Handling with a Piezoresistive Pressure Sensor Xiaobo Tan Team 2510.13616 link
2025-10-15 Active Tactile Exploration for Rigid Body Pose and Shape Estimation Michael Posa Team 2510.13595 null
2025-10-15 Tactile-Conditioned Diffusion Policy for Force-Aware Robotic Manipulation Jan Peters Team 2510.13324 null
2025-10-15 Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models Jingfeng Zhang Team 2510.13237 null
2025-10-15 Beyond Static LLM Policies: Imitation-Enhanced Reinforcement Learning for Recommendation Sen Wang Team 2510.13229 null
2025-10-15 VLA-0: Building State-of-the-Art VLAs with Zero Modification Fabio Ramos Team 2510.13054 null
2025-10-14 Development of a Linear Guide-Rail Testbed for Physically Emulating ISAM Operations Christopher Petersen Team 2510.13005 null
2025-10-14 Actron3D: Learning Actionable Neural Functions from Videos for Transferable Robotic Manipulation Stefan Leutenegger Team 2510.12971 null
2025-10-14 Learning to Grasp Anything by Playing with Random Toys Roei Herzig Team 2510.12866 null
2025-10-14 CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving Jiangtao Gong Team 2510.12560 null
2025-10-14 Automated Behavior Planning for Fruit Tree Pruning via Redundant Robot Manipulators: Addressing the Behavior Planning Challenge Bram Vanderborght Team 2510.12509 null
2025-10-14 Fast Visuomotor Policy for Robotic Manipulation Wenqiang Zhang Team 2510.12483 null
2025-10-14 Robot Learning: A Tutorial Michel Aractingi Team 2510.12403 null
2025-10-14 Improving Generative Behavior Cloning via Self-Guidance and Adaptive Chunking Eunhyeok Park Team 2510.12392 null
2025-10-14 Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications Sungjoon Choi Team 2510.12215 link
2025-10-13 Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation Mac Schwager Team 2510.11689 null
2025-10-14 ManiAgent: An Agentic Framework for General Robotic Manipulation Xudong Liu Team 2510.11660 null
2025-10-13 HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data Yanchao Yang Team 2510.11321 null
2025-10-13 FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks Alessandro Suglia Team 2510.11307 null
2025-10-13 DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation Zongqing Lu Team 2510.11258 null
2025-10-13 Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling Jingjing Liu Team 2510.11083 null
2025-10-13 Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey Badong Chen Team 2510.10903 null
2025-10-12 High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting Hua Zou Team 2510.10637 null
2025-10-12 Population-Coded Spiking Neural Networks for High-Dimensional Robotic Control Jeethu Sreenivas Amuthan Team 2510.10516 null
2025-10-12 Data-driven simulator of multi-animal behavior with unknown dynamics via offline and online reinforcement learning Yoshinobu Kawahara Team 2510.10451 null
2025-10-11 X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model Xianyuan Zhan Team 2510.10274 null
2025-10-11 A3RNN: Bi-directional Fusion of Bottom-up and Top-down Process for Developmental Visual Attention in Robots Tetsuya Ogata Team 2510.10221 null
2025-10-11 UF-RNN: Real-Time Adaptive Motion Generation Using Uncertainty-Driven Foresight Prediction Tetsuya Ogata Team 2510.10217 null
2025-10-15 Ctrl-World: A Controllable Generative World Model for Robot Manipulation Chelsea Finn Team 2510.10125 null
2025-10-10 VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation Caifeng Shan Team 2510.09607 link
2025-10-13 Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards Alireza Ramezani Team 2510.09543 null
2025-10-10 Autonomous Soft Robotic Guidewire Navigation via Imitation Learning Axel Krieger Team 2510.09497 null
2025-10-13 Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning Weitong Zhang Team 2510.09487 null
2025-10-13 Failure Prediction at Runtime for Generative Robot Policies Angela P. Schoellig Team 2510.09459 link
2025-10-10 Rate optimal learning of equilibria from data Giorgia Ramponi Team 2510.09325 null
2025-10-10 Glovity: Learning Dexterous Contact-Rich Manipulation via Spatial Wrench Feedback Teleoperation System Pai Zheng Team 2510.09229 null
2025-10-10 FM-IRL: Flow-Matching for Reward Modeling and Policy Regularization in Reinforcement Learning Ivor Tsang Team 2510.09222 null
2025-10-10 When a Robot is More Capable than a Human: Learning from Constrained Demonstrators Erdem Bıyık Team 2510.09096 null
2025-10-10 iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation Ziwei Wang Team 2510.09036 null
2025-10-09 Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation Yue Wang Team 2510.08807 null
2025-10-09 Geometry-aware Policy Imitation Sylvain Calinon Team 2510.08787 null
2025-10-09 Point and Go: Intuitive Reference Frame Reallocation in Mode Switching for Assistive Robotics M. Jagersand Team 2510.08753 null
2025-10-09 Agent Learning via Early Experience Yifan Wu Team 2510.08558 null
2025-10-09 R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation Jiwen Lu Team 2510.08547 link
2025-10-09 Unlocking 3D Affordance Segmentation with 2D Semantic Knowledge Wei Shen Team 2510.08316 null
2025-10-09 FastUMI-100K: Advancing Data-driven Robotic Manipulation with a Large-scale UMI-style Dataset Xuelong Li Team 2510.08022 null
2025-10-09 DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation Weibing Li Team 2510.07865 link
2025-10-09 Trajectory Conditioned Cross-embodiment Skill Transfer Bin Zhao Team 2510.07773 null
2025-10-11 Differentiable Particle Optimization for Fast Sequential Manipulation Zachary Kingston Team 2510.07674 null
2025-10-08 WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation Shanghang Zhang Team 2510.07313 null
2025-10-09 TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics Shanghang Zhang Team 2510.07181 null
2025-10-08 DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning Chen Lv Team 2510.06913 null
2025-10-07 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Weiran Yao Team 2510.06499 null
2025-10-07 EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model Zhaoxiang Zhang Team 2510.06207 link
2025-10-07 Differentiable Model Predictive Control on the GPU Thomas Lew Team 2510.06179 null
2025-10-07 Towards Autonomous Tape Handling for Robotic Wound Redressing Michael Yip Team 2510.06127 null
2025-10-07 Learning to Crawl: Latent Model-Based Reinforcement Learning for Soft Robotic Adaptive Locomotion Robin Chhabra Team 2510.05957 null
2025-10-07 VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation Badong Chen Team 2510.05827 null
2025-10-07 DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation Kuk-Jin Yoon Team 2510.05662 link
2025-10-07 Teaching Machines to Speak Using Articulatory Control Gopala Anumanchipalli Team 2510.05619 null
2025-10-07 Correlation-Aware Dual-View Pose and Velocity Estimation for Dynamic Robotic Manipulation Farrokh Janabi-Sharifi Team 2510.05536 null
2025-10-06 VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing Masayoshi Tomizuka Team 2510.05213 null
2025-10-06 Curiosity-Driven Co-Development of Action and Language in Robots Through Self-Exploration Jun Tani Team 2510.05013 null
2025-10-06 Hands-Free Heritage: Automated 3D Scanning for Cultural Heritage Digitization Arianna Traviglia Team 2510.04781 null
2025-10-06 MobRT: A Digital Twin-Based Framework for Scalable Learning in Mobile Manipulation Wenjie Song Team 2510.04592 null
2025-10-05 Reliable and Scalable Robot Policy Evaluation with Imperfect Simulators Anirudha Majumdar Team 2510.04354 null
2025-10-05 RAP: 3D Rasterization Augmented End-to-End Planning Alexandre Alahi Team 2510.04333 null
2025-10-04 NoTVLA: Narrowing of Dense Action Trajectories for Generalizable Robot Manipulation Chunhua Shen Team 2510.03895 null
2025-10-04 EmbodiSwap for Zero-Shot Robot Imitation Learning Yiannis Aloimonos Team 2510.03706 link
2025-10-04 Dissecting Larval Zebrafish Hunting using Deep Reinforcement Learning Trained RNN Agents Kanaka Rajan Team 2510.03699 null
2025-10-04 Learning to Act Through Contact: A Unified View of Multi-Task Robot Learning Majid Khadiv Team 2510.03599 null
2025-10-03 Warm-Starting Optimization-Based Motion Planning for Robotic Manipulators via Point Cloud-Conditioned Flow Matching Xiao Liang Team 2510.03460 null
2025-10-03 Mask2IV: Interaction-Centric Video Generation via Mask Trajectories Laura Sevilla-Lara Team 2510.03135 link
2025-10-03 Learning Stability Certificate for Robotics in Real-World Environments Zhe Shen Team 2510.03123 null
2025-10-06 Distributional Inverse Reinforcement Learning Anqi Wu Team 2510.03013 null
2025-10-03 Action Deviation-Aware Inference for Low-Latency Wireless Robots Seong-Lyun Kim Team 2510.02851 null
2025-10-03 Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data Nadia Figueroa Team 2510.02738 null
2025-10-02 A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models Hao Su Team 2510.02538 null
2025-10-02 U-LAG: Uncertainty-Aware, Lag-Adaptive Goal Retargeting for Robotic Manipulation Anujith Muraleedharan Team 2510.02526 null
2025-10-02 Beyond Imitation: Recovering Dense Rewards from Demonstrations Gholamreza Haffari Team 2510.02493 null
2025-10-02 ARMADA: Autonomous Online Failure Detection and Human Shared Control Empower Scalable Real-world Deployment and Adaptation Cewu Lu Team 2510.02298 null
2025-10-02 Do You Know Where Your Camera Is? View-Invariant Policy Learning with Camera Conditioning Matthew R. Walter Team 2510.02268 null
2025-10-02 GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning Bogdan Mazoure Team 2510.02180 null
2025-10-02 Fine-Tuning Flow Matching via Maximum Likelihood Estimation of Reconstructions Shihua Li Team 2510.02081 null
2025-10-02 Contrastive Representation Regularization for Vision-Language-Action Models Jinwoo Shin Team 2510.01711 null
2025-10-02 Symskill: Symbol and Skill Co-Invention for Data-Efficient and Real-Time Long-Horizon Manipulation Nadia Figueroa Team 2510.01661 link)
2025-10-02 FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models Bihan Wen Team 2510.01642 link
2025-10-02 MIMIC: Integrating Diverse Personality Traits for Better Game Testing Using Large Language Model Lili Wei Team 2510.01635 null
2025-10-02 ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations Yi Xu Team 2510.01607 link
2025-10-02 MiniBEE: A New Form Factor for Compact Bimanual Dexterity Matei Ciocarlie Team 2510.01603 null
2025-10-02 Predictive Preference Learning from Human Interventions Bolei Zhou Team 2510.01545 link
2025-10-02 Information Seeking for Robust Decision Making under Partial Observability Tsung-Wei Ke Team 2510.01531 link
2025-10-01 Online Hierarchical Policy Learning using Physics Priors for Robot Navigation in Unknown Environments Ahmed H. Qureshi Team 2510.01519 null
2025-10-01 Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets Ali Baheri Team 2510.01479 null
2025-10-01 AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation Pratap Tokekar Team 2510.01433 null
2025-10-01 How Well do Diffusion Policies Learn Kinematic Constraint Manifolds? Russ Tedrake Team 2510.01404 link
2025-10-01 Temporal Score Rescaling for Temperature Sampling in Diffusion and Flow Models Shubham Tulsiani Team 2510.01184 null
2025-10-01 Prometheus: Universal, Open-Source Mocap-Based Teleoperation System with Force Feedback for Dataset Collection in Robot Learning D. Tsetserukou Team 2510.01023 null
2025-10-01 On Discovering Algorithms for Adversarial Imitation Learning Pradeep Varakantham Team 2510.00922 null
2025-10-01 TubeDAgger: Reducing the Number of Expert Interventions with Stochastic Reach-Tubes Sophie A. Neubauer Team 2510.00906 null
2025-09-30 MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation Shanghang Zhang Team 2509.26642 null
2025-09-30 Learning from Hallucinating Critical Points for Navigation in Dynamic Environments Xuesu Xiao Team 2509.26513 null
2025-09-30 Anomaly detection for generic failure monitoring in robotic assembly, screwing and manipulation Kevin Haninger Team 2509.26308 null
2025-09-30 Noise-Guided Transport for Imitation Learning Alexandros Kalousis Team 2509.26294 null
2025-09-30 Reinforced Embodied Planning with Verifiable Reward for Real-World Robotic Manipulation Hao Chen Team 2509.25852 null
2025-10-01 Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies Li Cheng Team 2509.25822 null
2025-09-30 Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding Jiaojiao Fan Team 2509.25794 null
2025-09-30 SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling Wenbo Ding Team 2509.25756 null
2025-09-30 Best of Sim and Real: Decoupled Visuomotor Manipulation via Learning Control in Simulation and Perception in Real Yang Gao Team 2509.25747 null
2025-09-29 Boolean Satisfiability via Imitation Learning Xiangyu Xu Team 2509.25411 null
2025-09-29 Parallel Heuristic Search as Inference for Actor-Critic Reinforcement Learning Models Maxim Likhachev Team 2509.25402 null
2025-09-29 SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation Philipp Wu Team 2509.25358 null
2025-09-29 SRMP: Search-Based Robot Motion Planning Library Maxim Likhachev Team 2509.25352 null
2025-10-01 Curriculum Imitation Learning of Distributed Multi-Robot Policies Eduardo Montijano Team 2509.25097 null
2025-09-29 Annotation-Free One-Shot Imitation Learning for Multi-Step Manipulation Tasks Ruchi Choudhary Team 2509.24972 null
2025-09-29 MSG: Multi-Stream Generative Policies for Sample-Efficient Robotic Manipulation Abhinav Valada Team 2509.24956 null
2025-09-29 World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training Qing Zhang Team 2509.24948 null
2025-09-29 From Code to Action: Hierarchical Learning of Diffusion-VLM Policies Daniel Dijkman Team 2509.24917 null
2025-09-29 Quantifying Generalisation in Imitation Learning Odinaldo Rodrigues Team 2509.24784 null
2025-09-29 IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks Ville Kyrki Team 2509.24768 null
2025-09-29 Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering Daniele Pucci Team 2509.24697 null
2025-09-29 CEDex: Cross-Embodiment Dexterous Grasp Generation at Scale from Human-like Contact Representations Shan Luo Team 2509.24661 null
2025-09-29 U-DiT Policy: U-shaped Diffusion Transformers for Robotic Manipulation Zhongxue Gan Team 2509.24579 null
2025-09-29 Unlocking the Potential of Soft Actor-Critic for Imitation Learning Frank Kirchner Team 2509.24539 null
2025-09-29 Learning to Sample: Reinforcement Learning-Guided Sampling for Autonomous Vehicle Motion Planning Johannes Betz Team 2509.24313 null
2025-09-29 FreeAction: Training-Free Techniques for Enhanced Fidelity of Trajectory-to-Video Generation Minsu Cho Team 2509.24241 null
2025-09-29 ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning Yang You Team 2509.24219 null
2025-09-29 Preference-Based Long-Horizon Robotic Stacking with Multimodal Large Language Models Sethu Vijayakumar Team 2509.24163 null
2025-09-29 Memory Transfer Planning: LLM-driven Context-Aware Code Adaptation for Robot Manipulation Yang You Team 2509.24160 null
2025-09-28 Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress Kristen Grauman Team 2509.24129 null
2025-09-28 DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation Yuanpei Chen Team 2509.23829 null
2025-09-28 Control Your Robot: A Unified System for Robot Control and Policy Deployment Bingshan Hu Team 2509.23823 link
2025-09-30 Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse Ying Wen Team 2509.23778 null
2025-09-26 Pixel Motion Diffusion is What We Need for Robot Control Michael S. Ryoo Team 2509.22652 null
2025-09-26 VLA-Reasoner: Empowering Vision-Language-Action Models with Reasoning via Online Monte Carlo Tree Search Ziwei Wang Team 2509.22643 null
2025-09-26 Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Xing Sun Team 2509.22601 null
2025-09-26 EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation Liang Wang Team 2509.22578 null
2025-09-26 Learning to Ball: Composing Policies for Long-Horizon Basketball Moves C. Karen Liu Team 2509.22442 link
2025-09-26 EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer Guan Huang Team 2509.22407 null
2025-09-26 ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation Yang Yu Team 2509.22402 null
2025-09-26 RoboView-Bias: Benchmarking Visual Bias in Embodied Agents for Robotic Manipulation Shuchao Pang Team 2509.22356 null
2025-09-26 DemoGrasp: Universal Dexterous Grasping from a Single Demonstration Zongqing Lu Team 2509.22149 null
2025-09-26 Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation Chang Xu Team 2509.22093 null
2025-09-26 Teaching Transformers to Solve Combinatorial Problems through Efficient Trial & Error Christos Tzamos Team 2509.22023 null
2025-09-26 WAVE: Worm Gear-based Adaptive Variable Elasticity for Decoupling Actuators from External Forces Kazutoshi Tanaka Team 2509.21878 null
2025-09-26 Learning Multi-Skill Legged Locomotion Using Conditional Adversarial Motion Priors Qinchuan Li Team 2509.21810 null
2025-09-26 The Turkish Ice Cream Robot: Examining Playful Deception in Social Human-Robot Interactions Matthew Pan Team 2509.21776 link
2025-09-25 Generating Stable Placements via Physics-guided Diffusion Models Jonathan Kelly Team 2509.21664 null
2025-09-25 Inverse Reinforcement Learning Using Just Classification and a Few Regressions Aurélien Bibaut Team 2509.21172 null
2025-09-25 ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation Kui Jia Team 2509.20841 link
2025-09-25 Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations Weiming Zhi Team 2509.20703 null
2025-09-24 Large Pre-Trained Models for Bimanual Manipulation in 3D David Meger Team 2509.20579 null
2025-09-24 Selective Progress-Aware Querying for Human-in-the-Loop Reinforcement Learning Anamika J H Team 2509.20541 null
2025-09-26 mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies Shiwei Sheng Team 2509.20297 null
2025-09-24 Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving Xianpeng Lang Team 2509.20109 null
2025-09-24 LLM Trainer: Automated Robotic Data Generating via Demonstration Augmentation using LLMs Amir Barati Farimani Team 2509.20070 null
2025-09-25 Generalist Robot Manipulation beyond Action Labeled Data Danda Pani Paudel Team 2509.19958 null
2025-09-24 SAGE:State-Aware Guided End-to-End Policy for Multi-Stage Sequential Tasks via Hidden Markov Decision Process JingYuan Wang Team 2509.19853 null
2025-09-24 TopoCut: Learning Multi-Step Cutting with Spectral Rewards and Discrete Diffusion Policies Animesh Garg Team 2509.19712 null
2025-09-24 RoboSSM: Scalable In-context Imitation Learning via State-Space Models Peter Stone Team 2509.19658 null
2025-09-23 EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data Danfei Xu Team 2509.19626 null
2025-09-23 From Space to Time: Enabling Adaptive Safety with Learned Value Functions via Disturbance Recasting Sylvia L. Herbert Team 2509.19597 null
2025-09-23 Agentic Scene Policies: Unifying Space, Semantics, and Affordances for Robot Action Liam Paull Team 2509.19571 link
2025-09-23 Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation Chi-Guhn Lee Team 2509.19524 null
2025-09-23 Self-evolved Imitation Learning in Simulated World Zhihe Lu Team 2509.19460 null
2025-09-23 ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation Daniel Seita Team 2509.19454 null
2025-09-23 SOE: Sample-Efficient Robot Policy Self-Improvement via On-Manifold Exploration Cewu Lu Team 2509.19292 null
2025-09-23 Imitation-Guided Bimanual Planning for Stable Manipulation under Changing External Forces Arash Ajoudani Team 2509.19261 null
2025-09-23 FUNCanon: Learning Pose-Aware Action Primitives via Functional Object Canonicalization for Generalizable Robotic Manipulation Jianwei Zhang Team 2509.19102 link
2025-09-23 World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation Dongbin Zhao Team 2509.19080 null
2025-09-23 ManipForce: Force-Guided Policy Learning with Frequency-Aware Representation for Contact-Rich Manipulation Kyoobin Lee Team 2509.19047 null
2025-09-23 Eva-VLA: Evaluating Vision-Language-Action Models’ Robustness Under Real-World Physical Variations Wen Yao Team 2509.18953 null
2025-09-23 Bi-VLA: Bilateral Control-Based Imitation Learning via Vision-Language Fusion for Action Generation Thanpimon Buamanee Team 2509.18865 null
2025-09-23 DexSkin: High-Coverage Conformable Robotic Skin for Learning Contact-Rich Manipulation Jiajun Wu Team 2509.18830 null
2025-09-23 VGGT-DP: Generalizable Robot Control via Vision Foundation Models Zhi Wang Team 2509.18778 null
2025-09-23 MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning Fares Abu-Dakka Team 2509.18757 link
2025-09-23 Learning Obstacle Avoidance using Double DQN for Quadcopter Navigation Sanket Gujar Team 2509.18734 null
2025-09-23 3D Flow Diffusion Policy: Visuomotor Policy Learning via Generating Flow in 3D Space Kyoobin Lee Team 2509.18676 null
2025-09-24 Do You Need Proprioceptive States in Visuomotor Policies? Yang Gao Team 2509.18644 link
2025-09-23 Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training Danfei Xu Team 2509.18631 null
2025-09-23 SINGER: An Onboard Generalist Vision-Language Navigation Policy for Drones Mac Schwager Team 2509.18610 null
2025-09-23 Growing with Your Embodied Agent: A Human-in-the-Loop Lifelong Code Generation Framework for Long-Horizon Manipulation Skills Alois Knoll Team 2509.18597 null
2025-09-23 A scaling law for large-deformation contact in soft materials Huajian Gao Team 2509.18581 null
2025-09-22 Robotic Skill Diversification via Active Mutation of Reward Functions in Reinforcement Learning During a Liquid Pouring Task Luka Peternel Team 2509.18463 null
2025-09-22 Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands Daniel Seita Team 2509.18455 null
2025-09-22 PrioriTouch: Adapting to User Contact Preferences for Whole-Arm Physical Human-Robot Interaction Tapomayukh Bhattacharjee Team 2509.18447 null
2025-09-22 ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces Zeyu Ren Team 2509.18084 link
2025-09-22 Prepare Before You Act: Learning From Humans to Rearrange Initial States Dylan P. Losey Team 2509.18043 null
2025-09-22 FinFlowRL: An Imitation-Reinforcement Learning Framework for Adaptive Stochastic Control in Finance Ruixun Zhang Team 2509.17964 null
2025-09-22 ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion Joydeep Biswas Team 2509.17941 link
2025-09-22 DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving Zhaoxiang Zhang Team 2509.17940 null
2025-09-23 RoboSeek: You Need to Interact with Your Objects Yatong Han Team 2509.17783 null
2025-09-22 MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies Yang Gao Team 2509.17759 null
2025-09-22 EigenSafe: A Spectral Framework for Learning-Based Stochastic Safety Filtering H. Jin Kim Team 2509.17750 null
2025-09-22 DINOv3-Diffusion Policy: Self-Supervised Large Visual Model for Visuomotor Diffusion Policy Learning Zidong Chen Team 2509.17684 null
2025-09-22 Learning Dexterous Manipulation with Quantized Hand State Cewu Lu Team 2509.17450 null
2025-09-22 Fast Trajectory Planner with a Reinforcement Learning-based Controller for Robotic Manipulators Hamidreza Kasaei Team 2509.17381 link
2025-09-21 Scalable Multi Agent Diffusion Policies for Coverage Control Alejandro Ribeiro Team 2509.17244 null
2025-09-21 Ratatouille: Imitation Learning Ingredients for Real-world Social Robot Navigation Timothy D. Barfoot Team 2509.17204 null
2025-09-21 MAST: Multi-Agent Spatial Transformer for Learning to Collaborate Alejandro Ribeiro Team 2509.17195 null
2025-09-21 Imagine2Act: Leveraging Object-Action Motion Consistency from Imagined Goals for Robotic Manipulation Hao Dong Team 2509.17125 null
2025-09-21 RoboManipBaselines: A Unified Framework for Imitation Learning in Robotic Manipulation across Real and Simulated Environments Yukiyasu Domae Team 2509.17057 null
2025-09-21 FILIC: Dual-Loop Force-Guided Imitation Learning with Impedance Torque Control for Contact-Rich Manipulation Tasks Guyue Zhou Team 2509.17053 null
2025-09-21 Generalized Momenta-Based Koopman Formalism for Robust Control of Euler-Lagrangian Systems Jishnu Keshavan Team 2509.17010 null
2025-09-21 End2Race: Efficient End-to-End Imitation Learning for Real-Time F1Tenth Racing Henry X. Liu Team 2509.16894 null
2025-09-20 Robot Learning with Sparsity and Scarcity Jingxi Xu Team 2509.16834 null
2025-09-19 Efficient Detection of Objects Near a Robot Manipulator via Miniature Time-of-Flight Sensors Michael Gleicher Team 2509.16122 null
2025-09-19 I-FailSense: Towards General Robotic Failure Detection with Vision-Language Models Mohamed Chetouani Team 2509.16072 null
2025-09-19 Compose by Focus: Scene Graph-based Atomic Skills Heng Yang Team 2509.16053 null
2025-09-19 Learning Safety for Obstacle Avoidance via Control Barrier Functions Calin A. Belta Team 2509.16037 null
2025-09-19 Improving Robotic Manipulation with Efficient Geometry-Aware Vision Encoder Ian Reid Team 2509.15880 link
2025-09-19 All-Electric Heavy-Duty Robotic Manipulator: Actuator Configuration Optimization and Sensorless Control Jouni Mattila Team 2509.15778 null
2025-09-19 GP3: A 3D Geometry-Aware Policy with Multi-View Images for Robotic Manipulation Deli Zhao Team 2509.15733 null
2025-09-19 Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference Yoshihiko Nakamura Team 2509.15717 null
2025-09-18 Implicit Kinodynamic Motion Retargeting for Human-to-humanoid Imitation Learning Haodong Zhang Team 2509.15443 null
2025-09-18 RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Xin Li Team 2509.15212 link
2025-09-18 Self-Improving Embodied Foundation Models Igor Mordatch Team 2509.15155 null
2025-09-18 A Nonlinear Scaling-based Design of Control Lyapunov-barrier Function for Relative Degree 2 Case and its Application to Safe Feedback Linearization Gyunghoon Park Team 2509.15071 null
2025-09-18 Reinforcement Learning Agent for a 2D Shooter Game Hamza A. A. Gardi Team 2509.15042 null
2025-09-19 Affordance-Based Disambiguation of Surgical Instructions for Collaborative Robot-Assisted Surgery Yasuhisa Hasegawa Team 2509.14967 null
2025-09-18 Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale Florian Walter Team 2509.14932 null
2025-09-18 exUMI: Extensible Robot Teaching System with Action-aware Task-agnostic Tactile Representation Yong-Lu Li Team 2509.14688 null
2025-09-18 SimCoachCorpus: A naturalistic dataset with language and trajectories for embodied teaching Guy Rosman Team 2509.14548 null
2025-09-18 Learning to Pick: A Visuomotor Policy for Clustered Strawberry Picking Chen Peng Team 2509.14530 null
2025-09-17 Learning Discrete Abstractions for Visual Rearrangement Tasks Using Vision-Guided Graph Coloring Constantinos Chamzas Team 2509.14460 null
2025-09-17 LeVR: A Modular VR Teleoperation Framework for Imitation Learning in Dexterous Manipulation Han Liu Team 2509.14349 null
2025-09-17 MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies Negar Mehr Team 2509.14159 null
2025-09-17 SeqVLA: Sequential Task Execution for Long-Horizon Manipulation with Completion-Aware Vision-Language-Action Model Yiming Feng Team 2509.14138 null
2025-09-17 PhysicalAgent: Towards General Cognitive Robotics with Foundation World Models Dzmitry Tsetserukou Team 2509.13903 null
2025-09-17 Dual-Actor Fine-Tuning of VLA Models: A Talk-and-Tweak Human-in-the-Loop Approach Yangwei You Team 2509.13774 null
2025-09-17 Motion Adaptation Across Users and Tasks for Exoskeletons via Meta-Learning Houcheng Li Team 2509.13736 null
2025-09-17 Reinforcement Learning for Robotic Insertion of Flexible Cables in Industrial Settings Changjoo Nam Team 2509.13731 null
2025-09-17 HGACNet: Hierarchical Graph Attention Network for Cross-Modal Point Cloud Completion I-Ming Chen Team 2509.13692 null
2025-09-16 TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning Yunqing Hu Team 2509.13579 null
2025-09-18 StageACT: Stage-Conditioned Imitation for Robust Humanoid Door Opening Shayegan Omidshafiei Team 2509.13200 null
2025-09-16 A Design Co-Pilot for Task-Tailored Manipulators Matthias Althoff Team 2509.13077 null
2025-09-16 Deep Learning for Model-Free Prediction of Thermal States of Robot Joint Motors Eric Guiffo Kaigom Team 2509.12739 null
2025-09-16 Safety filtering of robotic manipulation under environment uncertainty: a computational approach Martin Servin Team 2509.12674 null
2025-09-16 ActiveVLN: Towards Active Exploration via Multi-Turn RL in Vision-and-Language Navigation Feng Zheng Team 2509.12618 null
2025-09-16 Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling Donglin Wang Team 2509.12562 null
2025-09-16 Pre-trained Visual Representations Generalize Where it Matters in Model-Based Reinforcement Learning Sebastian W. Pattinson Team 2509.12531 null
2025-09-15 Geometric Red-Teaming for Robotic Manipulation Zackory Erickson Team 2509.12379 null
2025-09-15 Deceptive Risk Minimization: Out-of-Distribution Generalization by Deceiving Distribution Shift Detectors Anirudha Majumdar Team 2509.12081 null
2025-09-15 Imitation Learning as Return Distribution Matching Alberto Maria Metelli Team 2509.12026 null
2025-09-15 Gesture-Based Robot Control Integrating Mm-wave Radar and Behavior Trees Stephan Sigg Team 2509.12008 null
2025-09-15 Learning to Generate 4D LiDAR Sequences Wei Tsang Ooi Team 2509.11959 link
2025-09-15 Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning Tim Bradley Team 2509.11880 null
2025-09-15 Tenma: Robust Cross-Embodiment Robot Manipulation with Diffusion Transformer Luhui Hu Team 2509.11865 null
2025-09-17 TrajBooster: Boosting Humanoid Whole-Body Manipulation via Trajectory-Centric Learning Donglin Wang Team 2509.11839 null
2025-09-15 Inference-stage Adaptation-projection Strategy Adapts Diffusion Policy to Cross-manipulators Scenarios Alois Knoll Team 2509.11621 null
2025-09-15 RAPTOR: A Foundation Policy for Quadrotor Control Giuseppe Loianno Team 2509.11481 null
2025-09-17 Enhancing Generalization in Vision-Language-Action Models by Preserving Pretrained Representations Xuanlin Li Team 2509.11417 link
2025-09-14 ActivePose: Active 6D Object Pose Estimation and Tracking for Robotic Manipulation Yizhao Wang Team 2509.11364 null
2025-09-14 MEMBOT: Memory-Based Robot in Intermittent POMDP Eyan Noronha Team 2509.11225 null
2025-09-14 SAMP: Spatial Anchor-based Motion Policy for Collision-Aware Robotic Manipulators Jun Ma Team 2509.11185 null
2025-09-14 ManiVID-3D: Generalizable View-Invariant Reinforcement Learning for Robotic Manipulation via Disentangled 3D Representations Jun Ma Team 2509.11125 null
2025-09-16 FEWT: Improving Humanoid Robot Perception with Frequency-Enhanced Wavelet-based Transformers Zhigong Song Team 2509.11109 null
2025-09-14 End-to-End Visual Autonomous Parking via Control-Aided Attention Chen Feng Team 2509.11090 null
2025-09-14 FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design Rick Stevens Team 2509.11044 null
2025-09-13 ImMimic: Cross-Domain Imitation from Human Videos via Mapping and Interpolation Danfei Xu Team 2509.10952 null
2025-09-11 Self-Augmented Robot Trajectory: Efficient Imitation Learning via Safe Self-augmentation with Demonstrator-annotated Precision Yukiyasu Domae Team 2509.09893 null
2025-09-11 Off Policy Lyapunov Stability in Reinforcement Learning Daniela Constantinescu Team 2509.09863 null
2025-09-11 MimicDroid: In-Context Learning for Humanoid Robot Manipulation from Human Play Videos Yuke Zhu Team 2509.09769 null
2025-09-11 SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Ning Ding Team 2509.09674 null
2025-09-11 Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference-Scoped Exploration Wei Yang Team 2509.09671 null
2025-09-11 A Neuromorphic Incipient Slip Detection System using Papillae Morphology Benjamin Ward-Cherrier Team 2509.09546 null
2025-09-11 KoopMotion: Learning Almost Divergence Free Koopman Flow Fields for Motion Planning M. Ani Hsieh Team 2509.09074 null
2025-09-11 Joint Model-based Model-free Diffusion for Planning with Constraints Shreyas Kousik Team 2509.08775 null
2025-09-10 SocialNav-SUB: Benchmarking VLMs for Scene Understanding in Social Robot Navigation Peter Stone Team 2509.08757 link
2025-09-10 PegasusFlow: Parallel Rolling-Denoising Score Sampling for Robot Diffusion Planner Flow Matching Liang Ding Team 2509.08435 null
2025-09-10 Grasp Like Humans: Learning Generalizable Multi-Fingered Grasping from Human Proprioceptive Sensorimotor Integration Huimin Lu Team 2509.08354 null
2025-09-10 Input-gated Bilateral Teleoperation: An Easy-to-implement Force Feedback Teleoperation Method for Low-cost Hardware Tetsuya Ogata Team 2509.08226 null
2025-09-09 TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models Hao Zhao Team 2509.07962 link
2025-09-09 Graph-Fused Vision-Language-Action for Policy Reasoning in Multi-Arm Robotic Manipulation Yingbai Hu Team 2509.07957 null
2025-09-09 RaC: Robot Learning for Long-Horizon Tasks by Scaling Recovery and Correction Aviral Kumar Team 2509.07953 null
2025-09-09 Text2Touch: Tactile In-Hand Manipulation with LLM-Designed Reward Functions Nathan F. Lepora Team 2509.07445 null
2025-09-08 Quantum Machine Learning and Grover’s Algorithm for Quantum Optimization of Robotic Manipulators Howard Li Team 2509.07216 null
2025-09-08 Design of Input-Output Observers for a Population of Systems with Bounded Frequency-Domain Variation using $DK$ -iteration James Richard Forbes Team 2509.07201 null
2025-09-08 First Plan Then Evaluate: Use a Vectorized Motion Planner for Grasping Tucker Hermans Team 2509.07162 null
2025-09-08 Deep Reactive Policy: Learning Reactive Manipulator Motion Planning for Dynamic Environments Deepak Pathak Team 2509.06953 null
2025-09-10 LLaDA-VLA: Vision Language Diffusion Action Models Xiaoyan Sun Team 2509.06932 null
2025-09-08 Cortex-Synth: Differentiable Topology-Aware 3D Skeleton Synthesis with Hierarchical Graph Attention Mohamed Zayaan S Team 2509.06705 null
2025-09-08 Group Effect Enhanced Generative Adversarial Imitation Learning for Individual Travel Behavior Modeling under Incentives Zhenliang Ma Team 2509.06656 null
2025-09-08 Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster Pavan Ramdya Team 2509.06426 null
2025-09-07 O $^3$ Afford: One-Shot 3D Object-to-Object Affordance Grounding for Generalizable Robotic Manipulation Yen-Ling Kuo Team 2509.06233 link
2025-09-07 Robotic Manipulation Framework Based on Semantic Keypoints for Packing Shoes of Different Sizes, Shapes, and Softness Zhendong Dai Team 2509.06048 link
2025-09-06 TeleopLab: Accessible and Intuitive Teleoperation of a Robotic Manipulator for Remote Labs John Liu Team 2509.05547 null
2025-09-05 OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation Yu Xiang Team 2509.05513 null
2025-09-04 Long-Horizon Visual Imitation Learning via Plan and Code Reflection Yunde Jia Team 2509.05368 null
2025-09-08 Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework Ji-Rong Wen Team 2509.05007 null
2025-09-05 Imitation Learning Based on Disentangled Representation Learning of Behavioral Characteristics Toshiaki Tsuji Team 2509.04737 null
2025-09-04 Surformer v2: A Multimodal Classifier for Surface Understanding from Touch and Vision Noorbakhsh Amiri Golilarz Team 2509.04658 null
2025-09-04 Planning from Point Clouds over Continuous Actions for Multi-object Rearrangement David Held Team 2509.04645 link)
2025-09-04 Action Chunking with Transformers for Image-Based Spacecraft Guidance and Control Richard Linares Team 2509.04628 null
2025-09-04 In-Context Policy Adaptation via Cross-Domain Skill Diffusion Honguk Woo Team 2509.04535 null
2025-09-04 EMMA: Scaling Mobile Manipulation via Egocentric Human Data Danfei Xu Team 2509.04443 null
2025-09-04 Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models Donglin Wang Team 2509.04063 null
2025-09-04 FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction Jingtai Liu Team 2509.04018 null
2025-09-04 Weakly-Supervised Learning of Dense Functional Correspondences Jiajun Wu Team 2509.03893 link
2025-09-05 Learning Multi-Stage Pick-and-Place with a Legged Mobile Manipulator Wei Xu Team 2509.03859 null
2025-09-03 The Role of Embodiment in Intuitive Whole-Body Teleoperation for Mobile Manipulation Georgia Chalvatzaki Team 2509.03222 null
2025-09-03 Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback Daniel A. Braun Team 2509.03206 null
2025-09-03 Forbal: Force Balanced 2-5 Degree of Freedom Robot Manipulator Built from a Five Bar Linkage Matteo Bottin Team 2509.03119 null
2025-09-02 Generalizable Skill Learning for Construction Robots with Crowdsourced Natural Language Instructions, Composable Skills Standardization, and Large Language Model Carol C. Menassa Team 2509.02876 null
2025-09-02 Power Grid Control with Graph-Based Distributed Reinforcement Learning Marcello Restelli Team 2509.02861 null
2025-09-04 Plan Verification for LLM-Based Embodied Task Completion Agents Gokhan Tur Team 2509.02761 null
2025-09-02 Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots Bingyi Kang Team 2509.02530 link
2025-09-02 U-ARM : Ultra low-cost general teleoperation interface for robot manipulation Bo Zhao Team 2509.02437 null
2025-09-05 Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance Xuelong Li Team 2509.02055 null
2025-09-01 ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training Dieter Fox Team 2509.01819 null
2025-09-01 Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control Stefan Lee Team 2509.01765 null
2025-09-01 Fail2Progress: Learning from Real-World Robot Failures with Stein Variational Inference Tucker Hermans Team 2509.01746 null
2025-09-01 Articulated Object Estimation in the Wild Abhinav Valada Team 2509.01708 null
2025-09-01 Data Retrieval with Importance Weights for Few-Shot Imitation Learning Joey Hejna Team 2509.01657 null
2025-09-01 Disentangled Multi-Context Meta-Learning: Unlocking robust and Generalized Task Learning Seongil Hong Team 2509.01297 null
2025-08-31 One-Step Model Predictive Path Integral for Manipulator Motion Planning Using Configuration Space Distance Fields Kenji Kawashima Team 2509.00836 null
2025-08-31 An Effective Trajectory Planning and an Optimized Path Planning for a 6-Degree-of-Freedom Robot Manipulator Masahiko Mikawa Team 2509.00828 null
2025-08-31 Inverse Kinematics for a 6-Degree-of-Freedom Robot Manipulator Using Comprehensive Gröbner Systems Masahiko Mikawa Team 2509.00823 null
2025-08-30 Learning Dolly-In Filming From Demonstration Using a Ground-Based Robot Wenbin Li Team 2509.00574 null
2025-08-30 NeuralSVCD for Efficient Swept Volume Collision Detection Beomjoon Kim Team 2509.00499 null
2025-08-29 Can a mobile robot learn from a pedestrian model to prevent the sidewalk salsa? David Abbink Team 2508.21690 null
2025-08-29 Robust Convex Model Predictive Control with collision avoidance guarantees for robot manipulators Thomas B. Schön Team 2508.21677 null
2025-08-29 Learning Agile Gate Traversal via Analytical Optimal Policy Gradient Lin Zhao Team 2508.21592 null
2025-08-29 Estimated Informed Anytime Search for Sampling-Based Planning via Adaptive Sampler Alois Knoll Team 2508.21549 null
2025-08-29 Few-Shot Neuro-Symbolic Imitation Learning for Long-Horizon Planning and Acting Matthias Scheutz Team 2508.21501 null
2025-08-29 RoboInspector: Unveiling the Unreliability of Policy Code for LLM-enabled Robotic Manipulation Yuanchao Shu Team 2508.21378 null
2025-08-29 Dynamics-Compliant Trajectory Diffusion for Super-Nominal Payload Manipulation Alessandro Roncone Team 2508.21375 null
2025-08-29 Learning to Assemble the Soma Cube with Legal-Action Masked DQN and Safe ZYZ Regrasp on a Doosan M0609 Sawoong Kim Team 2508.21272 null
2025-08-28 Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation Davide Scaramuzza Team 2508.21065 null
2025-08-28 Rapid Mismatch Estimation via Neural Network Informed Variational Inference Nadia Figueroa Team 2508.21007 link
2025-08-29 UltraTac: Integrated Ultrasound-Augmented Visuotactile Sensor for Enhanced Robotic Perception Wenbo Ding Team 2508.20982 null
2025-08-28 Deep Fuzzy Optimization for Batch-Size and Nearest Neighbors in Optimal Robot Motion Planning Alois Knoll Team 2508.20884 null
2025-08-28 Learning Primitive Embodied World Models: Towards Scalable Robotic Learning Qinying Gu Team 2508.20840 null
2025-08-28 Non-expert to Expert Motion Translation Using Generative Adversarial Networks Seiichiro Katsura Team 2508.20740 null
2025-08-28 SimShear: Sim-to-Real Shear-based Tactile Servoing Nathan F. Lepora Team 2508.20561 null
2025-08-31 HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation Huazhe Xu Team 2508.20085 null
2025-08-28 Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation Donglin Wang Team 2508.19958 link
2025-08-28 Ego-centric Predictive Model Conditioned on Hand Trajectories Mike Zheng Shou Team 2508.19852 null
2025-08-27 APT*: Asymptotically Optimal Motion Planning via Adaptively Prolated Elliptical R-Nearest Neighbors Alois Knoll Team 2508.19790 null
2025-08-27 Impedance Primitive-augmented Hierarchical Reinforcement Learning for Sequential Tasks Jens Kober Team 2508.19607 null
2025-08-26 Gentle Object Retraction in Dense Clutter Using Multimodal Force Sensing and Imitation Learning Mark Cutkosky Team 2508.19476 null
2025-08-26 LaVA-Man: Learning Visual Action Representations for Robot Manipulation Changjae Oh Team 2508.19391 null
2025-08-26 Inference of Human-derived Specifications of Object Placement via Demonstration Julie A Shah Team 2508.19367 null
2025-08-26 MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation Gao Huang Team 2508.19236 link
2025-08-26 LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding Felix Heide Team 2508.19204 link
2025-08-27 AutoRing: Imitation Learning–based Autonomous Intraocular Foreign Body Removal Manipulation with Eye Surgical Robot Jian Wu Team 2508.19191 null
2025-08-28 From Tabula Rasa to Emergent Abilities: Discovering Robot Skills via Real-World Unsupervised Quality-Diversity Antoine Cully Team 2508.19172 null
2025-08-26 Playstyle and Artificial Intelligence: An Initial Blueprint Through the Lens of Video Games Chiu-Chou Lin Team 2508.19152 null
2025-08-26 AS2FM: Enabling Statistical Model Checking of ROS 2 Systems for Robust Autonomy Matteo Morelli Team 2508.18820 null
2025-08-26 HyperTASR: Hypernetwork-Driven Task-Aware Scene Representations for Robust Manipulation Yanchao Yang Team 2508.18802 null
2025-08-26 Deep Sensorimotor Control by Imitating Predictive Models of Human Motion Antonio Loquercio Team 2508.18691 link
2025-08-26 Integration of Robot and Scene Kinematics for Sequential Mobile Manipulation Planning Song-Chun Zhu Team 2508.18627 null
2025-08-25 PneuGelSight: Soft Robotic Vision-Based Proprioception and Tactile Sensing Wenzhen Yuan Team 2508.18443 null
2025-08-25 Maintenance automation: methods for robotics manipulation planning and execution Alexander Verl Team 2508.18399 null
2025-08-26 FlowVLA: Thinking in Motion with a Visual Chain of Thought Haoang Li Team 2508.18269 null
2025-08-25 No Need to Look! Locating and Grasping Objects by a Robot Arm Covered with Sensitive Skin Matej Hoffmann Team 2508.17986 null
2025-08-25 SEBVS: Synthetic Event-based Visual Servoing for Robot Navigation and Manipulation Bharatesh Chakravarthi Team 2508.17643 null
2025-08-25 GWM: Towards Scalable Gaussian World Models for Robotic Manipulation Siyuan Huang Team 2508.17600 link
2025-08-24 LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations Hao Su Team 2508.17547 null
2025-08-24 Variational Shape Inference for Grasp Diffusion on SE(3) Aniket Bera Team 2508.17482 null
2025-08-24 ReviBranch: Deep Reinforcement Learning for Branch-and-Bound with Revived Trajectories Jiaping Xiao Team 2508.17452 null
2025-08-24 Robotic Manipulation via Imitation Learning: Taxonomy, Evolution, Benchmark, and Challenges Liming Chen Team 2508.17449 null
2025-08-24 OVITA: Open-Vocabulary Interpretable Trajectory Adaptations Ravi Prakash Team 2508.17260 link
2025-08-24 4D Visual Pre-training for Robot Learning Huazhe Xu Team 2508.17230 null
2025-08-21 UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation Binbin Xu Team 2508.15972 link
2025-08-21 Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning Wenwu Zhu Team 2508.15874 null
2025-08-21 Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning Houqiang Li Team 2508.15327 null
2025-08-20 A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot Marcelo Becker Team 2508.14994 null
2025-08-19 Learning to Drive Ethically: Embedding Moral Reasoning into Autonomous Driving Ostap Okhrin Team 2508.14926 null
2025-08-20 FBI: Learning Dexterous In-hand Manipulation with Dynamic Visuotactile Shortcut Policy Cewu Lu Team 2508.14441 null
2025-08-20 Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations Na Li Team 2508.14383 null
2025-08-20 Action-Constrained Imitation Learning Ping-Chun Hsieh Team 2508.14379 null
2025-08-20 Learning Point Cloud Representations with Pose Continuity for Depth-Based Category-Level 6D Object Pose Estimation Ioannis Stamos Team 2508.14358 null
2025-08-19 Train Once, Deploy Anywhere: Realize Data-Efficient Dynamic Object Manipulation Hengshuang Zhao Team 2508.14042 null
2025-08-19 Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Jianye Hao Team 2508.13998 null
2025-08-19 Toward Deployable Multi-Robot Collaboration via a Symbolically-Guided Decision Transformer Paul Asunda Team 2508.13877 null
2025-08-18 Decoding Communications with Partial Information Peter McBurney Team 2508.13326 null
2025-08-18 Precise Action-to-Video Generation Through Visual Action Prompts Ruizhen Hu Team 2508.13104 link
2025-08-18 Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy Zhi Hou Team 2508.13103 null
2025-08-18 Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey Liqiang Nie Team 2508.13073 link
2025-08-18 PROD: Palpative Reconstruction of Deformable Objects through Elastostatic Signed Distance Functions Hamza El-Kebir Team 2508.12554 null
2025-08-17 EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos Hesheng Wang Team 2508.12349 null
2025-08-17 Bimanual Robot-Assisted Dressing: A Spherical Coordinate-Based Strategy for Tight-Fitting Garments Jihong Zhu Team 2508.12274 null
2025-08-17 Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids Shuran Song Team 2508.12252 null
2025-08-16 Belief-Conditioned One-Step Diffusion: Real-Time Trajectory Planning with Just-Enough Sensing Melkior Ornik Team 2508.12166 null
2025-08-16 OASIS: Real-Time Opti-Acoustic Sensing for Intervention Systems in Unstructured Environments Richard Camilli Team 2508.12071 null
2025-08-16 Fully Spiking Actor-Critic Neural Network for Robotic Manipulation Guanghui Sun Team 2508.12038 null
2025-08-16 OmniD: Generalizable Robot Manipulation Policy via Image-Based BEV Representation Xiaozhu Ju Team 2508.11898 null
2025-08-15 Limitation Learning: Catching Adverse Dialog with GAIL Rahul Zalkikar Team 2508.11767 null
2025-08-15 MultiPark: Multimodal Parking Transformer with Next-Segment Prediction Tong Qin Team 2508.11537 null
2025-08-15 Learning Differentiable Reachability Maps for Optimization-based Humanoid Motion Generation Fumio Kanehiro Team 2508.11275 null
2025-08-15 Multi-Group Equivariant Augmentation for Reinforcement Learning in Robot Manipulation Kwok Wai Samuel Au Team 2508.11204 null
2025-08-15 Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward Yu-Gang Jiang Team 2508.11143 null
2025-08-14 Robot Policy Evaluation for Sim-to-Real Transfer: A Benchmarking Perspective Fabio Ramos Team 2508.11117 null
2025-08-14 GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning Ruohan Gao Team 2508.11049 null
2025-08-14 3D FlowMatch Actor: Unified 3D Policy for Single- and Dual-Arm Manipulation Katerina Fragkiadaki Team 2508.11002 null
2025-08-15 KDPE: A Kernel Density Estimation Strategy for Diffusion Policy Trajectory Selection Lorenzo Natale Team 2508.10511 null
2025-08-14 Large Model Empowered Embodied AI: A Survey on Decision-Making and Embodied Learning Ping Kuang Team 2508.10399 null
2025-08-14 Leveraging OS-Level Primitives for Robotic Action Management Haibo Chen Team 2508.10259 null
2025-08-13 Masquerade: Learning from In-the-wild Human Videos using Data-Editing Jeannette Bohg Team 2508.09976 link
2025-08-13 Toward Human-Robot Teaming: Learning Handover Behaviors from 3D Scenes Changjae Oh Team 2508.09855 null
2025-08-13 Physical Autoregressive Model for Robotic Manipulation without Action Pretraining Guangrun Wang Team 2508.09822 null
2025-08-13 Immersive Teleoperation of Beyond-Human-Scale Robotic Manipulators: Challenges and Future Directions Jouni Mattila Team 2508.09700 null
2025-08-13 CaRoBio: 3D Cable Routing with a Bio-inspired Gripper Fingernail Fumin Zhang Team 2508.09558 null
2025-08-13 Reactive Model Predictive Contouring Control for Robot Manipulators Jaeheung Park Team 2508.09502 null
2025-08-13 DAgger Diffusion Navigation: DAgger Boosted Diffusion Policy for Vision-Language Navigation Liqiang Nie Team 2508.09444 null
2025-08-13 GeoVLA: Empowering 3D Representations in Vision-Language-Action Models Jiale Cao Team 2508.09071 link
2025-08-12 Unsupervised Skill Discovery as Exploration for Learning Agile Locomotion Sehoon Ha Team 2508.08982 null
2025-08-12 Reducing Cognitive Load in Multi-Agent Reinforcement Learning for Mathematical Problem Solving: Decoupling Reasoning and Code Generation Yang Li Team 2508.08882 null
2025-08-12 Visual Prompting for Robotic Manipulation with Annotation-Guided Pick-and-Place Using ACT Yukiyasu Domae Team 2508.08748 null
2025-08-12 Towards Safe Imitation Learning via Potential Field-Guided Flow Matching Yoshihiko Nakamura Team 2508.08707 null
2025-08-12 OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing Hengdi Zhang Team 2508.08706 null
2025-08-11 ReconDreamer-RL: Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction Wenjun Mei Team 2508.08170 null
2025-08-11 AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies Joyce Chai Team 2508.08113 null
2025-08-13 AgentWorld: An Interactive Simulation Platform for Scene Construction and Mobile Robotic Manipulation Lei Han Team 2508.07770 null
2025-08-11 GraphCoT-VLA: A 3D Spatial-Aware Reasoning Vision-Language-Action Model for Robotic Manipulation with Ambiguous Instructions Hong Zhang Team 2508.07650 null
2025-08-11 AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning Yang Liu Team 2508.07626 null
2025-08-10 Collision-Free Trajectory Planning and control of Robotic Manipulator using Energy-Based Artificial Potential Field (E-APF) Manoranjan Sinha Team 2508.07323 null
2025-08-10 Multimodal Spiking Neural Network for Space Robotic Manipulation Guanghui Sun Team 2508.07287 null
2025-08-09 DexFruit: Dexterous Manipulation and Gaussian Splatting Inspection of Fruit Monroe Kennedy III Team 2508.07118 null
2025-08-09 From Imitation to Optimization: A Comparative Study of Offline Learning for Autonomous Driving Antonio Guillen-Perez Team 2508.07029 null
2025-08-09 Manipulator for people with limited abilities Arkady Yuschenko Team 2508.06969 null
2025-08-09 Learning a Vision-Based Footstep Planner for Hierarchical Walking Control Michael Posa Team 2508.06779 null
2025-08-08 Towards Balanced Behavior Cloning from Imbalanced Datasets Dylan P. Losey Team 2508.06319 null
2025-08-08 Surrogate-Enhanced Modeling and Adaptive Modular Control of All-Electric Heavy-Duty Robotic Manipulators Jouni Mattila Team 2508.06313 null
2025-08-08 ADPro: a Test-time Adaptive Diffusion Policy for Robot Manipulation via Manifold and Initial Noise Constraints Liming Chen Team 2508.06266 null
2025-08-08 Incremental Language Understanding for Online Motion Planning of Robot Manipulators Matthias Scheutz Team 2508.06095 null
2025-08-08 Society of Mind Meets Real-Time Strategy: A Hierarchical Multi-Agent Framework for Strategic Reasoning Jonghyun Choi Team 2508.06042 null
2025-08-08 PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation Yao Mu Team 2508.05976 null
2025-08-07 Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation Guanghui Ren Team 2508.05635 link
2025-08-07 Towards Generalizable Safety in Crowd Navigation via Conformal Uncertainty Handling Jiachen Li Team 2508.05634 link
2025-08-07 Robust adaptive fuzzy sliding mode control for trajectory tracking for of cylindrical manipulator Nga Nguyen Thi Team 2508.05584 null
2025-08-07 Do Robots Really Need Anthropomorphic Hands? Nicolás Navarro-Guerrero Team 2508.05415 null
2025-08-07 Real-Time Iteration Scheme for Diffusion Policy Danica Kragic Team 2508.05396 null
2025-08-07 ASkDAgger: Active Skill-level Data Aggregation for Interactive Imitation Learning Jens Kober Team 2508.05310 null
2025-08-07 Learning to See and Act: Task-Aware View Planning for Robotic Manipulation Liang Lin Team 2508.05186 link
2025-08-07 Cognitive Duality for Adaptive Web Agents Zheng Hu Team 2508.05081 null
2025-08-07 Analyzing the Impact of Multimodal Perception on Sample Complexity and Optimization Landscapes in Imitation Learning Temitope Lukman Adebanjo Team 2508.05077 null
2025-08-06 INTENTION: Inferring Tendencies of Humanoid Robot Motion Through Interactive Intuition and Grounded VLM Nikos Tsagarakis Team 2508.04931 link
2025-08-06 Optimization of sliding control parameters for a 3-dof robot arm using genetic algorithm (GA) Le Tieu Nien Team 2508.04009 null
2025-08-05 Constraint-Preserving Data Generation for Visuomotor Policy Learning Jeannette Bohg Team 2508.03944 link
2025-08-05 DiWA: Diffusion Policy Adaptation with World Models Abhinav Valada Team 2508.03645 null
2025-08-05 ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow Xiaodan Liang Team 2508.03218 null
2025-08-05 Safety-Aware Imitation Learning via MPC-Guided Disturbance Injection Somil Bansal Team 2508.03129 null
2025-08-07 Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching C. Karen Liu Team 2508.03068 null
2025-08-05 Aerobatic maneuvers in insect-scale flapping-wing aerial robots via deep-learned robust tube model predictive control YuFeng Chen Team 2508.03043 null
2025-08-04 Learning User Interaction Forces using Vision for a Soft Finger Exosuit Thomas George Thuruthel Team 2508.02870 null
2025-08-04 Manip4Care: Robotic Manipulation of Human Limbs for Solving Assistive Tasks Ahmed H. Qureshi Team 2508.02649 null
2025-08-04 D2PPO: Diffusion Policy Policy Optimization with Dispersive Loss Haitao Wang Team 2508.02644 null
2025-08-01 On-Device Diffusion Transformer Policy for Efficient Robot Manipulation Dong Xu Team 2508.00697 null
2025-08-01 HannesImitation: Grasping with the Hannes Prosthetic Hand via Imitation Learning Lorenzo Natale Team 2508.00491 null
2025-08-01 Energy Efficient Trajectory Control and Resource Allocation in Multi-UAV-assisted MEC via Deep Reinforcement Learning Dusit Niyato Team 2508.00261 null
2025-07-31 RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping Jianbing Shen Team 2507.23734 link
2025-07-31 villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Jiang Bian Team 2507.23682 link
2025-08-01 H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation Jun Zhu Team 2507.23523 null
2025-07-31 Policy Learning from Large Vision-Language Model Feedback without Reward Modeling Chang D. Yoo Team 2507.23391 null
2025-07-30 In-between Motion Generation Based Multi-Style Quadruped Robot Locomotion Peng Lu Team 2507.23053 null
2025-07-30 Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations Brendan Tidd Team 2507.22380 null
2025-07-29 RL from Teacher-Model Refinement: Gradual Imitation Learning for Machine Translation Pengcheng He Team 2507.22219 null
2025-07-29 A Nonlinear MPC Framework for Loco-Manipulation of Quadrupedal Robots with Non-Negligible Manipulator Dynamics Kaveh Akbari Hamed Team 2507.22042 null
2025-07-29 From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning Bolei Zhou Team 2507.22028 null
2025-07-29 DISCOVERSE: Efficient Robot Simulation in Complex High-Fidelity Environments Guyue Zhou Team 2507.21981 null
2025-07-29 MoDeSuite: Robot Learning Task Suite for Benchmarking Mobile Manipulation with Deformable Objects Joni Pajarinen Team 2507.21796 null
2025-07-29 Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning Panpan Cai Team 2507.21545 null
2025-07-29 Model Predictive Adversarial Imitation Learning for Planning from Observation Byron Boots Team 2507.21533 null
2025-07-29 Retrieve-Augmented Generation for Speeding up Diffusion Policy without Additional Training Yutaka Matsuo Team 2507.21452 null
2025-07-28 Fluidically Innervated Lattices Make Versatile and Durable Tactile Sensors Daniela Rus Team 2507.21225 null
2025-07-28 FMimic: Foundation Models are Fine-grained Action Learners from Human Videos Yufeng Yue Team 2507.20622 null
2025-07-28 Learning Physical Interaction Skills from Human Demonstrations Kwonjoon Lee Team 2507.20445 null
2025-07-23 ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents Hesheng Wang Team 2507.17462 null
2025-07-23 Ctx2TrajGen: Traffic Context-Aware Microscale Vehicle Trajectories using Generative Adversarial Imitation Learning Byeongjoon Noh Team 2507.17418 null
2025-07-23 Confounded Causal Imitation Learning with Instrumental Variables Zhi Geng Team 2507.17309 null
2025-07-23 Prolonging Tool Life: Learning Skillful Use of General-purpose Tools through Lifespan-guided Reinforcement Learning Takamitsu Matsubara Team 2507.17275 null
2025-07-23 Towards Human-level Intelligence via Human-like Whole-Body Manipulation Zhaohui An Team 2507.17141 null
2025-07-22 Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots Aitor Arrieta Team 2507.17049 null
2025-07-19 Sensor-Space Based Robust Kinematic Control of Redundant Soft Manipulator by Learning Charlie C. L. Wang Team 2507.16842 null
2025-07-22 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Fu-En Yang Team 2507.16815 null
2025-07-22 Equivariant Goal Conditioned Contrastive Reinforcement Learning Robert Platt Team 2507.16139 null
2025-07-21 Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers Iman Soltani Team 2507.15833 null
2025-07-21 Strong, Accurate, and Low-Cost Robot Manipulator Donghyun Kim Team 2507.15693 null
2025-07-21 Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Zongqing Lu Team 2507.15597 null
2025-07-22 GR-3 Technical Report Yichu Yang Team 2507.15493 null
2025-07-20 Learning-Based Modeling of a Magnetically Steerable Soft Suction Device for Endoscopic Endonasal Interventions Eric Diller Team 2507.15155 null
2025-07-20 Reinforcement Learning for Flow-Matching Policies Somayeh Sojoudi Team 2507.15073 null
2025-07-20 Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper Yunzhu Li Team 2507.15062 null
2025-07-20 LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading Lu Zhang Team 2507.14995 null
2025-07-20 Heterogeneous object manipulation on nonlinear soft surface through linear controller Andres Faiña Team 2507.14967 null
2025-07-20 KGN-Pro: Keypoint-Based Grasp Prediction through Probabilistic 2D-3D Correspondence Learning Guangyao Zhai Team 2507.14820 null
2025-07-19 BT-TL-DMPs: A Novel Robot TAMP Framework Combining Behavior Tree, Temporal Logic and Dynamical Movement Primitives Yongchun Fang Team 2507.14582 null
2025-07-18 Improving Low-Cost Teleoperation: Augmenting GELLO with Force Kai Arulkumaran Team 2507.13602 null
2025-07-17 The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Kai Chen Team 2507.13332 null
2025-07-17 ZipMPC: Compressed Context-Dependent MPC Cost via Imitation Learning Johannes A. Stork Team 2507.13088 null
2025-07-17 Generalist Bimanual Manipulation via Foundation Video Diffusion Models Jun Zhu Team 2507.12898 null
2025-07-17 Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) Jost Tobias Springenberg Team 2507.12856 null
2025-07-17 DEMONSTRATE: Zero-shot Language to Robotic Control via Multi-task Demonstration Learning Melanie N. Zeilinger Team 2507.12855 null
2025-07-17 Learning to Predict Mobile Robot Stability in Off-Road Environments Parikshit Maini Team 2507.12731 null
2025-07-18 EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Xiaolong Wang Team 2507.12440 null
2025-07-16 The Developments and Challenges towards Dexterous and Embodied Robotic Manipulation: A Survey Jiming Chen Team 2507.11840 null
2025-07-15 Let’s Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification Zsolt Kira Team 2507.11662 null
2025-07-15 MPC-based Coarse-to-Fine Motion Planning for Robotic Object Transportation in Cluttered Environments Steven Liu Team 2507.11211 null
2025-07-15 A Robust Controller based on Gaussian Processes for Robotic Manipulators with Unknown Uncertainty Ruggero Carli Team 2507.11170 null
2025-07-15 Enhancing Autonomous Manipulator Control with Human-in-loop for Uncertain Assembly Environments Kazuya Yoshida Team 2507.11006 null
2025-07-15 Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning Jun Morimoto Team 2507.10899 null
2025-07-14 Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection Colin Bellinger Team 2507.10814 null
2025-07-14 rt-RISeg: Real-Time Model-Free Robot Interactive Segmentation for Active Instance-Level Object Understanding Kaiyu Hang Team 2507.10776 null
2025-07-14 A New Dataset and Performance Benchmark for Real-time Spacecraft Segmentation in Onboard Flight Computers Arko Barman Team 2507.10775 null
2025-07-14 Vision Language Action Models in Robotic Manipulation: A Systematic Review Irfan Hussain Team 2507.10672 null
2025-07-16 GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning Dandan Tu Team 2507.10628 null
2025-07-14 MP1: Mean Flow Tames Policy Learning in 1-step for Robotic Manipulation Mengyuan Liu Team 2507.10543 null
2025-07-14 Prompt Informed Reinforcement Learning for Visual Coverage Path Planning Venkat Margapuri Team 2507.10284 null
2025-07-14 Should We Ever Prefer Decision Transformer for Offline Reinforcement Learning? Keith Ross Team 2507.10174 null
2025-07-16 MTF-Grasp: A Multi-tier Federated Learning Approach for Robotic Grasping Monowar Bhuyan Team 2507.10158 null
2025-07-13 Learning to Control Dynamical Agents via Spiking Neural Networks and Metropolis-Hastings Sampling Ali Al-Zawqari Team 2507.09540 null
2025-07-13 Self-supervised Pretraining for Integrated Prediction and Planning of Automated Vehicles Keqiang Li Team 2507.09537 null
2025-07-13 SegVec3D: A Method for Vector Embedding of 3D Objects Oriented Towards Robot manipulation Boyu Wang Team 2507.09459 null
2025-07-12 DAA*: Deep Angular A Star for Image-based Path Planning Zhiwei Xu Team 2507.09305 null
2025-07-15 Learning and Transferring Better with Depth Information in Visual Reinforcement Learning Jingdong Zhao Team 2507.09180 null
2025-07-12 PRAG: Procedural Action Generator Karla Stepanova Team 2507.09167 null
2025-07-12 Towards Human-level Dexterity via Robot Learning Gagan Khandate Team 2507.09117 null
2025-07-11 Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction Max Simchowitz Team 2507.09061 null
2025-07-11 Behavioral Exploration: Learning to Explore via In-Context Adaptation Sergey Levine Team 2507.09041 null
2025-07-11 Learning human-to-robot handovers through 3D scene reconstruction Changjae Oh Team 2507.08726 null
2025-07-11 Learning Robust Motion Skills via Critical Adversarial Attacks for Humanoid Robots Yue Gao Team 2507.08303 null
2025-07-11 CL3R: 3D Reconstruction and Contrastive Learning for Enhanced Robotic Manipulation Representations He Wang Team 2507.08262 null
2025-07-10 Imitation Learning for Obstacle Avoidance Using End-to-End CNN-Based Sensor Fusion Raafat E. Shalaby Team 2507.08112 null
2025-07-15 EXPO: Stable Reinforcement Learning with Expressive Policies Chelsea Finn Team 2507.07986 null
2025-07-15 Reinforcement Learning with Action Chunking Sergey Levine Team 2507.07969 null
2025-07-09 Self-Wearing Adaptive Garments via Soft Robotic Unfurling Allison M. Okamura Team 2507.07221 null
2025-07-09 Hierarchical Reinforcement Learning for Articulated Tool Manipulation with Multifingered Hand Xinjun Sheng Team 2507.06822 null
2025-07-09 Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm George A. Vouros Team 2507.06780 null
2025-07-13 Spatial-Temporal Aware Visuomotor Diffusion Policy Learning Yanwei Fu Team 2507.06710 null
2025-07-09 Value from Observations: Towards Large-Scale Imitation Learning via Self-Improvement Martin Riedmiller Team 2507.06701 null
2025-07-09 Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning Jian Cheng Team 2507.06628 null
2025-07-09 Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic Fabio Ramos Team 2507.06625 null
2025-07-09 Token Bottleneck: One Token to Remember Dynamics Sangdoo Yun Team 2507.06543 null
2025-07-08 Learning to Evaluate Autonomous Behaviour in Human-Robot Interaction Alessio Del Bue Team 2507.06404 null
2025-07-08 EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow Liang Wang Team 2507.06224 null
2025-07-08 Is Diversity All You Need for Scalable Robotic Manipulation? Hongyang Li Team 2507.06219 null
2025-07-08 Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model Toshiaki Tsuji Team 2507.06174 null
2025-07-08 Learning Agile Tensile Perching for Aerial Robots from Demonstrations Basaran Bahadir Kocer Team 2507.06172 null
2025-07-08 SCCRUB: Surface Cleaning Compliant Robot Utilizing Bristles Jeffrey Ian Lipton Team 2507.06053 null
2025-07-08 LeAD: The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving Jian Sun Team 2507.05754 null
2025-07-08 Hybrid Diffusion Policies with Projective Geometric Algebra for Efficient Robot Manipulation Learning Daniel Rakita Team 2507.05695 null
2025-07-08 Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control Bin Liang Team 2507.05674 null
2025-07-08 Stable Tracking-in-the-Loop Control of Cable-Driven Surgical Manipulators under Erroneous Kinematic Chains Michael C. Yip Team 2507.05663 null
2025-07-08 DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation Frank Chongwoo Park Team 2507.05627 null
2025-07-07 Gaussian Process-Based Active Exploration Strategies in Vision and Touch Nadia Figueroa Team 2507.05522 null
2025-07-07 A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation Russ Tedrake Team 2507.05331 null
2025-07-07 VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting Yanzhi Wang Team 2507.05116 null
2025-07-07 When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning Sebastien Ourselin Team 2507.05011 null
2025-07-07 Training-free Generation of Temporally Consistent Rewards from VLMs Jian Tang Team 2507.04789 null
2025-07-07 DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics Mingsheng Shang Team 2507.04661 null
2025-07-07 PRISM: Pointcloud Reintegrated Inference via Segmentation and Cross-attention for Manipulation Chee-Meng Chew Team 2507.04633 null
2025-07-07 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts Junjie Hu Team 2507.04631 null
2025-07-06 VLM-TDP: VLM-guided Trajectory-conditioned Diffusion Policy for Robust Long-Horizon Manipulation Lei Han Team 2507.04524 null
2025-07-06 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Xin Jin Team 2507.04447 null
2025-07-06 Wavelet Policy: Lifting Scheme for Policy Learning in Long-Horizon Tasks Yi Fang Team 2507.04331 null
2025-07-05 Are Learning-Based Approaches Ready for Real-World Indoor Navigation? A Case for Imitation Learning Sebastian Houben Team 2507.04086 null
2025-07-05 Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation Yadan Luo Team 2507.04049 null
2025-07-08 RwoR: Generating Robot Demonstrations from Human Hand Collection for Policy Learning without Robot Hao Dong Team 2507.03930 null
2025-07-05 DK-RRT: Deep Koopman RRT for Collision-Aware Motion Planning of Space Manipulators in Dynamic Debris Environments Dezhi Yu Team 2507.03878 null
2025-07-04 Dexterous Teleoperation of 20-DoF ByteDexter Hand via Human Motion Retargeting Zeyu Ren Team 2507.03227 null
2025-07-02 cVLA: Towards Efficient Camera-Space VLAs Thomas Brox Team 2507.02190 null
2025-07-02 Towards Bio-Inspired Robotic Trajectory Planning via Self-Supervised RNN Matthias Kerzel Team 2507.02171 null
2025-07-02 TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types Wei-Shi Zheng Team 2507.01857 null
2025-07-02 S3D: A Spatial Steerable Surgical Drilling Framework for Robotic Spinal Fixation Procedures Farshid Alambeigi Team 2507.01779 null
2025-07-03 TriVLA: A Triple-System-Based Unified Vision-Language-Action Model for General Robot Control Yanwei Fu Team 2507.01424 null
2025-07-01 Search-Based Robot Motion Planning With Distance-Based Adaptive Motion Primitives Bakir Lacevic Team 2507.01198 null
2025-07-01 Imitation Learning for Satellite Attitude Control under Unknown Perturbations Xiaoli Bai Team 2507.01161 null
2025-07-01 SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound Philipp Fürnstahl Team 2507.01152 null
2025-07-01 Geometry-aware 4D Video Generation for Robot Manipulation Shuran Song Team 2507.01099 null
2025-07-01 DexWrist: A Robotic Wrist for Constrained and Dynamic Manipulation Pulkit Agrawal Team 2507.01008 null
2025-07-04 Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations Yunzhu Li Team 2507.00990 null
2025-07-01 HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning Chenjia Bai Team 2507.00833 null
2025-07-01 Learning Steerable Imitation Controllers from Unstructured Animal Motions Stelian Coros Team 2507.00677 null
2025-07-01 RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation Siddhartha Srinivasa Team 2507.00435 null
2025-07-01 Adapt Your Body: Mitigating Proprioception Shifts in Imitation Learning Yang Gao Team 2506.23944 null
2025-06-30 World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation Lin Shao Team 2506.23919 null
2025-06-30 Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning Alexey Skrynnik Team 2506.23793 null
2025-06-30 PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies? Ransalu Senanayake Team 2506.23725 null
2025-07-04 ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation Mac Schwager Team 2506.23126 null
2025-06-29 Learning Motion Skills with Adaptive Assistive Curriculum Force in Humanoid Robots Yue Gao Team 2506.23125 null
2025-06-28 Hierarchical Vision-Language Planning for Multi-Step Humanoid Manipulation Navid Azizan Team 2506.22827 null
2025-06-28 SPI-BoTER: Error Compensation for Industrial Robots via Sparse Attention Masking and Hybrid Loss with Spatial-Physical Information Yuqiang Wu Team 2506.22788 null
2025-06-28 Learning Efficient Robotic Garment Manipulation with Standardization Bin He Team 2506.22769 null
2025-06-28 RoboPearls: Editable Video Simulation for Robot Manipulation Xiaodan Liang Team 2506.22756 null
2025-06-27 Spherical Pendulum with Quad-Rotor Thrust Vectoring Actuation – A Novel Mechatronics and Control Benchmark Platform Tsu-Chin Tsao Team 2506.22410 null
2025-06-27 RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation Abhinav Valada Team 2506.22007 null
2025-06-26 Experimental investigation of pose informed reinforcement learning for skid-steered visual navigation Venkat Krovi Team 2506.21732 null
2025-06-24 Ark: An Open-source Python-based Framework for Robot Learning Haitham Bou-Ammar Team 2506.21628 null
2025-06-24 FrankenBot: Brain-Morphic Modular Orchestration for Robotic Manipulation with Vision-Language Models Huiping Zhuang Team 2506.21627 null
2025-06-26 ACTLLM: Action Consistency Tuned Large Language Model Chenliang Xu Team 2506.21250 null
2025-07-02 World-aware Planning Narratives Enhance Large Vision-Language Model Planner Xipeng Qiu Team 2506.21230 null
2025-06-26 UAIbot: Beginner-friendly web-based simulator for interactive robotics learning and research Vinicius Mariano Gonçalves Team 2506.21178 null
2025-06-26 Knowledge-Driven Imitation Learning: Enabling Generalization Across Diverse Conditions Cewu Lu Team 2506.21057 null
2025-06-26 Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends Zeng-Guang Hou Team 2506.20966 null
2025-06-25 Learning-Based Distance Estimation for 360° Single-Sensor Setups Andreas Zell Team 2506.20586 null
2025-06-25 Learn to Position – A Novel Meta Method for Robotic Positioning Xiaoming Tao Team 2506.20445 null
2025-06-25 Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration Quanquan Gu Team 2506.20307 null
2025-06-24 Unified Vision-Language-Action Model Zhaoxiang Zhang Team 2506.19850 null
2025-06-24 T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models Qingyao Wu Team 2506.19498 null
2025-06-24 Is an object-centric representation beneficial for robotic manipulation ? Liming Chen Team 2506.19408 null
2025-06-24 Robotic Perception with a Large Tactile-Vision-Language Model for Physical Property Inference Nutan Chen Team 2506.19303 null
2025-06-25 AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation Hui Shen Team 2506.19269 null
2025-06-24 Robust Behavior Cloning Via Global Lipschitz Regularization Sean B. Andersson Team 2506.19250 null
2025-06-23 CUPID: Curating Data your Robot Loves with Influence Functions Jeannette Bohg Team 2506.19121 null
2025-06-23 Multimodal Anomaly Detection with a Mixture-of-Experts Dongheui Lee Team 2506.19077 null
2025-06-25 FORTE: Tactile Force and Slip Sensing on Compliant Fingers for Delicate Manipulation Lillian Chin Team 2506.18960 null
2025-06-23 RAG-6DPose: Retrieval-Augmented 6D Pose Estimation via Leveraging CAD as Knowledge Base Xiangyang Xue Team 2506.18856 null
2025-06-23 SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives Jia Pan Team 2506.18825 null
2025-06-23 Learning Point Correspondences In Radar 3D Point Clouds For Radar-Inertial Odometry Jan Steinbrener Team 2506.18580 null
2025-06-23 Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots Alessandro Di Nuovo Team 2506.18365 null
2025-06-23 Robotic Manipulation of a Rotating Chain with Bottom End Fixed Quang-Cuong Pham Team 2506.18355 null
2025-06-23 Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies Xiaolin Chang Team 2506.18304 null
2025-06-23 Learning Approach to Efficient Vision-based Active Tracking of a Flying Target by an Unmanned Aerial Vehicle Souma Chowdhury Team 2506.18264 null
2025-06-22 RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation Yao Mu Team 2506.18088 null
2025-06-21 RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models Xiao Li Team 2506.17639 null
2025-06-21 Imitation Learning for Active Neck Motion Enabling Robot Manipulation beyond the Field of View Yasuo Kuniyoshi Team 2506.17624 null
2025-06-20 Kinematic Model Optimization via Differentiable Contact Manifold for In-Space Manipulation Satyandra K. Gupta Team 2506.17458 null
2025-06-20 Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping Jingjin Yu Team 2506.17110 null
2025-06-24 Learning Accurate Whole-body Throwing with High-frequency Residual Policy and Pullback Tube Acceleration Marco Hutter Team 2506.16986 null
2025-06-20 Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections Shuran Song Team 2506.16685 null
2025-06-19 CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity Yunzhu Li Team 2506.16652 null
2025-06-19 Reimagination with Test-time Observation Interventions: Distractor-Robust World Model Predictions for Visual Model Predictive Control Ran Tian Team 2506.16565 null
2025-06-19 An Optimization-Augmented Control Framework for Single and Coordinated Multi-Arm Robotic Manipulation Ozgur S. Oguz Team 2506.16555 null
2025-06-19 Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining Ding Zhao Team 2506.16475 null
2025-06-19 GoalLadder: Incremental Goal Discovery with Vision-Language Models Shimon Whiteson Team 2506.16396 null
2025-06-19 CapsDT: Diffusion-Transformer for Capsule Robot Manipulation Hongliang Ren Team 2506.16263 null
2025-06-19 ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models Siyuan Huang Team 2506.16211 null
2025-06-19 FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation Wei Tang Team 2506.16201 null
2025-06-19 ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation Jitendra Malik Team 2506.15953 null
2025-06-18 Learning from Planned Data to Improve Robotic Pick-and-Place Planning Efficiency Kensuke Harada Team 2506.15920 null
2025-06-18 Improving Robotic Manipulation: Techniques for Object Pose Estimation, Accommodating Positional Uncertainty, and Disassembly Tasks from Examples Viral Rasik Galaiya Team 2506.15865 null
2025-06-18 Vision in Action: Learning Active Perception from Human Demonstrations Shuran Song Team 2506.15666 null
2025-06-18 Learning Task-Agnostic Skill Bases to Uncover Motor Primitives in Animal Behaviors Anqi Wu Team 2506.15190 null
2025-06-18 Robust Instant Policy: Leveraging Student’s t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation Yukiyasu Domae Team 2506.15157 null
2025-06-18 TACT: Humanoid Whole-body Contact Manipulation through Deep Imitation Learning with Tactile Modality Eiichi Yoshida Team 2506.15146 null
2025-06-17 RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills Chuang Gan Team 2506.14763 null
2025-06-17 Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation Mustafa Mukadam Team 2506.14754 null
2025-06-17 SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning Shuo Wang Team 2506.14648 null
2025-06-17 Latent Action Diffusion for Cross-Embodiment Manipulation Robert K. Katzschmann Team 2506.14608 null
2025-06-19 ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes Hao Dong Team 2506.14317 null
2025-06-17 Steering Robots with Inference-Time Interactions Yanwei Wang Team 2506.14287 null
2025-06-17 AMPLIFY: Actionless Motion Priors for Robot Learning from Videos Animesh Garg Team 2506.14198 null
2025-06-17 Non-Overlap-Aware Egocentric Pose Estimation for Collaborative Perception in Connected Autonomy Peng Gao Team 2506.14180 null
2025-06-17 GAF: Gaussian Action Field as a Dvnamic World Model for Robotic Mlanipulation Yebin Liu Team 2506.14135 null
2025-06-16 ATK: Automatic Task-driven Keypoint Selection for Robust Policy Learning Abhishek Gupta Team 2506.13867 null
2025-06-16 Touch begins where vision ends: Generalizable policies for contact-rich manipulation Raunaq Bhirangi Team 2506.13762 null
2025-06-16 Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins Wei-Chiu Ma Team 2506.13761 null
2025-06-16 What Matters in Learning from Large-Scale Datasets for Robot Manipulation Danfei Xu Team 2506.13536 null
2025-06-16 A Survey on Imitation Learning for Contact-Rich Tasks in Robotics Arash Ajoudani Team 2506.13498 null
2025-06-16 Learning Swing-up Maneuvers for a Suspended Aerial Manipulation Platform in a Hierarchical Control Framework Christian Ott Team 2506.13478 null
2025-06-16 VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation Wei Pan Team 2506.13428 null
2025-06-15 SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration Wenwu Zhu Team 2506.12723 null
2025-06-15 Adapting by Analogy: OOD Generalization of Visuomotor Policies via Functional Correspondence Andrea Bajcsy Team 2506.12678 null
2025-06-15 Goal-based Self-Adaptive Generative Adversarial Imitation Learning (Goal-SAGAIL) for Multi-goal Robotic Manipulation Tasks George Vogiatzis Team 2506.12676 null
2025-06-14 AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making Qingyao Wu Team 2506.12374 null
2025-06-13 Role of Uncertainty in Model Development and Control Design for a Manufacturing Process Francis Assadian Team 2506.12273 null
2025-06-13 SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies Danfei Xu Team 2506.11948 null
2025-06-13 mimic-one: a Scalable Model Recipe for General Purpose Robot Dexterity Robert K. Katzschmann Team 2506.11916 null
2025-06-13 ExoStart: Efficient learning for dexterous manipulation with sensorized exoskeleton demonstrations Maria Bauza Villalonga Team 2506.11775 null
2025-06-13 Control Architecture and Design for a Multi-robotic Visual Servoing System in Automated Manufacturing Environment Rongfei Li Team 2506.11387 null
2025-06-12 Influence Functions for Data Attribution in Linear System Identification and LQR Control Dongmei Chen Team 2506.11293 null
2025-06-12 Gondola: Grounded Vision Language Planning for Generalizable Robotic Manipulation Cordelia Schmid Team 2506.11261 null
2025-06-12 Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Angjoo Kanazawa Team 2506.10968 null
2025-06-12 GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation Jiangmiao Pang Team 2506.10966 null
2025-06-12 Human-Robot Navigation using Event-based Cameras and Reinforcement Learning Rodrigo Verschae Team 2506.10790 null
2025-06-12 Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success Kapil Katyal Team 2506.10359 null
2025-06-11 Innovative Adaptive Imaged Based Visual Servoing Control of 6 DoFs Industrial Robot Manipulators Francis Assadian Team 2506.10240 null
2025-06-11 One For All: LLM-based Heterogeneous Mission Planning in Precision Agriculture Stefano Carpin Team 2506.10106 null
2025-06-11 eFlesh: Highly customizable Magnetic Touch Sensing using Cut-Cell Microstructures Raunaq Bhirangi Team 2506.09994 null
2025-06-11 Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation Xiao Ma Team 2506.09990 null
2025-06-11 From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models Chen Feng Team 2506.09930 null
2025-06-11 Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving Chen Lv Team 2506.09800 null
2025-06-11 CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings Davide Boscaini Team 2506.09699 null
2025-06-11 Advances on Affordable Hardware Platforms for Human Demonstration Acquisition in Agricultural Applications Néstor García Team 2506.09494 null
2025-06-11 DCIRNet: Depth Completion with Iterative Refinement for Dexterous Grasping of Transparent and Reflective Objects Hong Liu Team 2506.09491 null
2025-06-11 Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation Le Wang Team 2506.09422 null
2025-06-11 Analyzing Key Objectives in Human-to-Robot Retargeting for Dexterous Manipulation Xiang Li Team 2506.09384 null
2025-06-11 ContextBuddy: AI-Enhanced Contextual Insights for Security Alert Investigation (Applied to Intrusion Detection) Cecile Paris Team 2506.09365 null
2025-06-10 UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation Li Fei-Fei Team 2506.09284 null
2025-06-10 Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism Bolei Zhou Team 2506.09176 null
2025-06-10 FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency Jian Tang Team 2506.08822 null
2025-06-10 Towards Biosignals-Free Autonomous Prosthetic Hand Control via Imitation Learning Xianta Jiang Team 2506.08795 null
2025-06-10 Bayesian Inverse Physics for Neuro-Symbolic Robot Learning Frank Kirchner Team 2506.08756 null
2025-06-10 Deep Reinforcement Learning-Based Motion Planning and PDE Control for Flexible Manipulators Jouni Mattila Team 2506.08639 null
2025-06-10 RoboSwap: A GAN-driven Video Diffusion Framework For Unsupervised Robot Arm Swapping Gitta Kutyniok Team 2506.08632 null
2025-06-10 Periodic Bipedal Gait Learning Using Reward Composition Based on a Novel Gait Planner for Humanoid Robots Lijun Zhu Team 2506.08416 null
2025-06-11 HiBerNAC: Hierarchical Brain-emulated Robotic Neural Agent Collective for Disentangling Complex Manipulation Cong Wang Team 2506.08296 null
2025-06-09 ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving Xinggang Wang Team 2506.08052 null
2025-06-09 BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Tieniu Tan Team 2506.07961 null
2025-06-09 BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Xilin Chen Team 2506.07530 null
2025-06-09 Reinforcement Learning via Implicit Imitation Guidance Chelsea Finn Team 2506.07505 null
2025-06-09 RAPID Hand: A Robust, Affordable, Perception-Integrated, Dexterous Manipulation Platform for Generalist Robot Autonomy Hui Cheng Team 2506.07490 null
2025-06-08 CARoL: Context-aware Adaptation for Robot Learning Xuan Wang Team 2506.07006 null
2025-06-07 SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game Shanghang Zhang Team 2506.06690 null
2025-06-07 RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation Si Liu Team 2506.06677 null
2025-06-07 Self-Adapting Improvement Loops for Robotic Learning Chen Sun Team 2506.06658 null
2025-06-06 Enhancing Robot Safety via MLLM-Based Semantic Interpretation of Failure Data Somil Bansal Team 2506.06570 null
2025-06-06 NeSyPack: A Neuro-Symbolic Framework for Bimanual Logistics Packing Changliu Liu Team 2506.06567 null
2025-06-06 MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping Farshad Khorrami Team 2506.06535 null
2025-06-06 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model Mingkui Tan Team 2506.06199 null
2025-06-06 Bridging Perception and Action: Spatially-Grounded Mid-Level Representations for Robot Generalization Tingnan Zhang Team 2506.06196 null
2025-06-10 BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning Rudolf Lioutikov Team 2506.06072 null
2025-06-06 Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning Ping Luo Team 2506.05985 null
2025-06-06 Optimal Robotic Velcro Peeling with Force Feedback Volkan Isler Team 2506.05812 null
2025-06-06 Where Do We Look When We Teach? Analyzing Human Gaze Behavior Across Demonstration Devices in Robot Imitation Learning Hiroshi Bito Team 2506.05808 null
2025-06-06 FlowOE: Imitation Learning with Flow Policy from Ensemble RL Experts for Optimal Execution under Heston Volatility and Concave Market Impacts Zhi Chen Team 2506.05755 null
2025-06-06 You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping Xiangyang Xue Team 2506.05719 null
2025-06-05 A Smooth Sea Never Made a Skilled $\texttt{SAILOR}$ : Robust Imitation via Learning to Search Gokul Swamy Team 2506.05294 null
2025-06-05 LiPo: A Lightweight Post-optimization Framework for Smoothing Action Chunks Generated by Learned Policies Suhan Park Team 2506.05165 null
2025-06-05 DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration Huazhe Xu Team 2506.05064 null
2025-06-06 ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning Jian Tang Team 2506.04941 null
2025-06-05 Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion Qi Dou Team 2506.04716 null
2025-06-05 Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning Wanxiang Che Team 2506.04625 null
2025-06-04 SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement Learning Aleksandr Panov Team 2506.04505 null
2025-06-04 Object-centric 3D Motion Field for Robot Learning from Human Videos Pieter Abbeel Team 2506.04227 null
2025-06-04 Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data Leonard Hasenclever Team 2506.04120 null
2025-06-04 STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization Liqiang Nie Team 2506.03863 link
2025-06-04 SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models Jian Tang Team 2506.03574 null
2025-06-05 Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving Hu Chuan Team 2506.03568 link
2025-06-03 ORV: 4D Occupancy-centric Robot Video Generation Hao Zhao Team 2506.03079 null
2025-06-03 Geometric Visual Servo Via Optimal Transport Ashutosh Tiwari Team 2506.02768 null
2025-06-03 Rodrigues Network for Learning Robot Actions Leonidas Guibas Team 2506.02618 null
2025-06-03 Reachability Weighted Offline Goal-conditioned Resampling Joni Pajarinen Team 2506.02577 null
2025-06-02 Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning Pheng-Ann Heng Team 2506.01953 null
2025-06-02 Feel the Force: Contact-Driven Learning from Humans Lerrel Pinto Team 2506.01944 null
2025-06-02 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Dahua Lin Team 2506.01943 null
2025-06-02 FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation Hongyang Li Team 2506.01941 null
2025-06-02 Learning with pyCub: A New Simulation and Exercise Framework for Humanoid Robotics Matej Hoffmann Team 2506.01756 null
2025-06-02 Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning Kang Liu Team 2506.01710 link
2025-06-02 WoMAP: World Models For Embodied Open-Vocabulary Object Localization Anirudha Majumdar Team 2506.01600 null
2025-06-02 FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens Yuexin Ma Team 2506.01583 null
2025-06-02 Trajectory First: A Curriculum for Discovering Diverse Policies Marc Toussaint Team 2506.01568 null
2025-06-02 Variational Adaptive Noise and Dropout towards Stable Recurrent Neural Networks Shingo Murata Team 2506.01350 null
2025-06-01 OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation Valts Blukis Team 2506.01196 null
2025-06-01 HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control Jeannette Bohg Team 2506.01185 null
2025-06-01 Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning Jing Li Team 2506.00782 null
2025-05-31 XYZ-IBD: High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity Benjamin Busam Team 2506.00599 null
2025-05-31 Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents Zhou Yu Team 2506.00320 null
2025-05-30 3D Gaussian Splat Vulnerabilities Polo Chau Team 2506.00280 null
2025-05-30 Bi-Manual Joint Camera Calibration and Scene Representation Weiming Zhi Team 2505.24819 null
2025-05-30 MagicGripper: A Multimodal Sensor-Integrated Gripper for Contact-Rich Robotic Manipulation Dandan Zhang Team 2505.24382 null
2025-05-30 Imitation Learning-Based Path Generation for the Complex Assembly of Deformable Objects Christoffer Sloth Team 2505.24339 null
2025-05-30 SR3D: Unleashing Single-view 3D Reconstruction for Transparent and Specular Object Grasping Hao Dong Team 2505.24305 null
2025-05-30 Safety-Aware Robust Model Predictive Control for Robotic Arms in Dynamic Environments Suwoong Lee Team 2505.24209 null
2025-05-30 Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control Guanya Shi Team 2505.24198 null
2025-05-29 Mobi- $π$ : Mobilizing Your Robot Learning Policy Jeannette Bohg Team 2505.23692 null
2025-05-30 Normalizing Flows are Capable Models for RL Benjamin Eysenbach Team 2505.23527 null
2025-05-29 Optimization-based Posture Generation for Whole-body Contact Motion by Contact Point Search on the Body Surface Masayuki Inaba Team 2505.23501 null
2025-05-29 Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents Lichao Sun Team 2505.23450 null
2025-05-29 Enhanced DACER Algorithm with High Diffusion Efficiency Shengbo Eben Li Team 2505.23426 null
2025-05-29 RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer Zhizhong Su Team 2505.23171 null
2025-05-28 SCIZOR: A Self-Supervised Approach to Data Curation for Large-Scale Imitation Learning Yuke Zhu Team 2505.22626 null
2025-05-28 Hybrid Learning for Cold-Start-Aware Microservice Scheduling in Dynamic Edge Environments Weijia Jia Team 2505.22424 link
2025-05-28 Efficient Precision-Scalable Hardware for Microscaling (MX) Processing in Robotics Learning Marian Verhelst Team 2505.22404 null
2025-05-28 State and Input Constrained Adaptive Tracking Control of Uncertain Euler-Lagrange Systems with Robustness and Feasibility Analysis Shubhendu Bhasin Team 2505.22352 null
2025-05-28 ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Wenqiang Zhang Team 2505.22159 null
2025-05-28 Learning Compositional Behaviors from Demonstration and Language Jiajun Wu Team 2505.21981 null
2025-05-29 ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge Yi Xu Team 2505.21906 null
2025-05-28 Streaming Flow Policy: Simplifying diffusion $/$ flow-matching policies by treating action trajectories as flow trajectories Siddharth Ancha Team 2505.21851 null
2025-05-27 PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation Tianmin Shu Team 2505.21652 null
2025-05-30 Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks Bryan A. Plummer Team 2505.21649 null
2025-05-27 CLAMP: Crowdsourcing a LArge-scale in-the-wild haptic dataset with an open-source device for Multimodal robot Perception Tapomayukh Bhattacharjee Team 2505.21495 null
2025-05-27 EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation Robert Platt Team 2505.21351 null
2025-05-27 EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild Gonzalo Ferrer Team 2505.21282 null
2025-05-27 Learning What to Do and What Not To Do: Offline Imitation from Expert and Undesirable Demonstrations Tanvi Verma Team 2505.21182 null
2025-05-27 Object-Centric Action-Enhanced Representations for Robot Visuo-Motor Policy Learning George Retsinas Team 2505.20962 null
2025-05-27 Learning Unified Force and Position Control for Legged Loco-Manipulation Siyuan Huang Team 2505.20829 null
2025-05-27 Spatial RoboGrasp: Generalized Robotic Grasping Control Policy Luhui Hu Team 2505.20814 null
2025-05-27 Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt Jianyu Chen Team 2505.20795 null
2025-05-28 ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image Ruohan Gao Team 2505.20498 null
2025-05-26 OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation Farshad Khorrami Team 2505.20425 null
2025-05-26 Co-Design of Soft Gripper with Neural Physics Xiaolong Wang Team 2505.20404 null
2025-05-26 EgoZero: Robot Learning from Smart Glasses Lerrel Pinto Team 2505.20290 null
2025-05-26 URPlanner: A Universal Paradigm For Collision-Free Robotic Motion Planning Based on Deep Reinforcement Learning Marcelo H. Ang Jr Team 2505.20175 null
2025-05-27 MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents Xiaodan Liang Team 2505.20148 link
2025-05-26 ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving Dongbin Zhao Team 2505.20024 link
2025-05-26 Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^π$ -Realizable MDPs Luca Viano Team 2505.19946 null
2025-05-26 TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning Dongbin Zhao Team 2505.19769 null
2025-05-26 Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning Jean-Baptiste Mouret Team 2505.19717 null
2025-05-25 Structured Reinforcement Learning for Combinatorial Decision-Making Maximilian Schiffer Team 2505.19053 link
2025-05-25 WorldEval: World Model as Real-World Robot Policies Evaluator Yi Xu Team 2505.19017 null
2025-05-25 Online Knowledge Distillation with Reward Guidance Chen Jia Team 2505.18952 null
2025-05-24 Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning Giovanni Beltrame Team 2505.18858 null
2025-05-24 On the Dual-Use Dilemma in Physical Reasoning and Force Nikolaus Correll Team 2505.18792 null
2025-05-24 VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning Ziwei Wang Team 2505.18719 null
2025-05-24 MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations Hong Thanh Nguyen Team 2505.18595 null
2025-05-24 Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning Zhiyun Lin Team 2505.18487 null
2025-05-24 Canonical Policy: Learning Canonical 3D Representation for Equivariant Policy Yu She Team 2505.18474 null
2025-05-24 ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning Yu She Team 2505.18472 null
2025-05-23 ProgRM: Build Better GUI Agents with Progress Rewards Kai Yu Team 2505.18121 null
2025-05-23 Classification of assembly tasks combining multiple primitive actions using Transformers and xLSTMs Pedro Neto Team 2505.18012 null
2025-05-23 Is Single-View Mesh Reconstruction Ready for Robotics? Ingmar Posner Team 2505.17966 null
2025-05-23 SynRES: Towards Referring Expression Segmentation in the Wild via Synthetic Data Donghyun Kim Team 2505.17695 null
2025-05-23 Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning Giorgia Ramponi Team 2505.17610 null
2025-05-23 Dynamic Manipulation of Deformable Objects in 3D: Simulation, Benchmark and Learning Strategy Bin Zhao Team 2505.17434 null
2025-05-23 Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space Hui Cheng Team 2505.17389 null
2025-05-22 ScanBot: Towards Intelligent Surface Scanning in Embodied Robotic Systems Farhad Imani Team 2505.17295 null
2025-05-22 CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning Limin Wang Team 2505.17006 null
2025-05-22 3D Equivariant Visuomotor Policy Learning via Spherical Projection Robin Walters Team 2505.16969 null
2025-05-22 Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only Donglin Wang Team 2505.16856 null
2025-05-22 Find the Fruit: Designing a Zero-Shot Sim2Real Deep RL Planner for Occlusion Aware Plant Manipulation Soumik Sarkar Team 2505.16547 null
2025-05-24 ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models Xiuying Chen Team 2505.16517 null
2025-05-22 Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) Junchi Yan Team 2505.16394 null
2025-05-22 TacCompress: A Benchmark for Multi-Point Tactile Data Compression in Dexterous Manipulation Hengdi Zhang Team 2505.16289 null
2025-05-22 SEM: Enhancing Spatial Understanding for Robust Robot Manipulation Zhizhong Su Team 2505.16196 null
2025-05-22 Tactile-based Reinforcement Learning for Adaptive Grasping under Observation Uncertainties Yang Ye Team 2505.16167 null
2025-05-21 WaveTouch: Active Tactile Sensing Using Vibro-Feedback for Classification of Variable Stiffness and Infill Density Objects Bakhtiyar Orazbayev Team 2505.16062 null
2025-05-25 Proactive Hierarchical Control Barrier Function-Based Safety Prioritization in Close Human-Robot Interaction Scenarios Prashanth Krishnamurthy Team 2505.16055 null
2025-05-21 UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning Si Liu Team 2505.15725 null
2025-05-21 Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization Junwei Liang Team 2505.15660 null
2025-05-21 FLARE: Robot Learning with Implicit World Modeling Linxi Fan Team 2505.15659 null
2025-05-21 Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets Ken Goldberg Team 2505.15517 null
2025-05-21 Guided Policy Optimization under Partial Observability Zongqing Lu Team 2505.15418 link
2025-05-21 Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control Jungwook Choi Team 2505.15304 null
2025-05-21 Learning-based Autonomous Oversteer Control and Collision Avoidance Seung-Hyun Kong Team 2505.15275 null
2025-05-21 Filtering Learning Histories Enhances In-Context Reinforcement Learning Santiago Paternain Team 2505.15143 null
2025-05-21 Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation Xiaodong He Team 2505.15098 null
2025-05-20 RoboCulture: A Robotics Platform for Automated Biological Experimentation Milica Radisic Team 2505.14941 null
2025-05-20 Imitation Learning via Focused Satisficing Brian Ziebart Team 2505.14820 null
2025-05-20 DORA: Object Affordance-Guided Reinforcement Learning for Dexterous Robotic Manipulation Jianwei Zhang Team 2505.14819 null
2025-05-20 Vid2World: Crafting Video Diffusion Models to Interactive World Models Mingsheng Long Team 2505.14357 null
2025-05-20 AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory Ping Luo Team 2505.14030 null
2025-05-20 RLVR-World: Training World Models with Reinforcement Learning Mingsheng Long Team 2505.13934 link
2025-05-20 Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning Yutong Ban Team 2505.13925 null
2025-05-20 Learning to Insert for Constructive Neural Vehicle Routing Solver Qingfu Zhang Team 2505.13904 null
2025-05-20 Structured Agent Distillation for Large Language Model Yanzhi Wang Team 2505.13820 null
2025-05-21 Adaptive Diffusion Constrained Sampling for Bimanual Robot Manipulation Georgia Chalvatzaki Team 2505.13667 null
2025-05-19 TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion Minh Nhat Vu Team 2505.13549 null
2025-05-19 GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation Rose Hendrix Team 2505.13441 null
2025-05-19 KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture R. James Cotton Team 2505.13436 null
2025-05-19 TeleOpBench: A Simulator-Centric Benchmark for Dual-Arm Dexterous Teleoperation Jiangmiao Pang Team 2505.12748 null
2025-05-19 Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation Chi-Wing Fu Team 2505.12744 null
2025-05-19 Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning Taesup Moon Team 2505.12737 null
2025-05-19 DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories Linxi Fan Team 2505.12705 null
2025-05-19 Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion Qi Wu Team 2505.12679 null
2025-05-19 HIL: Hybrid Imitation Learning of Diverse Parkour Skills from Videos Xue Bin Peng Team 2505.12619 null
2025-05-18 MTIL: Encoding Full History with Mamba for Temporal Imitation Learning Zhouping Yin Team 2505.12410 link
2025-05-18 PartDexTOG: Generating Dexterous Task-Oriented Grasping via Language-driven Part Analysis Zhipong Cai Team 2505.12294 null
2025-05-20 RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction Bo Zhao Team 2505.12224 null
2025-05-20 Learning Impact-Rich Rotational Maneuvers via Centroidal Velocity Rewards and Sim-to-Real Techniques: A One-Leg Hopper Flip Case Study Hae-Won Park Team 2505.12222 null
2025-05-17 L2D2: Robot Learning from 2D Drawings Dylan P. Losey Team 2505.12072 null
2025-05-17 H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos Shanghang Zhang Team 2505.11920 null
2025-05-17 GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation Junwei Liang Team 2505.11865 null
2025-05-17 Learning IMU Bias with Diffusion Model Guoquan Huang Team 2505.11763 null
2025-05-16 Zero-Shot Visual Generalization in Robot Manipulation Gaurav Sukhatme Team 2505.11719 null
2025-05-16 Employing Laban Shape for Generating Emotionally and Functionally Expressive Trajectories in Robotic Manipulators Alessandro Roncone Team 2505.11716 null
2025-05-16 EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video Jian Zhang Team 2505.11709 null
2025-05-16 Grounded Task Axes: Zero-Shot Semantic Skill Generalization via Task-Axis Controllers and Visual Foundation Models Oliver Kroemer Team 2505.11680 null
2025-05-16 SHIELD: Safety on Humanoids via CBFs In Expectation on Learned Dynamics Aaron D. Ames Team 2505.11494 null
2025-05-16 Exploiting Radiance Fields for Grasp Generation on Novel Synthetic Views Todor Stoyanov Team 2505.11467 null
2025-05-16 ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations Jesse Zhang Team 2505.10911 null
2025-05-16 Counterfactual Behavior Cloning: Offline Imitation Learning from Imperfect Human Demonstrations Dylan P. Losey Team 2505.10760 null
2025-05-15 Infinigen-Sim: Procedural Generation of Articulated Simulation Assets Jia Deng Team 2505.10755 null
2025-05-15 Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation Yan Jin Team 2505.10522 null
2025-05-15 IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning Junshan Zhang Team 2505.10442 null
2025-05-15 NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning Chengyuan Chen Team 2505.10359 null
2025-05-15 SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning Axel Krieger Team 2505.10251 null
2025-05-15 Training People to Reward Robots Matthew Howard Team 2505.10151 null
2025-05-15 EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation Jianye Hao Team 2505.10105 null
2025-05-15 FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation Qing Li Team 2505.10075 null
2025-05-15 APEX: Action Priors Enable Efficient Exploration for Skill Imitation on Articulated Robots Guillaume Sartoretti Team 2505.10022 null
2025-05-15 ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts Yang Yu Team 2505.10010 link
2025-05-16 PointArena: Probing Multimodal Grounding Through Language-Guided Pointing Ranjay Krishna Team 2505.09990 null
2025-05-15 Learning Diverse Natural Behaviors for Enhancing the Agility of Quadrupedal Robots Chunlin Chen Team 2505.09979 null
2025-05-14 Learning Rock Pushability on Rough Planetary Terrain Cagri Kilic Team 2505.09833 null
2025-05-14 Trailblazer: Learning offroad costmaps for long range planning Srikanth Saripalli Team 2505.09739 null
2025-05-14 EnerVerse-AC: Envisioning Embodied Environments with Action Condition Guanghui Ren Team 2505.09723 null
2025-05-14 ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation Daniel Seita Team 2505.09698 null
2025-05-14 DataMIL: Selecting Data for Robot Imitation Learning with Datamodels Roberto Martín-Martín Team 2505.09603 null
2025-05-14 Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware Ken Goldberg Team 2505.09601 null
2025-05-14 VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation Shuo Wang Team 2505.09577 null
2025-05-14 Learning Long-Context Diffusion Policies via Past-Token Prediction Chelsea Finn Team 2505.09561 null
2025-05-14 Distilling Realizable Students from Unrealizable Teachers Sanjiban Choudhury Team 2505.09546 null
2025-05-14 Exploring Pose-Guided Imitation Learning for Robotic Precise Insertion Qixin Cao Team 2505.09424 null
2025-05-14 Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model Keith Ross Team 2505.09308 null
2025-05-14 Latent Theory of Mind: A Decentralized Diffusion Architecture for Cooperative Manipulation Guillaume Sartoretti Team 2505.09144 null
2025-05-14 FoldNet: Learning Generalizable Closed-Loop Policy for Garment Folding via Keypoint-Driven Asset and Demonstration Synthesis He Wang Team 2505.09109 null
2025-05-14 Imitation Learning for Adaptive Control of a Virtual Soft Exoglove Letizia Gionfrida Team 2505.09099 null
2025-05-13 ChicGrasp: Imitation-Learning based Customized Dual-Jaw Gripper Control for Delicate, Irregular Bio-products Manipulation Dongyi Wang Team 2505.08986 null
2025-05-13 Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness Wolfram Burgard Team 2505.08627 null
2025-05-13 Beyond Predefined Actions: Integrating Behavior Trees and Dynamic Movement Primitives for Robot Learning from Demonstration Todor Stoyanov Team 2505.08625 null
2025-05-13 From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation Jianye Hao Team 2505.08548 null
2025-05-13 Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges Weisi Guo Team 2505.08453 null
2025-05-13 Adaptive Diffusion Policy Optimization for Robotic Manipulation Zhuang Yang Team 2505.08376 null
2025-05-13 Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation Qianchun Lu Team 2505.08364 null
2025-05-13 Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning Biwei Huang Team 2505.08361 null
2025-05-13 HandCept: A Visual-Inertial Fusion Framework for Accurate Proprioception in Dexterous Hands Yunhui Liu Team 2505.08213 null
2025-05-13 CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding Shuo Wang Team 2505.08194 null
2025-05-12 What Matters for Batch Online Reinforcement Learning in Robotics? Chelsea Finn Team 2505.08078 null
2025-05-12 H $^{\mathbf{3}}$ DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning Huazhe Xu Team 2505.07819 null
2025-05-12 Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models Jia-Bin Huang Team 2505.07815 null
2025-05-12 Improving Trajectory Stitching with Flow Models Ioannis Havoutis Team 2505.07802 null
2025-05-12 Guiding Data Collection via Factored Scaling Curves Anirudha Majumdar Team 2505.07728 null
2025-05-12 GelFusion: Enhancing Robotic Manipulation under Visual Constraints via Visuotactile Fusion Peng Yin Team 2505.07455 null
2025-05-12 ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning Donglin Wang Team 2505.07395 null
2025-05-11 X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real Sanjiban Choudhury Team 2505.07096 null
2025-05-11 YOPOv2-Tracker: An End-to-End Agile Tracking and Navigation Framework from Perception to Action Bailing Tian Team 2505.06923 null
2025-05-10 JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes Harish Ravichandar Team 2505.06771 null
2025-05-10 Learned IMU Bias Prediction for Invariant Visual Inertial Odometry Nikolay Atanasov Team 2505.06748 null
2025-05-10 ACORN: Adaptive Contrastive Optimization for Safe and Robust Fine-Grained Robotic Manipulation Zixian Yue Team 2505.06628 null
2025-05-10 Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach Xiaokang Yang Team 2505.06482 null
2025-05-09 Adaptive Wiping: Adaptive contact-rich manipulation through few-shot imitation learning with Force-Torque feedback and pre-trained object representations Gentiane Venture Team 2505.06451 null
2025-05-09 VIN-NBV: A View Introspection Network for Next-Best-View Selection for Resource-Efficient 3D Reconstruction Roni Sengupta Team 2505.06219 null
2025-05-09 Neuro-Symbolic Concepts Jiajun Wu Team 2505.06191 null
2025-05-07 Efficient Sensorimotor Learning for Open-world Robot Manipulation Yifeng Zhu Team 2505.06136 null
2025-05-09 Robot Learning Using Multi-Coordinate Elastic Maps Reza Azadeh Team 2505.06092 null
2025-05-09 TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations Abhinav Shrivastava Team 2505.06079 null
2025-05-09 3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks Farshad Khorrami Team 2505.05800 null
2025-05-09 Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives Mac Schwager Team 2505.05787 null
2025-05-09 FlowHFT: Flow Policy Induced Optimal High-Frequency Trading under Diverse Market Conditions Steve Yang Team 2505.05784 null
2025-05-08 CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations Stephen Tu Team 2505.04999 null
2025-05-08 CubeDAgger: Improved Robustness of Interactive Imitation Learning without Violation of Dynamic Stability Taisuke Kobayashi Team 2505.04897 null
2025-05-08 D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation Daniel Seita Team 2505.04860 null
2025-05-07 Steerable Scene Generation with Post Training and Inference-Time Search Russ Tedrake Team 2505.04831 null
2025-05-07 Primal-dual algorithm for contextual stochastic combinatorial optimization Axel Parmentier Team 2505.04757 null
2025-05-07 Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation Henrik I. Christensen Team 2505.04619 null
2025-05-06 OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Donglin Wang Team 2505.03912 null
2025-05-06 AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control Xiaolong Wang Team 2505.03738 null
2025-05-06 Meta-Optimization and Program Search using Language Models for Task and Motion Planning Marc Toussaint Team 2505.03725 null
2025-05-06 Ergodic Generative Flows Yinchuan Li Team 2505.03561 null
2025-05-06 RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation Sifa Zheng Team 2505.03344 null
2025-05-06 The Unreasonable Effectiveness of Discrete-Time Gaussian Process Mixtures for Robot Policy Learning Abhinav Valada Team 2505.03296 null
2025-05-05 Sim2Real Transfer for Vision-Based Grasp Verification Markus Vincze Team 2505.03046 link
2025-05-05 Zero-shot Sim2Real Transfer for Magnet-Based Tactile Sensor on Insertion Tasks Jia Deng Team 2505.02915 null
2025-05-05 Re-purposing a modular origami manipulator into an adaptive physical computer for machine learning and robotic perception Suyi Li Team 2505.02744 null
2025-05-05 Spatiotemporal Non-Uniformity-Aware Online Task Scheduling in Collaborative Edge Computing for Industrial Internet of Things Bo Lei Team 2505.02597 null
2025-05-05 Automated Hybrid Reward Scheduling via Large Language Models for Robotic Skill Learning Jianqiang Li Team 2505.02483 null
2025-05-05 MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans Siyuan Huang Team 2505.02388 null
2025-05-04 Coupled Distributional Random Expert Distillation for World Model Online Imitation Learning Hao Su Team 2505.02228 null
2025-05-04 CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation Hao Dong Team 2505.02166 null
2025-05-04 Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions Mingyu Ding Team 2505.02152 null
2025-05-03 Act Natural! Extending Naturalistic Projection to Multimodal Behavior Scenarios David Fridovich-Keil Team 2505.01945 null
2025-05-07 RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation Xiaodan Liang Team 2505.01709 null
2025-05-02 FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research Sayan Mitra Team 2505.01383 null
2025-05-06 Robotic Visual Instruction Xianzheng Ma Team 2505.00693 null
2025-05-01 Towards Autonomous Micromobility through Scalable Urban Simulation Bolei Zhou Team 2505.00690 null
2025-05-01 DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation Yang Gao Team 2505.00527 null
2025-05-01 Optimal Interactive Learning on the Job via Facility Location Planning George Konidaris Team 2505.00490 null
2025-04-30 LLM-based Interactive Imitation Learning for Robotic Manipulation Stefan Wermter Team 2504.21769 null
2025-04-30 RoboGround: Robotic Manipulation with Grounded Vision-Language Priors Zhou Zhao Team 2504.21530 null
2025-04-30 Provably-Safe, Online System Identification Ram Vasudevan Team 2504.21486 null
2025-04-29 TesserAct: Learning 4D Embodied World Models Chuang Gan Team 2504.20995 null
2025-04-29 XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search Elena Shrestha Team 2504.20969 null
2025-04-29 PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations Xuguang Lan Team 2504.20520 null
2025-04-29 SPARK Hand: Scooping-Pinching Adaptive Robotic Hand with Kempe Mechanism for Vertical Passive Grasp in Environmental Constraints Wenzeng Zhang Team 2504.20506 null
2025-04-28 UTTG_ A Universal Teleoperation Approach via Online Trajectory Generation Hesheng Wang Team 2504.19736 null
2025-04-28 GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning Mengyuan Liu Team 2504.19683 null
2025-04-27 PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies Edward Adelson Team 2504.19341 null
2025-04-29 Learned Perceptive Forward Dynamics Model for Safe and Platform-aware Robotic Navigation Marco Hutter Team 2504.19322 link
2025-04-27 Learning to Drive from a World Model Yassine Yousfi Team 2504.19077 null
2025-04-26 RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning Pieter Abbeel Team 2504.18904 null
2025-04-26 Imitation Learning for Autonomous Driving: Insights from Real-World Testing Tufan Kumbasar Team 2504.18847 null
2025-04-26 Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots Alfredo Weitzenfeld Team 2504.18794 null
2025-04-26 STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation Yanyong Zhang Team 2504.18792 null
2025-04-25 Generalization Capability for Imitation Learning Yixiao Wang Team 2504.18538 null
2025-04-25 Instrumentation for Better Demonstrations: A Case Study Francis wyffels Team 2504.18481 null
2025-04-25 Action Flow Matching for Continual Robot Learning Lantao Liu Team 2504.18471 null
2025-04-25 Design and Evaluation of a UGV-Based Robotic Platform for Precision Soil Moisture Remote Sensing George Nikolakopoulos Team 2504.18284 null
2025-04-28 Implementation Analysis of Collaborative Robot Digital Twins in Physics Engines Hans D. Schotten Team 2504.18200 null
2025-04-25 Offline Learning of Controllable Diverse Behaviors Ludovic Denoyer Team 2504.18160 null
2025-04-24 CIVIL: Causal and Intuitive Visual Imitation Learning Dylan P. Losey Team 2504.17959 null
2025-04-24 Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning Prithviraj Ammanabrolu Team 2504.17950 null
2025-04-24 Learning Attentive Neural Processes for Planning with Pushing Actions Nicholas Roy Team 2504.17924 null
2025-04-24 CaRL: Learning Scalable Planning Policies with Simple Rewards Andreas Geiger Team 2504.17838 null
2025-04-23 Learning Underwater Active Perception in Simulation Donald G. Dansereau Team 2504.17817 null
2025-04-24 Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation Jiangmiao Pang Team 2504.17784 null
2025-04-24 Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control Dong Xuan Team 2504.17771 null
2025-04-24 Robotic Grinding Skills Learning Based on Geodesic Length Dynamic Motion Primitives Han Ding Team 2504.17216 null
2025-04-23 Geometric Formulation of Unified Force-Impedance Control on SE(3) for Robotic Manipulators Roberto Horowitz Team 2504.17080 null
2025-04-23 A Systematic Approach to Design Real-World Human-in-the-Loop Deep Reinforcement Learning: Salient Features, Challenges and Trade-offs Younes Zerouali Team 2504.17006 null
2025-04-23 Latent Diffusion Planning for Imitation Learning Chelsea Finn Team 2504.16925 null
2025-04-23 MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning Maxim Likhachev Team 2504.16738 null
2025-04-23 ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance Shanghang Zhang Team 2504.16464 null
2025-04-22 Mass-Adaptive Admittance Control for Robotic Manipulators Logan E. Beaver Team 2504.16224 null
2025-04-22 $π_{0.5}$ : a Vision-Language-Action Model with Open-World Generalization Ury Zhilinsky Team 2504.16054 null
2025-04-22 SPECI: Skill Prompts based Hierarchical Continual Imitation Learning for Robot Manipulation Xiangli Nie Team 2504.15561 null
2025-04-22 VibeCheck: Using Active Acoustic Tactile Sensing for Contact-Rich Manipulation Matei Ciocarlie Team 2504.15535 null
2025-04-22 Few-Shot Vision-Language Action-Incremental Policy Learning Weili Guan Team 2504.15517 null
2025-04-21 LAPP: Large Language Model Feedback for Preference-Driven Reinforcement Learning Boyuan Chen Team 2504.15472 null
2025-04-23 Advancing Embodied Intelligence in Robotic-Assisted Endovascular Procedures: A Systematic Review of AI Solutions Peng Qi Team 2504.15327 null
2025-04-21 Immersive Teleoperation Framework for Locomanipulation Tasks Dimitrios Kanoulas Team 2504.15229 null
2025-04-21 A Genetic Fuzzy-Enabled Framework on Robotic Manipulation for In-Space Servicing Kelly Cohen Team 2504.15226 null
2025-04-21 A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment Huaping Liu Team 2504.15129 null
2025-04-21 SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks Animesh Garg Team 2504.14857 null
2025-04-20 Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline Hongsheng Li Team 2504.14709 null
2025-04-24 Latent Representations for Visual Proprioception in Inexpensive Robots Ladislau Bölöni Team 2504.14634 null
2025-04-18 DiffOG: Differentiable Policy Trajectory Optimization with Generalizability Yu She Team 2504.13807 null
2025-04-18 Imitation Learning with Precisely Labeled Human Demonstrations Yilong Song Team 2504.13803 null
2025-04-21 SLAM&Render: A Benchmark for the Intersection Between Neural Rendering, Gaussian Splatting and SLAM Javier Civera Team 2504.13713 link
2025-04-18 Self-Mixing Laser Interferometry: In Search of an Ambient Noise-Resilient Alternative to Acoustic Sensing Francis wyffels Team 2504.13711 null
2025-04-18 On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting Jan Peters Team 2504.13618 null
2025-04-18 A Model-Based Approach to Imitation Learning through Multi-Step Predictions Na Li Team 2504.13413 null
2025-04-17 RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins Ping Luo Team 2504.13059 null
2025-04-17 Adaptive Task Space Non-Singular Terminal Super-Twisting Sliding Mode Control of a 7-DOF Robotic Manipulator E. Witrant Team 2504.13056 null
2025-04-17 Krysalis Hand: A Lightweight, High-Payload, 18-DoF Anthropomorphic End-Effector for Robotic Learning and Dexterous Manipulation Iman Soltani Team 2504.12967 null
2025-04-17 TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors Yi Yang Team 2504.12799 null
2025-04-17 Trajectory Adaptation using Large Language Models Ravi Prakash Team 2504.12755 null
2025-04-17 Embodied Neuromorphic Control Applied on a 7-DOF Robotic Manipulator Lei Wang Team 2504.12702 link
2025-04-21 A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation Xiaodan Liang Team 2504.12636 null
2025-04-17 Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration Jeannette Bohg Team 2504.12609 null
2025-04-16 Adapting a World Model for Trajectory Following in a 3D Game Raluca Georgescu Team 2504.12299 null
2025-04-16 Towards Forceful Robotic Foundation Models: a Literature Survey Nikolaus Correll Team 2504.11827 null
2025-04-17 Toward Aligning Human and Robot Actions via Multi-Modal Demonstration Learning Fei Liu Team 2504.11493 null
2025-04-15 Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks Suryansh Kumar Team 2504.11247 null
2025-04-17 CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image Yi Zhu Team 2504.11230 null
2025-04-15 Superfast Configuration-Space Convex Set Computation on GPUs for Online Motion Planning Daniela Rus Team 2504.10783 link
2025-04-14 Improving In-Context Learning with Reasoning Distillation Xiang Gao Team 2504.10647 null
2025-04-14 Flying Hand: End-Effector-Centric Framework for Versatile Aerial Manipulation Teleoperation and Policy Learning Guanya Shi Team 2504.10334 null
2025-04-14 Look-to-Touch: A Vision-Enhanced Proximity and Tactile Sensor for Distance and Geometry Perception in Robotic Manipulation Guoying Gu Team 2504.10280 null
2025-04-14 Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models Hui Cheng Team 2504.10041 link
2025-04-14 Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization Wei Sui Team 2504.09927 null
2025-04-12 Compliant Explicit Reference Governor for Contact Friendly Robotic Manipulators Marco M. Nicotra Team 2504.09188 null
2025-04-11 BiFlex: A Passive Bimodal Stiffness Flexible Wrist for Manipulation in Unstructured Environments Roberto Martín-Martín Team 2504.08706 null
2025-04-11 Diffusion Models for Robotic Manipulation: A Survey Rania Rayyes Team 2504.08438 null
2025-04-10 Echo: An Open-Source, Low-Cost Teleoperation System with Force Feedback for Dataset Collection in Robot Learning Dzmitry Tsetserukou Team 2504.07939 null
2025-04-10 TOCALib: Optimal control library with interpolation for bimanual manipulation and obstacles avoidance Aleksandr Panov Team 2504.07708 null
2025-04-10 Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction Hesheng Wang Team 2504.07375 link
2025-04-09 Adaptive Vision-Guided Robotic Arm Control for Precision Pruning in Dynamic Orchard Environments Manoj Karkee Team 2504.07309 null
2025-04-09 AssistanceZero: Scalably Solving Assistance Games Anca Dragan Team 2504.07091 link
2025-04-09 Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation Huazhe Xu Team 2504.06961 null
2025-04-09 Developing Modular Grasping and Manipulation Pipeline Infrastructure to Streamline Performance Benchmarking Holly Yanco Team 2504.06819 null
2025-04-09 Interactive Expressive Motion Generation Using Dynamic Movement Primitives Kai O. Arras Team 2504.06735 null
2025-04-09 Overcoming Dynamic Environments: A Hybrid Approach to Motion Planning for Manipulators Gavin Paul Team 2504.06596 null
2025-04-09 CAFE-AD: Cross-Scenario Adaptive Feature Enhancement for Trajectory Planning in Autonomous Driving Yanyong Zhang Team 2504.06584 link
2025-04-09 OPAL: Encoding Causal Understanding of Physical Systems for Robot Learning Tyler Fenstermaker Team 2504.06538 null
2025-04-08 ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface Rui Chen Team 2504.06156 null
2025-04-08 MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos Marc Pollefeys Team 2504.06084 null
2025-04-08 Learning-enhanced electronic skin for tactile sensing on deformable surface based on electrical impedance tomography Yunjie Yang Team 2504.05987 null
2025-04-08 Stratified Expert Cloning with Adaptive Selection for User Retention in Large-Scale Recommender Systems Yongqi Liu Team 2504.05628 null
2025-04-08 TW-CRL: Time-Weighted Contrastive Reward Learning for Efficient Inverse Reinforcement Learning Stephen Xia Team 2504.05585 null
2025-04-07 SPARK-Remote: A Cost-Effective System for Remote Bimanual Robot Teleoperation Karthik Desingh Team 2504.05488 null
2025-04-07 RobustDexGrasp: Robust Dexterous Grasping of General Objects from Single-view Perception Jie Song Team 2504.05287 null
2025-04-07 Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation Wei Zhang Team 2504.05225 link
2025-04-07 Wavelet Policy: Imitation Policy Learning in Frequency Domain with Wavelet Transforms Hongrui Zhu Team 2504.04991 null
2025-04-07 Embodied Perception for Test-time Grasping Detection Adaptation with Knowledge Infusion Fengyu Zhou Team 2504.04795 null
2025-04-06 Tool-as-Interface: Learning Robot Policies from Human Tool Usage through Imitation Learning Katherine Driggs-Campbell Team 2504.04612 null
2025-04-06 Diffusion-Based Approximate MPC: Fast and Consistent Imitation of Multi-Modal Action Distributions Katherine J. Kuchenbecker Team 2504.04603 null
2025-04-06 DexTOG: Learning Task-Oriented Dexterous Grasp with Language Cewu Lu Team 2504.04573 null
2025-04-06 DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Cluttered Environments Lin Shao Team 2504.04516 null
2025-04-06 Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers Yuke Zhu Team 2504.04395 null
2025-04-05 ORCA: An Open-Source, Reliable, Cost-Effective, Anthropomorphic Robotic Hand for Uninterrupted Dexterous Task Learning Robert K. Katzschmann Team 2504.04259 null
2025-04-09 Digital Gene: Learning about the Physical World through Analytic Concepts Cewu Lu Team 2504.04170 null
2025-04-04 Dexterous Manipulation through Imitation Learning: A Survey Hong Zhang Team 2504.03515 null
2025-04-04 GraphSeg: Segmented 3D Representations via Graph Edge Addition and Contraction Weiming Zhi Team 2504.03129 null
2025-04-03 Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets Abhishek Gupta Team 2504.02792 null
2025-04-03 Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision Shibiao Xu Team 2504.02477 null
2025-04-02 RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics Qiang Nie Team 2504.02069 null
2025-04-02 Slot-Level Robotic Placement via Visual Imitation from Single Human Video Arsalan Mousavian Team 2504.01959 null
2025-04-02 Learning with Imperfect Models: When Multi-step Prediction Mitigates Compounding Error Nikolai Matni Team 2504.01766 null
2025-04-02 TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication Karla Stepanova Team 2504.01708 null
2025-04-02 8-DoFs Cable Driven Parallel Robots for Bimanual Teleportation Josie Hughes Team 2504.01554 null
2025-04-02 Bi-LAT: Bilateral Control-Based Imitation Learning via Natural Language and Action Chunking with Transformers Yuki Uranishi Team 2504.01301 null
2025-04-02 The Social Life of Industrial Arms: How Arousal and Attention Shape Human-Robot Interaction Matthew K. X. J Pan Team 2504.01260 null
2025-04-01 Energy Weighted Learning Progress Guided Interleaved Multi-Task Learning Erhan Oztop Team 2504.00707 null
2025-04-01 Learning Bipedal Locomotion on Gear-Driven Humanoid Robot Using Foot-Mounted IMUs Masaya Kinoshita Team 2504.00614 null
2025-04-01 Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation Dong Wang Team 2504.00420 null
2025-03-31 CBIL: Collective Behavior Imitation Learning for Fish from Real Videos Taku Komura Team 2504.00234 null
2025-04-02 Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation Yuke Zhu Team 2503.24361 null
2025-04-02 AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World Sergey Levine Team 2503.24278 link
2025-03-31 HACTS: a Human-As-Copilot Teleoperation System for Robot Learning Jian Tang Team 2503.24070 null
2025-03-31 Learning 3D-Gaussian Simulators from RGB Videos Georg Martius Team 2503.24009 null
2025-03-31 ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos Dinesh Jayaraman Team 2503.23877 link
2025-03-31 Disambiguate Gripper State in Grasp-Based Tasks: Pseudo-Tactile as Feedback Enables Pure Simulation Learning Yue Wang Team 2503.23835 null
2025-03-30 Can Visuo-motor Policies Benefit from Random Exploration Data? A Case Study on Stacking Florian T. Pokorny Team 2503.23571 null
2025-08-26 BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities Li Fei-Fei Team 2503.05652 link
2024-12-17 TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning Jeannette Bohg Team 2412.10447 link
2025-01-08 3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing Yunzhu Li Team 2410.24091 null
2024-10-24 SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation Ajay Mandlekar Team 2410.18065 null
2024-11-05 ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data Shuran Song Team 2406.19464 link
2023-10-31 Learning Robot Manipulation from Cross-Morphology Demonstration Gaurav Sukhatme Team 2304.03833 null
2022-11-17 ToolFlowNet: Robotic Manipulation with Tools via Predicting Tool Flow from Point Clouds David Held Team 2211.09006 link
2022-11-16 Learning and Retrieval from Prior Data for Skill-based Imitation Learning Yuke Zhu Team 2210.11435 null
2023-03-09 VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors Yuke Zhu Team 2210.11339 null
2022-10-12 Self-Supervised Learning of Multi-Object Keypoints for Robotic Manipulation Abhinav Valada Team 2205.08316 null
2022-11-21 R3M: A Universal Visual Representation for Robot Manipulation Abhinav Gupta Team 2203.12601 null
2022-02-07 BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning Chelsea Finn Team 2202.02005 null
2021-11-02 Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation Chelsea Finn Team 2109.01115 null
2021-06-11 Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single Demonstration Edward Johns Team 2105.06411 link
2022-03-09 Interactive Imitation Learning in State-Space Jens Kober Team 2008.00524 null
2020-05-19 On-Policy Robot Imitation Learning from a Converging Supervisor Ken Goldberg Team 1907.03423 null
2018-11-08 RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation Li Fei-Fei Team 1811.02790 null
2018-10-09 Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning Chelsea Finn Team 1810.03043 null
2017-10-27 Learning Robotic Manipulation of Granular Media Sergey Levine Team 1709.02833 null

VLM

Publish Date Title Authors PDF Code  
2025-11-20 Learning to Think Fast and Slow for Visual Language Models Kaiyang Zhou Team 2511.16670 null  
2025-11-20 Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO Jing Liao Team 2511.16669 link  
2025-11-20 Cognitive Foundations for Reasoning and Their Manifestation in LLMs Yulia Tsvetkov Team 2511.16660 null  
2025-11-20 InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy Jiangmiao Pang Team 2511.16651 null  
2025-11-20 Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization Xiaozhu Ju Team 2511.16602 null  
2025-11-20 TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Qin Jin Team 2511.16595 link  
2025-11-20 Contrastive vision-language learning with paraphrasing and negation Artur d’Avila Garcez Team 2511.16527 null  
2025-11-20 MiMo-Embodied: X-Embodied Foundation Model Technical Report Long Chen Team 2511.16518 link  
2025-11-20 Arctic-Extract Technical Report Wojciech Jaśkowski Team 2511.16470 null  
2025-11-20 LLaVA $^3$ : Representing 3D Scenes like a Cubist Painter to Boost 3D Scene Understanding of VLMs Loïc Barthe Team 2511.16454 null  
2025-11-20 VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference Bo Zhao Team 2511.16449 null  
2025-11-20 Beyond Visual Cues: Leveraging General Semantics as Support for Few-Shot Segmentation Weifeng Liu Team 2511.16435 null  
2025-11-20 TOFA: Training-Free One-Shot Federated Adaptation for Vision-Language Models Chaochao Chen Team 2511.16423 null  
2025-11-20 The Shawshank Redemption of Embodied AI: Understanding and Benchmarking Indirect Environmental Jailbreaks Jianfeng Ma Team 2511.16347 null  
2025-11-20 FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models Mingsheng Shang Team 2511.16233 null  
2025-11-20 Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions Yoichi Sato Team 2511.16221 null  
2025-11-20 FlipVQA-Miner: Cross-Page Visual Question-Answer Mining from Textbooks Wentao Zhang Team 2511.16216 null  
2025-11-20 When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models Yaochu Jin Team 2511.16203 null  
2025-11-20 From Performance to Understanding: A Vision for Explainable Automated Algorithm Design Thomas Bäck Team 2511.16201 null  
2025-11-20 Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight Zhijie Deng Team 2511.16175 null  
2025-11-19 Think Visually, Reason Textually: Vision-Language Synergy in ARC Jiaqi Wang Team 2511.15703 null  
2025-11-19 MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping Jun Zhang Team 2511.15690 null  
2025-11-19 Walrus: A Cross-Domain Foundation Model for Continuum Dynamics Shirley Ho Team 2511.15684 null  
2025-11-19 VisPlay: Self-Evolving Vision-Language Models from Images Yonghui Yang Team 2511.15661 null  
2025-11-19 Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning Da-Wei Zhou Team 2511.15633 null  
2025-11-19 The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification Didac Suris Team 2511.15622 null  
2025-11-19 When to Think and When to Look: Uncertainty-Guided Lookback Chenliang Xu Team 2511.15613 null  
2025-11-19 SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models Xipeng Qiu Team 2511.15605 null  
2025-11-19 AVATAAR: Agentic Video Answering via Temporal Adaptive Alignment and Reasoning Chinmay Gondhalekar Team 2511.15578 null  
2025-11-19 Computer-Use Agents as Judges for Generative User Interface Mike Zheng Shou Team 2511.15567 link  
2025-11-19 Multimodal Evaluation of Russian-language Architectures Alena Fenogenova Team 2511.15552 null  
2025-11-19 Learning to Expand Images for Efficient Visual Autoregressive Modeling Tao Huang Team 2511.15499 null  
2025-11-19 SIGMMA: Hierarchical Graph-Based Multi-Scale Multi-modal Contrastive Alignment of Histopathology Image and Spatial Transcriptome Mohammad Lotfollahi Team 2511.15464 null  
2025-11-19 D4C: Data-free Quantization for Contrastive Language-Image Pre-training Models Kentaro Yoshioka Team 2511.15411 null  
2025-11-19 Breaking Expert Knowledge Limits: Self-Pruning for Large Language Models Hao Wang Team 2511.15390 null  
2025-11-19 Zero-Shot Open-Vocabulary Human Motion Grounding with Test-Time Training Jianfei Yang Team 2511.15379 null  
2025-11-19 C2F-Space: Coarse-to-Fine Space Grounding for Spatial Instructions using Vision-Language Models Daehyung Park Team 2511.15333 null  
2025-11-19 What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs Fan Li Team 2511.15316 null  
2025-11-19 Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models Morteza Saberi Team 2511.15311 null  
2025-11-19 Text2Loc++: Generalizing 3D Point Cloud Localization from Natural Language Daniel Cremers Team 2511.15308 null  
2025-11-18 ARC Is a Vision Problem! Kaiming He Team 2511.14761 link  
2025-11-18 UniGen-1.5: Enhancing Image Generation and Editing through Reward Unification in Reinforcement Learning Afshin Dehghan Team 2511.14760 null  
2025-11-18 $π^{*}_{0.6}$ : a VLA That Learns From Experience Zhiyuan Zhou Team 2511.14759 null  
2025-11-18 Vision Large Language Models Are Good Noise Handlers in Engagement Analysis Xiaobai Li Team 2511.14749 null  
2025-11-18 Measuring AI Progress in Drug Discovery: A Reproducible Leaderboard for the Tox21 Challenge Günter Klambauer Team 2511.14744 null  
2025-11-18 Attention via Synaptic Plasticity is All You Need: A Biologically Inspired Spiking Neuromorphic Transformer Ankush Kumar Team 2511.14691 null  
2025-11-18 NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards Soujanya Poria Team 2511.14659 link  
2025-11-18 Enhancing Agentic Autonomous Scientific Discovery with Vision-Language Model Capabilities Inigo Zubeldia Team 2511.14631 null  
2025-11-18 Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks Xiaoshuai Hao Team 2511.14592 null  
2025-11-18 OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models Huan Wang Team 2511.14582 link  
2025-11-18 Task Addition and Weight Disentanglement in Closed-Vocabulary Models Pascal Frossard Team 2511.14569 null  
2025-11-18 Enhancing End-to-End Autonomous Driving with Risk Semantic Distillaion from VLM Siyuan Cheng Team 2511.14499 null  
2025-11-18 Agentic Video Intelligence: A Flexible Framework for Advanced Video Exploration and Understanding Min-Ling Zhang Team 2511.14446 null  
2025-11-18 Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems Angelo Ferrando Team 2511.14435 null  
2025-11-18 Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition Abhinav Valada Team 2511.14391 null  
2025-11-18 O3SLM: Open Weight, Open Data, and Open Vocabulary Sketch-Language Model Anirban Chakraborty Team 2511.14368 null  
2025-11-18 ArchMap: Arch-Flattening and Knowledge-Guided Vision Language Model for Tooth Counting and Structured Dental Understanding Jionglong Su Team 2511.14336 null  
2025-11-18 When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling Jacopo Mauro Team 2511.14334 null  
2025-11-18 Step by Step Network Gao Huang Team 2511.14329 null  
2025-11-18 Segmentwise Pruning in Audio-Language Models Jean-François Bonastre Team 2511.14293 null  
2025-11-17 Scaling Spatial Intelligence with Multimodal Foundation Models Lei Yang Team 2511.13719 link  
2025-11-17 TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Ying-Cong Chen Team 2511.13704 link  
2025-11-17 Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation Joseph K J Team 2511.13689 null  
2025-11-17 Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting Haoji Hu Team 2511.13684 null  
2025-11-17 Part-X-MLLM: Part-aware 3D Multimodal Large Language Model Chunchao Guo Team 2511.13647 null  
2025-11-17 CacheFlow: Compressive Streaming Memory for Efficient Long-Form Video Understanding Daivik Patel Team 2511.13644 null  
2025-11-17 CreBench: Human-Aligned Creativity Evaluation from Idea to Process to Product Jiayi Cen Team 2511.13626 null  
2025-11-17 FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI Jiangtao Gong Team 2511.13524 null  
2025-11-17 Language-Guided Invariance Probing of Vision-Language Models Jae Joong Lee Team 2511.13494 null  
2025-11-17 Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling Pascal Frossard Team 2511.13478 null  
2025-11-17 Trust in Vision-Language Models: Insights from a Participatory User Workshop Viola Schiaffonati Team 2511.13458 null  
2025-11-17 Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline Ziqian Lu Team 2511.13442 null  
2025-11-17 VOPE: Revisiting Hallucination of Vision-Language Models in Voluntary Imagination Task Xilin Chen Team 2511.13420 null  
2025-11-17 Attention Grounded Enhancement for Visual Document Retrieval Keping Bi Team 2511.13415 null  
2025-11-17 Descriptor: Distance-Annotated Traffic Perception Question Answering (DTPQA) Ciaran Eising Team 2511.13397 null  
2025-11-17 Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model Fei Kong Team 2511.13387 null  
2025-11-17 Moving Pictures of Thought: Extracting Visual Knowledge in Charles S. Peirce’s Manuscripts with Vision-Language Models Dario Rodighiero Team 2511.13378 null  
2025-11-17 Tab-PET: Graph-Based Positional Encodings for Tabular Transformers Mehul Motani Team 2511.13338 null  
2025-11-17 TabFlash: Efficient Table Understanding with Progressive Question Conditioning and Token Focusing Hyunwoo J. Kim Team 2511.13283 null  
2025-11-17 Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation Wenbo Ding Team 2511.13269 null  
2025-11-14 DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding Jinsung Yoon Team 2511.11552 null  
2025-11-14 Bridging Hidden States in Vision-Language Models Jacob Fein-Ashley Team 2511.11526 null  
2025-11-14 Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities Jingyuan Chen Team 2511.11512 null  
2025-11-14 PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision–Language Models Manish Bhattarai Team 2511.11502 null  
2025-11-14 Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective Ngan Le Team 2511.11478 null  
2025-11-14 Benchmarking Visual LLMs Resilience to Unanswerable Questions on Visually Rich Documents Fabrizio Battiloro Team 2511.11468 null  
2025-11-14 VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation Klaus Maier-Hein Team 2511.11450 null  
2025-11-14 From Synthetic Scenes to Real Performance: Enhancing Spatial Reasoning in VLMs Giuseppe Riccardi Team 2511.11440 null  
2025-11-14 VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models Wenqiang Lei Team 2511.11438 null  
2025-11-14 Comprehension of Multilingual Expressions Referring to Target Objects in Visual Inputs Bruno Martins Team 2511.11427 null  
2025-11-14 BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning De-Chuan Zhan Team 2511.11421 null  
2025-11-14 Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models Baoliang Chen Team 2511.11410 null  
2025-11-14 MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model Bo Yan Team 2511.11407 null  
2025-11-14 DocSLM: A Small Vision-Language Model for Long Multimodal Document Understanding Sunando Sengupta Team 2511.11313 null  
2025-11-14 EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment Hongyi Zhang Team 2511.11301 null  
2025-11-14 AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models Volker Tresp Team 2511.11299 link  
2025-11-14 Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation Xi Zheng Team 2511.11298 null  
2025-11-14 GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving Abhinav Valada Team 2511.11266 null  
2025-11-14 Discovering Meaningful Units with Visually Grounded Semantics from Image Captions James Henderson Team 2511.11262 null  
2025-11-14 CountSteer: Steering Attention for Object Counting in Diffusion Models Hyunsoo Cho Team 2511.11253 null  
2025-11-13 Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling Jinguo Zhu Team 2511.10648 null  
2025-11-13 Querying Labeled Time Series Data with Scenario Programs Sanjit A Seshia Team 2511.10627 null  
2025-11-13 Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals Pawan Goyal Team 2511.10615 null  
2025-11-13 Impact of Layer Norm on Memorization and Generalization in Transformers Jung-Eun Kim Team 2511.10566 null  
2025-11-13 OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Ziwei Liu Team 2511.10560 link  
2025-11-13 SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation Liqiang Nie Team 2511.10518 link  
2025-11-13 LLM-YOLOMS: Large Language Model-based Semantic Interpretation and Fault Diagnosis for Wind Turbine Components Jianbo Feng Team 2511.10394 null  
2025-11-13 MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns Xiang Bai Team 2511.10390 null  
2025-11-13 Rethinking Visual Information Processing in Multimodal LLMs Amit Kumar K C Team 2511.10301 null  
2025-11-13 Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models Pekka Marttinen Team 2511.10292 null  
2025-11-13 PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning Jey Han Lau Team 2511.10279 null  
2025-11-13 Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention Xiang Wang Team 2511.10268 null  
2025-11-13 Facial-R1: Aligning Reasoning and Recognition for Facial Emotion Analysis Min Cao Team 2511.10254 null  
2025-11-13 TubeRMC: Tube-conditioned Reconstruction with Mutual Constraints for Weakly-supervised Spatio-Temporal Video Grounding Beihao Xia Team 2511.10241 null  
2025-11-13 Intilligence Foundation Model: A New Perspective to Approach Artificial General Intelligence Yao Zhao Team 2511.10119 null  
2025-11-13 MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models Xiao Bai Team 2511.10098 null  
2025-11-13 How does My Model Fail? Automatic Identification and Interpretation of Physical Plausibility Failure Modes with Matryoshka Transcoders Dianbo Liu Team 2511.10094 null  
2025-11-13 SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition Zitong Yu Team 2511.10091 null  
2025-11-13 GridPrune: From “Where to Look” to “What to Select” in Visual Token Pruning for MLLMs Pengwei Wang Team 2511.10081 null  
2025-11-13 VLF-MSC: Vision-Language Feature-Based Multimodal Semantic Communication System Joonhyuk Kang Team 2511.10074 null  
2025-11-10 Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective Somil Bansal Team 2511.07410 null  
2025-11-10 CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video David Bull Team 2511.07290 null  
2025-11-10 Leveraging Text-Driven Semantic Variation for Robust OOD Segmentation Jaekoo Lee Team 2511.07238 null  
2025-11-10 Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use Rachid Chelouah Team 2511.07171 null  
2025-11-10 ClusterMine: Robust Label-Free Visual Out-Of-Distribution Detection via Concept Mining from Text Corpora Markus Kollmann Team 2511.07068 link  
2025-11-10 CoLM: Collaborative Large Models via A Client-Server Paradigm Hongyuan Zhang Team 2511.06991 null  
2025-11-10 RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation Yu Zhang Team 2511.06899 null  
2025-11-10 Flexible Concept Bottleneck Model Rui Zhang Team 2511.06678 null  
2025-11-10 HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment Shiguo Lian Team 2511.06653 null  
2025-11-10 NOVO: Bridging LLaVA and SAM with Visual-only Prompts for Reasoning Segmentation Yeong-Jun Cho Team 2511.06651 null  
2025-11-10 How Do VLAs Effectively Inherit from VLMs? Jiang Bian Team 2511.06619 null  
2025-11-09 A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving Xiaopeng Li Team 2511.06496 null  
2025-11-09 Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models Sabine Süsstrunk Team 2511.06490 null  
2025-11-09 GazeVLM: A Vision-Language Model for Multi-Task Gaze Understanding Riad Souissi Team 2511.06348 null  
2025-11-09 ALIGN: A Vision-Language Framework for High-Accuracy Accident Location Inference through Geo-Spatial Neural Reasoning Moazzem Hossain Team 2511.06316 null  
2025-11-09 TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks Bo Xu Team 2511.06283 null  
2025-11-09 WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation Jie Tang Team 2511.06251 null  
2025-11-09 Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation Winston H. Hsu Team 2511.06240 null  
2025-11-09 MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition Vijaykrishnan Narayanan Team 2511.06225 null  
2025-11-09 Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models Alexander Htet Kyaw Team 2511.06201 null  
2025-11-07 Visual Spatial Tuning Hengshuang Zhao Team 2511.05491 null  
2025-11-07 Turning Adversaries into Allies: Reversing Typographic Attacks for Multimodal E-Commerce Product Retrieval Hongda Shen Team 2511.05325 null  
2025-11-07 Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings Furong Huang Team 2511.05017 null  
2025-11-07 iFlyBot-VLM Technical Report Jia Pan Team 2511.04976 null  
2025-11-07 A benchmark multimodal oro-dental dataset for large vision-language models Muhammad Saqib Team 2511.04948 null  
2025-11-06 Conformalized Non-uniform Sampling Strategies for Accelerated Sampling-based Motion Planning Yiannis Kantaros Team 2511.04835 null  
2025-11-06 IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs Shubham Agarwal Team 2511.04727 null  
2025-11-05 SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking Dacheng Tao Team 2511.04711 null  
2025-11-06 SAFe-Copilot: Unified Shared Autonomy Framework Daniela Rus Team 2511.04664 null  
2025-11-06 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Xipeng Qiu Team 2511.04570 null  
2025-11-06 Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment Bo Zhao Team 2511.04555 link  
2025-11-07 ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai Kunat Pipatanakul Team 2511.04479 null  
2025-11-06 GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents Dongmei Zhang Team 2511.04307 null  
2025-11-07 On the Brittleness of CLIP Text Encoders Luca Rossetto Team 2511.04247 link  
2025-11-06 Text to Sketch Generation with Multi-Styles Lei Xu Team 2511.04123 null  
2025-11-05 Context informs pragmatic interpretation in vision-language models Michael C. Frank Team 2511.03908 null  
2025-11-05 Contamination Detection for VLMs using Multi-Modal Semantic Perturbation Yong Jae Lee Team 2511.03774 null  
2025-11-05 GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement Jiachen Li Team 2511.03400 null  
2025-11-05 Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models Seokju Lee Team 2511.03367 null  
2025-11-04 LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation Jinyoung Yeo Team 2511.03001 null  
2025-11-04 SCALE-VLP: Soft-Weighted Contrastive Volumetric Vision-Language Pre-training with Spatial-Knowledge Semantics Leonid Sigal Team 2511.02996 null  
2025-11-04 XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations Jian Tang Team 2511.02776 null  
2025-11-04 Adapting General-Purpose Foundation Models for X-ray Ptychography in Low-Data Regimes Yi Jiang Team 2511.02503 null  
2025-11-04 RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning Conghui He Team 2511.02384 null  
2025-11-04 The Pervasive Blind Spot: Benchmarking VLM Inference Risks on Everyday Personal Videos Hewu Li Team 2511.02367 null  
2025-11-04 CoCoVa: Chain of Continuous Vision-Language Thought for Latent Space Reasoning Han Yan Team 2511.02360 null  
2025-11-04 LACY: A Vision-Language Model-based Language-Action Cycle for Self-Improving Robotic Manipulation Changhyun Choi Team 2511.02239 link  
2025-11-04 Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models Randall Davis Team 2511.02162 null  
2025-11-03 Enhancing Multimodal Recommendations with Vision-Language Models and Information-Aware Fusion Dung D. Le Team 2511.02113 null  
2025-11-03 TRACE: Textual Reasoning for Affordance Coordinate Extraction Matthew S. Brown Team 2511.01999 null  
2025-11-03 Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing Tao Qi Team 2511.01952 null  
2025-11-04 Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models Mingwei Shen Team 2511.01831 null  
2025-11-03 SciTextures: Collecting and Connecting Visual Patterns, Models, and Code Across Science and Art Alona Strugatski Team 2511.01817 null  
2025-11-03 GenDexHand: Generative Simulation for Dexterous Hands Yi Ma Team 2511.01791 null  
2025-11-03 3EED: Ground Everything Everywhere in 3D Ziwei Liu Team 2511.01755 link  
2025-11-03 UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback Fan Wang Team 2511.01678 null  
2025-11-03 Vote-in-Context: Turning VLMs into Zero-Shot Rank Fusers Naeemullah Khan Team 2511.01617 null  
2025-11-03 Analyzing Sustainability Messaging in Large-Scale Corporate Social Media Marcel Worring Team 2511.01550 null  
2025-11-03 AERMANI-VLM: Structured Prompting and Reasoning for Aerial Manipulation with Vision Language Models Spandan Roy Team 2511.01472 null  
2025-11-03 HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA Shihong Xia Team 2511.01463 null  
2025-11-03 When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA Mobarak I. Hoque Team 2511.01458 null  
2025-10-31 PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting Tyler J. Bradshaw Team 2510.27680 null  
2025-10-31 Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Jiaqi Wang Team 2510.27606 null  
2025-10-31 From Pixels to Paths: A Multi-Agent Framework for Editable Scientific Illustration Kaipeng Zhang Team 2510.27452 null  
2025-10-31 Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds Mehrtash Harandi Team 2510.27391 null  
2025-10-31 FOCUS: Efficient Keyframe Selection for Long Video Understanding Yang You Team 2510.27280 null  
2025-10-31 T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis Mohammad Yaqub Team 2510.27265 null  
2025-10-31 ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models Tengxiang Zhang Team 2510.27256 null  
2025-11-03 Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes Seong-Whan Lee Team 2510.27255 null  
2025-10-31 Generating Accurate and Detailed Captions for High-Resolution Images Jiyoung Jung Team 2510.27164 null  
2025-10-30 MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation Xiaohui Xie Team 2510.26996 null  
2025-10-30 MM-OPERA: Benchmarking Open-ended Association Reasoning for Large Vision-Language Models Ziliang Chen Team 2510.26937 null  
2025-10-30 NaviTrace: Evaluating Embodied Navigation of Vision-Language Models Jonas Frey Team 2510.26909 null  
2025-10-30 Cognition Envelopes for Bounded AI Reasoning in Autonomous UAS Operations Jane Cleland-Huang Team 2510.26905 null  
2025-10-30 Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench Xi Yang Team 2510.26865 link  
2025-11-03 ChartAB: A Benchmark for Chart Grounding & Dense Alignment Tianyi Zhou Team 2510.26781 null  
2025-10-30 SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models Chris Thomas Team 2510.26769 null  
2025-10-30 All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles Abolfazl Razi Team 2510.26641 null  
2025-10-30 Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing Xuanjing Huang Team 2510.26474 null  
2025-11-03 Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition ShengJun Huang Team 2510.26466 null  
2025-10-30 Towards Fine-Grained Vision-Language Alignment for Few-Shot Anomaly Detection Chengjie Wang Team 2510.26464 null  
2025-10-30 A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models Muhammad Haris Khan Team 2510.26441 null  
2025-10-30 MedSAE: Dissecting MedCLIP Representations with Sparse Autoencoders Marco Grangetto Team 2510.26411 null  
2025-10-30 Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual Peerat Limkonchotiwat Team 2510.26271 null  
2025-10-30 Which Way Does Time Flow? A Psychophysics-Grounded Evaluation for Vision-Language Models Shigeru Kitazawa Team 2510.26241 null  
2025-10-30 MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction Ali Diba Team 2510.26151 null  
2025-10-30 GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks Qing Li Team 2510.26098 null  
2025-10-30 Dynamic VLM-Guided Negative Prompting for Diffusion Models Yoonseok Choi Team 2510.26052 null  
2025-10-29 CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments Antoine Bosselut Team 2510.26006 null  
2025-10-30 PairUni: Pairwise Training for Unified Multimodal Language Models Zhuochen Wang Team 2510.25682 null  
2025-10-29 ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents Bela Gipp Team 2510.25668 null  
2025-10-29 Don’t Blind Your VLA: Aligning Visual Representations for OOD Generalization Aleksandr I. Panov Team 2510.25616 null  
2025-10-29 Using VLM Reasoning to Constrain Task and Motion Planning Zachary Kingston Team 2510.25548 null  
2025-10-29 Seeing, Signing, and Saying: A Vision-Language Model-Assisted Pipeline for Sign Language Data Acquisition and Curation from Social Media Josef van Genabith Team 2510.25413 null  
2025-10-29 SoraNav: Adaptive UAV Task-Centric Navigation via Zeroshot VLM Reasoning Wei Pan Team 2510.25191 null  
2025-10-29 Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models Usman Naseem Team 2510.25179 null  
2025-10-29 Learning Spatial-Aware Manipulation Ordering Jian Pu Team 2510.25138 null  
2025-10-29 NanoVLA: Routing Decoupled Vision-Language Understanding for Nano-sized Generalist Robotic Policies Jinghui Lu Team 2510.25122 null  
2025-10-29 Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection Hyunwoo J. Kim Team 2510.25094 null  
2025-10-29 DRIP: Dynamic patch Reduction via Interpretable Pooling Sachin Kumar Team 2510.25067 null  
2025-10-28 Efficient License Plate Recognition via Pseudo-Labeled Supervision with Grounding DINO and YOLOv8 Ching Yee Suen Team 2510.25032 null  
2025-10-28 SCOUT: A Lightweight Framework for Scenario Coverage Assessment in Autonomous Driving Mykel J. Kochenderfer Team 2510.24949 null  
2025-10-28 Finding Culture-Sensitive Neurons in Vision-Language Models Ivan Titov Team 2510.24942 null  
2025-10-28 Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning Arnold W. Schumann Team 2510.24650 null  
2025-10-28 OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Lingpeng Kong Team 2510.24411 null  
2025-10-28 What do vision-language models see in the context? Investigating multimodal in-context learning Sandra Avila Team 2510.24331 null  
2025-10-28 Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning Ivan Kitanovski Team 2510.24321 null  
2025-10-28 ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model Rui Yan Team 2510.24285 null  
2025-10-28 Enabling Near-realtime Remote Sensing via Satellite-Ground Collaboration of Large Vision-Language Models Yue Gao Team 2510.24242 null  
2025-10-28 V-SAT: Video Subtitle Annotation Tool Vishwanathan Raman Team 2510.24180 null  
2025-10-28 Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning Xubo Luo Team 2510.24152 null  
2025-10-28 Compositional Image Synthesis with Inference-Time Scaling Namhyuk Ahn Team 2510.24133 link  
2025-10-28 HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology Vandita Singh Team 2510.24115 null  
2025-10-28 PFEA: An LLM-based High-Level Natural Language Planning and Feedback Embodied Agent for Human-Centered AI Philip Dames Team 2510.24109 null  
2025-10-28 Enhancing CLIP Robustness via Cross-Modality Alignment Hanwang Zhang Team 2510.24038 null  
2025-10-28 Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks Hannah Kerner Team 2510.24010 null  
2025-10-28 Reasoning Visual Language Model for Chest X-Ray Analysis Daguang Xu Team 2510.23968 null  
2025-10-27 Latent Chain-of-Thought for Visual Reasoning Zhiqiang Tao Team 2510.23925 null  
2025-10-27 Explainable Detection of AI-Generated Images with Artifact Localization Using Faster-Than-Lies and Vision-Language Models for Edge Devices Madesh Kuppusamy Team 2510.23775 null  
2025-10-27 RobotArena $\infty$ : Scalable Robot Benchmarking via Real-to-Sim Translation Katerina Fragkiadaki Team 2510.23571 link  
2025-10-28 VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation Cordelia Schmid Team 2510.23497 null  
2025-10-27 On the Faithfulness of Visual Thinking: Measurement and Enhancement Guisong Xia Team 2510.23482 null  
2025-10-27 A Video Is Not Worth a Thousand Words Michael Wray Team 2510.23253 null  
2025-10-27 Process Reward Models for Sentence-Level Verification of LVLM Radiology Reports Curtis P. Langlotz Team 2510.23217 null  
2025-10-27 DecoDINO: 3D Human-Scene Contact Prediction with Semantic Classification Angelo Broere Team 2510.23203 null  
2025-10-27 Evaluation of Vision-LLMs in Surveillance Video Jelte P. Mense Team 2510.23190 null  
2025-10-27 Finding 3D Scene Analogies with Multimodal Foundation Models Young Min Kim Team 2510.23184 null  
2025-10-27 Revisiting Multimodal Positional Encoding in Vision-Language Models Shuai Bai Team 2510.23095 null  
2025-10-27 Multi-Stage Field Extraction of Financial Documents with OCR and Compact Vision-Language Models Donald MacDonald Team 2510.23066 null  
2025-10-27 VoMP: Predicting Volumetric Mechanical Property Fields Maria Shugrina Team 2510.22975 link  
2025-10-28 HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment Zhen Li Team 2510.22917 null  
2025-10-26 Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models Jiong Tang Team 2510.22868 null  
2025-10-26 Semantic-Preserving Cross-Style Visual Reasoning for Robust Multi-Modal Understanding in Large Vision-Language Models Kaito Tanaka Team 2510.22838 null  
2025-10-26 VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions Taehwan Kim Team 2510.22798 link  
2025-10-26 Self-Calibrated Consistency can Fight Back for Adversarial Robustness in Vision-Language Models Mingkun Xu Team 2510.22785 null  
2025-10-26 MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion Chien-Sheng Wu Team 2510.22768 null  
2025-10-26 Jarvis: Towards Personalized AI Assistant via Personal KV-Cache Retrieval Wentao Zhang Team 2510.22765 null  
2025-10-26 S-Chain: Structured Visual Chain-of-Thought For Medicine Anh Totti Nguyen Team 2510.22728 null  
2025-10-26 Atlas Urban Index: A VLM-Based Approach for Spatially and Temporally Calibrated Urban Development Monitoring Prathamesh Mayekar Team 2510.22702 null  
2025-10-24 A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection Peter Henderson Team 2510.21679 null  
2025-10-24 Modest-Align: Data-Efficient Alignment for Vision-Language Models Zuozhu Liu Team 2510.21606 null  
2025-10-24 Head Pursuit: Probing Attention Specialization in Multimodal Transformers Alberto Cazzaniga Team 2510.21518 null  
2025-10-24 MoniTor: Exploiting Large Language Models with Instruction for Online Video Anomaly Detection Jie Qin Team 2510.21449 null  
2025-10-24 Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings Fakhri Karray Team 2510.21424 null  
2025-10-24 Bridging the gap to real-world language-grounded visual concept learning Seunghoon Hong Team 2510.21412 null  
2025-10-24 VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set Shuhui Wang Team 2510.21323 null  
2025-10-24 Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models Taesup Kim Team 2510.21175 null  
2025-10-24 Generalizable Hierarchical Skill Learning via Object-Centric Representation Robert Platt Team 2510.21121 null  
2025-10-24 SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation Joseph Yitan Cheng Team 2510.21120 null  
2025-10-24 MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning Dong In Kim Team 2510.21093 null  
2025-10-24 Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung’s Disease Adrian D. C. Chan Team 2510.21083 null  
2025-10-24 ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models Jimmy Chiun Team 2510.21069 null  
2025-10-23 3DReasonKnee: Advancing Grounded Reasoning in Medical Vision Language Models Pranav Rajpurkar Team 2510.20967 null  
2025-10-23 Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation Shengjie Wang Team 2510.20812 null  
2025-10-23 Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models Linfeng Zhang Team 2510.20707 link  
2025-10-23 Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward Chenliang Xu Team 2510.20696 null  
2025-10-23 Better Tokens for Better 3D: Advancing Vision-Language Modeling in 3D Medical Imaging Bjoern Menze Team 2510.20639 null  
2025-10-23 Bi-CoG: Bi-Consistency-Guided Self-Training for Vision-Language Models Lan-Zhe Guo Team 2510.20477 null  
2025-10-23 GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments? Yingchun Wang Team 2510.20333 null  
2025-10-23 Breakdance Video classification in the age of Generative AI Michelle Munson Team 2510.20287 null  
2025-10-23 Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding Sangyoun Lee Team 2510.20244 null  
2025-10-23 Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context Sibei Yang Team 2510.20229 null  
2025-10-24 Surfer 2: The Next Generation of Cross-Platform Computer Use Agents Jevgenij Zubovskij Team 2510.19949 null  
2025-10-22 Semantic World Models Abhishek Gupta Team 2510.19818 null  
2025-10-22 olmOCR 2: Unit Test Rewards for Document OCR Kyle Lo Team 2510.19817 link  
2025-10-22 Class-Aware Prototype Learning with Negative Contrast for Test-Time Adaptation of Vision-Language Models Xuelong Li Team 2510.19802 null  
2025-10-22 MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom Shaohua Kevin Zhou Team 2510.19626 link  
2025-10-22 XBench: A Comprehensive Benchmark for Visual-Language Explanations in Chest Radiography Mauricio Reyes Team 2510.19599 null  
2025-10-22 Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection Qiben Yan Team 2510.19574 null  
2025-10-22 A Matter of Time: Revealing the Structure of Time in Vision-Language Models Matthias Zeppelzauer Team 2510.19559 null  
2025-10-22 **[De Re]constructing VLMs’ Reasoning in Counting** Giuseppe Riccardi Team 2510.19555 null
2025-10-22 CARES: Context-Aware Resolution Selector for VLMs Eli Schwartz Team 2510.19496 null  
2025-10-22 Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Baining Guo Team 2510.19400 link  
2025-10-22 A Training-Free Framework for Open-Vocabulary Image Segmentation and Recognition with EfficientNet and CLIP Wei Yu Chen Team 2510.19333 null  
2025-10-22 Unified Reinforcement and Imitation Learning for Vision-Language Models Yueh-Hua Wu Team 2510.19307 link  
2025-10-22 Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models Changhyun Choi Team 2510.19268 null  
2025-10-22 Preliminary Use of Vision Language Model Driven Extraction of Mouse Behavior Towards Understanding Fear Expression Evangelos E. Papalexakis Team 2510.19160 null  
2025-10-21 PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions Kathleen McKeown Team 2510.19060 link  
2025-10-21 Robust Driving QA through Metadata-Grounded Context and Task-Specific Prompts Hyunjung Shim Team 2510.19001 null  
2025-10-21 DSI-Bench: A Benchmark for Dynamic Spatial Intelligence Zhou Zhao Team 2510.18873 null  
2025-10-21 FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning Jagath C. Rajapakse Team 2510.18837 null  
2025-10-21 Seg the HAB: Language-Guided Geospatial Algae Bloom Reasoning and Segmentation Elvis Hsieh Team 2510.18751 null  
2025-10-21 Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents Mike Zheng Shou Team 2510.18703 link  
2025-10-21 Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Ruqi Huang Team 2510.18632 null  
2025-10-21 CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent Xing Sun Team 2510.18596 null  
2025-10-21 CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder Hye Won Chung Team 2510.18583 null  
2025-10-21 Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation Yan-Ann Chen Team 2510.18502 null  
2025-10-21 StarBench: A Turn-Based RPG Benchmark for Agentic Multimodal Decision-Making and Information Seeking Donglin Yu Team 2510.18483 null  
2025-10-21 Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation Cristina España-Bonet Team 2510.18439 null  
2025-10-21 ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization Hongyi Wen Team 2510.18433 null  
2025-10-21 Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding Xian Wu Team 2510.18321 null  
2025-10-21 StreamingTOM: Streaming Token Compression for Efficient Video Understanding Huan Wang Team 2510.18269 null  
2025-10-21 UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding Xuelong Li Team 2510.18262 null  
2025-10-21 RadDiagSeg-M: A Vision Language Model for Joint Diagnosis and Multi-Target Segmentation in Radiology Bjoern Menze Team 2510.18188 null  
2025-10-20 Online In-Context Distillation for Low-Resource Vision Language Models Karteek Alahari Team 2510.18117 null  
2025-10-20 HouseTour: A Virtual Real Estate A(I)gent Iro Armeni Team 2510.18054 null  
2025-10-20 SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection Johannes Betz Team 2510.18034 null  
2025-10-21 Glyph: Scaling Context Windows via Visual-Text Compression Minlie Huang Team 2510.17800 null  
2025-10-20 SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference Zhijian Liu Team 2510.17777 null  
2025-10-20 Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs Hanghang Tong Team 2510.17771 null  
2025-10-20 VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models Ruqi Zhang Team 2510.17759 null  
2025-10-20 Frugal Federated Learning for Violence Detection: A Comparison of LoRA-Tuned VLMs and Personalized CNNs Rachid Chelouah Team 2510.17651 null  
2025-10-20 SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering Li Liu Team 2510.17633 null  
2025-10-20 MIRAGE: Agentic Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning Adiba Mahbub Proma Team 2510.17590 null  
2025-10-20 Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation Zhicheng Dou Team 2510.17354 null  
2025-10-20 Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations Omri Azencot Team 2510.17313 null  
2025-10-20 FineVision: Open Data Is All You Need Andrés Marafioti Team 2510.17269 null  
2025-10-20 ZSPAPrune: Zero-Shot Prompt-Aware Token Pruning for Vision-Language Models Guoming Tang Team 2510.17197 null  
2025-10-20 SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving Shaohua Wu Team 2510.17191 null  
2025-10-20 OmniVIC: A Self-Improving Variable Impedance Controller with Vision-Language In-Context Learning for Safe Robotic Manipulation Arash Ajoudani Team 2510.17150 link  
2025-10-20 Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey Jian Cheng Team 2510.17111 null  
2025-10-19 Where, Not What: Compelling Video LLMs to Learn Geometric Causality for 3D-Grounding Yutong Zhong Team 2510.17034 null  
2025-10-19 Does Visual Grounding Enhance the Understanding of Embodied Knowledge in Large Language Models? Renfen Hu Team 2510.16924 null  
2025-10-19 VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents Manling Li Team 2510.16907 null  
2025-10-19 Uncovering Brain-Like Hierarchical Patterns in Vision-Language Models through fMRI-Based Neural Encoding Xiaowei He Team 2510.16870 null  
2025-10-19 Region in Context: Text-condition Image editing with Human-like semantic reasoning Phan Xuan Tan Team 2510.16772 null  
2025-10-19 See or Say Graphs: Agent-Driven Scalable Graph Understanding with Vision-Language Models Xike Xie Team 2510.16769 null  
2025-10-17 BiomedXPro: Prompt Optimization for Explainable Diagnosis with Biomedical Vision Language Models Damayanthi Herath Team 2510.15866 null  
2025-10-17 Neuro-Symbolic Spatial Reasoning in Segmentation Shaogang Gong Team 2510.15841 null  
2025-10-17 Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models Xiting Wang Team 2510.15430 null  
2025-10-17 Fine-Tuning MedGemma for Clinical Captioning to Enhance Multimodal RAG over Malaysia CPGs Goh Man Fye Team 2510.15418 null  
2025-10-17 Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing Yuan Qi Team 2510.15349 null  
2025-10-16 From Pixels to Words – Towards Native Vision-Language Primitives at Scale Ziwei Liu Team 2510.14979 null  
2025-10-16 Learning an Image Editing Model without Image Editing Pairs Xun Huang Team 2510.14978 link  
2025-10-16 RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks Jiachen Li Team 2510.14968 null  
2025-10-16 RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning Haoran Li Team 2510.14828 null  
2025-10-16 CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection Hyunjung Shim Team 2510.14792 null  
2025-10-16 Free-Grained Hierarchical Recognition Stella X. Yu Team 2510.14737 null  
2025-10-16 Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference Andrew Tao Team 2510.14624 null  
2025-10-16 Talking Points: Describing and Localizing Pixels Shai Avidan Team 2510.14583 null  
2025-10-16 Exploring Cross-Modal Flows for Few-Shot Learning Long Chen Team 2510.14543 null  
2025-10-17 PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Yanjun Ma Team 2510.14528 link  
2025-10-16 Noise Projection: Closing the Prompt-Agnostic Gap Behind Text-to-Image Misalignment in Diffusion Models Ziyu Zhao Team 2510.14526 null  
2025-10-16 Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control Yuanchun Shi Team 2510.14388 null  
2025-10-16 Watermarking for Factuality: Guiding Vision-Language Models Toward Truth via Tri-layer Contrastive Decoding Jinkyu Kim Team 2510.14304 link  
2025-10-15 Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models Miguel Arana-Catania Team 2510.13993 null  
2025-10-15 VisCoP: Visual Probing for Video Domain Adaptation of Vision Language Models Srijan Das Team 2510.13808 null  
2025-10-15 Generative Universal Verifier as Multimodal Meta-Reasoner Yujiu Yang Team 2510.13804 null  
2025-10-15 Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models Xiaowei Huang Team 2510.13394 null  
2025-10-15 DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning Hang Zhao Team 2510.13375 null  
2025-10-15 Language as a Label: Zero-Shot Multimodal Classification of Everyday Postures under Data Scarcity Jubal Chandy Jacob Team 2510.13364 null  
2025-10-15 Improving Visual Recommendation on E-commerce Platforms Using Vision-Language Models Andre Rusli Team 2510.13359 null  
2025-10-15 Self-Augmented Visual Contrastive Decoding Vivek Gupta Team 2510.13315 null  
2025-10-15 MMLongCite: A Benchmark for Evaluating Fidelity of Long-Context Vision-Language Models Min Zhang Team 2510.13276 null  
2025-10-15 Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs Bohyung Han Team 2510.13251 null  
2025-10-15 What “Not” to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging Hyunjung Shim Team 2510.13232 null  
2025-10-15 SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs Usman Naseem Team 2510.13190 null  
2025-10-15 DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models Jose M. Alvarez Team 2510.13108 null  
2025-10-15 VLA-0: Building State-of-the-Art VLAs with Zero Modification Fabio Ramos Team 2510.13054 null  
2025-10-14 SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding Thomas Seidl Team 2510.13016 null  
2025-10-14 UNCAP: Uncertainty-Guided Planning Using Natural Language Communication for Cooperative Autonomous Vehicles Ufuk Topcu Team 2510.12992 null  
2025-10-14 Scope: Selective Cross-modal Orchestration of Visual Perception Experts Perouz Taslakian Team 2510.12974 null  
2025-10-14 Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation Bo Du Team 2510.12953 null  
2025-10-14 Unifying Vision-Language Latents for Zero-label Image Caption Enhancement Woo Seong Chung Team 2510.12931 null  
2025-10-14 UniFusion: Vision-Language Model as Unified Encoder in Image Generation Ajinkya Kale Team 2510.12789 link  
2025-10-15 SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model Chao Feng Team 2510.12709 null  
2025-10-14 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning Tong Zhang Team 2510.12693 null  
2025-10-14 VISaGE: Understanding Visual Generics and Exceptions Emily Allaway Team 2510.12548 null  
2025-10-14 A Review of Longitudinal Radiology Report Generation: Dataset Composition, Methods, and Performance Evaluation Luping Zhou Team 2510.12444 null  
2025-10-14 Towards General Urban Monitoring with Vision-Language Models: A Review, Evaluation, and a Research Agenda Nuno F. Rodrigues Team 2510.12400 null  
2025-10-14 Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector Yiwei Wang Team 2510.12287 null  
2025-10-14 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Haoang Li Team 2510.12276 null  
2025-10-14 HoneyBee: Data Recipes for Vision-Language Reasoners Ramakanth Pasunuru Team 2510.12225 null  
2025-10-14 Hierarchical Reasoning with Vision-Language Models for Incident Reports from Dashcam Videos Yu Yamaguchi Team 2510.12190 null  
2025-10-14 ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation Renjie Wan Team 2510.12119 null  
2025-10-13 Embedding the Teacher: Distilling vLLM Preferences for Scalable Image Retrieval Vyas Raina Team 2510.12014 null  
2025-10-13 Learning Dynamics of VLM Finetuning Keze Wang Team 2510.11978 null  
2025-10-13 Evaluating Open-Source Vision-Language Models for Multimodal Sarcasm Detection Marcos Zampieri Team 2510.11852 link  
2025-10-13 Data or Language Supervision: What Makes CLIP Better than DINO? Serena Yeung-Levy Team 2510.11835 null  
2025-10-13 CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images Xihui Liu Team 2510.11718 null  
2025-10-13 Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation Mac Schwager Team 2510.11689 null  
2025-10-13 EvoCAD: Evolutionary CAD Code Generation with Vision Language Models Niki van Stein Team 2510.11631 null  
2025-10-13 mmWalk: Towards Multi-modal Multi-view Walking Assistance Rainer Stiefelhagen Team 2510.11520 link  
2025-10-13 Coupled Degradation Modeling and Fusion: A VLM-Guided Degradation-Coupled Network for Degradation-Aware Infrared and Visible Image Fusion Guangmang Cui Team 2510.11456 null  
2025-10-13 Template-Based Text-to-Image Alignment for Language Accessibility: A Study on Visualizing Text Simplifications Yingqiang Gao Team 2510.11314 null  
2025-10-13 When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models Samer Al-Hamadani Team 2510.11302 null  
2025-10-13 $Δ\mathrm{Energy}$ : Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization Nanyang Ye Team 2510.11296 null  
2025-10-13 Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering Thomas Seidl Team 2510.11295 null  
2025-10-13 Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations Keno K. Bressem Team 2510.11196 null  
2025-10-13 BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models Roy Ka-Wei Lee Team 2510.11178 null  
2025-10-13 Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Zhi Hou Team 2510.11027 null  
2025-10-13 GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation Jing Zhang Team 2510.11020 null  
2025-10-13 COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision Language Models Aidong Zhang Team 2510.11012 null  
2025-10-13 Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization Guangdong Bai Team 2510.10982 null  
2025-10-13 Chart-RVR: Reinforcement Learning with Verifiable Rewards for Explainable Chart Reasoning Aidong Zhang Team 2510.10973 null  
2025-10-13 IUT-Plug: A Plug-in tool for Interleaved Image-Text Generation Jing Tang Team 2510.10969 null  
2025-10-13 MC#: Mixture Compressor for Mixture-of-Experts Large Models Xiaojuan Qi Team 2510.10962 null  
2025-10-13 FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model Yuhui Yin Team 2510.10921 null  
2025-10-13 Topological Alignment of Shared Vision-Language Embedding Space Jae-Hun Jung Team 2510.10889 null  
2025-10-10 StreamingVLM: Real-Time Understanding for Infinite Video Streams Song Han Team 2510.09608 null  
2025-10-10 VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation Caifeng Shan Team 2510.09607 link  
2025-10-10 Vision Language Models: A Survey of 26K Papers Fengming Lin Team 2510.09586 null  
2025-10-10 D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models Wonjun Hwang Team 2510.09473 null  
2025-10-10 Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models Jiao Ran Team 2510.09358 link  
2025-10-10 Spotlight on Token Perception for Multimodal Reinforcement Learning Yu Cheng Team 2510.09285 link  
2025-10-10 Hallucination Filtering in Radiology Vision-Language Models Using Discrete Semantic Entropy Daniel Truhn Team 2510.09256 link  
2025-10-10 Zero-shot image privacy classification with Vision-Language Models Andrea Cavallaro Team 2510.09253 null  
2025-10-10 Clear Roads, Clear Vision: Advancements in Multi-Weather Restoration for Smart Transportation Subrahmanyam Murala Team 2510.09228 null  
2025-10-10 MCMC: Bridging Rendering, Optimization and Generative AI Wenzel Jakob Team 2510.09078 null  
2025-10-10 On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models Se Young Chun Team 2510.09008 null  
2025-10-10 Unleashing Perception-Time Scaling to Multimodal Reasoning Models Minghui Qiu Team 2510.08964 null  
2025-10-10 PHyCLIP: $\ell_1$ -Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning Takashi Matsubara Team 2510.08919 null  
2025-10-09 CDE: Concept-Driven Exploration for Reinforcement Learning Joseph Campbell Team 2510.08851 null  
2025-10-09 FOLK: Fast Open-Vocabulary 3D Instance Segmentation via Label-guided Knowledge Distillation Zhihua Wei Team 2510.08849 null  
2025-10-09 D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition Yun Fu Team 2510.08818 null  
2025-10-09 Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization Zhengzhong Tu Team 2510.08789 null  
2025-10-09 MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning Salman Khan Team 2510.08567 null  
2025-10-09 SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models Yueting Zhuang Team 2510.08531 link  
2025-10-09 To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models Leonid Sigal Team 2510.08510 link  
2025-10-09 MoA-VR: A Mixture-of-Agents System Towards All-in-One Video Restoration Guangtao Zhai Team 2510.08508 null  
2025-10-09 The Visual Iconicity Challenge: Evaluating Vision-Language Models on Sign Language Form-Meaning Mapping Esam Ghaleb Team 2510.08482 null  
2025-10-09 Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling Paula Buttery Team 2510.08470 null  
2025-10-09 VideoVerse: How Far is Your T2V Generator from a World Model? Lei Zhang Team 2510.08398 null  
2025-10-09 Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception Ciaran Eising Team 2510.08352 null  
2025-10-09 Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness Hai Zhao Team 2510.08238 null  
2025-10-09 Approximate Domain Unlearning for Vision-Language Models Go Irie Team 2510.08132 null  
2025-10-09 CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning Rongrong Ji Team 2510.08003 null  
2025-10-09 Executable Analytic Concepts as the Missing Link Between VLM Insight and Precise Manipulation Jianhua Sun Team 2510.07975 null  
2025-10-09 Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents Jun Zhu Team 2510.07809 null  
2025-10-09 GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models Long Zeng Team 2510.07791 null  
2025-10-09 IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction Liqiang Nie Team 2510.07778 null  
2025-10-09 Multimodal Safety Evaluation in Generative Agent Social Simulations Bernard Ghanem Team 2510.07709 null  
2025-10-09 Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models Fuzhi Tang Team 2510.07632 null  
2025-10-08 Cross-Modal Attention Guided Unlearning in Vision-Language Models Xintao Wu Team 2510.07567 null  
2025-10-08 Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices Jimmy Huang Team 2510.07545 null  
2025-10-09 TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics Shanghang Zhang Team 2510.07181 null  
2025-10-08 Few-Shot Adaptation Benchmark for Remote Sensing Vision-Language Models Benoit Macq Team 2510.07135 null  
2025-10-08 TALENT: Table VQA via Augmented Language-Enhanced Natural-text Transcription Haoyu Wang Team 2510.07098 null  
2025-10-08 Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications Yuke Zhu Team 2510.07077 link  
2025-10-08 Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration Yuan Fang Team 2510.07035 null  
2025-10-08 Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness Brian Bartoldson Team 2510.06790 null  
2025-10-08 TTRV: Test-Time Reinforcement Learning for Vision Language Models M. Jehanzeb Mirza Team 2510.06783 null  
2025-10-08 ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory Zora Zhiruo Wang Team 2510.06664 null  
2025-10-08 VUGEN: Visual Understanding priors for GENeration Jakob Verbeek Team 2510.06529 null  
2025-10-07 ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations Yujun Cai Team 2510.06292 null  
2025-10-06 Surgeons Are Indian Males and Speech Therapists Are White Females: Auditing Biases in Vision-Language Models for Healthcare Professionals Beenish Moalla Chaudhry Team 2510.06280 null  
2025-10-07 Reasoning under Vision: Understanding Visual-Spatial Cognition in Vision-Language Models for CAPTCHA Junfeng Yang Team 2510.06067 null  
2025-10-07 Medical Vision Language Models as Policies for Robotic Surgery Martin Radfar Team 2510.06064 null  
2025-10-07 Data Factory with Minimal Human Effort Using VLMs Andrew Markham Team 2510.05722 null  
2025-10-07 Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM Zheng Zhang Team 2510.05544 null  
2025-10-06 Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization Ariel Gera Team 2510.05038 null  
2025-10-06 Efficient Navigation in Unknown Indoor Environments with Vision-Language Models J. P. How Team 2510.04991 null  
2025-10-06 ViTs: Teaching Machines to See Time Series Anomalies Like Human Experts Dan Pei Team 2510.04710 null  
2025-10-06 Conditional Representation Learning for Customized Tasks Xi Peng Team 2510.04564 null  
2025-10-06 More Than Meets the Eye? Uncovering the Reasoning-Planning Disconnect in Training Vision-Language Driving Models Jun Luo Team 2510.04532 null  
2025-10-06 VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery Hao Tang Team 2510.04479 null  
2025-10-06 MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models Gyeongyeon Hwang Team 2510.04477 null  
2025-10-06 A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering Chen Chen Team 2510.04428 null  
2025-10-06 Your Vision-Language Model Can’t Even Count to 20: Exposing the Failures of VLMs in Compositional Counting Jiahao Zhang Team 2510.04401 null  
2025-10-05 AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents Bin Xiao Team 2510.04257 null  
2025-10-05 ContextVLA: Vision-Language-Action Model with Amortized Multi-Frame Context Jinwoo Shin Team 2510.04246 link  
2025-10-05 Zoom-In to Sort AI-Generated Images Out Jianfu Zhang Team 2510.04225 null  
2025-10-05 Automating construction safety inspections using a multi-modal vision-language RAG framework Daniel Dias-da-Costa Team 2510.04145 null  
2025-10-07 AgriGPT-VL: Agricultural Vision-Language Understanding Suite Shijian Li Team 2510.04002 null  
2025-10-04 No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models Serena Yeung-Levy Team 2510.03978 null  
2025-10-04 Zero-Shot Fine-Grained Image Classification Using Large Vision-Language Models Chris Thomas Team 2510.03903 null  
2025-10-04 Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert Chunhua Shen Team 2510.03896 null  
2025-10-04 Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models Durga Toshniwal Team 2510.03840 null  
2025-10-04 Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models Zeynep Akata Team 2510.03721 null  
2025-10-04 MonitorVLM:A Vision Language Framework for Safety Violation Detection in Mining Operations Jingliang Duan Team 2510.03666 null  
2025-10-03 Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning Yang Zhang Team 2510.03182 null  
2025-10-03 SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus Caifeng Shan Team 2510.03160 null  
2025-10-03 Multimodal Carotid Risk Stratification with Large Vision-Language Models: Benchmarking, Fine-Tuning, and Clinical Insights Konstantina Nikita Team 2510.02922 null  
2025-10-03 Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting Mostafa Tavassolipour Team 2510.02913 null  
2025-10-03 Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis Xin Gao Team 2510.02815 null  
2025-10-03 MaskCD: Mitigating LVLM Hallucinations by Image Head Masked Contrastive Decoding Yujiu Yang Team 2510.02790 null  
2025-10-03 OTR: Synthesizing Overlay Text Dataset for Text Removal Kota Yamaguchi Team 2510.02787 link  
2025-10-03 Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models Prahitha Movva Team 2510.02780 null  
2025-10-03 AdaRD-key: Adaptive Relevance-Diversity Keyframe Sampling for Long-form Video understanding Mohammed Bennamoun Team 2510.02778 null  
2025-10-03 Bayesian Test-time Adaptation for Object Recognition and Detection with Vision-language Models Zhen Lei Team 2510.02750 null  
2025-10-03 Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation – Technical Report for IROS 2025 RoboSense Challenge Track 4 Xiaoshuai Hao Team 2510.02728 null  
2025-10-03 ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks Bo Li Team 2510.02677 null  
2025-10-02 Exploring OCR-augmented Generation for Bilingual VQA Sunho Park Team 2510.02543 null  
2025-10-02 Multimodal Function Vectors for Spatial Relations Hongjing Lu Team 2510.02528 null  
2025-10-02 From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens Freda Shi Team 2510.02292 link  
2025-10-02 microCLIP: Unsupervised CLIP Adaptation via Coarse-Fine Token Fusion for Fine-Grained Image Classification Muhammad Haris Khan Team 2510.02270 null  
2025-10-02 Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents Zhuosheng Zhang Team 2510.02204 null  
2025-10-02 GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation Heng Tao Shen Team 2510.02186 null  
2025-10-02 Unlocking Vision-Language Models for Video Anomaly Detection via Fine-Grained Prompting Jing Zhang Team 2510.02155 null  
2025-10-02 Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving Chun Jason Xue Team 2510.01795 null  
2025-10-02 Accelerating Attention with Basis Decomposition Jialin Zhao Team 2510.01718 null  
2025-10-02 Contrastive Representation Regularization for Vision-Language-Action Models Jinwoo Shin Team 2510.01711 null  
2025-10-02 VaPR – Vision-language Preference alignment for Reasoning Nanyun Peng Team 2510.01700 null  
2025-10-02 Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning Wentao Zhang Team 2510.01681 null  
2025-10-02 Source-Free Cross-Domain Continual Learning Kutluyil Dogancay Team 2510.01649 null  
2025-10-02 FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models Bihan Wen Team 2510.01642 link  
2025-10-02 ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models Murali Emani Team 2510.01582 null  
2025-10-03 Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed Sanmi Koyejo Team 2510.01494 null  
2025-10-01 VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs Gonzalo Ferrer Team 2510.01483 null  
2025-10-01 Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories Baharan Mirzasoleiman Team 2510.01454 link  
2025-10-01 GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings Rakesh Kumar Team 2510.01448 null  
2025-10-01 VENTURA: Adapting Image Diffusion Models for Unified Task Conditioned Navigation Amirreza Shaban Team 2510.01388 null  
2025-10-01 Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models Feng Zhao Team 2510.01304 null  
2025-10-01 Dirichlet-Prior Shaping: Guiding Expert Specialization in Upcycled MoEs Paul Whatmough Team 2510.01185 null  
2025-09-30 MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation Shanghang Zhang Team 2509.26642 null  
2025-09-30 Query-Kontext: An Unified Multimodal Model for Image Generation and Editing Jingdong Wang Team 2509.26641 null  
2025-09-30 Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces Ivan Titov Team 2509.26594 null  
2025-09-30 The Invisible Mentor: Inferring User Actions from Screen Recordings to Recommend Better Workflows Emerson Murphy-Hill Team 2509.26557 null  
2025-09-30 Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation Varun Jampani Team 2509.26555 link  
2025-09-30 Zero-Shot Decentralized Federated Learning Giovanni Bellitto Team 2509.26462 link  
2025-09-30 SQUARE: Semantic Query-Augmented Fusion and Efficient Batch Reranking for Training-free Zero-Shot Composed Image Retrieval Huei-Fang Yang Team 2509.26330 null  
2025-09-30 ProfVLM: A Lightweight Video-Language Model for Multi-View Proficiency Estimation Antonio Liotta Team 2509.26278 null  
2025-09-30 Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document David Naccache Team 2509.26235 null  
2025-09-30 TSalV360: A Method and Dataset for Text-driven Saliency Detection in 360-Degrees Videos Vasileios Mezaris Team 2509.26208 link  
2025-09-30 SGS: Segmentation-Guided Scoring for Global Scene Inconsistencies Xue Li Team 2509.26039 null  
2025-10-01 AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment Weisi Lin Team 2509.26006 null  
2025-09-30 Learning Egocentric In-Hand Object Segmentation through Weak Supervision from Human Narrations Antonino Furnari Team 2509.26004 null  
2025-09-30 Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline Zhun Zhong Team 2509.25991 null  
2025-09-30 NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving Johannes Betz Team 2509.25944 null  
2025-09-30 VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs Tiancheng Zhao Team 2509.25916 null  
2025-10-01 LLaVAShield: Safeguarding Multimodal Multi-Turn Dialogues in Vision-Language Models Yongjun Shen Team 2509.25896 null  
2025-09-30 DeepSketcher: Internalizing Visual Manipulation for Multimodal Reasoning Jing Zhang Team 2509.25866 null  
2025-09-30 MAPLE: Multi-scale Attribute-enhanced Prompt Learning for Few-shot Whole Slide Image Classification Daoqiang Zhang Team 2509.25863 null  
2025-09-30 Reinforced Embodied Planning with Verifiable Reward for Real-World Robotic Manipulation Hao Chen Team 2509.25852 null  
2025-09-29 TemMed-Bench: Evaluating Temporal Medical Image Reasoning in Vision-Language Models Nanyun Peng Team 2509.25143 null  
2025-09-29 Visual serial processing deficits explain divergences in human and VLM reasoning Thomas L. Griffiths Team 2509.25142 null  
2025-09-29 GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning Salman Khan Team 2509.25026 link  
2025-09-29 World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training Qing Zhang Team 2509.24948 null  
2025-09-29 From Code to Action: Hierarchical Learning of Diffusion-VLM Policies Daniel Dijkman Team 2509.24917 null  
2025-09-29 Training-Free Token Pruning via Zeroth-Order Gradient Estimation in Vision-Language Models Sungeun Hong Team 2509.24837 null  
2025-09-29 IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks Ville Kyrki Team 2509.24768 null  
2025-09-29 IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video? Botian Shi Team 2509.24709 null  
2025-09-29 Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs Elia Bruni Team 2509.24640 null  
2025-09-30 Inducing Dyslexia in Vision Language Models Martin Schrimpf Team 2509.24597 null  
2025-09-29 TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models Joey Tianyi Zhou Team 2509.24566 null  
2025-09-29 CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D Matin Mirzababaei Team 2509.24528 null  
2025-09-29 PhysiAgent: An Embodied Agent Framework in Physical World Xianyuan Zhan Team 2509.24524 null  
2025-09-29 GRPO-MA: Multi-Answer Generation in GRPO for Stable and Efficient Chain-of-Thought Training Hao Dong Team 2509.24494 null  
2025-09-29 Euclid’s Gift: Enhancing Spatial Perception and Reasoning in Vision-Language Models via Geometric Surrogate Tasks Kai Chen Team 2509.24473 null  
2025-09-29 AXIS: Explainable Time Series Anomaly Detection with Large Language Models Chen Zhang Team 2509.24378 null  
2025-09-29 SONAR: Semantic-Object Navigation with Aggregated Reasoning through a Cross-Modal Inference Paradigm Jiankun Wang Team 2509.24321 null  
2025-09-30 FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting Yu Cheng Team 2509.24304 null  
2025-09-29 ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning Yang You Team 2509.24219 null  
2025-09-29 Talk in Pieces, See in Whole: Disentangling and Hierarchical Aggregating Representations for Language-based Object Detection Donghyun Kim Team 2509.24192 null  
2025-09-26 See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation Yu-Lun Liu Team 2509.22653 link  
2025-09-26 CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Dahua Lin Team 2509.22647 link  
2025-09-26 Hierarchical Representation Matching for CLIP-based Class-Incremental Learning Da-Wei Zhou Team 2509.22645 null  
2025-09-26 WoW: Towards a World omniscient World model Through Embodied Interaction Jian Tang Team 2509.22642 null  
2025-09-26 SPARK: Synergistic Policy And Reward Co-Evolving Framework Jiaqi Wang Team 2509.22624 link  
2025-09-26 Color Names in Vision-Language Models Javier Vazquez-Corral Team 2509.22524 null  
2025-09-26 Guiding Evolution of Artificial Life Using Vision-Language Models Frederico Wieser Team 2509.22447 null  
2025-09-26 Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding Mrinmaya Sachan Team 2509.22437 link)  
2025-09-26 RAU: Reference-based Anatomical Understanding with Vision Language Models Shanhui Sun Team 2509.22404 null  
2025-09-26 Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach Zijing Zhou Team 2509.22378 null  
2025-09-26 Rule-Based Reinforcement Learning for Document Image Classification with Vision Language Models Andreas Fischer Team 2509.22283 link  
2025-09-26 Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks Shangyang Li Team 2509.22258 null  
2025-09-26 A Tale of Two Experts: Cooperative Learning for Source-Free Unsupervised Domain Adaptation Cheng Deng Team 2509.22229 null  
2025-09-26 Polysemous Language Gaussian Splatting via Matching-based Mask Lifting Ge Li Team 2509.22225 null  
2025-09-26 Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models Bo Yang Team 2509.22221 null  
2025-09-26 Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting Anirudha Majumdar Team 2509.22195 null  
2025-09-26 MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Conghui He Team 2509.22186 link  
2025-09-26 Multilingual Vision-Language Models, A Survey Jindřich Libovický Team 2509.22123 null  
2025-09-26 Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics Stefan K. Ehrlich Team 2509.22014 null  
2025-09-26 CoFFT: Chain of Foresight-Focus Thought for Visual Language Models Mike Zheng Shou Team 2509.22010 null  
2025-09-25 Nova: Real-Time Agentic Vision-Language Model Serving with Adaptive Cross-Stage Parallelization Guihai Chen Team 2509.21301 null  
2025-09-25 DisCoCLIP: A Distributional Compositional Tensor Network Encoder for Vision-Language Understanding Mehrnoosh Sadrzadeh Team 2509.21287 null  
2025-09-25 Un-Doubling Diffusion: LLM-guided Disambiguation of Homonym Duplication Alexander Nagaev Team 2509.21262 null  
2025-09-25 Hallucination as an Upper Bound: A New Perspective on Text-to-Image Evaluation Mohammad Hossein Rohban Team 2509.21257 null  
2025-09-25 Learning to Look: Cognitive Attention Alignment with Vision-Language Models Nidhi Rastogi Team 2509.21247 null  
2025-09-25 TABLET: A Large-Scale Dataset for Robust Visual Table Understanding Mirella Lapata Team 2509.21205 null  
2025-09-25 Human-like Navigation in a World Built for Humans Shenlong Wang Team 2509.21189 link  
2025-09-25 Can Less Precise Be More Reliable? A Systematic Evaluation of Quantization’s Impact on CLIP Beyond Accuracy Chokri Mraidha Team 2509.21173 null  
2025-09-25 Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning Mingyu Hu Team 2509.21126 null  
2025-09-25 Cross-Modal Instructions for Robot Motion Generation Weiming Zhi Team 2509.21107 null  
2025-09-25 Mammo-CLIP Dissect: A Framework for Analysing Mammography Concepts in Vision-Language Models Robert Jenssen Team 2509.21102 null  
2025-09-25 SoM-1K: A Thousand-Problem Benchmark Dataset for Strength of Materials Lu Cheng Team 2509.21079 null  
2025-09-25 Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos Alka Maurya Team 2509.20961 null  
2025-09-25 Decoding the Surgical Scene: A Scoping Review of Scene Graphs in Surgery M. Ali Nasseri Team 2509.20941 null  
2025-09-25 MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases Diange Yang Team 2509.20843 null  
2025-09-25 DAC-LoRA: Dynamic Adversarial Curriculum for Efficient and Robust Few-Shot Adaptation Ved Umrajkar Team 2509.20792 null  
2025-09-25 Provenance Analysis of Archaeological Artifacts via Multimodal RAG Systems Ruiliang Liu Team 2509.20769 null  
2025-09-25 Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models Meenakshi Khosla Team 2509.20751 null  
2025-09-25 Recov-Vision: Linking Street View Imagery and Vision-Language Models for Post-Disaster Recovery Ali Mostafavi Team 2509.20628 null  
2025-09-24 InstructVTON: Optimal Auto-Masking and Natural-Language-Guided Interactive Style Control for Inpainting-Based Virtual Try-On Karim Bouyarmane Team 2509.20524 null  
2025-09-24 A co-evolving agentic AI system for medical imaging analysis Zhi Huang Team 2509.20279 null  
2025-09-24 Universal Camouflage Attack on Vision-Language Models for Autonomous Driving Wenqi Ren Team 2509.20196 null  
2025-09-24 EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models Dacheng Tao Team 2509.20146 null  
2025-09-24 A Simple Data Augmentation Strategy for Text-in-Image Scientific VQA Yova Kementchedjhieva Team 2509.20119 null  
2025-09-24 Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving Xianpeng Lang Team 2509.20109 null  
2025-09-24 Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning Jiajun Liu Team 2509.20077 null  
2025-09-25 OmniScene: Attention-Augmented Multimodal 4D Scene Understanding for Autonomous Driving Jun Ma Team 2509.19973 null  
2025-09-24 Generalist Robot Manipulation beyond Action Labeled Data Danda Pani Paudel Team 2509.19958 null  
2025-09-24 Benchmarking Gaslighting Attacks Against Speech Large Language Models Pan Zhou Team 2509.19858 null  
2025-09-24 CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition Monica S. Lam Team 2509.19768 null  
2025-09-24 Logics-Parsing Technical Report Minggang Wu Team 2509.19760 null  
2025-09-24 Formal Safety Verification and Refinement for Generative Motion Planners via Certified Local Stabilization Glen Chou Team 2509.19688 null  
2025-09-24 Bias in the Picture: Benchmarking VLMs with Social-Cue News Images and LLM-as-Judge Assessment Shaina Raza Team 2509.19659 null  
2025-09-23 Anatomy of a Feeling: Narrating Embodied Emotions via Large Vision-Language Models Tianyu Jiang Team 2509.19595 null  
2025-09-23 iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning Abhishek Aich Team 2509.19552 null  
2025-09-23 Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation Chi-Guhn Lee Team 2509.19524 null  
2025-09-23 DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models’ Understanding on Indian Culture Sriparna Saha Team 2509.19274 null  
2025-09-23 Long Story Short: Disentangling Compositionality and Long-Caption Understanding in VLMs Yova Kementchedjhieva Team 2509.19207 null  
2025-09-23 Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions Georgios Tzimiropoulos Team 2509.19203 null  
2025-09-23 Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models Xiaojie Wang Team 2509.19191 null  
2025-09-23 FUNCanon: Learning Pose-Aware Action Primitives via Functional Object Canonicalization for Generalizable Robotic Manipulation Jianwei Zhang Team 2509.19102 link  
2025-09-23 ColorBlindnessEval: Can Vision-Language Models Pass Color Blindness Tests? Jiahao Cui Team 2509.19070 null  
2025-09-23 Pure Vision Language Action (VLA) Models: A Comprehensive Survey Qingguo Zhou Team 2509.19012 null  
2025-09-23 Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards Xinlong Wang Team 2509.19003 null  
2025-09-23 No Labels Needed: Zero-Shot Image Classification with Collaborative Self-Learning Joel Luís Carbonera Team 2509.18938 null  
2025-09-23 How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Huchuan Lu Team 2509.18905 null  
2025-09-23 Benchmarking Vision-Language and Multimodal Large Language Models in Zero-shot and Few-shot Scenarios: A study on Christian Iconography Giovanni Colavizza Team 2509.18839 null  
2025-09-23 Bi-VLM: Pushing Ultra-Low Precision Post-Training Quantization Boundaries in Vision-Language Models Dinesh Manocha Team 2509.18763 null  
2025-09-23 Knowledge Transfer from Interaction Learning Shugong Xu Team 2509.18733 null  
2025-09-23 What Makes You Unique? Attribute Prompt Composition for Object Re-Identification Huchuan Lu Team 2509.18715 null  
2025-09-23 RSVG-ZeroOV: Exploring a Training-Free Framework for Zero-Shot Open-Vocabulary Visual Grounding in Remote Sensing Images Quan Wang Team 2509.18711 null  
2025-09-23 NaviSense: A Multimodal Assistive Mobile application for Object Retrieval by Persons with Visual Impairment Vijaykrishnan Narayanan Team 2509.18672 null  
2025-09-23 Learning neuroimaging models from health system-scale data Todd Hollon Team 2509.18638 null  
2025-09-23 SINGER: An Onboard Generalist Vision-Language Navigation Policy for Drones Mac Schwager Team 2509.18610 null  
2025-09-23 VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation Ufuk Topcu Team 2509.18592 link  
2025-09-22 Losing the Plot: How VLM responses degrade on imperfect charts Mahantesh Halappanavar Team 2509.18425 null  
2025-09-22 NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning Sandeep Chinchali Team 2509.18041 null  
2025-09-22 Robust and Resilient Soft Robotic Object Insertion with Compliance-Enabled Contact Formation and Failure Recovery Yoshitaka Ushiku Team 2509.17666 null  
2025-09-22 SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models Jieping Ye Team 2509.17664 null  
2025-09-22 From Benchmarks to Reality: Advancing Visual Anomaly Detection by the VAND 3.0 Challenge Paula Ramos Team 2509.17615 null  
2025-09-22 COLA: Context-aware Language-driven Test-time Adaptation Zhihe Lu Team 2509.17598 null  
2025-09-22 Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models Seong Jae Hwang Team 2509.17588 null  
2025-09-23 Visual Instruction Pretraining for Domain-Specific Foundation Models Jian Yang Team 2509.17562 null  
2025-09-22 ChartHal: A Fine-grained Framework Evaluating Hallucination of Large Vision Language Models in Chart Understanding Xiaoyu Qin Team 2509.17481 null  
2025-09-22 Training-Free Label Space Alignment for Universal Domain Adaptation Donghyun Kim Team 2509.17452 null  
2025-09-23 Multi-scale Temporal Prediction via Incremental Generation and Multi-agent Collaboration Yueming Jin Team 2509.17429 null  
2025-09-22 Vision Language Models Are Not (Yet) Spelling Correctors Bojun Zhang Team 2509.17418 null  
2025-09-22 Mano Report Shuo Wang Team 2509.17336 null  
2025-09-22 UIPro: Unleashing Superior Interaction Capability For GUI Agents Zhaoxiang Zhang Team 2509.17328 null  
2025-09-22 OpenGVL - Benchmarking Visual Temporal Progress for Data Curation Krzysztof Walas Team 2509.17321 null  
2025-09-21 FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions Zhongyuan Wang Team 2509.17177 null  
2025-09-21 MoCLIP-Lite: Efficient Video Recognition by Fusing CLIP with Motion Vectors Soumyabrata Dev Team 2509.17084 null  
2025-09-21 CardiacCLIP: Video-based CLIP Adaptation for LVEF Prediction in a Few-shot Manner Xiaomeng Li Team 2509.17065 null  
2025-09-21 AgriDoctor: A Multimodal Intelligent Assistant for Agriculture Liang Wang Team 2509.17044 null  
2025-09-21 Orchestrate, Generate, Reflect: A VLM-Based Multi-Agent Collaboration Framework for Automated Driving Policy Learning Jun Ma Team 2509.17042 null  
2025-09-21 When Color-Space Decoupling Meets Diffusion for Adverse-Weather Image Restoration Jun Li Team 2509.17024 null  
2025-09-19 Robust Vision-Language Models via Tensor Decomposition: A Defense Against Adversarial Attacks Evangelos E. Papalexakis Team 2509.16163 null  
2025-09-19 Randomized Smoothing Meets Vision-Language Models Chih-Hong Cheng Team 2509.16088 null  
2025-09-19 I-FailSense: Towards General Robotic Failure Detection with Vision-Language Models Mohamed Chetouani Team 2509.16072 null  
2025-09-19 Compose by Focus: Scene Graph-based Atomic Skills Heng Yang Team 2509.16053 null  
2025-09-19 CIDER: A Causal Cure for Brand-Obsessed Text-to-Image Models Wushao Wen Team 2509.15803 null  
2025-09-19 Vision-Language Models as Differentiable Semantic and Spatial Rewards for Text-to-3D Generation He Sun Team 2509.15772 null  
2025-09-19 GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning Zhaojian Li Team 2509.15738 null  
2025-09-19 Training-Free Pyramid Token Pruning for Efficient Large Vision-Language Models via Region, Token, and Instruction-Guided Importance Xiangyang Xue Team 2509.15704 null  
2025-09-19 ORIC: Benchmarking Object Recognition in Incongruous Context for Large Vision-Language Models Hao Su Team 2509.15695 null  
2025-09-19 SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Nima Mesgarani Team 2509.15661 null  
2025-09-19 PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models Byung-Cheol Min Team 2509.15607 null  
2025-09-18 SmolRGPT: Efficient Spatial Reasoning for Warehouse Environments with 600M Parameters Andy Couturier Team 2509.15490 null  
2025-09-18 Comparing Computational Pathology Foundation Models using Representational Similarity Analysis William Lotter Team 2509.15482 null  
2025-09-18 ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models Nathaniel D. Bastian Team 2509.15435 null  
2025-09-18 SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language Models Andrew Yates Team 2509.15432 null  
2025-09-18 CoDoL: Conditional Domain Prompt Learning for Out-of-Distribution Generalization Xin Lin Team 2509.15330 null  
2025-09-18 Calibration-Aware Prompt Learning for Medical Vision-Language Models Muhammad Haris Khan Team 2509.15226 null  
2025-09-19 ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Wenhai Wang Team 2509.15221 null  
2025-09-18 What’s the Best Way to Retrieve Slides? A Comparative Study of Multimodal, Caption-Based, and Hybrid Retrieval Techniques Grigorios Tsoumakas Team 2509.15211 null  
2025-09-18 MedFact-R1: Towards Factual Medical Reasoning via Pseudo-Label Augmentation Guodong Ding Team 2509.15154 null  
2025-09-18 Forecasting and Visualizing Air Quality from Sky Images with Vision-Language Models Yanqing Zhang Team 2509.15076 null  
2025-09-18 QuizRank: Picking Images by Quizzing VLMs Eytan Adar Team 2509.15059 null  
2025-09-18 PRISM: Product Retrieval In Shopping Carts using Hybrid Matching Jiajing Chen Team 2509.14985 null  
2025-09-18 EchoVLM: Dynamic Mixture-of-Experts Vision-Language Model for Universal Ultrasound Intelligence Qinghua Huang Team 2509.14977 null  
2025-09-19 Affordance-Based Disambiguation of Surgical Instructions for Collaborative Robot-Assisted Surgery Yasuhisa Hasegawa Team 2509.14967 null  
2025-09-18 MARIC: Multi-Agent Reasoning for Image Classification Seunghyun Lee Team 2509.14860 null  
2025-09-18 V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models Ming Jiang Team 2509.14837 null  
2025-09-18 Frame Sampling Strategies Matter: A Benchmark for small vision language models Mounim A. El Yacoubi Team 2509.14769 null  
2025-09-18 Do Vision-Language Models See Urban Scenes as People Do? An Urban Perception Benchmark Rashid Mushkani Team 2509.14574 null  
2025-09-18 VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models Yuxin Ma Team 2509.14571 null  
2025-09-17 CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks Negar Mehr Team 2509.14380 null  
2025-09-17 Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark Vishal M. Patel Team 2509.14227 null  
2025-09-19 TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning Guitao Cao Team 2509.14172 null  
2025-09-17 VSE-MOT: Multi-Object Tracking in Low-Quality Video Scenes Guided by Visual Semantic Enhancement Fei Richard Yu Team 2509.14060 null  
2025-09-17 Can Current AI Models Count What We Mean, Not What They See? A Benchmark and Systematic Evaluation Minh Hoai Team 2509.13939 null  
2025-09-17 Towards Rationale-Answer Alignment of LVLMs via Self-Rationale Calibration Xiaoqiang Li Team 2509.13919 null  
2025-09-17 EDITS: Enhancing Dataset Distillation with Implicit Textual Semantics Jielei Wang Team 2509.13858 null  
2025-09-17 Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models Longwen Gao Team 2509.13836 null  
2025-09-17 Iterative Prompt Refinement for Safer Text-to-Image Generation Byung-Jun Lee Team 2509.13760 null  
2025-09-17 Reinforcement Learning for Robotic Insertion of Flexible Cables in Industrial Settings Changjoo Nam Team 2509.13731 null  
2025-09-17 DREAM: Domain-aware Reasoning for Efficient Autonomous Underwater Monitoring Xiaomin Lin Team 2509.13666 null  
2025-09-16 Intelligent Healthcare Imaging Platform An VLM-Based Framework for Automated Medical Image Analysis and Clinical Report Generation Samer Al-Hamadani Team 2509.13590 null  
2025-09-16 Using Visual Language Models to Control Bionic Hands: Assessment of Object Perception and Grasp Inference Cedomir Stefanovic Team 2509.13572 null  
2025-09-16 EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing Mingyuan Zhou Team 2509.13399 null  
2025-09-16 3D Aware Region Prompted Vision Language Model Sifei Liu Team 2509.13317 link  
2025-09-16 Image Realness Assessment and Localization with Multimodal Features Somdyuti Paul Team 2509.13289 null  
2025-09-16 ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement Giuseppe Carenini Team 2509.13282 null  
2025-09-16 RadGame: An AI-Powered Platform for Radiology Education Pranav Rajpurkar Team 2509.13270 null  
2025-09-16 HERO: Rethinking Visual Token Early Dropping in High-Resolution Large Vision-Language Models Xiangyang Xue Team 2509.13067 null  
2025-09-16 Perception Before Reasoning: Two-Stage Reinforcement Learning for Visual Reasoning in Vision-Language Models Jingdong Wang Team 2509.13031 null  
2025-09-16 Cross-Layer Vision Smoothing: Enhancing Visual Understanding via Sustained Focus on Key Objects in Large Vision-Language Models Chong Feng Team 2509.12897 null  
2025-09-16 Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents Haiyang Zhang Team 2509.12876 null  
2025-09-16 Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models Xingjun Ma Team 2509.12724 null  
2025-09-16 AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models Jin Huang Team 2509.12715 null  
2025-09-15 Evaluating Robustness of Vision-Language Models Under Noisy Conditions Alireza Team 2509.12492 null  
2025-09-15 An integrated process for design and control of lunar robotics using AI and simulation Martin Servin Team 2509.12367 null  
2025-09-15 Open-ended Hierarchical Streaming Video Understanding with Vision Language Models Seon Joo Kim Team 2509.12145 null  
2025-09-15 Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models Jiajun Zhang Team 2509.12132 null  
2025-09-16 Embodied Navigation Foundation Model He Wang Team 2509.12129 link  
2025-09-15 Lost in Embeddings: Information Loss in Vision-Language Models Anders Søgaard Team 2509.11986 null  
2025-09-15 Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding Yijun Chen Team 2509.11961 null  
2025-09-15 Bridging Vision Language Models and Symbolic Grounding for Video Question Answering Daisy Zhe Wang Team 2509.11862 null  
2025-09-15 Synthetic Captions for Open-Vocabulary Zero-Shot Segmentation Michael Louis Iuzzolino Team 2509.11840 null  
2025-09-15 SpecVLM: Fast Speculative Decoding in Vision-Language Models Emad Barsoum Team 2509.11815 null  
2025-09-15 Igniting VLMs toward the Embodied Space Zach Xu Team 2509.11766 null  
2025-09-15 EMeRALDS: Electronic Medical Record Driven Automated Lung Nodule Detection and Classification in Thoracic CT Images Syed Muhammad Anwar Team 2509.11714 null  
2025-09-15 How Auxiliary Reasoning Unleashes GUI Grounding in VLMs Manni Duan Team 2509.11548 null  
2025-09-15 LVLMs are Bad at Overhearing Human Referential Communication Susan E. Brennan Team 2509.11514 null  
2025-09-14 CEMTM: Contextual Embedding-based Multimodal Topic Modeling Giuseppe Carenini Team 2509.11465 null  
2025-09-14 Enhancing Generalization in Vision-Language-Action Models by Preserving Pretrained Representations Xuanlin Li Team 2509.11417 link  
2025-09-14 ActivePose: Active 6D Object Pose Estimation and Tracking for Robotic Manipulation Yizhao Wang Team 2509.11364 null  
2025-09-14 Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations Weiming Hu Team 2509.11287 null  
2025-09-14 The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge Dehui Du Team 2509.11071 null  
2025-09-14 ViScratch: Using Large Language Models and Gameplay Videos for Automated Feedback in Scratch Jialu Zhang Team 2509.11065 null  
2025-09-13 Language-based Color ISP Tuning Jiro Takatori Team 2509.10765 null  
2025-09-12 TASC: Task-Aware Shared Control for Teleoperated Manipulation Renaud Detry Team 2509.10416 null  
2025-09-12 Towards Understanding Visual Grounding in Visual Language Models Eda B. Özyiğit Team 2509.10345 null  
2025-09-12 Detecting Text Manipulation in Images using Vision Language Models Sébastien Marcel Team 2509.10278 link  
2025-09-12 MagicMirror: A Large-Scale Dataset and Benchmark for Fine-Grained Artifacts Assessment in Text-to-Image Generation Xiaoming Wei Team 2509.10260 null  
2025-09-12 Towards Reliable and Interpretable Document Question Answering via VLMs Simone Marinai Team 2509.10129 null  
2025-09-12 VARCO-VISION-2.0 Technical Report Youngjune Kim Team 2509.10105 null  
2025-09-12 Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration Wayne Zhang Team 2509.10059 null  
2025-09-12 LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA Jianshu Li Team 2509.10026 null  
2025-09-11 How well can LLMs provide planning feedback in grounded environments? Victor Zhong Team 2509.09790 null  
2025-09-11 FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark Hongsheng Li Team 2509.09680 link  
2025-09-11 Compositional Concept Generalization with Variational Quantum Circuits Mehrnoosh sadrzadeh Team 2509.09541 null  
2025-09-11 Decoupling Clinical and Class-Agnostic Features for Reliable Few-Shot Adaptation under Shift Dwarikanath Mahapatra Team 2509.09397 null  
2025-09-11 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Donglin Wang Team 2509.09372 null  
2025-09-11 Curriculum-Based Multi-Tier Semantic Exploration via Deep Reinforcement Learning Abderrezzak Debilou Team 2509.09356 null  
2025-09-11 Image Recognition with Vision and Language Embeddings of VLMs Jiri Matas Team 2509.09311 null  
2025-09-11 Visual Programmability: A Guide for Code-as-Thought in Chart Understanding Ya Zhang Team 2509.09286 null  
2025-09-11 Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis Kuo Feng Hung Team 2509.09254 null  
2025-09-11 Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios Yao Zhu Team 2509.09172 null  
2025-09-11 Zero-shot Hierarchical Plant Segmentation via Foundation Segmentation Models and Text-to-image Attention Fumio Okura Team 2509.09116 null  
2025-09-10 COCO-Urdu: A Large-Scale Urdu Image-Caption Dataset with Multimodal Quality Estimation Umair Hassan Team 2509.09014 link  
2025-09-10 Can Vision-Language Models Solve Visual Math Equations? Mrinmaya Sachan Team 2509.09013 null  
2025-09-10 Generalized User-Oriented Image Semantic Coding Empowered by Large Vision-Language Model Vincent W. S. Wong Team 2509.08913 null  
2025-09-10 Recurrence Meets Transformers for Universal Multimodal Retrieval Rita Cucchiara Team 2509.08897 null  
2025-09-10 RewardDance: Reward Scaling in Visual Generation Weilin Huang Team 2509.08826 null  
2025-09-10 RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation Hao Zhao Team 2509.08820 link  
2025-09-10 SocialNav-SUB: Benchmarking VLMs for Scene Understanding in Social Robot Navigation Peter Stone Team 2509.08757 link  
2025-09-10 TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making Xiu Li Team 2509.08500 null  
2025-09-10 A Structured Review of Underwater Object Detection Challenges and Solutions: From Traditional to Large Vision Language Models Zhou Ni Team 2509.08490 null  
2025-09-11 Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics Pierre Baldi Team 2509.08461 null  
2025-09-10 Retrieval-Augmented VLMs for Multimodal Melanoma Diagnosis Charmgil Hong Team 2509.08338 null  
2025-09-10 Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models Monali Deshmukh Team 2509.08270 null  
2025-09-10 Examining Vision Language Models through Multi-dimensional Experiments with Vision and Text Features Donald E. Brown Team 2509.08266 null  
2025-09-10 Vector embedding of multi-modal texts: a tool for discovery? Sachith Withana Team 2509.08216 null  
2025-09-09 Privacy Preserving Semantic Communications Using Vision Language Models: A Segmentation and Generation Approach Qianqian Zhang Team 2509.08142 null  
2025-09-09 Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images Marc Haraoui Team 2509.07966 null  
2025-09-09 Data-Efficient Fine-Tuning of Vision-Language Models for Diagnosis of Alzheimer’s Disease Xiaochen Yang Team 2509.07613 null  
2025-09-09 Fine-Tuning Vision-Language Models for Visual Navigation Assistance Xi Wang Team 2509.07488 null  
2025-09-09 DepthVision: Robust Vision-Language Understanding through GAN-Based LiDAR-to-RGB Synthesis Alois C. Knoll Team 2509.07463 null  
2025-09-09 SpecifyUI: Supporting Iterative UI Design Intent Expression through Structured Specifications and Generative AI Liuqing Chen Team 2509.07334 null  
2025-09-10 LLaDA-VLA: Vision Language Diffusion Action Models Xiaoyan Sun Team 2509.06932 null  
2025-09-08 D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning Nagendra Kumar Team 2509.06771 null  
2025-09-08 Embodied Hazard Mitigation using Vision-Language Models for Autonomous Mobile Robots Aliasghar Arab Team 2509.06768 null  
2025-09-08 Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization Janis Dalins Team 2509.06759 null  
2025-09-08 Focusing by Contrastive Attention: Enhancing VLMs’ Visual Reasoning Xueqi Cheng Team 2509.06461 null  
2025-09-08 When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection Muhammad Ashad Kabir Team 2509.06427 null  
2025-09-08 Index-Preserving Lightweight Token Pruning for Efficient Document Understanding in Vision-Language Models Inyong Yun Team 2509.06415 null  
2025-09-08 Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning Dong Liang Team 2509.06409 null  
2025-09-08 Multi View Slot Attention Using Paraphrased Texts For Face Anti-Spoofing Ha Young Kim Team 2509.06336 null  
2025-09-08 Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes Mohammad Akbari Team 2509.06266 null  
2025-09-07 PathoHR: Hierarchical Reasoning for Vision-Language Models in Pathology Hujun Yin Team 2509.06105 null  
2025-09-07 Analysis of Blood Report Images Using General Purpose Vision-Language Models Hamid Beigy Team 2509.06033 null  
2025-09-07 ZLATTE: A Geometry-Aware, Learning-Free Framework for Language-Driven Trajectory Reshaping in Human-Robot Interaction Luis Figueredo Team 2509.06031 null  
2025-09-07 Imagining Alternatives: Towards High-Resolution 3D Counterfactual Medical Image Generation via Language Guidance Tal Arbel Team 2509.05978 null  
2025-09-06 Towards an Automated Framework to Audit Youth Safety on TikTok Francesco Pierri Team 2509.05838 null  
2025-09-06 Do Vision-Language Models See Visualizations Like Humans? Alignment in Chart Categorization Torsten Möller Team 2509.05718 null  
2025-09-06 Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis Kazuhiro Nakadai Team 2509.05703 null  
2025-09-05 VLSM-Ensemble: Ensembling CLIP-based Vision-Language Models for Enhanced Medical Image Segmentation Noel E. O’Connor Team 2509.05154 null  
2025-09-05 GenAI-based test case generation and execution in SDV platform Alois Knoll Team 2509.05112 null  
2025-09-05 MM-DREX: Multimodal-Driven Dynamic Routing of LLM Experts for Financial Trading Fei Wu Team 2509.05080 null  
2025-09-05 Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework Guangmang Cui Team 2509.05000 null  
2025-09-05 SynGen-Vision: Synthetic Data Generation for training industrial vision models Nitish Bhardwaj Team 2509.04894 null  
2025-09-05 TemporalFlowViz: Parameter-Aware Visual Analytics for Interpreting Scramjet Combustion Evolution Guihua Shan Team 2509.04834 null  
2025-09-05 FloodVision: Urban Flood Depth Estimation Using Foundation Vision-Language Models and Domain Knowledge Graph John E. Taylor Team 2509.04772 null  
2025-09-05 Dynamic Group Detection using VLM-augmented Temporal Groupness Graph Norimichi Ukita Team 2509.04758 null  
2025-09-04 Guideline-Consistent Segmentation via Multi-Agent Refinement James Davis Team 2509.04687 null  
2025-09-04 TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection Mong Li Lee Team 2509.04448 link  
2025-09-05 GeoArena: An Open Platform for Benchmarking Large Vision-language Models on WorldWide Image Geolocalization Yixuan Li Team 2509.04334 null  
2025-09-04 Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding Juntao Li Team 2509.04243 null  
2025-09-04 An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline Nathaniel D. Bastian Team 2509.04214 null  
2025-09-04 Real Time FPGA Based Transformers & VLMs for Vision Tasks: SOTA Designs and Optimizations Ashiyana Abdul Majeed Team 2509.04162 null  
2025-09-04 Multimodal Feature Fusion Network with Text Difference Enhancement for Remote Sensing Change Detection C. L. Philip Chen Team 2509.03961 null  
2025-09-04 Attn-Adapter: Attention Is All You Need for Online Few-shot Learner of Vision-Language Model Hyunseung Choo Team 2509.03895 null  
2025-09-04 Weakly-Supervised Learning of Dense Functional Correspondences Jiajun Wu Team 2509.03893 link  
2025-09-04 Expedition & Expansion: Leveraging Semantic Representations for Goal-Directed Exploration in Continuous Cellular Automata Cédric Colas Team 2509.03863 null  
2025-09-04 Measuring How (Not Just Whether) VLMs Build Common Ground Malihe Alikhani Team 2509.03805 null  
2025-09-04 Causality-guided Prompt Learning for Vision-language Models via Visual Granulation Qiulei Dong Team 2509.03803 null  
2025-09-04 MedVista3D: Vision-Language Modeling for Reducing Diagnostic Errors in 3D CT Disease Detection, Understanding and Reporting Xiaofeng Yang Team 2509.03800 null  
2025-09-03 Singular Value Few-shot Adaptation of Vision-Language Models Yiming Xiao Team 2509.03740 null  
2025-09-03 E-ARMOR: Edge case Assessment and Review of Multilingual Optical Character Recognition Anupam Purwar Team 2509.03615 null  
2025-09-05 Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens Eunho Yang Team 2509.03025 null  
2025-09-03 KEPT: Knowledge-Enhanced Prediction of Trajectories from Consecutive Driving Frames with Vision-Language Models Hong Chen Team 2509.02966 null  
2025-09-02 A-SEA3L-QA: A Fully Automated Self-Evolving, Adversarial Workflow for Arabic Long-Context Question-Answer Generation Pedro J. Moreno Team 2509.02864 null  
2025-09-02 Challenges in Understanding Modality Conflict in Vision-Language Models David Jensen Team 2509.02805 null  
2025-09-02 2nd Place Solution for CVPR2024 E2E Challenge: End-to-End Autonomous Driving Using Vision Language Model Yi Yang Team 2509.02659 null  
2025-08-31 Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation Slava Voloshynovskiy Team 2509.02615 null  
2025-09-02 Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception Bin He Team 2509.02324 null  
2025-09-02 RS-OOD: A Vision-Language Augmented Framework for Out-of-Distribution Detection in Remote Sensing Yao Zhu Team 2509.02273 null  
2025-09-02 E-THER: A PCT-Grounded Dataset for Benchmarking Empathic AI Syed Afaq Ali Shah Team 2509.02100 null  
2025-09-02 Structure-aware Contrastive Learning for Diagram Understanding of Multimodal Models Hiroshi Sasaki Team 2509.01959 null  
2025-09-02 RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events Feng Zhang Team 2509.01907 null  
2025-09-02 Automated Wildfire Damage Assessment from Multi view Ground level Imagery Via Vision Language Models Yiming Xiao Team 2509.01895 null  
2025-09-01 MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation Haibin Yan Team 2509.01658 link  
2025-09-01 Unified Supervision For Vision-Language Modeling in 3D Computed Tomography Xueyan Mei Team 2509.01554 null  
2025-09-01 Variation-aware Vision Token Dropping for Faster Large Vision-Language Models Honggang Chen Team 2509.01552 link  
2025-09-01 Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models Zhiming Tan Team 2509.01350 null  
2025-09-03 Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation Yanyun Qu Team 2509.01275 null  
2025-09-01 ReCap: Event-Aware Image Captioning with Article Retrieval and Semantic Gaussian Normalization Trung-Nghia Le Team 2509.01259 null  
2025-09-01 POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion Jie Zhou Team 2509.01215 null  
2025-09-01 Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-training for Open-Vocabulary Scene Graph Generation Akihiro Sugimoto Team 2509.01209 null  
2025-08-29 VoCap: Video Object Captioning and Segmentation from Any Prompt Cordelia Schmid Team 2508.21809 null  
2025-08-29 CAD2DMD-SET: Synthetic Generation Tool of Digital Measurement Device CAD Model Datasets for fine-tuning Large Vision-Language Models Rodrigo Ventura Team 2508.21732 null  
2025-08-29 How Well Do Vision–Language Models Understand Cities? A Comparative Study on Spatial Reasoning from Street-View Images Yoonjin Yoon Team 2508.21565 null  
2025-08-29 HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones Shaozi Li Team 2508.21539 null  
2025-08-28 OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning Xinglong Wu Team 2508.21066 link  
2025-08-28 CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification Liqiang Nie Team 2508.21046 link  
2025-08-28 Learning Primitive Embodied World Models: Towards Scalable Robotic Learning Qinying Gu Team 2508.20840 null  
2025-08-28 Estimating 2D Keypoints of Surgical Tools Using Vision-Language Models with Low-Rank Adaptation Binod Bhattarai Team 2508.20830 null  
2025-08-28 Evaluating Compositional Generalisation in VLMs and Diffusion Models Martha Lewis Team 2508.20783 null  
2025-09-02 Occlusion Robustness of CLIP for Military Vehicle Classification Hugo J. Kuijf Team 2508.20760 null  
2025-08-28 “Humor, Art, or Misinformation?”: A Multimodal Dataset for Intent-Aware Synthetic Image Detection Panagiotis C. Petrantonakis Team 2508.20670 null  
2025-08-28 Towards Mechanistic Defenses Against Typographic Attacks in CLIP Wojciech Samek Team 2508.20570 null  
2025-08-28 MedGR $^2$ : Breaking the Data Barrier for Medical Reasoning via Generative Reward Learning Shangyang Li Team 2508.20549 null  
2025-08-28 MedFoundationHub: A Lightweight and Secure Toolkit for Deploying Medical Vision Language Foundation Models Yuankai Huo Team 2508.20345 null  
2025-08-28 GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs Haohan Wang Team 2508.20325 null  
2025-08-27 A Novel Framework for Automated Explain Vision Model Using Vision-Language Models Truong Son Hy Team 2508.20227 null  
2025-08-27 Segmentation Assisted Incremental Test Time Adaptation in an Open World Soma Biswas Team 2508.20029 null  
2025-08-27 SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control Ping Luo Team 2508.20018 null  
2025-08-27 GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity Yixuan Li Team 2508.19972 null  
2025-08-27 Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models Shoaib Ehsan Team 2508.19967 null  
2025-08-27 KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts Hyunjun Eun Team 2508.19944 null  
2025-08-28 NLKI: A lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks Somak Aditya Team 2508.19724 null  
2025-08-27 InquireMobile: Teaching VLM-based Mobile Agent to Request Human Assistance via Reinforcement Fine-Tuning Bo Zheng Team 2508.19679 null  
2025-08-27 Self-Rewarding Vision-Language Model via Reasoning Decomposition Dong Yu Team 2508.19652 null  
2025-08-27 FakeSV-VLM: Taming VLM for Detecting Fake Short-Video News via Progressive Mixture-Of-Experts Adapter Zhun Zhong Team 2508.19639 null  
2025-08-26 LaVA-Man: Learning Visual Action Representations for Robot Manipulation Changjae Oh Team 2508.19391 null  
2025-08-26 Fine-Tuning Vision-Language Models for Neutrino Event Analysis in High-Energy Physics Experiments Pierre Baldi Team 2508.19376 null  
2025-08-26 AT-CXR: Uncertainty-Aware Agentic Triage for Chest X-rays Yiyu Shi Team 2508.19322 null  
2025-10-01 Object Detection with Multimodal Large Vision-Language Models: An In-depth Review Manoj Karkee Team 2508.19294 null  
2025-08-26 Do LVLMs Know What They Know? A Systematic Study of Knowledge Boundary Perception in LVLMs Keping Bi Team 2508.19111 null  
2025-08-26 ProPy: Building Interactive Prompt Pyramids upon CLIP for Partially Relevant Video Retrieval Xiaoguang Zhao Team 2508.19024 null  
2025-08-26 Ask Me Again Differently: GRAS for Measuring Bias in Vision Language Models on Gender, Race, Age, and Skin Tone Amit Sheth Team 2508.18989 null  
2025-08-28 Enhancing Document VQA Models via Retrieval-Augmented Generation Ernest Valveny Team 2508.18984 null  
2025-08-26 Toward Robust Medical Fairness: Debiased Dual-Modal Alignment via Text-Guided Attribute-Disentangled Prompt Learning for Vision-Language Models Yong Xia Team 2508.18886 null  
2025-08-26 Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models Guowen Xu Team 2508.18805 null  
2025-08-26 Rethinking Human-Object Interaction Evaluation for both Vision-Language Models and HOI-Specific Methods Robby T. Tan Team 2508.18753 null  
2025-08-26 Knowing or Guessing? Robust Medical Visual Question Answering via Joint Consistency and Contrastive Learning Zuozhu Liu Team 2508.18687 null  
2025-08-26 PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality Chaowei Xiao Team 2508.18649 null  
2025-08-25 CLARIFY: A Specialist-Generalist Framework for Accurate and Lightweight Dermatological Visual Question Answering Mohammad Ariful Haque Team 2508.18430 null  
2025-08-25 Language-Specific Layer Matters: Efficient Multilingual Enhancement for Large Vision-Language Models Jingbo Zhu Team 2508.18381 null  
2025-08-25 SafeBimanual: Diffusion-based Trajectory Optimization for Safe Bimanual Manipulation Ziwei Wang Team 2508.18268 link  
2025-08-25 MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs Qi Qian Team 2508.18264 link  
2025-08-25 SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models Ashton Anderson Team 2508.18179 null  
2025-08-25 Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance Liyong Ren Team 2508.18177 null  
2025-08-25 ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation Ye Li Team 2508.18050 null  
2025-08-25 PerPilot: Personalizing VLM-based Mobile Agents via Memory and Exploration Zhen Wang Team 2508.18040 null  
2025-08-25 Alternating Training-based Label Smoothing Enhances Prompt Generalization Yu Zhang Team 2508.17846 null  
2025-08-25 PoRe: Position-Reweighted Visual Token Pruning for Vision Language Models Dan Zeng Team 2508.17807 null  
2025-08-25 F2RVLM: Boosting Fine-grained Fragment Retrieval for Multi-Modal Long-form Dialogue with Vision Language Model Jinchao Zhang Team 2508.17714 null  
2025-08-25 Language-Guided Temporal Token Pruning for Efficient VideoLLM Processing Yogesh Kumar Team 2508.17686 null  
2025-08-25 Hierarchical Vision-Language Learning for Medical Out-of-Distribution Detection Ruixuan Wang Team 2508.17667 null  
2025-08-25 Dynamic Embedding of Hierarchical Visual Features for Efficient Vision-Language Fine-Tuning Chunping Qiu Team 2508.17638 null  
2025-08-25 TinyGiantVLM: A Lightweight Vision-Language Architecture for Spatial Reasoning under Resource Constraints Xuan-Huong Nguyen Team 2508.17595 null  
2025-08-25 MetaGen: A DSL, Database, and Benchmark for VLM-Assisted Metamaterial Generation Wojciech Matusik Team 2508.17568 null  
2025-08-24 MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models Venkatram Vishwanath Team 2508.17467 null  
2025-08-24 Multi-Level LVLM Guidance for Untrimmed Video Action Recognition Yunjie Guo Team 2508.17442 null  
2025-08-24 Constrained Prompt Enhancement for Improving Zero-Shot Generalization of Vision-Language Models Qinghua Hu Team 2508.17417 null  
2025-08-24 Lightweight Joint Optimization of General-Purpose Vision-Language Models and Retrievers for Medical Diagnosis Tom Hope Team 2508.17394 null  
2025-08-26 Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs Gaurav Harit Team 2508.17334 null  
2025-08-24 Explain Before You Answer: A Survey on Compositional Visual Reasoning Hamid Rezatofighi Team 2508.17298 null  
2025-08-22 Modular Embedding Recomposition for Incremental Learning Simone Calderara Team 2508.16463 null  
2025-08-22 Structuring GUI Elements through Vision Language Models: Towards Action Space Generation Jingdong Chen Team 2508.16271 null  
2025-08-22 RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution Gui-Song Xia Team 2508.16158 null  
2025-08-22 Beyond Human-prompting: Adaptive Prompt Tuning with Semantic Alignment for Anomaly Detection Chao-Chun Chen Team 2508.16157 null  
2025-08-22 Prompting with Sign Parameters for Low-resource Sign Language Instruction Generation Hasan Mahmud Team 2508.16076 null  
2025-08-22 Less Redundancy: Boosting Practicality of Vision Language Model in Walking Assistants Jinchao Zhang Team 2508.16070 null  
2025-08-21 Glo-VLMs: Leveraging Vision-Language Models for Fine-Grained Diseased Glomerulus Classification Ruining Deng Team 2508.15960 null  
2025-08-21 Semantic-Aware Ship Detection with Vision-Language Integration Xiaomeng Huang Team 2508.15930 null  
2025-08-21 VT-LVLM-AR: A Video-Temporal Large Vision-Language Model Adapter for Fine-Grained Action Recognition in Long-Term Videos Zihan Xu Team 2508.15903 null  
2025-08-21 LLM-empowered Dynamic Prompt Routing for Vision-Language Models Tuning under Long-Tailed Distributions Yulong Bian Team 2508.15688 null  
2025-08-21 Mind and Motion Aligned: A Joint Evaluation IsaacSim Benchmark for Task Planning and Low-Level Policies in Mobile Manipulation Alexey K. Kovalev Team 2508.15663 null  
2025-08-21 DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding Sourav Medya Team 2508.15297 null  
2025-08-21 Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images Jin Tae Kwak Team 2508.15256 null  
2025-08-21 Pathology-Informed Latent Diffusion Model for Anomaly Detection in Lymph Node Metastasis Jin Tae Kwak Team 2508.15236 null  
2025-08-21 See it. Say it. Sorted: Agentic System for Compositional Diagram Generation Ed Li Team 2508.15222 null  
2025-08-21 ContextualLVLM-Agent: A Holistic Framework for Multi-Turn Visually-Grounded Dialogue and Complex Instruction Following Taeyang Yoon Team 2508.15164 null  
2025-08-20 MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs Yunsi Fei Team 2508.15036 null  
2025-08-20 WISE-FUSE: Efficient Whole Slide Image Encoding via Coarse-to-Fine Patch Selection with VLM and LLM Knowledge Fusion Won-Ki Jeong Team 2508.14537 null  
2025-08-19 Multi-Rationale Explainable Object Recognition via Contrastive Conditional Inference Simon Gottschalk Team 2508.14280 null  
2025-08-19 CLIPSym: Delving into Symmetry Detection with CLIP Raymond A. Yeh Team 2508.14197 null  
2025-08-19 LENS: Learning to Segment Anything with Unified Reinforced Reasoning Xinggang Wang Team 2508.14153 link  
2025-08-19 Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation Jianye Hao Team 2508.13998 null  
2025-08-19 Mitigating Cross-Image Information Leakage in LVLMs for Multi-Image Tasks Junsuk Choe Team 2508.13744 link  
2025-08-19 Enhancing Targeted Adversarial Attacks on Large Vision-Language Models through Intermediate Projector Guidance Bin Xiao Team 2508.13739 null  
2025-08-19 Hierarchical Vision-Language Retrieval of Educational Metaverse Content in Agriculture Giuseppe Serra Team 2508.13713 null  
2025-08-19 ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions? Daeyoung Kim Team 2508.13680 null  
2025-08-19 Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation Lin Ma Team 2508.13587 null  
2025-08-21 DictAS: A Framework for Class-Generalizable Few-Shot Anomaly Segmentation via Dictionary Lookup Guiguang Ding Team 2508.13560 link  
2025-08-19 Evaluating Open-Source Vision Language Models for Facial Emotion Recognition against Traditional Deep Learning Models Sridevi Bonthu Team 2508.13524 null  
2025-08-19 STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models Tien-Huy Nguyen Team 2508.13470 null  
2025-08-19 CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models Sergey Levine Team 2508.13446 null  
2025-08-19 Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference Jidong J. Yang Team 2508.13439 null  
2025-08-19 Mitigating Easy Option Bias in Multiple-Choice Question Answering Basura Fernando Team 2508.13428 null  
2025-08-18 Prune2Drive: A Plug-and-Play Framework for Accelerating Vision-Language Models in Autonomous Driving Linfeng Zhang Team 2508.13305 null  
2025-08-18 CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Support Jinming Duan Team 2508.13256 null  
2025-08-18 Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey Liqiang Nie Team 2508.13073 link  
2025-08-18 IntelliCap: Intelligent Guidance for Consistent View Sampling Shohei Mori Team 2508.13043 link  
2025-08-18 Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination Lihua Zhang Team 2508.12957 null  
2025-08-18 RoboRetriever: Single-Camera Robot Object Retrieval via Active and Interactive Perception with Dynamic Scene Graph Yunquan Sun Team 2508.12916 null  
2025-08-18 Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning Ruixuan Wang Team 2508.12877 null  
2025-08-18 Cross-Domain Few-Shot Learning via Multi-View Collaborative Optimization with Vision-Language Models Ruixuan Wang Team 2508.12861 null  
2025-08-18 HeteroRAG: A Heterogeneous Retrieval-Augmented Generation Framework for Medical Vision Language Tasks Yu Wang Team 2508.12778 null  
2025-08-18 Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection Wei Zhou Team 2508.12711 null  
2025-08-18 WP-CLIP: Leveraging CLIP to Predict Wölfflin’s Principles in Visual Art Feng Liu Team 2508.12668 link  
2025-08-18 SpotVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer Zheng Yang Team 2508.12638 null  
2025-08-18 ViLaD: A Large Vision Language Diffusion Framework for End-to-End Autonomous Driving Ziran Wang Team 2508.12603 null  
2025-08-18 Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models Chris Ngo Team 2508.12587 null  
2025-08-18 REVEAL – Reasoning and Evaluation of Visual Evidence through Aligned Language Yash Butala Team 2508.12543 null  
2025-08-17 LangVision-LoRA-NAS: Neural Architecture Search for Variable LoRA Rank in Vision Language Models Venkatram Vishwanath Team 2508.12512 null  
2025-08-17 Standardization of Neuromuscular Reflex Analysis – Role of Fine-Tuned Vision-Language Model Consortium and OpenAI gpt-oss Reasoning LLM Enabled Decision Support System Kasun De Zoysa Team 2508.12473 null  
2025-08-17 M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following Yanfei Qian Team 2508.12458 null  
2025-08-17 X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning Shaoqing Tang Team 2508.12455 null  
2025-08-17 LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving Li Zhang Team 2508.12404 null  
2025-08-17 MPCAR: Multi-Perspective Contextual Augmentation for Enhanced Visual Reasoning in Large Vision-Language Models Xueying Huang Team 2508.12400 null  
2025-08-17 Federated Cross-Modal Style-Aware Prompt Generation Amit Sethi Team 2508.12399 null  
2025-08-15 Reinforcing Video Reasoning Segmentation to Think Before It Segments Huchuan Lu Team 2508.11538 null  
2025-08-15 OVSegDT: Segmenting Transformer for Open-Vocabulary Object Goal Navigation Aleksandr Panov Team 2508.11479 null  
2025-08-15 ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving Li Zhang Team 2508.11428 null  
2025-08-15 Semantically Guided Adversarial Testing of Vision Models Using Language Models Jorge M. Cruz-Duarte Team 2508.11341 null  
2025-08-15 Noise Matters: Optimizing Matching Noise for Diffusion Classifiers Long Chen Team 2508.11330 null  
2025-08-15 Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models Tat-Seng Chua Team 2508.11317 null  
2025-08-15 Vision-Language Models display a strong gender bias Sreedath Panat Team 2508.11262 null  
2025-08-15 Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception Zhuotao Tian Team 2508.11256 null  
2025-08-15 UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning Yue Zhang Team 2508.11196 null  
2025-08-15 Fine-Grained VLM Fine-tuning via Latent Hierarchical Adapter Learning Bin Luo Team 2508.11176 null  
2025-08-15 Better Supervised Fine-tuning for VQA: Integer-Only Loss Junhui Cui Team 2508.11170 null  
2025-08-14 Utilizing Vision-Language Models as Action Models for Intent Recognition and Assistance Rustam Stolkin Team 2508.11093 null  
2025-08-14 Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors? Zhengbo Zou Team 2508.11011 null  
2025-08-14 Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision Anhong Guo Team 2508.10972 null  
2025-08-14 AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences Joey Tianyi Zhou Team 2508.10771 null  
2025-08-14 From Diagnosis to Improvement: Probing Spatio-Physical Reasoning in Vision Language Models Wenqi Shao Team 2508.10770 null  
2025-08-14 IADGPT: Unified LVLM for Few-Shot Industrial Anomaly Detection, Localization, and Reasoning via In-Context Learning Bin Li Team 2508.10681 null  
2025-08-14 AddressVLM: Cross-view Alignment Tuning for Image Address Localization using Large Vision-Language Models Jieping Ye Team 2508.10667 null  
2025-08-14 SemPT: Semantic Prompt Tuning for Vision-Language Models Zhenzhong Chen Team 2508.10645 null  
2025-08-14 ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation Mohsen Guizani Team 2508.10635 null  
2025-08-14 Retrieval-Augmented Prompt for OOD Detection Changqing Zhang Team 2508.10556 null  
2025-08-14 DiFaR: Enhancing Multimodal Misinformation Detection with Diverse, Factual, and Relevant Rationales Zhi Zeng Team 2508.10444 null  
2025-08-14 MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance Yi Zhang Team 2508.10429 link  
2025-08-14 STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes Yu Yamaguchi Team 2508.10427 link  
2025-08-14 PQ-DAF: Pose-driven Quality-controlled Data Augmentation for Data-scarce Driver Distraction Detection Xinghui Song Team 2508.10397 null  
2025-08-14 Contrast Sensitivity Function of Multimodal Vision-Language Models Valero Laparra Team 2508.10367 null  
2025-08-14 JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics Hamid Rezatofighi Team 2508.10287 null  
2025-08-14 MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs Yujun Cai Team 2508.10264 null  
2025-08-13 Efficient Forward-Only Data Valuation for Pretrained LLMs and VLMs Xiaoxiao Li Team 2508.10180 null  
2025-08-13 SynSpill: Improved Industrial Spill Detection With Synthetic Data Shruti Vyas Team 2508.10171 null  
2025-08-13 Interpretable Oracle Bone Script Decipherment through Radical and Pictographic Analysis with LVLMs Bin Li Team 2508.10113 null  
2025-08-13 LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit Wenya Wang Team 2508.09981 null  
2025-08-13 January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis Mark Woodward Team 2508.09966 null  
2025-08-14 Prototype-Guided Diffusion: Visual Conditioning without External Memory Mustapha Lebbah Team 2508.09922 null  
2025-08-12 OpenCUA: Open Foundations for Computer-Use Agents Tao Yu Team 2508.09123 null  
2025-08-12 Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving Tian Ding Team 2508.09099 null  
2025-08-12 Addressing Bias in VLMs for Glaucoma Detection Without Protected Attribute Supervision Prashnna Gyawali Team 2508.09087 null  
2025-08-13 GeoVLA: Empowering 3D Representations in Vision-Language-Action Models Jiale Cao Team 2508.09071 link  
2025-08-12 VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception Lei He Team 2508.09061 null  
2025-08-12 MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions Jin Xu Team 2508.09057 null  
2025-08-12 Rational Inverse Reasoning Leslie Pack Kaelbling Team 2508.08983 null  
2025-08-12 How Does a Virtual Agent Decide Where to Look? – Symbolic Cognitive Reasoning for Embodied Head Rotation Hyeongyeop Kang Team 2508.08930 null  
2025-08-12 Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language Models Xuelong Li Team 2508.08926 null  
2025-08-12 3DFroMLLM: 3D Prototype Generation only from Pretrained Multimodal LLMs Eddy Ilg Team 2508.08821 null  
2025-08-12 SafeFix: Targeted Model Repair via Controlled Image Generation Yunhui Guo Team 2508.08701 null  
2025-08-12 STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision Marios Savvides Team 2508.08688 null  
2025-08-12 AME: Aligned Manifold Entropy for Robust Vision-Language Distillation Yuming Ou Team 2508.08644 null  
2025-08-13 Transferable Model-agnostic Vision-Language Model Adaptation for Efficient Weak-to-Strong Generalization Hyunwoo J. Kim Team 2508.08604 null  
2025-08-12 Superclass-Guided Representation Disentanglement for Spurious Correlation Mitigation Qi Lei Team 2508.08570 null  
2025-08-11 VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models Ravikumar Balakrishnan Team 2508.08521 null  
2025-08-11 Re:Verse – Can Your VLM Read a Manga? Shruti Vyas Team 2508.08508 null  
2025-08-11 ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks Chunhua Shen Team 2508.08240 null  
2025-08-11 Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model Shaoliang Peng Team 2508.08199 null  
2025-08-11 BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models Bo Wang Team 2508.08040 null  
2025-08-11 TRIDE: A Text-assisted Radar-Image weather-aware fusion network for Depth Estimation Robert Wille Team 2508.08038 null  
2025-08-11 TAG: A Simple Yet Effective Temporal-Aware Approach for Zero-Shot Video Temporal Grounding Jee-Hyong Lee Team 2508.07925 null  
2025-08-11 RSVLM-QA: A Benchmark Dataset for Remote Sensing Vision Language Model-based Question Answering Mukesh Prasad Team 2508.07918 null  
2025-08-11 CATP: Contextually Adaptive Token Pruning for Efficient and Enhanced Multimodal In-Context Learning Ruixiang Tang Team 2508.07871 null  
2025-08-11 Effortless Vision-Language Model Specialization in Histopathology without Annotation Katharina Breininger Team 2508.07835 null  
2025-08-11 MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization Alexandros Stergiou Team 2508.07833 link  
2025-08-11 Architectural Co-Design for Zero-Shot Anomaly Detection: Decoupling Representation and Dynamically Fusing Features in CLIP Yueyi Luo Team 2508.07819 null  
2025-08-11 SwarmVLM: VLM-Guided Impedance Control for Autonomous Navigation of Heterogeneous Robots in Dynamic Warehousing Dzmitry Tsetserukou Team 2508.07814 null  
2025-08-11 Grasp-HGN: Grasping the Unexpected Gunar Schirner Team 2508.07648 null  
2025-08-11 Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents Parisa Kordjamshidi Team 2508.07642 null  
2025-08-11 InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information Vivek Gupta Team 2508.07630 null  
2025-08-11 AR-VRM: Imitating Human Motions for Visual Robot Manipulation with Analogical Reasoning Yang Liu Team 2508.07626 null  
2025-08-11 Adaptive Cache Enhancement for Test-Time Adaptation of Vision-Language Models Duc Thanh Nguyen Team 2508.07570 null  
2025-08-10 FormCoach: Lift Smarter, Not Harder Lingjie Liu Team 2508.07501 null  
2025-08-10 Freeze and Reveal: Exposing Modality Bias in Vision-Language Models Ponnurangam Kumaraguru Team 2508.07432 null  
2025-08-10 AgriVLN: Vision-and-Language Navigation for Agricultural Robots Xiang Li Team 2508.07406 null  
2025-08-10 Small-Large Collaboration: Training-efficient Concept Personalization for Large VLM using a Meta Personalized Small VLM Wentao Zhang Team 2508.07260 null  
2025-08-08 Uncertainty-quantified Rollout Policy Adaptation for Unlabelled Cross-domain Temporal Grounding Kun Shao Team 2508.06317 null  
2025-08-08 Real-Time 3D Vision-Language Embedding Mapping Elmar Rueckert Team 2508.06291 null  
2025-08-08 InfoCausalQA:Can Models Perform Non-explicit Causal Reasoning Based on Infographic? Youngjae Yu Team 2508.06220 null  
2025-08-08 VISTAR:A User-Centric and Role-Driven Benchmark for Text-to-Image Evaluation ChengSheng Deng Team 2508.06152 null  
2025-08-08 Q-CLIP: Unleashing the Power of Vision-Language Models for Video Quality Assessment through Unified Cross-Modal Adaptation Shaohui Liu Team 2508.06092 null  
2025-08-08 AdaptInfer: Adaptive Token Pruning for Vision-Language Model Inference with Dynamical Text Guidance Yunhao Liu Team 2508.06084 null  
2025-08-08 Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models Zhouhan Lin Team 2508.06038 null  
2025-08-08 More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment Zhepeng Wang Team 2508.06036 null  
2025-08-08 Mediator-Guided Multi-Agent Collaboration among Open-Source Models for Medical Decision-Making Xiaosong Wang Team 2508.05996 null  
2025-08-08 PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation Yao Mu Team 2508.05976 null  
2025-08-07 HOLODECK 2.0: Vision-Language-Guided 3D World Generation with Editing Chris Callison-Burch Team 2508.05899 null  
2025-08-07 ETTA: Efficient Test-Time Adaptation for Vision-Language Models through Dynamic Embedding Updates Ali cheraghian Team 2508.05898 null  
2025-08-07 Follow-Your-Instruction: A Comprehensive MLLM Agent for World Data Synthesis Zeyu Wang Team 2508.05580 null  
2025-08-07 Adapting Vision-Language Models Without Labels: A Comprehensive Survey Olga Fink Team 2508.05547 link  
2025-08-07 Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions Przemyslaw Biecek Team 2508.05430 null  
2025-08-07 From Detection to Correction: Backdoor-Resilient Face Recognition via Vision-Language Trigger Detection and Noise-Based Neutralization Ibrahim Khalil Team 2508.05409 null  
2025-08-07 DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning Bo Zheng Team 2508.05405 null  
2025-08-07 StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward Models Youqiang Zhou Team 2508.05383 null  
2025-08-07 Textual Inversion for Efficient Adaptation of Open-Vocabulary Object Detectors Without Forgetting Hugo Kuijf Team 2508.05323 null  
2025-08-07 Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction Jorge Peña Queralta Team 2508.05294 null  
2025-08-07 RegionMed-CLIP: A Region-Aware Multimodal Contrastive Learning Pre-trained Model for Medical Image Understanding Guiru Liu Team 2508.05244 null  
2025-08-07 Navigating the Trade-off: A Synthesis of Defensive Strategies for Zero-Shot Adversarial Robustness in Vision-Language Models Jason Sun Team 2508.05237 null  
2025-08-07 ReasoningTrack: Chain-of-Thought Reasoning for Long-term Vision-Language Tracking Zhipeng Zhang Team 2508.05221 null  
2025-08-07 SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images Liangpei Zhang Team 2508.05202 null  
2025-08-07 Chemist Eye: A Visual Language Model-Powered System for Safety Monitoring and Robot Decision-Making in Self-Driving Laboratories Andrew I. Cooper Team 2508.05148 null  
2025-08-07 Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation Zhen Lei Team 2508.05008 null  
2025-08-07 Attribute Guidance With Inherent Pseudo-label For Occluded Person Re-identification Haiyang Zhang Team 2508.04998 null  
2025-08-07 Unified modality separation: A vision-language framework for unsupervised domain adaptation Heng Tao Shen Team 2508.04987 null  
2025-08-07 Accelerating Conditional Prompt Learning via Masked Image Modeling for Vision-Language Models Hyunseung Choo Team 2508.04942 null  
2025-08-06 INTENTION: Inferring Tendencies of Humanoid Robot Motion Through Interactive Intuition and Grounded VLM Nikos Tsagarakis Team 2508.04931 link  
2025-08-06 Automated Bug Frame Retrieval from Gameplay Videos Using Vision-Language Models Cor-Paul Bezemer Team 2508.04895 null  
2025-08-06 Dual-Stream Attention with Multi-Modal Queries for Object Detection in Transportation Applications Wassim Bouachir Team 2508.04868 null  
2025-08-01 MMRAG-DocQA: A Multi-Modal Retrieval-Augmented Generation Method for Document Question-Answering with Hierarchical Index and Multi-Granularity Retrieval Chengcheng Mai Team 2508.00579 null  
2025-08-01 Training-Free Class Purification for Open-Vocabulary Semantic Segmentation Xiaohua Xie Team 2508.00557 null  
2025-08-01 HiPrune: Training-Free Visual Token Pruning via Hierarchical Attention in Vision-Language Models Bin Chen Team 2508.00553 null  
2025-08-01 Your other Left! Vision-Language Models Fail to Identify Relative Positions in Medical Images Timo Ropinski Team 2508.00549 null  
2025-08-01 EFlat-LoRA: Efficiently Seeking Flat Minima for Better Generalization in Fine-Tuning Large Language Models and Beyond Baochang Zhang Team 2508.00522 null  
2025-08-01 When Vision-Language Model (VLM) Meets Beam Prediction: A Multimodal Contrastive Learning Framework Tony Q. S. Quek Team 2508.00456 null  
2025-08-01 CLIPTime: Time-Aware Multimodal Representation Learning from Images and Text Petar Durdevic Team 2508.00447 null  
2025-08-01 AutoDebias: Automated Framework for Debiasing Text-to-Image Models Yang Liu Team 2508.00445 null  
2025-08-01 Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents Rowel O. Atienza Team 2508.00400 null  
2025-08-01 iSafetyBench: A video-language benchmark for safety in industrial environment Shruti Vyas Team 2508.00399 null  
2025-08-01 Decouple before Align: Visual Disentanglement Enhances Prompt Tuning Yanfeng Wang Team 2508.00395 null  
2025-08-01 SA-GCS: Semantic-Aware Gaussian Curriculum Scheduling for UAV Vision-Language Navigation Renxin Zhong Team 2508.00390 null  
2025-08-01 CoRGI: Verified Chain-of-Thought Reasoning with Visual Grounding Lin Shang Team 2508.00378 null  
2025-08-01 Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning Athanasios Voulodimos Team 2508.00356 null  
2025-08-01 Evaluating the Efficacy of Large Language Models for Generating Fine-Grained Visual Privacy Policies in Homes Hewu Li Team 2508.00321 null  
2025-08-01 DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios Lin Ma Team 2508.00311 null  
2025-08-01 Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models Eunwoo Kim Team 2508.00260 null  
2025-08-01 Towards Higher Effective Rank in Parameter-efficient Fine-tuning using Khatri–Rao Product Ehsan Abbasnejad Team 2508.00230 null  
2025-07-31 On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI Enzo Ferrante Team 2508.00171 null  
2025-07-31 ART: Adaptive Relation Tuning for Generalized Relation Prediction Stefan Roth Team 2507.23543 null  
2025-07-23 BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems Christian Berger Team 2507.17722 null  
2025-07-23 InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Jiangmiao Pang Team 2507.17520 null  
2025-07-23 Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection Elisa Ricci Team 2507.17456 null  
2025-07-23 VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization Shoaib Ehsan Team 2507.17455 null  
2025-07-23 Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection Xi Li Team 2507.17436 null  
2025-07-23 Language-Conditioned Open-Vocabulary Mobile Manipulation with Pretrained Models Guanghui Sun Team 2507.17379 null  
2025-07-23 RoadBench: A Vision-Language Foundation Model and Benchmark for Road Damage Understanding Tianyang Wang Team 2507.17353 null  
2025-07-23 HySafe-AI: Hybrid Safety Architectural Analysis Framework for AI Systems: A Case Study Maria Spence Team 2507.17118 null  
2025-07-23 FedVLM: Scalable Personalized Vision-Language Models through Federated Learning Habeeb Olufowobi Team 2507.17088 null  
2025-07-22 VL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings Kannan Achan Team 2507.17080 null  
2025-07-22 Controllable Hybrid Captioner for Improved Long-form Video Understanding Arun Reddy Team 2507.17047 null  
2025-07-22 Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning Kai Chen Team 2507.16814 null  
2025-07-22 Cooling Matters: Benchmarking Large Language Models and Vision-Language Models on Liquid-Cooled Versus Air-Cooled H100 GPU Systems Arslan Munir Team 2507.16781 null  
2025-07-22 Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation Ke Yang Team 2507.16716 null  
2025-07-22 Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory Marco Hutter Team 2507.16713 null  
2025-07-22 Spatial 3D-LLM: Exploring Spatial Awareness in 3D Vision-Language Models Chao Zhang Team 2507.16524 null  
2025-07-22 SceneLoom: Communicating Data with Scene Context Siming Chen Team 2507.16466 null  
2025-07-22 Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language Models Isao Echizen Team 2507.16257 null  
2025-07-22 SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Jiaqi Wang Team 2507.15852 null  
2025-07-21 Can Your Model Separate Yolks with a Water Bottle? Benchmarking Physical Commonsense Understanding in Video Generation Models Erkut Erdem Team 2507.15824 null  
2025-07-23 Visual-Language Model Knowledge Distillation Method for Image Quality Assessment Jiarun Song Team 2507.15680 null  
2025-07-21 Smart Eyes for Silent Threats: VLMs and In-Context Learning for THz Imaging Margret Keuper Team 2507.15576 null  
2025-07-21 HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation Robby T. Tan Team 2507.15542 null  
2025-07-21 Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner Lin Ma Team 2507.15509 null  
2025-07-21 One Last Attention for Your Vision-Language Model Zhiqiang Shen Team 2507.15480 null  
2025-07-21 EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent Xinlei Chen Team 2507.15428 null  
2025-07-21 In-context Learning of Vision Language Models for Detection of Physical and Digital Attacks against Face Recognition Systems Christoph Busch Team 2507.15285 null  
2025-07-21 VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving Tong Heng Lee Team 2507.15266 null  
2025-07-20 Survey of GenAI for Automotive Software Development: From Requirements to Executable Code Alois Knoll Team 2507.15025 null  
2025-07-20 Hierarchical Cross-modal Prompt Learning for Vision-Language Models Zhenhua Huang Team 2507.14976 null  
2025-07-20 FinChart-Bench: Benchmarking Financial Chart Comprehension in Vision-Language Models Mengnan Du Team 2507.14823 null  
2025-07-19 IRGPT: Understanding Real-world Infrared Image with Bi-cross-modal Curriculum on Large-scale Benchmark Ruiheng Zhang Team 2507.14449 null  
2025-07-18 CLIPTTA: Robust Contrastive Vision-Language Test-Time Adaptation Nicolas Thome Team 2507.14312 null  
2025-07-18 In-Depth and In-Breadth: Pre-training Multimodal Language Models Customized for Comprehensive Chart Understanding Leonid Sigal Team 2507.14298 null  
2025-07-18 VLA-Mark: A cross modal watermark for large vision-language alignment model Xuming Hu Team 2507.14067 null  
2025-07-18 EdgeVLA: Efficient Vision-Language-Action Models Benjamin Bolte Team 2507.14049 null  
2025-07-18 Moodifier: MLLM-Enhanced Emotion-Driven Image Editing Sharon X. Huang Team 2507.14024 null  
2025-07-18 When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models Alberto Cazzaniga Team 2507.13868 null  
2025-07-18 Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions Jiajun Zhang Team 2507.13773 null  
2025-07-17 LoRA-Loop: Closing the Synthetic Replay Cycle for Continual VLM Learning Margrit Betke Team 2507.13568 null  
2025-07-17 COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark Vasu Sharma Team 2507.13405 null  
2025-07-17 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Jiaya Jia Team 2507.13348 null  
2025-07-17 Leveraging Language Prior for Infrared Small Target Detection Pravendra Singh Team 2507.13113 null  
2025-07-17 GLAD: Generalizable Tuning for Vision-Language Models Shifeng Chen Team 2507.13089 null  
2025-07-17 Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection Changwen Zheng Team 2507.13061 null  
2025-07-21 LaViPlan : Language-Guided Visual Path Planning with RLVR Hayeon Oh Team 2507.12911 null  
2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning Xiaowen Chu Team 2507.12795 null  
2025-07-16 VLMgineer: Vision Language Models as Robotic Toolsmiths Dinesh Jayaraman Team 2507.12644 null  
2025-07-16 NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting Chaoli Wang Team 2507.12621 null  
2025-07-16 MindJourney: Test-Time Scaling with World Models for Spatial Reasoning Chuang Gan Team 2507.12508 null  
2025-07-16 ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving Xinge Zhu Team 2507.12499 null  
2025-07-15 Spatially Grounded Explanations in Vision Language Models for Document Visual Question Answering Dimosthenis Karatzas Team 2507.12490 null  
2025-07-20 PhysX-3D: Physical-Grounded 3D Asset Generation Ziwei Liu Team 2507.12465 null  
2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images Min Xu Team 2507.12441 null  
2025-07-16 AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models Sihao Ding Team 2507.12414 null  
2025-07-16 Generate to Ground: Multimodal Text Conditioning Boosts Phrase Grounding in Medical Vision-Language Models Bernhard Kainz Team 2507.12236 null  
2025-07-16 InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing Wen-Huang Cheng Team 2507.12060 null  
2025-07-16 GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models Rongrong Ji Team 2507.11969 null  
2025-07-16 POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering Qin Jin Team 2507.11939 null  
2025-07-15 Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis Lihang Ying Team 2507.11730 null  
2025-07-18 How Far Have Medical Vision-Language Models Come? A Comprehensive Benchmarking Study Rossella Arcucci Team 2507.11200 null  
2025-07-15 Bridging the Gap in Vision Language Models in Identifying Unsafe Concepts Across Modalities Yang Zhang Team 2507.11155 null  
2025-07-15 Assessing Color Vision Test in Large Vision-language Models Hongyang Chen Team 2507.11153 null  
2025-07-15 MSA at ImageCLEF 2025 Multimodal Reasoning: Multilingual Multimodal Reasoning With Ensemble Vision Language Models Hamza Moustafa Team 2507.11114 null  
2025-07-15 Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander Lei Chen Team 2507.11079 null  
2025-07-15 Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection Guanzhong Tian Team 2507.11003 null  
2025-07-14 EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Xiaojuan Qi Team 2507.10548 null  
2025-07-14 CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding Yi Wang Team 2507.10449 null  
2025-07-14 Beyond Graph Model: Reliable VLM Fine-Tuning via Random Graph Adapter Bin Luo Team 2507.10355 null  
2025-07-14 Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection Wenqiang Zhang Team 2507.10225 null  
2025-07-14 BlueGlass: A Framework for Composite AI Safety Kay-Ulrich Scholl Team 2507.10106 null  
2025-07-14 Foundation Model Driven Robotics: A Comprehensive Review Ammar Waheed Team 2507.10087 null  
2025-07-14 LayLens: Improving Deepfake Understanding through Simplified Explanations Abhinav Dhall Team 2507.10066 null  
2025-07-14 CoSMo: A Multimodal Transformer for Page Stream Segmentation in Comic Books Dimosthenis Karatzas Team 2507.10053 null  
2025-07-14 Text-Driven Causal Representation Learning for Source-Free Domain Generalization Zhen Lei Team 2507.09961 null  
2025-07-13 NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection Pulei Xiong Team 2507.09795 null  
2025-07-13 Towards Fine-Grained Adaptation of CLIP via a Self-Trained Alignment Score Muhammad Haris Khan Team 2507.09615 null  
2025-07-13 Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations Guiguang Ding Team 2507.09500 null  
2025-07-13 GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them? Huaxiu Yao Team 2507.09491 null  
2025-07-12 Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models Tat-Seng Chua Team 2507.09209 null  
2025-07-12 MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models Dahan Wang Team 2507.09184 null  
2025-07-12 OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering Niaz Abdolrahim Team 2507.09155 null  
2025-07-12 RadEyeVideo: Enhancing general-domain Large Vision Language Model for chest X-ray analysis with video representations of eye gaze Honghan Wu Team 2507.09097 null  
2025-07-11 BlindSight: Harnessing Sparsity for Efficient VLMs Steven K. Reinhardt Team 2507.09071 null  
2025-07-11 Beyond vividness: Content analysis of induced hallucinations reveals the hidden structure of individual differences in visual imagery Seana Coulson Team 2507.09011 null  
2025-07-11 VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models Olivier Déforges Team 2507.08982 null  
2025-07-11 ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way Subarna Tripathi Team 2507.08679 null  
2025-07-11 Adaptive Framework for Ambient Intelligence in Rehabilitation Assistance András Lőrincz Team 2507.08624 null  
2025-07-11 Emergent Natural Language with Communication Games for Improving Image Captioning Capabilities without Additional Data Ambedkar Dukkipati Team 2507.08610 null  
2025-07-11 BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis Hui Xiong Team 2507.08607 null  
2025-07-11 Efficient Deployment of Vision-Language Models on Mobile Devices: A Case Study on OnePlus 13R Sanidhya Kashyap Team 2507.08505 null  
2025-07-11 LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning Lei Fan Team 2507.08496 null  
2025-07-11 Multi-modal Mutual-Guidance Conditional Prompt Learning for Vision-Language Models Jianping Fan Team 2507.08410 null  
2025-07-11 Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning Yejin Choi Team 2507.08224 null  
2025-07-10 CLIP Won’t Learn Object-Attribute Binding from Natural Data and Here is Why Thomas Brox Team 2507.07985 null  
2025-07-10 Scaling RL to Long Videos Song Han Team 2507.07966 null  
2025-07-10 SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment Lei Fan Team 2507.07939 null  
2025-07-10 MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving Chao Zhang Team 2507.07818 null  
2025-07-10 Energy-Guided Decoding for Object Hallucination Mitigation Christopher Zach Team 2507.07731 null  
2025-07-10 One Object, Multiple Lies: A Benchmark for Cross-task Adversarial Attack on Unified Vision-Language Models Cairong Zhao Team 2507.07709 null  
2025-07-10 Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought Daiki Chijiwa Team 2507.07685 null  
2025-07-11 ViLU: Learning Vision-Language Uncertainties for Failure Prediction Nicolas Thome Team 2507.07620 null  
2025-07-10 LOSC: LiDAR Open-voc Segmentation Consolidator Renaud Marlet Team 2507.07605 null  
2025-07-10 The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs Qun Liu Team 2507.07562 null  
2025-07-10 ArchiveGPT: A human-centered evaluation of using a vision language model for image cataloguing Markus Huff Team 2507.07551 null  
2025-07-11 Entity Re-identification in Visual Storytelling via Contrastive Reinforcement Learning David Martins de Matos Team 2507.07340 null  
2025-07-09 ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation Suren Kumar Team 2507.07317 null  
2025-07-09 LangNavBench: Evaluation of Natural Language Understanding in Semantic Navigation Angel X. Chang Team 2507.07299 null  
2025-07-09 MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning Dan Goldwasser Team 2507.07297 null  
2025-07-09 4KAgent: Agentic Any Image to 4K Super-Resolution Zhengzhong Tu Team 2507.07105 null  
2025-07-14 Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Junfei Xiao Team 2507.07104 link  
2025-07-09 Evaluating Attribute Confusion in Fashion Text-to-Image Generation Davide Talon Team 2507.07079 null  
2025-07-09 Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM Sibei Yang Team 2507.06973 null  
2025-07-09 CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale Quan Wang Team 2507.06959 null  
2025-07-09 VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation Tat-Seng Chua Team 2507.06899 null  
2025-07-09 HVI-CIDNet+: Beyond Extreme Darkness for Low-Light Image Enhancement Yanning Zhang Team 2507.06814 null  
2025-07-09 Finetuning Vision-Language Models as OCR Systems for Low-Resource Languages: A Case Study of Manchu Donghyeok Choi Team 2507.06761 null  
2025-07-09 Text-promptable Object Counting via Quantity Awareness Enhancement Li Li Team 2507.06679 null  
2025-07-09 Cross-Modal Dual-Causal Learning for Long-Term Action Recognition Fan Chao Team 2507.06603 null  
2025-07-09 Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection Xiangmin Xu Team 2507.06510 null  
2025-07-09 3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds Nick Haber Team 2507.06484 null  
2025-07-08 VisioPath: Vision-Language Enhanced Model Predictive Control for Safe Autonomous Navigation in Mixed Traffic Andreas A. Malikopoulos Team 2507.06441 null  
2025-07-08 CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions Yi R. Fung Team 2507.06210 null  
2025-07-08 Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling Naga Harshita Marupaka Team 2507.06183 null  
2025-07-10 Skywork-R1V3 Technical Report Yahui Zhou Team 2507.06167 null  
2025-07-08 LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models Hongming Shan Team 2507.06140 null  
2025-07-08 GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image Parsing Hao Liu Team 2507.05887 null  
2025-07-08 Bridging Perception and Language: A Systematic Benchmark for LVLMs’ Understanding of Amodal Completion Reports Hitomi Yanaka Team 2507.05799 null  
2025-07-08 SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning Tao He Team 2507.05798 null  
2025-07-08 A Satellite-Ground Synergistic Large Vision-Language Model System for Earth Observation Yue Gao Team 2507.05731 null  
2025-07-09 Integrated Structural Prompt Learning for Vision-Language Models Bin Luo Team 2507.05677 null  
2025-07-08 R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding Shabnam Ghadar Team 2507.05673 null  
2025-07-08 Dynamic Rank Adaptation for Vision-Language Models Bin Luo Team 2507.05668 null  
2025-07-08 Structured Task Solving via Modular Embodied Intelligence: A Case Study on Rubik’s Cube Shenghai Yuan Team 2507.05607 null  
2025-07-08 Rethinking Layered Graphic Design Generation with a Top-Down Approach Qifeng Chen Team 2507.05601 null  
2025-07-08 PaddleOCR 3.0 Technical Report Yanjun Ma Team 2507.05595 null  
2025-07-07 Fine-Grained Vision-Language Modeling for Multimodal Training Assistants in Augmented Reality Junxiao Wang Team 2507.05515 null  
2025-07-07 Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model Even Oldridge Team 2507.05513 null  
2025-07-07 OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts Priyadarshini Panda Team 2507.05427 null  
2025-07-07 pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models Ramtin Pedarsani Team 2507.05394 null  
2025-07-07 NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving Cheng Lu Team 2507.05227 null  
2025-07-07 All in One: Visual-Description-Guided Unified Point Cloud Segmentation Rao Muhammad Anwer Team 2507.05211 null  
2025-07-07 Differential Attention for Multimodal Crisis Event Analysis Abdullah-Al-Zubaer Imran Team 2507.05165 null  
2025-07-07 INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling Bo Zheng Team 2507.05056 null  
2025-07-07 Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision Nicolas Padoy Team 2507.05020 null  
2025-07-07 Training-free Generation of Temporally Consistent Rewards from VLMs Jian Tang Team 2507.04789 null  
2025-07-07 Vision-Language Models Can’t See the Obvious Sanath Narayan Team 2507.04741 null  
2025-07-07 An analysis of vision-language models for fabric retrieval Fabio Poiesi Team 2507.04735 null  
2025-07-07 A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets Jie Zhou Team 2507.04699 null  
2025-07-07 MOSU: Autonomous Long-range Robot Navigation with Multi-modal Scene Understanding Dinesh Manocha Team 2507.04686 null  
2025-07-07 Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation Chang Xu Team 2507.04680 null  
2025-07-06 VLM-TDP: VLM-guided Trajectory-conditioned Diffusion Policy for Robust Long-Horizon Manipulation Lei Han Team 2507.04524 null  
2025-07-08 FA: Forced Prompt Learning of Vision-Language Models for Out-of-Distribution Detection Ruixuan Wang Team 2507.04511 null  
2025-07-06 MVL-Loc: Leveraging Vision-Language Model for Generalizable Multi-Scene Camera Relocalization Changhao Chen Team 2507.04509 null  
2025-07-06 Think Twice Before You Judge: Mixture of Dual Reasoning Experts for Multimodal Sarcasm Detection Sanasam Ranbir Singh Team 2507.04458 null  
2025-07-06 Multi-Modal Semantic Parsing for the Interpretation of Tombstone Inscriptions Johan Bos Team 2507.04377 null  
2025-07-05 LVLM-Composer’s Explicit Planning for Image Generation Amina Grant Team 2507.04152 null  
2025-07-05 Unlocking Compositional Control: Self-Supervision for LVLM-Based Image Generation Hunter Young Team 2507.04151 null  
2025-07-05 PresentAgent: Multimodal Agent for Presentation Video Generation Yang Zhao Team 2507.04036 null  
2025-07-05 A Comparative Study of Specialized LLMs as Dense Retrievers Jiafeng Guo Team 2507.03958 null  
2025-07-03 ArtGS:3D Gaussian Splatting for Interactive Visual-Physical Modeling and Manipulation of Articulated Objects Cewu Lu Team 2507.02600 null  
2025-07-02 cVLA: Towards Efficient Camera-Space VLAs Thomas Brox Team 2507.02190 null  
2025-07-02 Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges Anuj Sharma Team 2507.02074 null  
2025-07-01 Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames Cordelia Schmid Team 2507.02001 null  
2025-07-02 How Do Vision-Language Models Process Conflicting Information Across Modalities? Ellie Pavlick Team 2507.01790 null  
2025-07-02 Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition Muzammil Behzad Team 2507.01673 null  
2025-07-02 MARVIS: Modality Adaptive Reasoning over VISualizations Chinmay Hegde Team 2507.01544 null  
2025-07-02 Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence Martin Schramm Team 2507.01504 null  
2025-07-02 BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments Mingzhai Sun Team 2507.01485 null  
2025-07-03 TriVLA: A Triple-System-Based Unified Vision-Language-Action Model for General Robot Control Yanwei Fu Team 2507.01424 null  
2025-07-02 CaptionSmiths: Flexibly Controlling Language Pattern in Image Captioning Yoshitaka Ushiku Team 2507.01409 null  
2025-07-02 Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model Xi Li Team 2507.01351 null  
2025-07-02 AIGVE-MACS: Unified Multi-Aspect Commenting and Scoring Model for AI-Generated Video Evaluation Jiawei Zhang Team 2507.01255 null  
2025-07-02 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Jie Tang Team 2507.01006 null  
2025-07-04 Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations Yunzhu Li Team 2507.00990 null  
2025-07-01 Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact Seyedali Mirjalili Team 2507.00951 null  
2025-07-01 The Age of Sensorial Zero Trust: Why We Can No Longer Trust Our Senses Fabio Correa Xavier Team 2507.00907 null  
2025-07-01 ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models Yaqi Xie Team 2507.00898 null  
2025-07-01 GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond Luc Van Gool Team 2507.00886 null  
2025-07-01 UPRE: Zero-Shot Domain Adaptation for Object Detection via Unified Prompt and Representation Enhancement Xiangxiang Chu Team 2507.00721 null  
2025-07-01 Contrasting Cognitive Styles in Vision-Language Models: Holistic Attention in Japanese Versus Analytical Focus in English Rajesh Sharma Team 2507.00700 null  
2025-07-01 Context-Aware Academic Emotion Dataset and Benchmark Wenwu Yang Team 2507.00586 null  
2025-07-01 Not All Attention Heads Are What You Need: Refining CLIP’s Image Representation with Attention Ablation Rong Xiao Team 2507.00537 null  
2025-07-01 Box-QAymo: Box-Referring VQA Dataset for Autonomous Driving Yadan Luo Team 2507.00525 null  
2025-06-30 EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations Sungzoon Cho Team 2506.24016 null  
2025-06-30 The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models Tieniu Tan Team 2506.24000 null  
2025-06-30 GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models Hassan Rivaz Team 2506.23903 null  
2025-06-30 A Closer Look at Conditional Prompt Tuning for Vision-Language Models Heng Tao Shen Team 2506.23856 null  
2025-06-30 Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model Fahad Shahbaz Khan Team 2506.23822 null  
2025-06-30 Visual Textualization for Image Prompted Object Detection Yan Xu Team 2506.23785 null  
2025-06-30 PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies? Ransalu Senanayake Team 2506.23725 null  
2025-06-30 On the Domain Robustness of Contrastive Vision-Language Models Erik Rodner Team 2506.23663 null  
2025-06-30 CAI: Caption-Sensitive Attention Intervention for Mitigating Object Hallucination in Large Vision-Language Models Bing Qin Team 2506.23590 null  
2025-06-30 A Clinically-Grounded Two-Stage Framework for Renal CT Report Generation Jie Xu Team 2506.23584 null  
2025-07-01 ZonUI-3B: A Lightweight Vision-Language Model for Cross-Resolution GUI Grounding ShengJing Yang Team 2506.23491 null  
2025-06-30 Sanitizing Manufacturing Dataset Labels Using Vision-Language Models Vinh Nguyen Team 2506.23465 null  
2025-06-29 GeoProg3D: Compositional Visual Reasoning for City-Scale 3D Language Fields Yutaka Matsuo Team 2506.23352 null  
2025-06-29 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering Brandon Y. Feng Team 2506.23329 null  
2025-07-01 SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting Hongliang Ren Team 2506.23309 null  
2025-06-29 Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models Tanmoy Chakraborty Team 2506.23122 null  
2025-06-29 MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings Zhicheng Dou Team 2506.23115 null  
2025-06-29 Empowering Small VLMs to Think with Dynamic Memorization and Exploration Long Chen Team 2506.23061 null  
2025-06-29 SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions Maarten Sap Team 2506.23046 null  
2025-06-28 Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models Swadesh Swain Team 2506.22982 null  
2025-06-27 MiCo: Multi-image Contrast for Reinforcement Visual Reasoning Hengshuang Zhao Team 2506.22434 null  
2025-06-27 Test-Time Consistency in Vision Language Models Leonid Sigal Team 2506.22395 null  
2025-06-27 Exploiting Vision Language Model for Training-Free 3D Point Cloud OOD Detection via Graph Score Propagation Xun Xu Team 2506.22375 null  
2025-06-27 Rethinking Visual Token Reduction in LVLMs under Cross-modal Misalignment Bo Du Team 2506.22283 null  
2025-06-27 COOCO – Common Objects Out-of-Context – Semantic Violation in Scenes: Investigating Multimodal Context in Referential Communication Albert Gatt Team 2506.22274 null  
2025-06-27 Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs Mahdieh Soleymani Baghshah Team 2506.22146 null  
2025-06-27 Universal Retrieval for Multimodal Trajectory Modeling Dehan Kong Team 2506.22056 null  
2025-06-27 Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation Daisuke Deguchi Team 2506.22032 null  
2025-06-27 SODA: Out-of-Distribution Detection in Domain-Shifted Point Clouds via Neighborhood Propagation Xulei Yang Team 2506.21892 null  
2025-06-27 Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles Matthew J. Barth Team 2506.21885 null  
2025-06-27 Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation Zhiting Hu Team 2506.21876 null  
2025-06-27 On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling Ben Y. Zhao Team 2506.21874 null  
2025-06-27 Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling Yong Man Ro Team 2506.21863 null  
2025-06-27 Embodied Domain Adaptation for Object Detection Feras Dayoub Team 2506.21860 null  
2025-06-27 The Cost of Avoiding Backpropagation Hui Guan Team 2506.21833 null  
2025-06-26 ViStruct: Simulating Expert-Like Reasoning Through Task Decomposition and Visual Attention Cues Carolina Nobre Team 2506.21762 null  
2025-06-26 Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs Ismini Lourentzou Team 2506.21656 null  
2025-06-26 Mitigating Hallucination of Large Vision-Language Models via Dynamic Logits Calibration Jian Wu Team 2506.21509 null  
2025-06-26 Global and Local Entailment Learning for Natural World Imagery Nathan Jacobs Team 2506.21476 null  
2025-06-26 Spatial Mental Modeling from Limited Views Li Fei-Fei Team 2506.21458 null  
2025-06-27 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Ziwei Liu Team 2506.21356 null  
2025-06-26 LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning Hayaru Shouno Team 2506.21317 null  
2025-06-26 DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Ganesh Ramakrishnan Team 2506.21316 null  
2025-06-26 World-aware Planning Narratives Enhance Large Vision-Language Model Planner Xipeng QIu Team 2506.21230 null  
2025-06-26 Personalized Federated Learning via Dual-Prompt Optimization and Cross Fusion Jian Liang Team 2506.21144 null  
2025-06-26 V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling Bin Ran Team 2506.21041 null  
2025-06-26 Multimodal Prompt Alignment for Facial Expression Recognition Shutao Li Team 2506.21017 null  
2025-06-26 Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology S Kevin Zhou Team 2506.21001 null  
2025-06-26 TSDASeg: A Two-Stage Model with Direct Alignment for Interactive Point Cloud Segmentation Yihong Wu Team 2506.20991 null  
2025-06-26 SharpZO: Hybrid Sharpness-Aware Vision Language Model Prompt Tuning via Forward-Only Passes Zheng Zhang Team 2506.20990 null  
2025-06-26 Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends Zeng-Guang Hou Team 2506.20966 null  
2025-06-26 E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs Minh-Son Dao Team 2506.20944 null  
2025-06-25 Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models Zafer Dogan Team 2506.20832 null  
2025-06-25 How do Foundation Models Compare to Skeleton-Based Approaches for Gesture Recognition in Human-Robot Interaction? Bastian Leibe Team 2506.20795 null  
2025-06-27 Shape2Animal: Creative Animal Generation from Natural Silhouettes Trung-Nghia Le Team 2506.20616 null  
2025-06-25 HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction Maja Matarić Team 2506.20566 null  
2025-06-25 Med-Art: Diffusion Transformer for 2D Medical Text-to-Image Generation Morten Rieger Hannemose Team 2506.20449 null  
2025-06-25 CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition Michael Gienger Team 2506.20373 null  
2025-06-25 Mobile-R1: Towards Interactive Reinforcement Learning for VLM-Based Mobile Agent via Task-Level Rewards Bo Zheng Team 2506.20332 null  
2025-06-25 MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations Vikram S. Adve Team 2506.20100 null  
2025-06-24 Unified Vision-Language-Action Model Zhaoxiang Zhang Team 2506.19850 null  
2025-06-24 Evaluating Compliance with Visualization Guidelines in Diagrams for Scientific Publications Using Large Vision Language Models Christoph M. Friedrich Team 2506.19825 null  
2025-06-24 CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation Jiangmiao Pang Team 2506.19816 null  
2025-06-24 UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation Zhongliang Jiang Team 2506.19694 null  
2025-06-24 PEVLM: Parallel Encoding for Vision-Language Models Yong Wu Team 2506.19651 null  
2025-06-24 V2T-CoT: From Vision to Text Chain-of-Thought for Medical Reasoning and Diagnosis Zuozhu Liu Team 2506.19610 null  
2025-06-24 ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP Bokui Chen Team 2506.19608 null  
2025-06-24 Fake or Real, Can Robots Tell? Evaluating Embodied Vision-Language Models on Real and 3D-Printed Objects Angelo Cangelosi Team 2506.19579 null  
2025-06-24 Visual hallucination detection in large vision-language models via evidential conflict Liping Jing Team 2506.19513 null  
2025-06-24 T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models Qingyao Wu Team 2506.19498 null  
2025-06-24 Emergence of Text Readability in Vision Language Models Bohyung Han Team 2506.19389 null  
2025-06-24 Robotic Perception with a Large Tactile-Vision-Language Model for Physical Property Inference Nutan Chen Team 2506.19303 null  
2025-06-24 Open-Vocabulary Camouflaged Object Segmentation with Cascaded Vision Language Models Dan Zeng Team 2506.19300 null  
2025-06-24 Da Yu: Towards USV-Based Image Captioning for Waterway Surveillance and Scene Understanding Hui Xiong Team 2506.19288 null  
2025-06-24 MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models Bo Zheng Team 2506.19257 null  
2025-06-24 Scaffolding Dexterous Manipulation with Vision-Language Models Dorsa Sadigh Team 2506.19212 null  
2025-06-23 Reading Smiles: Proxy Bias in Foundation Models for Facial Emotion Recognition Bjoern W. Schuller Team 2506.19079 null  
2025-06-23 HAWAII: Hierarchical Visual Knowledge Transfer for Efficient Vision-Language Models Krzysztof Czarnecki Team 2506.19072 null  
2025-06-23 GLIMPSE: Gradient-Layer Importance Mapping for Prompted Visual Saliency Explanation for Generative LVLMs Guanxi Shen Team 2506.18985 null  
2025-06-23 VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning Jian Zhang Team 2506.18564 null  
2025-06-23 Generalizing Vision-Language Models to Novel Domains: A Comprehensive Survey Heng Tao Shen Team 2506.18504 null  
2025-06-23 InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models Wenhai Wang Team 2506.18385 null  
2025-06-23 Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review Jing Qin Team 2506.18378 null  
2025-06-23 Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations? Bill Howe Team 2506.18322 null  
2025-06-24 Referring Expression Instance Retrieval and A Strong End-to-End Baseline JinQiao Wang Team 2506.18246 null  
2025-06-23 Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning Xinhai Zhao Team 2506.18234 null  
2025-06-22 See-in-Pairs: Reference Image-Guided Comparative Vision-Language Models for Medical Diagnosis Xiaoxiao Li Team 2506.18140 null  
2025-06-22 CLGRPO: Reasoning Ability Enhancement for Small VLMs Zhiwang Zhang Team 2506.18048 null  
2025-06-22 Adapting Vision-Language Models for Evaluating World Models Sarah Parisot Team 2506.17967 null  
2025-06-21 RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models Marco Pavone Team 2506.17811 null  
2025-06-21 MDSAM:Memory-Driven Sparse Attention Matrix for LVLMs Hallucination Mitigation Xiaochuan Shi Team 2506.17664 null  
2025-06-21 Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning Yu-Chiang Frank Wang Team 2506.17645 null  
2025-06-21 CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning Xiaoling Wang Team 2506.17629 null  
2025-06-21 DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving Zhengzhong Tu Team 2506.17590 null  
2025-06-21 HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models Tao He Team 2506.17587 null  
2025-06-20 Trustworthy Few-Shot Transfer of Medical VLMs through Split Conformal Prediction Jose Dolz Team 2506.17503 null  
2025-06-20 Few-Shot, Now for Real: Medical VLMs Adaptation without Balanced Sets or Validation Ismail Ben Ayed Team 2506.17500 null  
2025-06-20 General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting Georgios Georgakis Team 2506.17462 null  
2025-06-20 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? Klara Nahrstedt Team 2506.17417 null  
2025-06-20 VLN-R1: Vision-Language Navigation via Reinforcement Fine-Tuning Hengshuang Zhao Team 2506.17221 null  
2025-06-20 Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Chuang Gan Team 2506.17218 null  
2025-06-20 Do We Need Large VLMs for Spotting Soccer Actions? Sandeep Chaurasia Team 2506.17144 null  
2025-06-20 Prmpt2Adpt: Prompt-Based Zero-Shot Domain Adaptation for Resource-Constrained Environments Nathaniel D. Bastian Team 2506.16994 null  
2025-06-20 FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation Jinqiao Wang Team 2506.16806 null  
2025-06-20 Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor Scenes Chen Feng Team 2506.16805 null  
2025-06-20 Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models Xiaohua Xu Team 2506.16760 null  
2025-06-20 TeSG: Textual Semantic Guidance for Infrared and Visible Image Fusion Xinbo Gao Team 2506.16730 null  
2025-06-20 V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos Xiaoyu Qin Team 2506.16716 null  
2025-06-20 VLM-Empowered Multi-Mode System for Efficient and Safe Planetary Navigation Liang Ding Team 2506.16703 null  
2025-06-20 LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation Jing Liu Team 2506.16691 null  
2025-06-19 CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity Yunzhu Li Team 2506.16652 null  
2025-06-19 History-Augmented Vision-Language Models for Frontier-Based Zero-Shot Object Navigation Fatemeh Afghah Team 2506.16623 null  
2025-06-19 GoalLadder: Incremental Goal Discovery with Vision-Language Models Shimon Whiteson Team 2506.16396 null  
2025-06-19 CLIP-MG: Guiding Semantic Attention with Skeletal Pose Features and RGB Data for Micro-Gesture Recognition on the iMiGUE Dataset Amith Adiraju Team 2506.16385 null  
2025-06-19 FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models Tat-Seng Chua Team 2506.16218 null  
2025-06-19 AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models Shanghang Zhang Team 2506.16112 null  
2025-06-19 Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation Yansong Tang Team 2506.16058 null  
2025-06-19 DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning Zongqing Lu Team 2506.16012 null  
2025-06-18 VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics Michal Štefánik Team 2506.15903 null  
2025-06-18 GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Yueh-Hua Wu Team 2506.15681 null  
2025-06-18 Dual-Stage Value-Guided Inference with Margin-Based Reward Adjustment for Fast and Faithful VLM Captioning Imran Razzak Team 2506.15649 null  
2025-06-18 FindingDory: A Benchmark to Evaluate Memory in Embodied Agents Zsolt Kira Team 2506.15635 null  
2025-06-18 WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts Rémi Lebret Team 2506.15594 link  
2025-06-18 DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement Zhuang Li Team 2506.15583 link  
2025-06-18 Context-Informed Grounding Supervision Minjoon Seo Team 2506.15480 link  
2025-06-19 OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models Guotai Wang Team 2506.15318 null  
2025-06-18 MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering Adrian K. Davision Team 2506.15298 null  
2025-06-18 ReSeDis: A Dataset for Referring-based Object Search across Large-Scale Image Collections Shin’ichi Satoh Team 2506.15180 null  
2025-06-18 DyNaVLM: Zero-Shot Vision-Language Navigation System with Dynamic Viewpoints and Self-Refining Graph Memory Yue Gao Team 2506.15096 null  
2025-06-18 An Empirical Study of Bugs in Data Visualization Libraries Chengnian Sun Team 2506.15084 link  
2025-06-17 PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning Yeyun Gong Team 2506.14907 link  
2025-06-17 RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills Chuang Gan Team 2506.14763 null  
2025-06-17 Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models Yuke Zhu Team 2506.14727 null  
2025-06-17 AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions Dacheng Tao Team 2506.14697 null  
2025-06-17 Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models Jiaheng Wei Team 2506.14674 null  
2025-06-17 StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery Michelle Pasco Team 2506.14670 null  
2025-06-17 SIRI-Bench: Challenging VLMs’ Spatial Intelligence through Complex Reasoning Tasks Liang Lin Team 2506.14512 null  
2025-06-17 Can Pretrained Vision-Language Embeddings Alone Guide Robot Navigation? Soumik Sarkar Team 2506.14507 link  
2025-06-17 Adapting Lightweight Vision Language Models for Radiological Visual Question Answering Chang Sun Team 2506.14451 null  
2025-06-17 Causally Steered Diffusion for Automated Video Counterfactual Generation Sotirios A. Tsaftaris Team 2506.14404 null  
2025-06-17 Narrate2Nav: Real-Time Visual Navigation with Implicit Language Reasoning in Human-Centric Environments Xuesu Xiao Team 2506.14233 null  
2025-06-17 Interpreting Biomedical VLMs on High-Imbalance Out-of-Distributions: An Insight into BiomedCLIP on Radiology Benjamin Kwan Team 2506.14136 null  
2025-06-17 A Hierarchical Test Platform for Vision Language Model (VLM)-Integrated Real-World Autonomous Driving Ziran Wang Team 2506.14100 null  
2025-06-16 Disentangling 3D from Large Vision-Language Models for Controlled Portrait Generation Hyeongwoo Kim Team 2506.14015 null  
2025-06-16 GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics Mac Schwager Team 2506.14009 null  
2025-06-16 Comparison of ConvNeXt and Vision-Language Models for Breast Density Assessment in Screening Mammography Alejandro Santos-Díaz Team 2506.13964 null  
2025-06-16 HierVL: Semi-Supervised Segmentation leveraging Hierarchical Vision-Language Synergy with Dynamic Text-Spatial Query Alignment Abdul Bais Team 2506.13925 null  
2025-06-16 Touch begins where vision ends: Generalizable policies for contact-rich manipulation Raunaq Bhirangi Team 2506.13762 null  
2025-06-16 Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins Wei-Chiu Ma Team 2506.13761 null  
2025-06-16 OTFusion: Bridging Vision-only and Vision-Language Models via Optimal Transport for Transductive Zero-Shot Learning Yonghang Tai Team 2506.13723 null  
2025-06-16 ROSA: Harnessing Robot States for Vision-Language and Action Alignment Xiaoyan Sun Team 2506.13679 null  
2025-06-16 DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models Hanspeter Pfister Team 2506.13638 null  
2025-06-16 VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation Wei Pan Team 2506.13428 null  
2025-06-16 Uncertainty-Informed Active Perception for Open Vocabulary Object Goal Navigation Marija Popović Team 2506.13367 null  
2025-06-16 Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling Rei Kawakami Team 2506.13282 null  
2025-06-16 Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments Ee-Chien Chang Team 2506.13205 null  
2025-06-16 Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence Bernard Ghanem Team 2506.13187 null  
2025-06-16 GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models Jun Wang Team 2506.13166 null  
2025-06-16 Rethinking Test-Time Scaling for Medical AI: Model and Task-Aware Strategies for LLMs and VLMs Byung-Hoon Kim Team 2506.13102 null  
2025-06-16 PRISM2: Unlocking Multi-Modal General Pathology AI with Clinical Dialogue Siqi Liu Team 2506.13063 null  
2025-06-17 HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs Xuezhi Cao Team 2506.13038 null  
2025-06-15 CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making Zuozhu Liu Team 2506.12849 null  
2025-06-15 Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models Chang D. Yoo Team 2506.12822 null  
2025-06-15 Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models Wentao Zhang Team 2506.12776 null  
2025-06-15 NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models Jitao Sang Team 2506.12706 null  
2025-06-15 Evaluating Cell Type Inference in Vision Language Models Under Varying Visual Context Sandeep Singhal Team 2506.12683 null  
2025-06-14 Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation Yuexian Zou Team 2506.12609 null  
2025-06-13 Affogato: Learning Open-Vocabulary Affordance Grounding with Automated Data Generation at Scale Minsu Cho Team 2506.12009 null  
2025-06-13 How Visual Representations Map to Language Feature Space in Multimodal LLMs Neel Nanda Team 2506.11976 null  
2025-06-13 Rethinking Multilingual Vision-Language Translation: Dataset, Evaluation, and Adaptation Kaifu Zhang Team 2506.11820 null  
2025-06-13 MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space Jan Strich Team 2506.11684 null  
2025-06-13 VLM@school – Evaluation of AI image understanding on German middle school knowledge Vincent Tischler Team 2506.11604 null  
2025-06-16 EasyARC: Evaluating Vision Language Models on True Visual Reasoning Aylin Akkus Team 2506.11595 null  
2025-06-13 Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis Johannes Betz Team 2506.11526 null  
2025-06-13 Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs Min-Yen Kan Team 2506.11515 null  
2025-06-13 Taming Stable Diffusion for Computed Tomography Blind Super-Resolution Lichao Mou Team 2506.11496 null  
2025-06-13 On the Natural Robustness of Vision-Language Models Against Visual Perception Attacks in Autonomous Driving Mert D. Pesé Team 2506.11472 null  
2025-06-12 Poutine: Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training Enable Robust End-to-End Autonomous Driving Liam Paull Team 2506.11234 null  
2025-06-12 AIR: Zero-shot Generative Model Adaptation with Iterative Refinement Ngai-Man Cheung Team 2506.10895 link  
2025-06-13 RationalVLA: A Rational Vision-Language-Action Model with Dual System Haoang Li Team 2506.10826 null  
2025-06-12 Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding Mir Feroskhan Team 2506.10756 null  
2025-06-13 IQE-CLIP: Instance-aware Query Embedding for Zero-/Few-shot Anomaly Detection in Medical Domain Yefeng Zheng Team 2506.10730 link  
2025-06-12 GigaVideo-1: Advancing Video Generation via Automatic Feedback with 4 GPU-Hours Fine-Tuning Guan Huang Team 2506.10639 null  
2025-06-12 Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning Yong Liu Team 2506.10575 null  
2025-06-12 LLMs Are Not Yet Ready for Deepfake Image Detection Kristen Moore Team 2506.10474 null  
2025-06-12 UrbanSense:AFramework for Quantitative Analysis of Urban Streetscapes leveraging Vision Large Language Models Shuai Lu Team 2506.10342 null  
2025-06-12 Using Vision Language Models to Detect Students’ Academic Emotion through Facial Expressions Gaowei Chen Team 2506.10334 null  
2025-06-12 HalLoc: Token-level Localization of Hallucinations for Vision Language Models Gunhee Kim Team 2506.10286 null  
2025-06-11 Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval Francis Ferraro Team 2506.10202 null  
2025-06-11 Improving Personalized Search with Regularized Low-Rank Parameter Updates Bryan Russell Team 2506.10182 null  
2025-06-11 A Navigation Framework Utilizing Vision-Language Models Kaiyu tang Team 2506.10172 null  
2025-06-11 One Patient, Many Contexts: Scaling Medical AI Through Contextual Intelligence Marinka Zitnik Team 2506.10157 null  
2025-06-11 ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Lijuan Wang Team 2506.10128 null  
2025-06-11 Test-Time Adaptation for Generalizable Task Progress Estimation Alessandra Russo Team 2506.10085 null  
2025-06-11 Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing Tieniu Tan Team 2506.09965 link  
2025-06-11 From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models Chen Feng Team 2506.09930 null  
2025-06-11 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation Hyunjung Shim Team 2506.09883 link  
2025-06-11 Adding simple structure at inference improves Vision-Language Compositionality Gorka Azkune Team 2506.09691 link  
2025-06-11 FedVLMBench: Benchmarking Federated Fine-Tuning of Vision-Language Models Liangqiong Qu Team 2506.09638 null  
2025-06-11 Revisit What You See: Disclose Language Prior in Vision Tokens for Efficient Guided Decoding of LVLMs Jaehyung Kim Team 2506.09522 link  
2025-06-11 Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning Jia Li Team 2506.09473 null  
2025-06-11 TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision Susmit Jha Team 2506.09445 null  
2025-06-11 DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt Ge Li Team 2506.09353 null  
2025-06-10 UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation Li Fei-Fei Team 2506.09284 null  
2025-06-10 MultiNet: An Open-Source Software Toolkit \& Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models Harshvardhan Sikka Team 2506.09172 null  
2025-06-10 VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning Zhenfei Yin Team 2506.09049 null  
2025-06-11 Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs Yonatan Belinkov Team 2506.09047 null  
2025-06-10 Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Jiaqi Wang Team 2506.09040 null  
2025-06-10 Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Liansheng Wang Team 2506.08990 null  
2025-06-10 Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions Yejin Choi Team 2506.08927 null  
2025-06-12 Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought Shanghang Zhang Team 2506.08817 null  
2025-06-10 Multimodal Representation Alignment for Cross-modal Information Retrieval Luis A. Leiva Team 2506.08774 null  
2025-06-10 PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly Xiaodan Liang Team 2506.08708 null  
2025-06-10 VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism Weijiang Yu Team 2506.08691 null  
2025-06-10 ATAS: Any-to-Any Self-Distillation for Enhanced Open-Vocabulary Dense Prediction Taesup Kim Team 2506.08678 null  
2025-06-10 Convergence of Spectral Principal Paths: How Deep Networks Distill Linear Representations from Noisy Inputs Ang Li Team 2506.08543 null  
2025-06-10 Better Reasoning with Less Data: Enhancing VLMs Through Unified Modality Scoring Jiaheng Wei Team 2506.08429 null  
2025-06-11 SafeCoT: Improving VLM Safety with Minimal Reasoning Chaochao Lu Team 2506.08399 null  
2025-06-10 SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding Jaeyoung Do Team 2506.08391 null  
2025-06-09 A Good CREPE needs more than just Sugar: Investigating Biases in Compositional Vision-Language Benchmarks Matthias Bethge Team 2506.08227 null  
2025-06-11 GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra Guha Balakrishnan Team 2506.08194 null  
2025-06-09 Open World Scene Graph Generation using Vision Language Models Anuj Karpatne Team 2506.08189 null  
2025-06-09 CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems Ramya Korlakai Vinayak Team 2506.08071 null  
2025-06-10 Vision Transformers Don’t Need Trained Registers Yossi Gandelsman Team 2506.08010 null  
2025-06-09 Hidden in plain sight: VLMs overlook their visual representations Trevor Darrell Team 2506.08008 null  
2025-06-09 BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Tieniu Tan Team 2506.07961 null  
2025-06-09 Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations Yiqing Shen Team 2506.07943 null  
2025-06-09 Mimicking or Reasoning: Rethinking Multi-Modal In-Context Learning in Vision-Language Models Zsolt Kira Team 2506.07936 null  
2025-06-09 SAM2Auto: Auto Annotation Using FLASH Q. M. Jonathan Wu Team 2506.07850 null  
2025-06-09 Image Reconstruction as a Tool for Feature Analysis Andrey Kuznetsov Team 2506.07803 null  
2025-06-09 Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger Shiming Xiang Team 2506.07785 null  
2025-06-09 Language-Vision Planner and Executor for Text-to-Visual Reasoning Ling Liu Team 2506.07778 null  
2025-06-10 ArchiLense: A Framework for Quantitative Analysis of Architectural Styles Based on Vision Large Language Models Shuai Lu Team 2506.07739 null  
2025-06-09 OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting Bastian Leibe Team 2506.07697 null  
2025-06-09 Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline Idan Szpektor Team 2506.07631 null  
2025-06-09 Event-Priori-Based Vision-Language Model for Efficient Visual Understanding Michele Magno Team 2506.07627 null  
2025-06-10 SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems Zhengzhong Tu Team 2506.07564 null  
2025-06-10 GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition Conghui He Team 2506.07553 null  
2025-06-09 Taking Flight with Dialogue: Enabling Natural Language Control for PX4-based Drone Agent Ting Yang Ling Team 2506.07509 null  
2025-06-09 Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency Xinggang Wang Team 2506.07497 null  
2025-06-09 CoCoA-Mix: Confusion-and-Confidence-Aware Mixture Model for Context Optimization Hyun Myung Team 2506.07484 null  
2025-06-09 LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments Josh Park Team 2506.07416 null  
2025-06-09 MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems Tao Qi Team 2506.07399 null  
2025-06-06 CoMemo: LVLMs Need Image Context with Image Memory Jifeng Dai Team 2506.06279 null  
2025-06-06 Movie Facts and Fibs (MF $^2$ ): A Benchmark for Long Movie Understanding André F. T. Martins Team 2506.06275 null  
2025-06-06 Challenging Vision-Language Models with Surgical Data: A New Dataset and Broad Benchmarking Study Lena Maier-Hein Team 2506.06232 null  
2025-06-06 GenIR: Generative Visual Feedback for Mental Image Retrieval James Davis Team 2506.06220 null  
2025-06-06 STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving Horst Possegger Team 2506.06218 null  
2025-06-06 WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management Zijian Wang Team 2506.06084 null  
2025-06-06 Full Conformal Adaptation of Medical Vision-Language Models Jose Dolz Team 2506.06076 null  
2025-06-06 BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning Rudolf Lioutikov Team 2506.06072 null  
2025-06-06 MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks Yiren Song Team 2506.05982 null  
2025-06-06 HMVLM: Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios Weihao Gu Team 2506.05883 null  
2025-06-06 Do Large Vision-Language Models Distinguish between the Actual and Apparent Features of Illusions? Hitomi Yanaka Team 2506.05765 null  
2025-06-06 MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory João Magalhães Team 2506.05696 null  
2025-06-06 DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models Xianpeng Lang Team 2506.05667 null  
2025-06-05 MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Furong Huang Team 2506.05523 null  
2025-06-05 Degradation-Aware Image Enhancement via Vision-Language Classification Zibo Meng Team 2506.05450 null  
2025-06-09 Coordinated Robustness Evaluation Framework for Vision-Language Models Soumyendu Sarkar Team 2506.05429 null  
2025-06-06 Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs Xiaodan Liang Team 2506.05318 null  
2025-06-05 MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm Xiang Bai Team 2506.05218 null  
2025-06-05 Quantifying Cross-Modality Memorization in Vision-Language Models Chiyuan Zhang Team 2506.05198 null  
2025-06-05 CIVET: Systematic Evaluation of Understanding in VLMs Giuseppe Riccardi Team 2506.05146 null  
2025-06-05 PixCell: A generative foundation model for digital histopathology images Dimitris Samaras Team 2506.05127 null  
2025-06-05 A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions Dung Nguyen Team 2506.05061 null  
2025-06-05 Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System Moju Zhao Team 2506.05020 null  
2025-06-05 ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT Mikołaj Koszowski Team 2506.04929 null  
2025-06-05 SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs Dacheng Tao Team 2506.04743 null  
2025-06-05 Robust Few-Shot Vision-Language Model Adaptation Shu Kong Team 2506.04713 null  
2025-06-05 HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model Sung Ju Hwang Team 2506.04704 null  
2025-06-05 SmartAvatar: Text- and Image-Guided Human Avatar Generation with VLM AI Agents Yu-Wing Tai Team 2506.04606 null  
2025-06-05 MuSciClaims: Multimodal Scientific Claim Verification Niranjan Balasubramanian Team 2506.04585 null  
2025-06-05 Handle-based Mesh Deformation Guided By Vision Language Model Aniket Bera Team 2506.04562 null  
2025-06-04 RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Shanghang Zhang Team 2506.04308 null  
2025-06-04 Image Editing As Programs with Diffusion Models Xinchao Wang Team 2506.04158 null  
2025-06-04 Recent Advances in Medical Image Classification Ngoc Quoc Ly Team 2506.04129 null  
2025-06-04 LaF-GRPO: In-Situ Navigation Instruction Generation for the Visually Impaired via GRPO with LLM-as-Follower Reward Jing Li Team 2506.04070 null  
2025-06-04 Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization Min Zhang Team 2506.04039 null  
2025-06-04 Vocabulary-free few-shot learning for Vision-Language Models Christophe De Vleeschouwer Team 2506.04005 null  
2025-06-04 DiffCAP: Diffusion-based Cumulative Adversarial Purification for Vision Language Models Anders Holst Team 2506.03933 null  
2025-06-04 Zero-Shot Temporal Interaction Localization for Egocentric Videos Hesheng Wang Team 2506.03662 null  
2025-06-04 Spatial Understanding from Videos: Structured Prompts Meet Simulation Data Liqiang Nie Team 2506.03642 null  
2025-06-04 VLMs Can Aggregate Scattered Training Patches Chaochao Lu Team 2506.03614 null  
2025-06-04 BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance Ngan Le Team 2506.03589 null  
2025-06-04 MiMo-VL Technical Report Bingquan Xia Team 2506.03569 null  
2025-06-04 Target Semantics Clustering via Text Representations for Robust Universal Domain Adaptation Yixin Zhang Team 2506.03521 null  
2025-06-04 DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Aliaksandr Siarohin Team 2506.03517 null  
2025-06-04 POLARIS: A High-contrast Polarimetric Imaging Benchmark Dataset for Exoplanetary Disk Representation Learning Weixin Yao Team 2506.03511 link  
2025-06-03 Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views Hansaem Kim Team 2506.03371 null  
2025-06-03 Robustness in Both Domains: CLIP Needs a Robust Text Encoder Volkan Cevher Team 2506.03355 null  
2025-06-03 Grounded Vision-Language Interpreter for Integrated Task and Motion Planning Atsushi Hashimoto Team 2506.03270 null  
2025-06-03 OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Li Yi Team 2506.03135 null  
2025-06-03 EgoVLM: Policy Optimization for Egocentric Video Understanding Linshen Liu Team 2506.03097 null  
2025-06-03 DPO Learning with LLMs-Judge Signal for Computer Use Agents Phillip Howard Team 2506.03095 null  
2025-06-03 From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit Demba Ba Team 2506.03093 null  
2025-06-03 Text-guided Generation of Efficient Personalized Inspection Plans Aniket Bera Team 2506.02917 null  
2025-06-04 FlySearch: Exploring how vision-language models explore Maciej Wołczyk Team 2506.02896 null  
2025-06-03 Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights Tony Wu Team 2506.02865 null  
2025-06-03 SemVink: Advancing VLMs’ Semantic Understanding of Optical Illusions via Visual Global Thinking Yiwei Wang Team 2506.02803 null  
2025-06-04 Open-PMC-18M: A High-Fidelity Large Scale Medical Dataset for Multimodal Representation Learning Arash Afkanpour Team 2506.02738 null  
2025-06-03 Iterative Self-Improvement of Vision Language Models for Image Scoring and Self-Explanation Toshihiko Yamasaki Team 2506.02708 null  
2025-06-03 Small Aid, Big Leap: Efficient Test-Time Adaptation for Vision-Language Models with AdaptNet Zhi Wang Team 2506.02671 null  
2025-06-03 Hierarchical Question-Answering for Driving Scene Understanding Using Vision-Language Models Dong Seog Han Team 2506.02615 null  
2025-06-03 Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models Farzan Farnia Team 2506.02557 null  
2025-06-03 Sign Language: Towards Sign Understanding for Robot Autonomy David Hsu Team 2506.02556 null  
2025-06-03 SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence Yueming Jin Team 2506.02555 null  
2025-06-03 Rethinking Post-Unlearning Behavior of Large Vision-Language Models Kyomin Jung Team 2506.02541 null  
2025-06-04 MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection Qingyao Wu Team 2506.02535 null  
2025-06-03 VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Yu Wang Team 2506.02387 null  
2025-06-03 Auto-Labeling Data for Object Detection Jason J. Corso Team 2506.02359 null  
2025-06-03 RATE-Nav: Region-Aware Termination Enhancement for Zero-shot Object Navigation with Vision-Language Models Jianzong Wang Team 2506.02354 null  
2025-05-30 ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL Lili Qiu Team 2505.24875 null  
2025-05-30 ProxyThinker: Test-Time Guidance through Small Visual Reasoners Vicente Ordonez Team 2505.24872 null  
2025-05-30 GenSpace: Benchmarking Spatially-Aware Image Generation Zhou Zhao Team 2505.24870 null  
2025-05-30 Time Blindness: Why Video-Language Models Can’t See What Humans Can? Mohamed Elhoseiny Team 2505.24867 null  
2025-05-30 Conformal Prediction for Zero-Shot Models Jose Dolz Team 2505.24693 null  
2025-05-30 BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models Khoa Luu Team 2505.24649 null  
2025-05-30 SARD: A Large-Scale Synthetic Arabic OCR Dataset for Book-Style Text Recognition Wadii Boulila Team 2505.24600 null  
2025-05-30 AMIA: Automatic Masking and Joint Intention Analysis Makes LVLMs Robust Jailbreak Defenders Liang Ding Team 2505.24519 null  
2025-05-30 CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation Thamar Solorio Team 2505.24456 null  
2025-05-30 Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning Matthias Hein Team 2505.24424 null  
2025-05-30 MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs Sophia Ananiadou Team 2505.24423 null  
2025-05-30 Grid-LOGAT: Grid Based Local and Global Area Transcription for Video Question Answering Fadoua Ghourabi Team 2505.24371 null  
2025-05-30 KEVER^2: Knowledge-Enhanced Visual Emotion Reasoning and Retrieval Yong Li Team 2505.24342 null  
2025-05-30 ROAD: Responsibility-Oriented Reward Design for Reinforcement Learning in Autonomous Driving Songan Zhang Team 2505.24317 null  
2025-05-30 Benchmarking Foundation Models for Zero-Shot Biometric Tasks Arun Ross Team 2505.24214 null  
2025-05-30 Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap Baharan Mirzasoleiman Team 2505.24208 null  
2025-05-30 DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis? Xuegong Zhang Team 2505.24173 null  
2025-05-30 CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs Xuchen Song Team 2505.24120 null  
2025-05-29 mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation Zhengzhong Tu Team 2505.24073 null  
2025-05-29 Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding Tinoosh Mohsenin Team 2505.23990 null  
2025-05-29 ZeroGUI: Automating Online GUI Learning at Zero Human Cost Jifeng Dai Team 2505.23762 link  
2025-05-29 Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint David M. Chan Team 2505.23759 link  
2025-05-29 To Trust Or Not To Trust Your Vision-Language Model’s Prediction Olga Fink Team 2505.23745 link  
2025-05-29 LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization Jing Liao Team 2505.23740 null  
2025-05-29 Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Sergey Levine Team 2505.23705 null  
2025-05-29 Grounded Reinforcement Learning for Visual Reasoning Katerina Fragkiadaki Team 2505.23678 null  
2025-05-29 Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition Liangcai Gao Team 2505.23566 null  
2025-05-30 Qwen Look Again: Guiding Vision-Language Reasoning Models to Re-attention Visual Information Weiping Li Team 2505.23558 link  
2025-05-29 TRAP: Targeted Redirecting of Agentic Preferences Gagandeep Singh Team 2505.23518 null  
2025-05-29 VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation Xu-Cheng Yin Team 2505.23484 link  
2025-05-29 Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model Muzammil Behzad Team 2505.23358 null  
2025-05-29 LADA: Scalable Label-Specific CLIP Adapter for Continual Learning Min-Ling Zhang Team 2505.23271 link  
2025-05-29 VLM-RRT: Vision Language Model Guided RRT Search for Autonomous UAV Navigation Panayiotis Kolios Team 2505.23267 null  
2025-05-29 Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion Tao Xiang Team 2505.23266 null  
2025-05-29 ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering Lei Wang Team 2505.23242 null  
2025-05-29 PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents Jinjin Gu Team 2505.23130 null  
2025-05-29 Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation Yu Cheng Team 2505.23043 link  
2025-05-29 An Empirical Study of Federated Prompt Learning for Vision Language Model Mang Ye Team 2505.23024 null  
2025-05-29 SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model Zhenwei Shi Team 2505.23010 null  
2025-05-29 QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining Muhao Chen Team 2505.23004 link  
2025-05-28 Zero-Shot Vision Encoder Grafting via LLM Surrogates Tom Goldstein Team 2505.22664 link  
2025-05-28 Training Free Stylized Abstraction Vishal M. Patel Team 2505.22663 null  
2025-05-28 VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models Dong Yu Team 2505.22654 null  
2025-05-28 Sherlock: Self-Correcting Reasoning in Vision-Language Models Ruqi Zhang Team 2505.22651 null  
2025-05-28 Hypothesis Testing in Imaging Inverse Problems Marcelo Pereyra Team 2505.22481 null  
2025-05-28 Zero-Shot 3D Visual Grounding from Vision-Language Models Junwei Liang Team 2505.22429 null  
2025-05-28 IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth Syed Masum Billah Team 2505.22305 null  
2025-05-28 Investigating Mechanisms for In-Context Vision Language Binding Vineet Gandhi Team 2505.22200 null  
2025-05-29 Improving Brain-to-Image Reconstruction via Fine-Grained Text Bridging Piji Li Team 2505.22150 null  
2025-05-28 3D Question Answering via only 2D Vision-Language Models Qianru Sun Team 2505.22143 null  
2025-05-28 Reinforced Reasoning for Embodied Planning Bo Jin Team 2505.22050 null  
2025-05-28 Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization Xinlei Chen Team 2505.22038 null  
2025-05-28 Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset Muhammad Abdul-Mageed Team 2505.21979 null  
2025-05-29 DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation Xin Tan Team 2505.21969 null  
2025-05-28 Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack Usman Naseem Team 2505.21967 null  
2025-05-28 Towards Comprehensive Scene Understanding: Integrating First and Third-Person Views for LVLMs Byonghyo Shim Team 2505.21955 null  
2025-05-28 Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge Yi Xu Team 2505.21906 null  
2025-05-28 Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation Christian Desrosiers Team 2505.21844 null  
2025-05-27 MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning Vivek Gupta Team 2505.21771 null  
2025-05-27 MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis Christian Wachinger Team 2505.21698 null  
2025-05-27 ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models Yueting Zhuang Team 2505.21500 null  
2025-05-27 AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery Qing Wang Team 2505.21499 null  
2025-05-27 Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration Ziwei Zhu Team 2505.21472 null  
2025-05-27 ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models Wentao Zhang Team 2505.21465 null  
2025-05-27 LazyVLM: Neuro-Symbolic Approach to Video Analytics M. Tamer Özsu Team 2505.21459 null  
2025-05-27 DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models Soumik Sarkar Team 2505.21382 null  
2025-05-27 XBOUND: Exploring the Capability Boundaries of Device-Control Agents through Trajectory Tree Exploration Min Zhang Team 2505.21279 null  
2025-05-27 Interpreting Social Bias in LVLMs via Information Flow Analysis and Multi-Round Dialogue Evaluation Yutao Yue Team 2505.21106 null  
2025-05-27 DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response Naoto Yokoya Team 2505.21089 null  
2025-05-27 LPOI: Listwise Preference Optimization for Vision Language Models Gunhee Kim Team 2505.21061 null  
2025-05-27 RefAV: Towards Planning-Centric Scenario Mining Neehar Peri Team 2505.20981 null  
2025-05-27 On VLMs for Diverse Tasks in Multimodal Meme Classification Jasabanta Patro Team 2505.20937 null  
2025-05-27 A Stereotype Content Analysis on Color-related Social Bias in Large Vision Language Models Bugeun Kim Team 2505.20901 null  
2025-05-27 AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding Joon Son Chung Team 2505.20862 null  
2025-05-27 Rendering-Aware Reinforcement Learning for Vector Graphics Generation Marco Pedersoli Team 2505.20793 null  
2025-05-27 FM-Planner: Foundation Model Guided Path Planning for Autonomous Drone Navigation Mir Feroskhan Team 2505.20783 null  
2025-05-27 Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models Yao Yang Team 2505.20728 null  
2025-05-27 ManiTaskGen: A Comprehensive Task Generator for Benchmarking and Improving Vision-Language Agents on Embodied Decision-Making Hao Su Team 2505.20726 null  
2025-05-27 Automating eHMI Action Design with LLMs for Automated Vehicle Communication Takeo Igarashi Team 2505.20711 null  
2025-05-27 GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning Sundong Kim Team 2505.20672 null  
2025-05-26 Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models Naoto Yokoya Team 2505.20236 null  
2025-05-26 Agentic 3D Scene Generation with Spatially Contextualized VLMs Chi-Keung Tang Team 2505.20129 null  
2025-05-26 MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models James M. Rehg Team 2505.20122 null  
2025-05-27 EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition Sören Auer Team 2505.20033 null  
2025-05-26 ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers Elmar Rückert Team 2505.20032 null  
2025-05-26 Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models Ernest K. Ryu Team 2505.20021 null  
2025-05-26 Can Visual Encoder Learn to See Arrows? Hiroaki Ozaki Team 2505.19944 null  
2025-05-26 Attention! You Vision Language Model Could Be Maliciously Manipulated Shudong Zhang Team 2505.19911 null  
2025-05-26 Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement Muzammil Behzad Team 2505.19895 null  
2025-05-26 One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP Kehuan Zhang Team 2505.19840 null  
2025-05-26 TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning Dongbin Zhao Team 2505.19769 null  
2025-05-26 Modeling Beyond MOS: Quality Assessment Models Must Integrate Context, Reasoning, and Multimodality Alessandro Bruno Team 2505.19696 null  
2025-05-26 Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs Shu-Tao Xia Team 2505.19678 null  
2025-05-26 JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models Yingchun Wang Team 2505.19610 null  
2025-05-26 What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation Rongrong Ji Team 2505.19569 null  
2025-05-26 FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models Ruixuan Li Team 2505.19536 null  
2025-05-26 Locality-Aware Zero-Shot Human-Object Interaction Detection Minsu Cho Team 2505.19503 null  
2025-05-26 Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models Guoliang Kang Team 2505.19498 null  
2025-05-26 Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model Yu Cheng Team 2505.19406 null  
2025-05-27 DiffVLA: Vision-Language Guided Diffusion Planning for Autonomous Driving Hao Zhao Team 2505.19381 null  
2025-05-26 DiSa: Directional Saliency-Aware Prompt Learning for Generalizable Vision-Language Models Fatemeh Afghah Team 2505.19373 null  
2025-05-23 VideoGameBench: Can Vision-Language Models complete popular video games? Ofir Press Team 2505.18134 null  
2025-05-23 One RL to See Them All: Visual Triple Unified Reinforcement Learning Junjie Yan Team 2505.18129 null  
2025-05-23 CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays Edward Choi Team 2505.18087 null  
2025-05-23 FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation Shibiao Xu Team 2505.18053 null  
2025-05-23 Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation Bogdan Sorin Coseriu Team 2505.18039 null  
2025-05-23 Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling Mun Yong Yi Team 2505.17982 null  
2025-05-23 VLM Models and Automated Grading of Atopic Dermatitis Hamed Ghodrati Team 2505.17835 null  
2025-05-23 Seeing It or Not? Interpretable Vision-aware Latent Steering to Mitigate Object Hallucinations Chao Shen Team 2505.17812 null  
2025-05-23 U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding Hongcheng Guo Team 2505.17779 null  
2025-05-23 SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain Yu Li Team 2505.17727 null  
2025-05-23 Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek Xiangdong Zhou Team 2505.17702 null  
2025-05-23 Towards General Continuous Memory for Vision-Language Models Biwei Huang Team 2505.17670 null  
2025-05-23 EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications Min Yang Team 2505.17654 null  
2025-05-23 HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning Jianfei Yang Team 2505.17645 null  
2025-05-23 Enhancing Large Vision-Language Models with Layout Modality for Table Question Answering on Japanese Annual Securities Reports Takahiro Omi Team 2505.17625 null  
2025-05-23 CAS-IQA: Teaching Vision-Language Models for Synthetic Angiography Quality Assessment Zeng-Guang Hou Team 2505.17619 null  
2025-05-23 Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving Wangmeng Zuo Team 2505.17609 null  
2025-05-23 A Unified Multi-Scale Attention-Based Network for Automatic 3D Segmentation of Lung Parenchyma & Nodules In Thoracic CT Images Furqan Shaukat Team 2505.17602 null  
2025-05-23 Multimodal Conversation Structure Understanding David Bamman Team 2505.17536 null  
2025-05-23 Do You Keep an Eye on What I Ask? Mitigating Multimodal Hallucination via Attention-Guided Ensemble Decoding Sungzoon Cho Team 2505.17529 null  
2025-05-22 Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models Mike Zheng Shou Team 2505.16854 link  
2025-05-23 LaViDa: A Large Diffusion Language Model for Multimodal Understanding Aditya Grover Team 2505.16839 link  
2025-05-22 From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Pedagogical Visualization Huaxiu Yao Team 2505.16832 link  
2025-05-22 Perceptual Quality Assessment for Embodied AI Guangtao Zhai Team 2505.16815 link  
2025-05-22 SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving Hongsheng Li Team 2505.16805 null  
2025-05-22 REOBench: Benchmarking Robustness of Earth Observation Foundation Models Tianjin Huang Team 2505.16793 link  
2025-05-22 Single Domain Generalization for Few-Shot Counting via Universal Representation Matching Xinghao Chen Team 2505.16778 link  
2025-05-22 IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models AiTi Aw Team 2505.16774 link  
2025-05-22 Self-Rewarding Large Vision-Language Models for Optimizing Prompts in Text-to-Image Generation Jianbing Shen Team 2505.16763 null  
2025-05-22 SD-MAD: Sign-Driven Few-shot Multi-Anomaly Detection in Medical Images Mahsa Baktashmotlagh Team 2505.16659 null  
2025-05-22 Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models Pål Halvorsen Team 2505.16647 null  
2025-05-22 MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation Zongqing Lu Team 2505.16602 null  
2025-05-22 ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models Xiuying Chen Team 2505.16517 null  
2025-05-22 Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models Yaochu Jin Team 2505.16446 null  
2025-05-22 Circle-RoPE: Cone-like Decoupled Rotary Positional Embedding for Large Vision-Language Models Kai Han Team 2505.16416 link  
2025-05-22 Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression Souvik Kundu Team 2505.16411 link  
2025-05-22 VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving Samuel Labi Team 2505.16377 null  
2025-05-22 MM-MovieDubber: Towards Multi-Modal Learning for Multi-Modal Movie Dubbing Xinhan Di Team 2505.16279 null  
2025-05-22 When VLMs Meet Image Classification: Test Sets Renovation via Missing Label Identification Jiaheng Wei Team 2505.16149 null  
2025-05-22 Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation Junfeng Fang Team 2505.16146 null  
2025-05-21 InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition Xue Yang Team 2505.15818 null  
2025-05-21 From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems Soujanya Poria Team 2505.15685 null  
2025-05-21 FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models Qian Wang Team 2505.15644 null  
2025-05-21 Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models Ya Wang Team 2505.15576 link  
2025-05-21 TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving Abdallah Shami Team 2505.15564 null  
2025-05-21 Clapper: Compact Learning and Video Representation in VLMs Fuzheng Zhang Team 2505.15529 null  
2025-05-21 Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets Ken Goldberg Team 2505.15517 null  
2025-05-21 Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought Libo Qin Team 2505.15510 null  
2025-05-21 Prompt Tuning Vision Language Models with Margin Regularizer for Few-Shot Learning under Distribution Shifts Soma Biswas Team 2505.15506 link  
2025-05-21 Beyond Linearity: Squeeze-and-Recalibrate Blocks for Few-Shot Whole Slide Image Classification Irwin King Team 2505.15504 null  
2025-05-21 Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models Bryan Hooi Team 2505.15489 null  
2025-05-21 Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL Qing Li Team 2505.15436 null  
2025-05-21 TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models Keze Wang Team 2505.15435 null  
2025-05-21 On the Robustness of Medical Vision-Language Models: Are they Truly Generalizable? Mohammad Yaqub Team 2505.15425 null  
2025-05-21 Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study Hwanjo Yu Team 2505.15389 null  
2025-05-21 RAZER: Robust Accelerated Zero-Shot 3D Open-Vocabulary Panoptic Reconstruction with Spatio-Temporal Aggregation Farshad Khorrami Team 2505.15373 null  
2025-05-21 Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition Youngsook Song Team 2505.15367 null  
2025-05-21 AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving Diange Yang Team 2505.15298 null  
2025-05-21 Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs Zibin Zheng Team 2505.15265 null  
2025-05-21 Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation Kyomin Jung Team 2505.15249 null  
2025-05-20 UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens Wentao Zhang Team 2505.14671 null  
2025-05-20 CAD-Coder: An Open-Source Vision-Language Model for Computer-Aided Design Code Generation Faez Ahmed Team 2505.14646 null  
2025-05-20 Debating for Better Reasoning: An Unsupervised Multimodal Approach Mirella Lapata Team 2505.14627 null  
2025-05-21 PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models Wenjia Zhang Team 2505.14481 null  
2025-05-20 RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding Serge Belongie Team 2505.14462 link  
2025-05-20 SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation Masafumi Oyamada Team 2505.14381 null  
2025-05-20 Towards Embodied Cognition in Robots via Spatially Grounded Synthetic Worlds Agnieszka Wykowska Team 2505.14366 null  
2025-05-20 DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning Xing Yu Team 2505.14362 link  
2025-05-20 Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives Gui-Song Xia Team 2505.14361 null  
2025-05-20 Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey Dongwoo Kim Team 2505.14340 null  
2025-05-20 Aligning Attention Distribution to Information Flow for Hallucination Mitigation in Large Vision-Language Models Chong Feng Team 2505.14257 null  
2025-05-20 Visual Agentic Reinforcement Fine-Tuning Jiaqi Wang Team 2505.14246 link  
2025-05-20 VoQA: Visual-only Question Answering Lei Huang Team 2505.14227 null  
2025-05-20 Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models Matthew Purver Team 2505.14160 null  
2025-05-20 Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent Xuming Hu Team 2505.14141 null  
2025-05-20 NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI Benedikt Wiestler Team 2505.14064 null  
2025-05-20 ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs Minlie Huang Team 2505.14035 null  
2025-05-20 Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models Yalin Wang Team 2505.13973 null  
2025-05-20 APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight Ambuj Singh Team 2505.13921 link  
2025-05-20 InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning Jingkuan Song Team 2505.13888 null  
2025-05-19 ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Greg Durrett Team 2505.13444 null  
2025-05-19 G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning Baobao Chang Team 2505.13426 link  
2025-05-19 Seeing, Saying, Solving: An LLM-to-TL Framework for Cooperative Robots Shreyas Kousik Team 2505.13376 null  
2025-05-20 Unlabeled Data or Pre-trained Model: Rethinking Semi-Supervised Learning and Pretrain-Finetuning Lan-Zhe Guo Team 2505.13317 null  
2025-05-19 I’ll believe it when I see it: Images increase misinformation sharing in Vision-Language Models R. Maria del Rio-Chanona Team 2505.13302 link  
2025-05-19 Computer Vision Models Show Human-Like Sensitivity to Geometric and Topological Concepts Sashank Varma Team 2505.13281 null  
2025-05-19 From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection Jian Liang Team 2505.13233 link  
2025-05-19 ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models Pekka Marttinen Team 2505.13180 link  
2025-05-19 Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model Dong Yu Team 2505.13062 null  
2025-05-20 3D Visual Illusion Depth Estimation Yunde Jia Team 2505.13061 link  
2025-05-19 MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO Ying Shan Team 2505.13031 link  
2025-05-19 Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption Tomoki Hamagami Team 2505.12912 link  
2025-05-19 TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks Jin Dong Team 2505.12884 null  
2025-05-19 FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models Renxin Zhong Team 2505.12835 null  
2025-05-19 VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection Ransalu Senanayake Team 2505.12715 null  
2025-05-19 TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning Soodeh Nikan Team 2505.12670 null  
2025-05-19 Predicting Reaction Time to Comprehend Scenes with Foveated Scene Understanding Maps Miguel P. Eckstein Team 2505.12660 null  
2025-05-19 AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use Fei Wei Team 2505.12650 link  
2025-05-19 Use as Many Surrogates as You Want: Selective Ensemble Attack to Unleash Transferability without Sacrificing Resource Efficiency Zhengyu Zhao Team 2505.12644 null  
2025-05-19 Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents Honglak Lee Team 2505.12632 null  
2025-05-16 Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner Hong Bu Team 2505.11404 null  
2025-05-16 Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild Guillaume Sartoretti Team 2505.11350 null  
2025-05-16 Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models Joyce Chai Team 2505.11326 null  
2025-05-16 Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation Chang D. Yoo Team 2505.11221 null  
2025-05-16 Redundancy-Aware Pretraining of Vision-Language Foundation Models in Remote Sensing Begüm Demir Team 2505.11121 null  
2025-05-16 CUBIC: Concept Embeddings for Unsupervised Bias Identification using VLMs Natalia Díaz-Rodríguez Team 2505.11060 null  
2025-05-16 Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere Prashant Singh Team 2505.11029 null  
2025-05-16 On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating Alessandro Rinaldo Team 2505.10860 null  
2025-05-16 Benchmarking performance, explainability, and evaluation strategies of vision-language models for surgery: Challenges and opportunities Shan Lin Team 2505.10764 null  
2025-05-15 GeoGrid-Bench: Can Foundation Models Understand Multimodal Gridded Geo-Spatial Data? Tanwi Mallick Team 2505.10714 null  
2025-05-15 MOSAIC: A Multi-View 2.5D Organ Slice Selector with Cross-Attentional Reasoning for Anatomically-Aware CT Localization in Medical Organ Segmentation Muzammil Behzad Team 2505.10672 null  
2025-05-15 CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier Ziyang Ou Team 2505.10664 null  
2025-05-15 Mitigate Language Priors in Large Vision-Language Models by Cross-Images Contrastive Decoding Chong Feng Team 2505.10634 null  
2025-05-15 MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Mark Steedman Team 2505.10610 null  
2025-05-18 MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models Vithursan Thangarasa Team 2505.10526 null  
2025-05-16 AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges Manoj Karkee Team 2505.10468 null  
2025-05-15 Vision language models have difficulty recognizing virtual objects J. G. Trafton Team 2505.10453 null  
2025-05-15 MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models Xiaodong Gu Team 2505.10088 link  
2025-05-15 AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection Chengjie Wang Team 2505.09926 link  
2025-05-14 Unfettered Forceful Skill Acquisition with Physical Reasoning and Coordinate Frame Labeling Nikolaus Correll Team 2505.09731 null  
2025-05-14 ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation Daniel Seita Team 2505.09698 null  
2025-05-14 LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models Yanan Sun Team 2505.09659 link  
2025-05-14 Variational Visual Question Answering Marcus Rohrbach Team 2505.09591 null  
2025-05-14 VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation Shuo Wang Team 2505.09577 null  
2025-05-14 Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput Lin Ma Team 2505.09498 null  
2025-05-14 Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition Muzammil Behzad Team 2505.09336 null  
2025-05-14 MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning Bin-Bin Gao Team 2505.09265 null  
2025-05-14 Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models Ross Greer Team 2505.09139 null  
2025-05-14 Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning Qing Li Team 2505.09118 null  
2025-05-14 OpenLKA: An Open Dataset of Lane Keeping Assist from Recent Car Models under Real-world Driving Conditions Hao Zhou Team 2505.09092 link  
2025-05-13 Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training Heng Ji Team 2505.08971 link  
2025-05-15 Behind Maya: Building a Multilingual Vision Language Model Alham Fikri Aji Team 2505.08910 link  
2025-05-12 Position: Restructuring of Categories and Implementation of Guidelines Essential for VLM Adoption in Healthcare Imon Banerjee Team 2505.08818 null  
2025-05-13 Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving Xiang Bai Team 2505.08725 link  
2025-05-13 OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Yu Cheng Team 2505.08617 link  
2025-05-13 From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation Jianye Hao Team 2505.08548 null  
2025-05-13 Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning? Jimmy Huang Team 2505.08468 link  
2025-05-13 MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos Wei Zhang Team 2505.08367 null  
2025-05-13 Removing Watermarks with Partial Regeneration using Semantic Information Michael W. Mahoney Team 2505.08234 link  
2025-05-13 CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding Shuo Wang Team 2505.08194 null  
2025-05-13 DSADF: Thinking Fast and Slow for Decision Making Shufei Zhang Team 2505.08189 null  
2025-05-12 Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models Jia-Bin Huang Team 2505.07815 null  
2025-05-12 Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction Andrew Yates Team 2505.07730 null  
2025-05-12 Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images Vasily Konovalov Team 2505.07704 null  
2025-05-12 Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models Yihong Gong Team 2505.07690 null  
2025-05-12 Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ ptimization Sung Ju Hwang Team 2505.07675 null  
2025-05-12 Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning Hanwang Zhang Team 2505.07538 null  
2025-05-12 AI-Enabled Accurate Non-Invasive Assessment of Pulmonary Hypertension Progression via Multi-Modal Echocardiography Xiaomeng Li Team 2505.07347 null  
2025-05-12 Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Yahui Zhou Team 2505.07263 null  
2025-05-12 Incomplete In-context Learning Yangshijie Zhang Team 2505.07251 null  
2025-05-12 UAV-CodeAgents: Scalable UAV Mission Planning via Multi-Agent ReAct and Vision-Language Reasoning Dzmitry Tsetserukou Team 2505.07236 null  
2025-05-12 Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection Ningjiang Chen Team 2505.07219 link  
2025-05-12 Internet of Agents: Fundamentals, Applications, and Challenges Dusit Niyato Team 2505.07176 null  
2025-05-12 Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning Weiping Wang Team 2505.07172 null  
2025-05-12 EmoVLM-KD: Fusing Distilled Expertise with Vision-Language Models for Visual Emotion Analysis Eunil Park Team 2505.07164 null  
2025-05-11 A Vision-Language Foundation Model for Leaf Disease Identification Luyl-Da Quach Team 2505.07019 null  
2025-05-11 Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models Binod Bhattarai Team 2505.07001 null  
2025-05-11 UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms Zhenze Liu Team 2505.06832 null  
2025-05-10 STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation Jean Oh Team 2505.06729 null  
2025-05-10 METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection Shuo Yang Team 2505.06663 link  
2025-05-10 Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation Nancy F. Chen Team 2505.06594 null  
2025-05-09 MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks Bo Yan Team 2505.06152 link  
2025-05-09 Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI Dominik Bollmann Team 2505.05895 null  
2025-05-09 Describe Anything in Medical Images Min Xu Team 2505.05804 null  
2025-05-09 3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks Farshad Khorrami Team 2505.05800 null  
2025-05-08 Fine-Tuning Video-Text Contrastive Model for Primate Behavior Retrieval from Unlabeled Raw Videos Nina S. T. Hirata Team 2505.05681 null  
2025-05-08 X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP James Bailey Team 2505.05528 link  
2025-05-08 Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging Junxian He Team 2505.05464 link  
2025-05-08 SITE: towards Spatial Intelligence Thorough Evaluation Boqing Gong Team 2505.05456 null  
2025-05-08 DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning Jun Ma Team 2505.05360 null  
2025-05-08 Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization Joon Son Chung Team 2505.05343 link  
2025-05-08 Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects Matteo Matteucci Team 2505.05318 null  
2025-05-08 Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models Meng Zhang Team 2505.05189 null  
2025-05-08 OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning Qingming Huang Team 2505.05180 link  
2025-05-08 Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models Joachim Denzler Team 2505.05163 null  
2025-05-08 CacheFL: Efficient Federated Cache Model Fine-Tuning for Vision-Language Models Furao Shen Team 2505.05130 null  
2025-05-08 X-Driver: Explainable Autonomous Driving with Vision-Language Models Zengfeng Zeng Team 2505.05098 null  
2025-05-08 Image-Text Relation Prediction for Multilingual Tweets Edison Marrese-Taylor Team 2505.05040 null  
2025-05-09 G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness Youngjae Yu Team 2505.05026 null  
2025-05-08 Split Matching for Inductive Zero-shot Semantic Segmentation Daisuke Deguchi Team 2505.05023 null  
2025-05-08 LVLM-MPC Collaboration for Autonomous Driving: A Safety-Aware and Task-Scalable Control Architecture Tatsuya Suzuki Team 2505.04980 null  
2025-05-07 Vision-Language-Action Models: Concepts, Progress, Applications and Challenges Manoj Karkee Team 2505.04769 null  
2025-05-07 “I Can See Forever!”: Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments Xinlei He Team 2505.04488 null  
2025-05-07 DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception Zhuotao Tian Team 2505.04410 link  
2025-05-07 CM1 – A Dataset for Evaluating Few-Shot Information Extraction with Large Vision Language Models Gernot A. Fink Team 2505.04214 null  
2025-05-07 R^3-VQA: “Read the Room” by Video Social Reasoning Lifeng Fan Team 2505.04147 null  
2025-05-06 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Hoifung Poon Team 2505.03981 null  
2025-05-06 Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning Victor Amblard Team 2505.03703 null  
2025-05-06 Distribution-Conditional Generation: From Class Distribution to Creative Generation Xin Geng Team 2505.03667 null  
2025-05-06 Learning Unknown Spoof Prompts for Generalized Face Anti-Spoofing Using Only Real Face Images Zhenan Sun Team 2505.03611 null  
2025-05-06 Learning Knowledge-based Prompts for Robust 3D Mask Presentation Attack Detection Ming-Hsuan Yang Team 2505.03610 null  
2025-05-06 Mitigating Image Captioning Hallucinations in Vision-Language Models Xi Li Team 2505.03420 null  
2025-05-07 Enhancing Target-unspecific Tasks through a Features Matrix Jun Yu Team 2505.03414 null  
2025-05-06 Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models Aiden Doherty Team 2505.03374 null  
2025-05-06 A Vision-Language Model for Focal Liver Lesion Classification Chen Yen-Wei Team 2505.03350 null  
2025-05-06 From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection Rong Xiao Team 2505.03334 null  
2025-05-06 Seeing the Abstract: Translating the Abstract Language for Vision Language Models Yiming Wang Team 2505.03242 link  
2025-05-06 VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making Juan Carlos Niebles Team 2505.03181 null  
2025-05-06 Robust Fairness Vision-Language Learning for Medical Image Analysis Shu Hu Team 2505.03153 link  
2025-05-05 Adversarial Robustness Analysis of Vision-Language Models in Medical Image Segmentation Manish Dhakal Team 2505.02971 null  
2025-05-05 LISAT: Language-Instructed Segmentation Assistant for Satellite Imagery David M. Chan Team 2505.02829 null  
2025-05-05 HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction Dzmitry Tsetserukou Team 2505.02569 null  
2025-05-05 Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality Jimmy Lin Team 2505.02466 null  
2025-05-05 Recent Advances in Out-of-Distribution Detection with CLIP-Like Models: A Survey Songcan Chen Team 2505.02448 null  
2025-05-05 SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing Sijie Zhu Team 2505.02370 link  
2025-05-05 TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment Xinwei He Team 2505.02325 null  
2025-05-04 Compositional Image-Text Matching and Retrieval by Grounding Entities Jana Košecká Team 2505.02278 null  
2025-05-04 Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin Xinyang Chen Team 2505.02056 null  
2025-05-04 A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models Xinya Du Team 2505.01958 null  
2025-05-03 PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications Santosh Patapati Team 2505.01881 null  
2025-05-03 Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos Anett Hoppe Team 2505.01790 null  
2025-05-03 An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding Guoliang Xing Team 2505.01743 null  
2025-05-03 Vision and Intention Boost Large Language Model in Long-Term Action Anticipation Yanning Zhang Team 2505.01713 null  
2025-05-03 RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation Xiaodan Liang Team 2505.01709 null  
2025-05-03 Topology-Aware CLIP Few-Shot Learning Dazhi Huang Team 2505.01694 null  
2025-05-02 TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action Jenq-Neng Hwang Team 2505.01583 null  
2025-05-02 Grounding Task Assistance with Multimodal Cues from a Single Demonstration Andrew D. Wilson Team 2505.01578 null  
2025-05-02 Dynamic Robot Tool Use with Vision Language Models Ahmed H. Qureshi Team 2505.01399 null  
2025-05-02 Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages Valerio Guarrasi Team 2505.01096 null  
2025-05-02 Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation Valerio Guarrasi Team 2505.01091 null  
2025-05-02 Transferable Adversarial Attacks on Black-Box Vision-Language Models Matt Fredrikson Team 2505.01050 null  
2025-04-30 Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis Alexei Kaltchenko Team 2505.00746 null  
2025-05-01 Robotic Visual Instruction Xianzheng Ma Team 2505.00693 null  
2025-05-01 Visual Test-time Scaling for GUI Agent Grounding Honglak Lee Team 2505.00684 null  
2025-05-01 DeCo: Task Decomposition and Skill Composition for Zero-Shot Generalization in Long-Horizon 3D Manipulation Yang Gao Team 2505.00527 null  
2025-05-01 LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving Henry X. Liu Team 2505.00284 null  
2025-05-01 AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care Tianming Liu Team 2505.00275 null  
2025-04-30 V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving Markus Lienkamp Team 2505.00156 null  
2025-04-30 Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models Xintao Wu Team 2505.00150 null  
2025-04-30 Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design Mahdi S. Hosseini Team 2505.00134 null  
2025-04-30 Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization Ganesh Ramakrishnan Team 2504.21831 null  
2025-04-30 Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models Lin Lee Cheong Team 2504.21559 null  
2025-04-30 RoboGround: Robotic Manipulation with Grounded Vision-Language Priors Zhou Zhao Team 2504.21530 null  
2025-04-30 Vision-Language Model-Based Semantic-Guided Imaging Biomarker for Early Lung Cancer Detection William Hsu Team 2504.21344 null  
2025-04-29 MemeBLIP2: A novel lightweight multimodal system to detect harmful memes Lisha Xu Team 2504.21226 null  
2025-04-29 GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model Yue Zhao Team 2504.21186 null  
2025-04-29 Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization Xiaojun Chang Team 2504.21063 null  
2025-04-29 Real-Time Wayfinding Assistant for Blind and Low-Vision Users Farhan Sadaf Team 2504.20976 null  
2025-04-29 FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models Elisa Ricci Team 2504.20860 null  
2025-04-29 In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Yi Yang Team 2504.20690 null  
2025-04-29 SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data Freda Shi Team 2504.20648 null  
2025-04-29 PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations Xuguang Lan Team 2504.20520 null  
2025-04-29 Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception Xiaoqiang Li Team 2504.20468 null  
2025-04-29 Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks Dimitrios K. Nasiopoulos Team 2504.20419 null  
2025-04-29 FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding Bo Zheng Team 2504.20384 null  
2025-04-28 A Multimodal Pipeline for Clinical Data Extraction: Applying Vision-Language Models to Scans of Transfusion Reaction Reports Christoph M. Friedrich Team 2504.20220 null  
2025-04-28 Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains Rui Yan Team 2504.20199 null  
2025-04-28 SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning Alan Yuille Team 2504.20024 null  
2025-04-28 EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia Diego Marcos Team 2504.19742 null  
2025-04-28 Contrastive Language-Image Learning with Augmented Textual Prompts for 3D/4D FER Using Vision-Language Model Guoying Zhao Team 2504.19739 null  
2025-04-28 VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning Xiaobo Xia Team 2504.19627 null  
2025-04-28 LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning Aimin Yang Team 2504.19524 null  
2025-04-27 DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning Shini Han Team 2504.19127 null  
2025-04-27 Boosting Single-domain Generalized Object Detection via Vision-Language Knowledge Interaction Jian Liu Team 2504.19086 null  
2025-04-26 Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation Arif Mahmood Team 2504.18856 null  
2025-04-26 Video CLIP Model for Multi-View Echocardiography Interpretation Norihiko Takeda Team 2504.18800 null  
2025-04-25 A Review of 3D Object Detection with Vision-Language Models Manoj Karkee Team 2504.18738 null  
2025-04-25 Proof-of-TBI – Fine-Tuned Vision Language Model Consortium and OpenAI-o3 Reasoning LLM-Based Medical Diagnosis Support System for Mild Traumatic Brain Injury (TBI) Prediction Donna Broshek Team 2504.18671 null  
2025-04-25 Generalization Capability for Imitation Learning Yixiao Wang Team 2504.18538 null  
2025-04-25 Fast-Slow Thinking for Large Vision-Language Model Reasoning Fei Wu Team 2504.18458 null  
2025-04-25 Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation Guang Yang Team 2504.18453 null  
2025-04-25 Revisiting Data Auditing in Large Vision-Language Models Zhuosheng Zhang Team 2504.18349 null  
2025-04-25 A Large Vision-Language Model based Environment Perception System for Visually Impaired People Shiguo Lian Team 2504.18027 null  
2025-04-24 CAMU: Context Augmentation for Meme Understanding Aditya Joshi Team 2504.17902 null  
2025-04-24 FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model Waikeung Wong Team 2504.17826 null  
2025-04-25 Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction Weiyan Wen Team 2504.17671 null  
2025-04-24 SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting Qingming Huang Team 2504.17395 null  
2025-04-24 M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction Tatsunori Mori Team 2504.17353 null  
2025-04-24 DIMT25@ICDAR2025: HW-TSC’s End-to-End Document Image Machine Translation System Leveraging Large Vision-Language Model Hao Yang Team 2504.17315 null  
2025-04-24 Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning Khimya Khetarpal Team 2504.17282 null  
2025-04-24 Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Minhyuk Sung Team 2504.17207 null  
2025-04-23 Distilling semantically aware orders for autoregressive image generation Marco Pedersoli Team 2504.17069 null  
2025-04-23 DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Ran Xu Team 2504.17040 null  
2025-04-24 V $^2$ R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations Yi R. Fung Team 2504.16727 null  
2025-04-23 Streetscape Analysis with Generative AI (SAGAI): Vision-Language Assessment and Mapping of Urban Scenes Giovanni Fusco Team 2504.16538 null  
2025-04-23 TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance Jiaya Jia Team 2504.16505 null  
2025-04-23 FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing Biplab Banerjee Team 2504.16433 null  
2025-04-22 CLIP-IT: CLIP-based Pairing for Histology Images Classification Eric Granger Team 2504.16181 null  
2025-04-22 MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention Lili Qiu Team 2504.16083 null  
2025-04-22 MR. Video: “MapReduce” is the Principle for Long Video Understanding Yu-Xiong Wang Team 2504.16082 null  
2025-04-22 Describe Anything: Detailed Localized Image and Video Captioning Yin Cui Team 2504.16072 null  
2025-04-22 Vision language models are unreliable at trivial spatial cognition J. Gregory Trafton Team 2504.16061 null  
2025-04-22 Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation Joyce Chai Team 2504.16060 null  
2025-04-22 Evaluating Vision Language Models (VLMs) for Radiology: A Comprehensive Analysis Judy Gichoya Team 2504.16047 null  
2025-04-22 LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale Mike Zheng Shou Team 2504.16030 null  
2025-04-24 Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models Tolga Çukur Team 2504.15929 null  
2025-04-21 CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting Mohit Bansal Team 2504.15485 null  
2025-04-21 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Guilin Liu Team 2504.15271 null  
2025-04-21 KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking Kijung Shin Team 2504.15135 link  
2025-04-21 Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation Serge Belongie Team 2504.14988 link  
2025-04-21 VLM as Policy: Common-Law Content Moderation Framework for Short Video Platform Kun Gai Team 2504.14904 null  
2025-04-21 Object-Level Verbalized Confidence Calibration in Vision-Language Models via Semantic Perturbation Yunji Chen Team 2504.14848 null  
2025-04-20 OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding Zuozhu Liu Team 2504.14692 null  
2025-04-20 NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation Juho Kannala Team 2504.14638 null  
2025-04-20 LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation Yongsheng Gao Team 2504.14467 null  
2025-04-20 Neglected Risks: The Disturbing Reality of Children’s Images in Datasets and the Urgent Call for Accountability Sandra Avila Team 2504.14446 null  
2025-04-19 Hydra: An Agentic Reasoning Approach for Enhancing Adversarial Robustness and Mitigating Hallucinations in Vision-Language Models Nathaniel D. Bastian Team 2504.14395 null  
2025-04-19 How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos? James Zou Team 2504.14391 null  
2025-04-19 A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling Adriana Kovashka Team 2504.14359 null  
2025-04-19 Diffusion-based Dynamic Contract for Federated AI Agent Construction in Mobile Metaverses Chau Yuen Team 2504.14326 null  
2025-04-19 Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization Xu Yang Team 2504.14200 null  
2025-04-19 Bayesian Principles Improve Prompt Learning In Vision-Language Models Mijung Park Team 2504.14123 null  
2025-04-19 PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models Ozlem Ozmen Garibay Team 2504.14117 null  
2025-04-21 Analysing the Robustness of Vision-Language-Models to Common Corruptions Umair Bin Mansoor Team 2504.13690 null  
2025-04-18 EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model Beng Chin Ooi Team 2504.13650 link  
2025-04-18 PV-VLM: A Multimodal Vision-Language Approach Incorporating Sky Images for Intra-Hour Photovoltaic Power Forecasting Miao Yu Team 2504.13624 null  
2025-04-18 Chain-of-Thought Textual Reasoning for Few-shot Temporal Action Localization Huadong Ma Team 2504.13460 null  
2025-04-18 Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety Ross Greer Team 2504.13399 null  
2025-04-17 VLLFL: A Vision-Language Model Based Lightweight Federated Learning Framework for Smart Agriculture Yanbo Huang Team 2504.13365 null  
2025-04-17 Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models Jacky Liang Team 2504.13351 null  
2025-04-17 WildFireCan-MMD: A Multimodal dataset for Classification of User-generated Content During Wildfires in Canada Marzieh Amini Team 2504.13231 null  
2025-04-17 PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Christoph Feichtenhofer Team 2504.13180 null  
2025-04-17 Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling David M. Chan Team 2504.13169 link  
2025-04-17 Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training Zhanhui Kang Team 2504.13123 null  
2025-04-17 Probing and Inducing Combinational Creativity in Vision-Language Models Zilong Zheng Team 2504.13120 null  
2025-04-17 Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration Yong Hong Kuo Team 2504.13119 null  
2025-04-17 Early Accessibility: Automating Alt-Text Generation for UI Icons During App Development Christoph Csallner Team 2504.13069 null  
2025-04-17 NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Michael Qizhe Shieh Team 2504.13055 null  
2025-04-17 Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning Wenwu Zhu Team 2504.12680 link  
2025-04-17 VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization Siheng Chen Team 2504.12661 null  
2025-04-16 Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation Éric Granger Team 2504.12436 link  
2025-04-16 FLIP Reasoning Challenge Roger Wattenhofer Team 2504.12256 null  
2025-04-16 Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models - Hanno Gottschalk Team 2504.12137 null  
2025-04-17 Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions Zhi-Qi Cheng Team 2504.11967 null  
2025-04-16 Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning Yi Chang Team 2504.11930 null  
2025-04-16 A Visual RAG Pipeline for Few-Shot Fine-Grained Product Classification Janis Keuper Team 2504.11838 null  
2025-04-17 DVLTA-VQA: Decoupled Vision-Language Modeling with Text-Guided Adaptation for Blind Video Quality Assessment Moncef Gabbouj Team 2504.11733 null  
2025-04-16 Interpreting the Linear Structure of Vision-language Model Embedding Spaces Stephanie Gil Team 2504.11695 null  
2025-04-16 VLM-Fuzz: Vision Language Model Assisted Recursive Depth-first Search Exploration for Effective UI Testing of Android Apps Mariano Ceccato Team 2504.11675 null  
2025-04-15 Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation Majid Mirmehdi Team 2504.11669 null  
2025-04-17 PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage Lina Wang Team 2504.11509 null  
2025-04-15 From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation Jungong Han Team 2504.11368 null  
2025-04-17 UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis Yan Lu Team 2504.11257 null  
2025-04-15 R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning Ran He Team 2504.11195 null  
2025-06-30 Benchmarking Vision Language Models on German Factual Data Vincent Tischler Team 2504.11108 null  
2025-04-16 Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR Gongshen Liu Team 2504.11101 null  
2025-04-15 QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models Yu Wang Team 2504.11038 null  
2025-04-15 Can Vision-Language Models Understand and Interpret Dynamic Gestures from Pedestrians? Pilot Datasets and Exploration Towards Instructive Nonverbal Commands for Cooperative Autonomous Vehicles Ross Greer Team 2504.10873 null  
2025-04-15 LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation Mohsen Imani Team 2504.10854 null  
2025-04-15 Enhancing Features in Long-tailed Data Using Large Vision Mode Xuesong Li Team 2504.10852 null  
2025-04-14 ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models Lifeng Zhou Team 2504.10757 null  
2025-04-14 AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark Yu-Xiong Wang Team 2504.10568 null  
2025-04-14 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Jiashi Feng Team 2504.10465 null  
2025-04-15 GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents Run Luo Team 2504.10458 null  
2025-04-14 SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model Yanning Zhang Team 2504.10320 null  
2025-04-15 Breaking the Data Barrier – Building GUI Agents Through Task Generalization Junxian He Team 2504.10127 null  
2025-04-14 AGO: Adaptive Grounding for Open World 3D Occupancy Prediction Andreas Zell Team 2504.10117 null  
2025-04-14 CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography Jun-Cheng Chen Team 2504.10090 null  
2025-04-14 Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure Frédéric Dufaux Team 2504.10049 null  
2025-04-14 Aligning Anime Video Generation with Human Feedback Zuxuan Wu Team 2504.10044 null  
2025-04-14 KeyMPs: One-Shot Vision-Language Guided Motion Generation by Sequencing DMPs for Occlusion-Rich Tasks Takamitsu Matsubara Team 2504.10011 null  
2025-04-14 GenTe: Generative Real-world Terrains for General Legged Robot Locomotion Control Xiaoqiang Ji Team 2504.09997 null  
2025-04-14 Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models Keisuke Ozawa Team 2504.09979 null  
2025-04-14 Can VLMs Assess Similarity Between Graph Visualizations? Jinwook Seo Team 2504.09859 null  
2025-04-14 VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents Jun Suzuki Team 2504.09795 null  
2025-04-13 A Survey on Efficient Vision-Language Models Nirmalya Roy Team 2504.09724 null  
2025-04-13 Metropolis-Hastings Captioning Game: Knowledge Fusion of Vision Language Models via Decentralized Bayesian Inference Tadahiro Taniguchi Team 2504.09620 null  
2025-04-13 DualPrompt-MedCap: A Dual-Prompt Enhanced Approach for Medical Image Captioning Mukesh Prasad Team 2504.09598 null  
2025-04-15 Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation Yunhong Wang Team 2504.09480 null  
2025-04-13 Identity-Aware Vision-Language Model for Explainable Face Forgery Detection Yu-Gang Jiang Team 2504.09439 null  
2025-04-13 BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning Boqing Gong Team 2504.09426 null  
2025-04-12 PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks Yang Liu Team 2504.09258 null  
2025-04-11 AstroLLaVA: towards the unification of astronomical data and natural language Dimitrios Tanoglidis Team 2504.08583 null  
2025-04-11 EO-VLM: VLM-Guided Energy Overload Attacks on Vision Models Jinwoo Kim Team 2504.08205 null  
2025-04-10 Investigating Vision-Language Model for Point Cloud-based Vehicle Classification Camille Kamga Team 2504.08154 null  
2025-04-10 The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search David Ha Team 2504.08066 null  
2025-04-10 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Feng Zhao Team 2504.07956 null  
2025-04-10 SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos Yuhao Chen Team 2504.07867 null  
2025-04-10 CollEX – A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections Chris Biemann Team 2504.07643 null  
2025-04-10 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Tiancheng Zhao Team 2504.07615 link  
2025-04-10 TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs Xuezhi Cao Team 2504.07556 null  
2025-04-10 Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models Xian-Sheng Hua Team 2504.07521 link  
2025-04-10 Kimi-VL Technical Report Ziwei Chen Team 2504.07491 link  
2025-04-09 Perception in Reflection Vishal M. Patel Team 2504.07165 null  
2025-04-09 Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation Marzieh Fadaee Team 2504.07072 null  
2025-04-09 Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition Aythami Morales Team 2504.06925 null  
2025-04-09 MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking Hesheng Wang Team 2504.06863 null  
2025-04-09 ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models Namhoon Lee Team 2504.06838 null  
2025-04-09 LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding Bo XU Team 2504.06835 null  
2025-04-08 PromptHMR: Promptable Human Mesh Recovery Muhammed Kocabas Team 2504.06397 null  
2025-04-08 SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation Zhaozheng Yin Team 2504.06389 null  
2025-04-08 OmniSVG: A Unified Scalable Vector Graphics Generation Model Yu-Gang Jiang Team 2504.06263 null  
2025-04-08 Latent Multimodal Reconstruction for Misinformation Detection Panagiotis C. Petrantonakis Team 2504.06010 link  
2025-04-08 Measuring Déjà vu Memorization Efficiently Kamalika Chaudhuri Team 2504.05651 null  
2025-04-08 A Lightweight Large Vision-language Model for Multimodal Medical Images Navid Toosy Saidy Team 2504.05575 null  
2025-04-10 ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering Shafiq Joty Team 2504.05506 null  
2025-04-07 Trust Through Transparency: Explainable Social Navigation for Autonomous Mobile Robots via Vision-Language Models Aliasghar Arab Team 2504.05477 null  
2025-04-07 Taxonomy-Aware Evaluation of Vision-Language Models Stella Frank Team 2504.05457 null  
2025-04-07 Probing the Visualization Literacy of Vision Language Models: the Good, the Bad, and the Ugly Anamaria Crisan Team 2504.05445 null  
2025-04-07 InteractVLM: 3D Interaction Reasoning from 2D Foundational Models Dimitrios Tzionas Team 2504.05303 null  
2025-04-07 SmolVLM: Redefining small and efficient multimodal models Thomas Wolf Team 2504.05299 null  
2025-04-07 A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text? Ismail Ben Ayed Team 2504.05227 null  
2025-04-07 Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation Wei Zhang Team 2504.05225 null  
2025-04-08 A Taxonomy of Self-Handover Katsushi Ikeuchi Team 2504.04939 null  
2025-04-07 SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models Lorenz Hufe Team 2504.04893 null  
2025-04-07 Don’t Lag, RAG: Training-Free Adversarial Detection Using RAG Ofer Hadar Team 2504.04858 null  
2025-04-07 OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance Xinhan Di Team 2504.04781 null  
2025-04-07 Feedback-Enhanced Hallucination-Resistant Vision-Language Model for Real-Time Scene Understanding Zahir Alsulaimawi Team 2504.04772 null  
2025-04-07 Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions Yue Wang Team 2504.04744 null  
2025-04-07 Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data Venkatesh Saligrama Team 2504.04740 null  
2025-04-06 M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models Ruixiang Tang Team 2504.04633 null  
2025-04-06 Foundation Models for Software Engineering of Cyber-Physical Systems: the Road Ahead Shaukat Ali Team 2504.04630 null  
2025-04-06 Enhance Then Search: An Augmentation-Search Strategy with Foundation Models for Cross-Domain Few-Shot Object Detection Xiaomeng Huang Team 2504.04517 link  
2025-04-06 OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning Jose M. Alvarez Team 2504.04348 null  
2025-04-06 MedM-VL: What Makes a Good Medical LVLM? Ji Wu Team 2504.04323 null  
2025-04-05 GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill Siyuan Huang Team 2504.04191 null  
2025-04-05 LATTE: Lightweight Attention-based Traffic Accident Anticipation Engine Zhenning Li Team 2504.04103 null  
2025-04-05 TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection Xiaohua Xu Team 2504.04099 null  
2025-04-04 VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models Anelia Angelova Team 2504.03970 null  
2025-04-04 Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models Matias Valdenegro-Toro Team 2504.03440 null  
2025-04-04 SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding Naoto Yokoya Team 2504.03254 null  
2025-04-04 Seeing is Believing: Belief-Space Planning with Foundation Models as Uncertainty Estimators Lawson L. S. Wong Team 2504.03245 null  
2025-04-04 Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation Robby T. Tan Team 2504.03193 null  
2025-04-04 NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving Zhengzhong Tu Team 2504.03164 null  
2025-04-04 TokenFLEX: Unified VLM Training for Flexible Visual Tokens Inference Xianpeng Lang Team 2504.03154 null  
2025-04-04 MORAL: A Multimodal Reinforcement Learning Framework for Decision Making in Autonomous Laboratories Arvind Ramanathan Team 2504.03153 null  
2025-04-03 QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding Bryan Wang Team 2504.02971 null  
2025-04-03 STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection Naoufel Werghi Team 2504.02823 null  
2025-04-03 Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models Zeynep Akata Team 2504.02821 null  
2025-04-03 Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence Serena Yeung-Levy Team 2504.02799 null  
2025-04-03 Robot-Led Vision Language Model Wellbeing Assessment of Children Hatice Gunes Team 2504.02765 null  
2025-04-04 Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Pengfei Liu Team 2504.02587 null  
2025-04-03 Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision Shibiao Xu Team 2504.02477 null  
2025-04-03 Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation Rui Yan Team 2504.02438 null  
2025-04-03 ReuseDroid: A VLM-empowered Android UI Test Migrator Boosted by Active Feedback Hailong Wang Team 2504.02357 null  
2025-04-03 Large (Vision) Language Models are Unsupervised In-Context Learners Maria Brbic Team 2504.02349 link  
2025-04-03 Re-thinking Temporal Search for Long-Form Video Understanding Manling Li Team 2504.02259 null  
2025-04-03 SocialGesture: Delving into Multi-person Gesture Understanding James M. Rehg Team 2504.02244 null  
2025-04-02 FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs Fatima Albreiki Team 2504.01916 link  
2025-04-02 Is Temporal Prompting All We Need For Limited Labeled Action Recognition? Xiaobo Jin Team 2504.01890 null  
2025-04-02 Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images Abdullah-Al-Zubaer Imran Team 2504.01838 link  
2025-04-02 BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing Leonidas Guibas Team 2504.01786 null  
2025-04-02 AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization Linli Xu Team 2504.01735 null  
2025-04-02 Reasoning LLMs for User-Aware Multimodal Conversational Agents Mohamed Chetouani Team 2504.01700 null  
2025-04-02 CLIP-SLA: Parameter-Efficient CLIP Adaptation for Continuous Sign Language Recognition Hamzah Luqman Team 2504.01666 link  
2025-04-02 BioAtt: Anatomical Prior Driven Low-Dose CT Denoising UiHyun Cho Team 2504.01662 null  
2025-04-02 Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models Ming-Hsuan Yang Team 2504.01589 null  
2025-03-25 Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Jinqiao Wang Team 2503.18013 link  
2024-12-02 VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models Youngjune Kim Team 2411.19103 link  
2025-05-19 Evaluating Vision-Language Models as Evaluators in Path Planning Ziyu Yao Team 2411.18711 null  
2025-03-11 ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding Feng Guo Team 2410.00982 null  
2024-09-24 Behavioral Bias of Vision-Language Models: A Behavioral Finance View Ming-Chang Chiu Team 2409.15256 null  
2024-08-01 Vision-Language Model Based Handwriting Verification Sargur Srihari Team 2407.21788 null  
2025-10-15 Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions Aman Chadha Team 2404.07214 null  
2024-05-10 Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving Mohan Trivedi Team 2403.19838 null  
2024-01-17 Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models Jie Yang Team 2309.04041 null  
2023-10-13 Distilling Large Vision-Language Model with Out-of-Distribution Generalizability Hao Su Team 2307.03135 link  
2023-06-16 LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models Ping Luo Team 2306.09265 null  
2023-11-08 Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and Detection Adriana Kovashka Team 2303.10093 null  
2024-04-19 VLP: A Survey on Vision-Language Pre-training Bo Xu Team 2202.09061 null  
2022-10-07 Learning to Prompt for Vision-Language Models Ziwei Liu Team 2109.01134 null  

VLA

Publish Date Title Authors PDF Code
2025-11-20 InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy Jiangmiao Pang Team 2511.16651 null
2025-11-20 VLA-Pruner: Temporal-Aware Dual-Level Visual Token Pruning for Efficient Vision-Language-Action Inference Bo Zhao Team 2511.16449 null
2025-11-20 FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models Mingsheng Shang Team 2511.16233 null
2025-11-20 When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models Yaochu Jin Team 2511.16203 null
2025-11-20 Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight Zhijie Deng Team 2511.16175 null
2025-11-20 EvoVLA: Self-Evolving Vision-Language-Action Model Hao Tang Team 2511.16166 null
2025-11-19 SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models Xipeng Qiu Team 2511.15605 null
2025-11-19 Zero-Shot Open-Vocabulary Human Motion Grounding with Test-Time Training Jianfei Yang Team 2511.15379 null
2025-11-19 Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception Wenzhao Lian Team 2511.15279 null
2025-11-19 Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation Andrew J. Hung Team 2511.15159 null
2025-11-19 $π^{*}_{0.6}$ : a VLA That Learns From Experience Zhiyuan Zhou Team 2511.14759 null
2025-11-18 NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards Soujanya Poria Team 2511.14659 link
2025-11-18 Enhancing End-to-End Autonomous Driving with Risk Semantic Distillaion from VLM Siyuan Cheng Team 2511.14499 null
2025-11-18 Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning Hongpeng Wang Team 2511.14396 link
2025-11-18 Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion Fei Chen Team 2511.14178 null
2025-11-19 RoboTidy : A 3D Gaussian Splatting Household Tidying Benchmark for Embodied Navigation and Action Jiayu Chen Team 2511.14161 null
2025-11-18 MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs Lu Cheng Team 2511.14159 null
2025-11-18 AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models Biqing Qi Team 2511.14148 null
2025-11-18 Multi-view Phase-aware Pedestrian-Vehicle Incident Reasoning Framework with Vision-Language Models Jidong J. Yang Team 2511.14120 null
2025-11-19 Searching in Space and Time: Unified Memory-Action Loops for Open-World Object Retrieval Roberto Martín-Martín Team 2511.14004 null
2025-11-17 TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Ying-Cong Chen Team 2511.13704 link
2025-11-17 Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models Yuxiang Sun Team 2511.12937 null
2025-11-18 Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views Hesheng Wang Team 2511.12878 null
2025-11-16 BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections Vedhus Hoskere Team 2511.12676 null
2025-11-16 RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation Long Chen Team 2511.12436 null
2025-11-16 VLA-R: Vision-Language Action Retrieval toward Open-World End-to-End Autonomous Driving David Hyunchul Shim Team 2511.12405 null
2025-11-15 AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models Yu-Gang Jiang Team 2511.12149 null
2025-11-15 Decoupled Action Head: Confining Task Knowledge to Conditioning Layers Qi WU Team 2511.12101 null
2025-11-18 Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective Ngan Le Team 2511.11478 null
2025-11-14 EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment Hongyi Zhang Team 2511.11301 null
2025-11-14 Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation Xi Zheng Team 2511.11298 null
2025-11-14 AdaptPNP: Integrating Prehensile and Non-Prehensile Skills for Adaptive Robotic Manipulation Lin Shao Team 2511.11052 null
2025-11-14 DEFT-LLM: Disentangled Expert Feature Tuning for Micro-Expression Recognition Jianqin Yin Team 2511.10948 null
2025-11-13 Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals Pawan Goyal Team 2511.10615 null
2025-11-14 OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer Ziwei Liu Team 2511.10560 link
2025-11-13 SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation Liqiang Nie Team 2511.10518 link
2025-11-13 Facial-R1: Aligning Reasoning and Recognition for Facial Emotion Analysis Min Cao Team 2511.10254 null
2025-11-13 SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition Zitong Yu Team 2511.10091 null
2025-11-13 Phantom Menace: Exploring and Enhancing the Robustness of VLA Models against Physical Sensor Attacks Wenyuan Xu Team 2511.10008 null
2025-11-13 Audio-VLA: Adding Contact Audio Perception to Vision-Language-Action Model for Robotic Manipulation Changbo Wang Team 2511.09958 null
2025-11-12 MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation Ziwei Wang Team 2511.09516 null
2025-11-12 WMPO: World Model-based Policy Optimization for Vision-Language-Action Models Song Guo Team 2511.09515 link
2025-11-12 Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning Fatemeh Afghah Team 2511.08942 null
2025-11-12 Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Guang Shi Team 2511.08892 null
2025-11-12 MirrorLimb: Implementing hand pose acquisition and robot teleoperation based on RealMirror Tao Shen Team 2511.08865 null
2025-11-11 SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control Yuke Zhu Team 2511.07820 link
2025-11-11 ViPRA: Video Prediction for Robot Actions Deepak Pathak Team 2511.07732 link
2025-11-11 LLM-GROP: Visually Grounded Robot Task and Motion Planning with Large Language Models Shiqi Zhang Team 2511.07727 null
2025-11-10 How Do VLAs Effectively Inherit from VLMs? Jiang Bian Team 2511.06619 null
2025-11-09 ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval Jeff Ichnowski Team 2511.06202 null
2025-11-09 OpenVLN: Open-world aerial Vision-Language Navigation Yang Cong Team 2511.06182 null
2025-11-08 10 Open Challenges Steering the Future of Vision-Language-Action Models David Hsu Team 2511.05936 null
2025-11-11 Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation Xiachong Feng Team 2511.05923 null
2025-11-07 Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots Mrinmoy Sarkar Team 2511.05642 null
2025-11-07 Visual Spatial Tuning Hengshuang Zhao Team 2511.05491 null
2025-11-07 EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation Samuel Dickerson Team 2511.05397 null
2025-11-07 TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models Youngwoon Lee Team 2511.05275 link
2025-11-06 Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment Bo Zhao Team 2511.04555 link
2025-11-06 GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies Cédric Buche Team 2511.04357 null
2025-11-04 TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System C. Karen Liu Team 2511.02832 link
2025-11-04 XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations Jian Tang Team 2511.02776 null
2025-11-01 iFlyBot-VLA Technical Report Jia Pan Team 2511.01914 null
2025-11-03 Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process Haoang Li Team 2511.01718 null
2025-11-03 PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model Yang Cong Team 2511.01571 null
2025-11-03 RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models Donglin Wang Team 2511.01331 null
2025-11-03 Embodiment Transfer Learning for Vision-Language-Action Models Yaxin Peng Team 2511.01224 null
2025-11-06 OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation Lili Qiu Team 2511.01210 null
2025-10-31 End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection Zhibin Li Team 2511.00139 null
2025-10-30 Self-Improving Vision-Language-Action Models with Data Generation via Residual RL Yuke Zhu Team 2511.00091 null
2025-10-30 Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail Marco Pavone Team 2511.00088 null
2025-11-04 Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model Jinwoo Shin Team 2510.27607 null
2025-10-31 EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities Luhui Hu Team 2510.27545 null
2025-10-30 RoboOS-NeXT: A Unified Memory-based Framework for Lifelong, Scalable, and Robust Multi-Robot Collaboration Shanghang Zhang Team 2510.26536 null
2025-10-30 Human-in-the-loop Online Rejection Sampling for Robotic Manipulation Yansong Tang Team 2510.26406 null
2025-10-29 $π_\texttt{RL}$ : Online RL Fine-tuning for Flow-based Vision-Language-Action Models Chao Yu Team 2510.25889 null
2025-10-29 Robotic Assistant: Completing Collaborative Tasks with Dexterous Vision-Language-Action Models Robert Katzschmann Team 2510.25713 null
2025-10-29 Don’t Blind Your VLA: Aligning Visual Representations for OOD Generalization Aleksandr I. Panov Team 2510.25616 null
2025-10-29 NanoVLA: Routing Decoupled Vision-Language Understanding for Nano-sized Generalist Robotic Policies Jinghui Lu Team 2510.25122 null
2025-10-27 A Survey on Efficient Vision-Language-Action Models Heng Tao Shen Team 2510.24795 null
2025-10-28 BLM $_1$ : A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning Heng Tao Shen Team 2510.24161 null
2025-11-01 RoboOmni: Proactive Robot Manipulation in Omni-modal Context Xipeng Qiu Team 2510.23763 null
2025-10-27 UrbanVLA: A Vision-Language-Action Model for Urban Micromobility He Wang Team 2510.23576 null
2025-10-27 Dexbotic: Open-Source Vision-Language-Action Toolbox Ziyu Zhang Team 2510.23511 link
2025-10-28 Evaluation of Vision-LLMs in Surveillance Video Jelte P. Mense Team 2510.23190 null
2025-10-25 ACG: Action Coherence Guidance for Flow-based VLA models Jaegul Choo Team 2510.22201 null
2025-10-23 Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence Elias Aronsson Team 2510.21860 null
2025-10-21 VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting Ran He Team 2510.21817 link
2025-10-24 Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos Baining Guo Team 2510.21571 link
2025-10-23 SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing Axel Krieger Team 2510.20965 null
2025-10-23 VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation Abhishek Gupta Team 2510.20818 null
2025-10-23 MemER: Scaling Up Memory for Robot Control via Experience Retrieval Chelsea Finn Team 2510.20328 link
2025-10-22 Learning Affordances at Inference-Time for Vision-Language-Action Models Sergey Levine Team 2510.19752 null
2025-10-22 GigaBrain-0: A World Model-Powered Vision-Language-Action Model Zheng Zhu Team 2510.19430 link
2025-10-22 Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes Baining Guo Team 2510.19400 link
2025-10-23 MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning Heng Yang Team 2510.18337 null
2025-10-24 RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation Ziwei Wang Team 2510.17640 null
2025-10-20 From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors Pan Zhou Team 2510.17439 link
2025-10-20 Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots Josie Hughes Team 2510.17369 null
2025-10-21 DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment Wang Jijun Team 2510.17148 null
2025-10-23 Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey Jian Cheng Team 2510.17111 null
2025-10-18 MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation Ufuk Topcu Team 2510.16617 null
2025-10-18 Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification Claudia P’erez-D’Arpino Team 2510.16281 null
2025-10-21 NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly? Yu Yin Team 2510.16263 link
2025-10-17 Cosmos-Surg-dVRK: World Foundation Model-based Automated Online Evaluation of Surgical Robot Policy Learning Sean Huver Team 2510.16240 null
2025-10-17 VDRive: Leveraging Reinforced VLA and Diffusion Policy for End-to-end Autonomous Driving Zufeng Zhang Team 2510.15446 null
2025-10-16 RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks Jiachen Li Team 2510.14968 null
2025-10-17 From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance Chang Xu Team 2510.14952 null
2025-10-16 VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation Donglin Wang Team 2510.14902 null
2025-10-16 QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models Haoran Li Team 2510.14836 null
2025-10-16 Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning Yao Mu Team 2510.14300 null
2025-10-15 InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy Yangkun Zhu Team 2510.13778 null
2025-10-15 LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models Xipeng Qiu Team 2510.13626 null
2025-10-15 DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning Hang Zhao Team 2510.13375 null
2025-10-15 Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models Jingfeng Zhang Team 2510.13237 null
2025-10-15 VLA-0: Building State-of-the-Art VLAs with Zero Modification Fabio Ramos Team 2510.13054 null
2025-10-14 DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving Zhaoxiang Zhang Team 2510.12796 null
2025-10-14 Reflection-Based Task Adaptation for Self-Improving VLA Hongbin Zha Team 2510.12710 null
2025-10-17 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Haoang Li Team 2510.12276 null
2025-10-14 ManiAgent: An Agentic Framework for General Robotic Manipulation Xudong Liu Team 2510.11660 null
2025-10-13 Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Zhi Hou Team 2510.11027 null
2025-10-14 RoVer: Robot Reward Model as Test-Time Verifier for Vision-Language-Action Model Xinyu Wu Team 2510.10975 null
2025-10-13 TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models Yu-Gang Jiang Team 2510.10932 null
2025-10-11 X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model Xianyuan Zhan Team 2510.10274 null
2025-10-11 Dejavu: Post-Deployment Learning for Embodied Agents via Experience Feedback Hongtao Lu Team 2510.10181 null
2025-10-11 Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models Yi Zeng Team 2510.09976 null
2025-10-08 OmniSAT: Compact Action Token, Faster Auto Regression Changsheng Xu Team 2510.09667 null
2025-10-10 VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation Caifeng Shan Team 2510.09607 link
2025-10-10 PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs Ying-Cong Chen Team 2510.09507 null
2025-10-10 Goal-oriented Backdoor Attack against Vision-Language-Action Models via Physical Objects Jingfeng Zhang Team 2510.09269 null
2025-10-09 Don’t Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered Shayegan Omidshafiei Team 2510.08464 null
2025-10-15 USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots Zhengxing Wu Team 2510.07869 link
2025-10-09 IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction Liqiang Nie Team 2510.07778 null
2025-10-09 DEAS: DEtached value learning with Action Sequence for Scalable Offline RL Yuke Zhu Team 2510.07730 link
2025-10-08 TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking He Wang Team 2510.07134 link
2025-10-08 Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications Yuke Zhu Team 2510.07077 link
2025-10-08 Bring the Apple, Not the Sofa: Impact of Irrelevant Context in Embodied AI Commands on VLA Models Elena Tutubalina Team 2510.07067 null
2025-10-08 RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training Yu Wang Team 2510.06710 link
2025-10-07 EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model Zhaoxiang Zhang Team 2510.06207 link
2025-10-07 Verifier-free Test-Time Sampling for Vision Language Action Models Jinwoo Shin Team 2510.05681 null
2025-10-07 MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption Marios Savvides Team 2510.05580 null
2025-10-06 HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks Shimon Whiteson Team 2510.04898 null
2025-10-05 ContextVLA: Vision-Language-Action Model with Amortized Multi-Frame Context Jinwoo Shin Team 2510.04246 link
2025-10-05 SITCOM: Scaling Inference-Time COMpute for VLAs Esha Pahwa Team 2510.04041 null
2025-10-04 Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert Chunhua Shen Team 2510.03896 null
2025-10-04 NoTVLA: Narrowing of Dense Action Trajectories for Generalizable Robot Manipulation Chunhua Shen Team 2510.03895 null
2025-10-04 LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization Lichao Sun Team 2510.03827 null
2025-10-02 Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer Yuxiang Zhou Team 2510.03342 null
2025-10-03 MM-Nav: Multi-View VLA Model for Robust Visual Navigation via Multi-Expert Learning He Wang Team 2510.03142 link
2025-10-02 Contrastive Representation Regularization for Vision-Language-Action Models Jinwoo Shin Team 2510.01711 null
2025-10-02 FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models Bihan Wen Team 2510.01642 link
2025-10-02 VLA-R1: Enhancing Reasoning in Vision-Language-Action Models Zheng Zhu Team 2510.01623 null
2025-10-01 INSIGHT: INference-time Sequence Introspection for Generating Help Triggers in Vision-Language-Action Models Tesca FItzgerald Team 2510.01389 null
2025-10-01 Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition Andrew F. Luo Team 2510.01068 link
2025-10-02 HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy Jinwoo Shin Team 2510.00695 link
2025-10-01 Hybrid Training for Vision-Language-Action Models Daniel Dijkman Team 2510.00600 null
2025-10-01 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators Weihua Su Team 2510.00406 null
2025-09-30 MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation Shanghang Zhang Team 2509.26642 null
2025-09-30 Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA Ruqi Huang Team 2509.26251 null
2025-09-30 MUVLA: Learning to Explore Object Navigation via Map Understanding Jianye Hao Team 2509.25966 null
2025-09-30 TacRefineNet: Tactile-Only Grasp Refinement Between Arbitrary In-Hand Object Poses Yangwei You Team 2509.25746 null
2025-09-30 VLA Model Post-Training via Action-Chunked PPO and Self Behavior Cloning Zeng-Guang Hou Team 2509.25718 null
2025-09-30 dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought Yi Xu Team 2509.25681 null
2025-09-29 AIRoA MoMa Dataset: A Large-Scale Hierarchical Dataset for Mobile Manipulation Tetsuya Ogata Team 2509.25032 null
2025-09-29 World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training Qing Zhang Team 2509.24948 null
2025-09-29 IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks Ville Kyrki Team 2509.24768 null
2025-09-29 Emergent World Representations in OpenVLA Omar G. Younis Team 2509.24559 null
2025-09-29 PhysiAgent: An Embodied Agent Framework in Physical World Xianyuan Zhan Team 2509.24524 null
2025-09-28 AutoPrune: Each Complexity Deserves a Pruning Policy Zhipeng Zhang Team 2509.23931 null
2025-09-28 Control Your Robot: A Unified System for Robot Control and Policy Deployment Bingshan Hu Team 2509.23823 link
2025-09-28 Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models Pietro Mazzaglia Team 2509.23655 null
2025-09-27 Leave No Observation Behind: Real-time Correction for VLA Action Chunks Yusuke Iwasawa Team 2509.23224 null
2025-09-27 Transferring Vision-Language-Action Models to Industry Applications: Architectures, Performance, and Challenges Zhibo Pang Team 2509.23121 null
2025-09-26 VLA-Reasoner: Empowering Vision-Language-Action Models with Reasoning via Online Monte Carlo Tree Search Ziwei Wang Team 2509.22643 null
2025-09-26 UnderwaterVLA: Dual-brain Vision-Language-Action architecture for Autonomous Underwater Navigation Dixia Fan Team 2509.22441 null
2025-09-26 EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer Guan Huang Team 2509.22407 null
2025-09-29 MimicDreamer: Aligning Human and Robot Demonstrations for Scalable VLA Training Xingang Wang Team 2509.22199 null
2025-09-26 Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting Anirudha Majumdar Team 2509.22195 null
2025-09-26 Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation Chang Xu Team 2509.22093 null
2025-09-26 Developing Vision-Language-Action Model from Egocentric Videos Shinsuke Mori Team 2509.21986 null
2025-09-20 KV-Efficient VLA: A Method of Speed up Vision Language Model with RNN-Gated Chunked KV Cache Long Zhuang Team 2509.21354 null
2025-09-25 RetoVLA: Reusing Register Tokens for Spatial Reasoning in Vision-Language-Action Models Andrew Jaeyong Choi Team 2509.21243 null
2025-09-24 Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving Xianpeng Lang Team 2509.20109 null
2025-09-24 FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models Yu-Gang Jiang Team 2509.19870 null
2025-09-24 Beyond Human Demonstrations: Diffusion-Based Reinforcement Learning to Generate Data for VLA Training Yi Chen Team 2509.19752 null
2025-09-23 Agentic Scene Policies: Unifying Space, Semantics, and Affordances for Robot Action Liam Paull Team 2509.19571 link
2025-09-23 OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation Sergey Levine Team 2509.19480 null
2025-09-25 Pure Vision Language Action (VLA) Models: A Comprehensive Survey Qingguo Zhou Team 2509.19012 null
2025-09-23 Eva-VLA: Evaluating Vision-Language-Action Models’ Robustness Under Real-World Physical Variations Wen Yao Team 2509.18953 null
2025-09-22 Latent Action Pretraining Through World Modeling Ian Reid Team 2509.18428 null
2025-09-18 VLA-LPAF: Lightweight Perspective-Adaptive Fusion for Vision-Language-Action to Enable More Unconstrained Robotic Manipulation Anzhou Hou Team 2509.18183 null
2025-09-19 CoReVLA: A Dual-Stage End-to-End Autonomous Driving Framework for Long-Tail Scenarios via Collect-and-Refine Jian Sun Team 2509.15968 null
2025-09-19 A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning Jiangmiao Pang Team 2509.15937 null
2025-09-18 RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Xin Li Team 2509.15212 link
2025-09-18 Robot Control Stack: A Lean Ecosystem for Robot Learning at Scale Florian Walter Team 2509.14932 null
2025-09-18 CollabVLA: Self-Reflective Vision-Language-Action Model Dreaming Together with Human Huaping Liu Team 2509.14889 null
2025-09-18 RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI Tao Shen Team 2509.14687 null
2025-09-18 Toward Embodiment Equivariant Vision-Language-Action Policy Yue Wang Team 2509.14630 null
2025-09-17 CLAW: A Vision-Language-Action Framework for Weight-Aware Robotic Grasping Lifeng Zhou Team 2509.14143 null
2025-09-17 SeqVLA: Sequential Task Execution for Long-Horizon Manipulation with Completion-Aware Vision-Language-Action Model Yiming Feng Team 2509.14138 null
2025-09-22 GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model Dezhen Song Team 2509.14117 null
2025-09-17 Dual-Actor Fine-Tuning of VLA Models: A Talk-and-Tweak Human-in-the-Loop Approach Yangwei You Team 2509.13774 null
2025-09-17 AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving Zhi-xin Yang Team 2509.13769 null
2025-09-13 OpenHA: A Series of Open-Source Hierarchical Agentic Models in Minecraft Yitao Liang Team 2509.13347 null
2025-09-21 The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning Xianpeng Lang Team 2509.12594 link
2025-09-17 TrajBooster: Boosting Humanoid Whole-Body Manipulation via Trajectory-Centric Learning Donglin Wang Team 2509.11839 null
2025-09-15 Cross-Platform Scaling of Vision-Language-Action Models from Edge to Cloud GPUs Yanzhi Wang Team 2509.11480 null
2025-09-17 Enhancing Generalization in Vision-Language-Action Models by Preserving Pretrained Representations Xuanlin Li Team 2509.11417 link
2025-09-11 SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Ning Ding Team 2509.09674 null
2025-09-22 VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Donglin Wang Team 2509.09372 link
2025-09-11 SQAP-VLA: A Synergistic Quantization-Aware Pruning Framework for High-Performance Vision-Language-Action Models Huanrui Yang Team 2509.09090 null
2025-09-10 RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation Hao Zhao Team 2509.08820 link
2025-09-09 TA-VLA: Elucidating the Design Space of Torque-aware Vision-Language-Action Models Hao Zhao Team 2509.07962 link
2025-09-09 Graph-Fused Vision-Language-Action for Policy Reasoning in Multi-Arm Robotic Manipulation Yingbai Hu Team 2509.07957 null
2025-09-09 F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions Jiangmiao Pang Team 2509.06951 link
2025-09-11 LLaDA-VLA: Vision Language Diffusion Action Models Xiaoyan Sun Team 2509.06932 null
2025-09-08 CRISP – Compliant ROS2 Controllers for Learning-Based Manipulation Policies and Teleoperation Angela P. Schöllig Team 2509.06819 null
2025-09-09 Leveraging Vision-Language Large Models for Interpretable Video Action Recognition with Semantic Tokenization Surasakdi Siripong Team 2509.05695 null
2025-09-06 SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning Guohao Dai Team 2509.05614 null
2025-09-06 OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision Hang Zhao Team 2509.05578 null
2025-09-05 OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation Yu Xiang Team 2509.05513 null
2025-09-05 FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies Rudolf Lioutikov Team 2509.04996 null
2025-09-04 Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models Donglin Wang Team 2509.04063 null
2025-09-04 FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction Jingtai Liu Team 2509.04018 null
2025-09-03 ANNIE: Be Careful of Your Robots Yiming Gan Team 2509.03383 null
2025-09-05 Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance Xuelong Li Team 2509.02055 null
2025-09-02 AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving Shuo Li Team 2509.01944 null
2025-08-31 OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving Jun Ma Team 2509.00789 null
2025-08-30 Galaxea Open-World Dataset and G0 Dual-System VLA Model Hang Zhao Team 2509.00576 link
2025-08-30 Mechanistic interpretability for steering vision-language-action models Claire Tomlin Team 2509.00328 link
2025-09-09 EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Dong Wang Team 2508.21112 null
2025-10-02 CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification Liqiang Nie Team 2508.21046 link
2025-08-27 Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies Ping Luo Team 2508.20072 null
2025-08-28 Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation Donglin Wang Team 2508.19958 link
2025-08-28 Ego-centric Predictive Model Conditioned on Hand Trajectories Mike Zheng Shou Team 2508.19852 null
2025-08-15 TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models Huiling Duan Team 2508.19257 null
2025-08-26 MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation Gao Huang Team 2508.19236 link
2025-08-26 FlowVLA: Thinking in Motion with a Visual Chain of Thought Haoang Li Team 2508.18269 null
2025-09-06 4D Visual Pre-training for Robot Learning Huazhe Xu Team 2508.17230 null
2025-08-23 NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows Vladislav Kurenkov Team 2508.16845 null
2025-08-22 Do What? Teaching Vision-Language-Action Models to Reject the Impossible David M. Chan Team 2508.16292 null
2025-11-13 Survey of Vision-Language-Action Models for Embodied Manipulation Dongbin Zhao Team 2508.15201 null
2025-08-19 CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models Sergey Levine Team 2508.13446 null
2025-08-18 Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy Zhi Hou Team 2508.13103 null
2025-09-01 Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey Liqiang Nie Team 2508.13073 link
2025-08-17 Improving Pre-Trained Vision-Language-Action Policies with Model-Based Search Glen Berseth Team 2508.12211 null
2025-08-16 Toward General Physical Intelligence for Resilient Agile Manufacturing Automation Sunny Katyara Team 2508.11960 null
2025-08-14 CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model Hao Dong Team 2508.10416 null
2025-08-14 Large Model Empowered Embodied AI: A Survey on Decision-Making and Embodied Learning Ping Kuang Team 2508.10399 null
2025-08-14 ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver Haoang Li Team 2508.10333 null
2025-08-13 GeoVLA: Empowering 3D Representations in Vision-Language-Action Models Jiale Cao Team 2508.09071 link
2025-08-12 Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding Aleksandr I. Panov Team 2508.09032 null
2025-08-22 OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing Hengdi Zhang Team 2508.08706 link
2025-08-14 Reinforcement Learning in Vision: A Survey Mike Zheng Shou Team 2508.08189 null
2025-08-12 MolmoAct: Action Reasoning Models that can Reason in Space Ranjay Krishna Team 2508.07917 link
2025-08-13 AgentWorld: An Interactive Simulation Platform for Scene Construction and Mobile Robotic Manipulation Lei Han Team 2508.07770 null
2025-08-23 GraphCoT-VLA: A 3D Spatial-Aware Reasoning Vision-Language-Action Model for Robotic Manipulation with Ambiguous Instructions Hong Zhang Team 2508.07650 null
2025-08-15 IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model Li Sun Team 2508.06571 null
2025-08-06 Static and Plugged: Make Embodied Evaluation Simple Guangtao Zhai Team 2508.06553 null
2025-08-06 A tutorial note on collecting simulated data for vision-language-action models Jingfeng Zhang Team 2508.06547 null
2025-08-07 Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control Hamid Reza Karimi Team 2508.05342 null
2025-08-14 Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction Jorge Peña Queralta Team 2508.05294 null
2025-08-07 Learning to See and Act: Task-Aware View Planning for Robotic Manipulation Liang Lin Team 2508.05186 link
2025-08-06 Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions Xiaokang Yang Team 2508.04681 link
2025-08-06 Following Route Instructions using Large Vision-Language Models: A Comparison between Low-level and Panoramic Action Spaces Pierre Lison Team 2508.02917 null
2025-08-04 MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming Zhaoxin Fan Team 2508.02549 null
2025-08-04 CO-RFT: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning Chunhe Xia Team 2508.02219 null
2025-08-04 FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation Xiaodong Wang Team 2508.02190 null
2025-08-04 RICL: Adding In-Context Adaptability to Pre-Trained Vision-Language-Action Models Insup Lee Team 2508.02062 null
2025-07-31 XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation Ning Yang Team 2508.00097 link
2025-07-31 villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Jiang Bian Team 2507.23682 link
2025-07-31 A Unified Perception-Language-Action Framework for Adaptive Autonomous Driving Alois Knoll Team 2507.23540 null
2025-08-02 FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning Shanghang Zhang Team 2507.23318 null
2025-07-30 Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance Derek F. Wong Team 2507.22424 null
2025-07-23 InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation Jiangmiao Pang Team 2507.17520 null
2025-07-23 ERMV: Editing 4D Robotic Multi-view images to enhance embodied agents Hesheng Wang Team 2507.17462 null
2025-07-23 Confidence Calibration in Vision-Language-Action Models Richard Zemel Team 2507.17383 null
2025-07-29 VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback Harold Soh Team 2507.17294 null
2025-07-22 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Fu-En Yang Team 2507.16815 link
2025-07-21 Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Zongqing Lu Team 2507.15597 null
2025-07-22 GR-3 Technical Report Yichu Yang Team 2507.15493 link
2025-07-18 EdgeVLA: Efficient Vision-Language-Action Models Benjamin Bolte Team 2507.14049 null
2025-07-23 LaViPlan : Language-Guided Visual Path Planning with RLVR Hayeon Oh Team 2507.12911 null
2025-07-17 AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation Jun Zhu Team 2507.12768 null
2025-07-18 EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Xiaolong Wang Team 2507.12440 link
2025-07-14 Vision Language Action Models in Robotic Manipulation: A Systematic Review Irfan Hussain Team 2507.10672 null
2025-07-12 Tactile-VLA: Unlocking Vision-Language-Action Model’s Physical Knowledge for Tactile Generalization Yang Gao Team 2507.09160 null
2025-07-09 3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds Nick Haber Team 2507.06484 link
2025-07-07 NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving Cheng Lu Team 2507.05227 null
2025-10-06 VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting Yanzhi Wang Team 2507.05116 null
2025-07-17 DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Xin Jin Team 2507.04447 null
2025-07-06 Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties Yunxin Liu Team 2507.04227 null
2025-07-03 DexVLG: Dexterous Vision-Language-Grasp Model at Scale He Wang Team 2507.02747 null
2025-07-02 cVLA: Towards Efficient Camera-Space VLAs Thomas Brox Team 2507.02190 null
2025-07-03 A Survey on Vision-Language-Action Models: An Action Tokenization Perspective Yaodong Yang Team 2507.01925 null
2025-07-02 MoIRA: Modular Instruction Routing Architecture for Multi-Task Robotics Nadiya Shvai Team 2507.01843 null
2025-07-03 TriVLA: A Triple-System-Based Unified Vision-Language-Action Model for General Robot Control Yanwei Fu Team 2507.01424 null
2025-07-01 VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers Tong He Team 2507.01016 null
2025-07-01 Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding Bo Zhao Team 2507.00416 null
2025-06-30 A Survey on Vision-Language-Action Models for Autonomous Driving Lijun Sun Team 2506.24044 null
2025-06-27 4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration Li Zhang Team 2506.22242 null
2025-08-08 Can Vision Language Models Understand Mimed Actions? Jonathan May Team 2506.21586 null
2025-06-26 WorldVLA: Towards Autoregressive Action World Model Hao Chen Team 2506.21539 null
2025-06-26 Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends Zeng-Guang Hou Team 2506.20966 null
2025-06-25 Unified Vision-Language-Action Model Zhaoxiang Zhang Team 2506.19850 null
2025-06-24 CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation Jiangmiao Pang Team 2506.19816 null
2025-07-07 RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models Marco Pavone Team 2506.17811 null
2025-06-21 RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models Xiao Li Team 2506.17639 null
2025-06-21 VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models Lin Shao Team 2506.17561 null
2025-06-19 CapsDT: Diffusion-Transformer for Capsule Robot Manipulation Hongliang Ren Team 2506.16263 null
2025-06-19 ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models Siyuan Huang Team 2506.16211 null
2025-06-19 ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes Hao Dong Team 2506.14317 null
2025-06-16 GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics Mac Schwager Team 2506.14009 null
2025-06-16 AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning Jiaqi Ma Team 2506.13757 link
2025-06-19 LeVERB: Humanoid Whole-Body Control with Latent Vision-Language Instruction Shankar Sastry Team 2506.13751 null
2025-06-16 CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding Haoang Li Team 2506.13725 null
2025-06-16 ROSA: Harnessing Robot States for Vision-Language and Action Alignment Xiaoyan Sun Team 2506.13679 null
2025-06-16 Block-wise Adaptive Caching for Accelerating Diffusion Policy Zhi Wang Team 2506.13456 null
2025-06-19 A Comprehensive Survey on Continual Learning in Generative Models Cheng-Lin Liu Team 2506.13045 link
2025-06-19 SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration Wenwu Zhu Team 2506.12723 null
2025-06-13 RationalVLA: A Rational Vision-Language-Action Model with Dual System Haoang Li Team 2506.10826 null
2025-06-11 EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models Linfeng Zhang Team 2506.10100 null
2025-06-11 SAFE: Multitask Failure Detection for Vision-Language-Action Models Florian Shkurti Team 2506.09937 null
2025-06-11 From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models Chen Feng Team 2506.09930 null
2025-06-17 An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models Harshvardhan Sikka Team 2506.09172 null
2025-06-10 FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency Jian Tang Team 2506.08822 null
2025-06-10 Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing Sebastian W. Pattinson Team 2506.08462 null
2025-06-11 TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization Qi Wang Team 2506.08440 null
2025-06-11 HiBerNAC: Hierarchical Brain-emulated Robotic Neural Agent Collective for Disentangling Complex Manipulation Cong Wang Team 2506.08296 null
2025-06-14 Agentic Surgical AI: Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion in a Vision-Language-Action Framework Jason H. Moore Team 2506.08185 link
2025-06-09 BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Tieniu Tan Team 2506.07961 null
2025-06-09 Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse Chris Xiaoxuan Lu Team 2506.07639 null
2025-06-09 BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Xilin Chen Team 2506.07530 link
2025-06-09 Real-Time Execution of Action Chunking Flow Policies Sergey Levine Team 2506.07339 null
2025-06-12 Robotic Policy Learning via Human-assisted Action Preference Optimization Di Hu Team 2506.07127 null
2025-06-07 RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation Si Liu Team 2506.06677 null
2025-06-06 MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping Farshad Khorrami Team 2506.06535 null
2025-06-06 DriveAction: A Benchmark for Exploring Human-like Driving Decisions in VLA Models Xianpeng Lang Team 2506.05667 null
2025-06-04 SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models Jian Tang Team 2506.03574 null
2025-06-03 Adversarial Attacks on Robotic Vision Language Action Models J. Zico Kolter Team 2506.03350 link
2025-06-02 Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning Pheng-Ann Heng Team 2506.01953 null
2025-06-02 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Remi Cadene Team 2506.01844 link
2025-06-02 MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments Jun Zhu Team 2506.01616 null
2025-06-02 ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding Huaxiu Yao Team 2506.01300 null
2025-06-01 OG-VLA: 3D-Aware Vision Language Action Model via Orthographic Image Generation Valts Blukis Team 2506.01196 null
2025-05-31 LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Zhijie Deng Team 2506.00411 null
2025-05-30 Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction Xuelong Li Team 2505.24156 null
2025-05-29 Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models Hao Zhao Team 2505.23757 link
2025-05-29 Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better Sergey Levine Team 2505.23705 null
2025-05-29 Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents Lichao Sun Team 2505.23450 null
2025-05-29 TrackVLA: Embodied Visual Tracking in the Wild He Wang Team 2505.23189 null
2025-05-28 ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Wenqiang Zhang Team 2505.22159 null
2025-05-29 ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge Yi Xu Team 2505.21906 null
2025-05-27 EaqVLA: Encoding-aligned Quantization for Vision-Language-Action Models Xiang Chen Team 2505.21567 null
2025-06-02 Hume: Introducing System-2 Thinking in Visual-Language-Action Model Xuelong Li Team 2505.21432 null
2025-05-27 Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models Tao Chen Team 2505.21200 null
2025-05-26 Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review Goldie Nejat Team 2505.20503 null
2025-05-26 What Can RL Bring to VLA Generalization? An Empirical Study Yu Wang Team 2505.19789 null
2025-05-26 RFTF: Reinforcement Fine-tuning for Embodied Agents with Temporal Feedback Yongtao Wang Team 2505.19767 null
2025-05-25 ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning Minh Nhat Vu Team 2505.19080 null
2025-05-24 Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance Maoqing Yao Team 2505.18793 null
2025-05-24 VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning Ziwei Wang Team 2505.18719 link
2025-05-22 ScanBot: Towards Intelligent Surface Scanning in Embodied Robotic Systems Farhad Imani Team 2505.17295 null
2025-05-22 Interactive Post-Training for Vision-Language-Action Models Philipp Krähenbühl Team 2505.17016 null
2025-05-22 Perceptual Quality Assessment for Embodied AI Guangtao Zhai Team 2505.16815 link
2025-05-22 BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization Lichao Sun Team 2505.16640 null
2025-05-22 DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving Junchi Yan Team 2505.16278 null
2025-05-21 From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems Soujanya Poria Team 2505.15685 link
2025-05-24 Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization Junwei Liang Team 2505.15660 link
2025-05-21 FLARE: Robot Learning with Implicit World Modeling Linxi Fan Team 2505.15659 null
2025-05-21 Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control Jungwook Choi Team 2505.15304 null
2025-05-21 EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy Hongliang Ren Team 2505.15206 null
2025-05-21 Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation Xiaodong He Team 2505.15098 null
2025-05-20 AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory Ping Luo Team 2505.14030 null
2025-05-22 InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning Jingkuan Song Team 2505.13888 link
2025-05-25 RoboFAC: A Comprehensive Framework for Robotic Failure Analysis and Correction Bo Zhao Team 2505.12224 null
2025-05-17 OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning Yang Gao Team 2505.11917 null
2025-05-16 Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions Donglin Wang Team 2505.11214 null
2025-05-16 Conditioning Matters: Training Diffusion Policies is Faster Than You Think Jianye Hao Team 2505.11123 null
2025-05-14 Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware Ken Goldberg Team 2505.09601 null
2025-05-14 RT-cache: Efficient Robot Trajectory Retrieval System Amir Barati Farimani Team 2505.09040 null
2025-05-13 From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation Jianye Hao Team 2505.08548 null
2025-05-17 Training Strategies for Efficient Embodied Reasoning Sergey Levine Team 2505.08243 null
2025-05-12 Pixel Motion as Universal Representation for Robot Control Michael S Ryoo Team 2505.07817 null
2025-05-12 ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning Donglin Wang Team 2505.07395 null
2025-05-15 UniVLA: Learning to Act Anywhere with Task-centric Latent Actions Hongyang Li Team 2505.06111 link
2025-05-09 3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks Farshad Khorrami Team 2505.05800 null
2025-05-08 Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments Harshvardhan Sikka Team 2505.05540 link
2025-05-09 Vision-Language-Action Models: Concepts, Progress, Applications and Challenges Manoj Karkee Team 2505.04769 null
2025-05-06 OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Donglin Wang Team 2505.03912 link
2025-05-16 Task Reconstruction and Extrapolation for $π_0$ using Text Latent Quanyi Li Team 2505.03500 null
2025-05-06 GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data He Wang Team 2505.03233 null
2025-05-06 Automated Data Curation Using GPS & NLP to Generate Instruction-Action Pairs for Autonomous Vehicle Vision-Language Navigation Datasets Ross Greer Team 2505.03174 null
2025-05-04 CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation Hao Dong Team 2505.02166 null
2025-05-04 Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions Mingyu Ding Team 2505.02152 null
2025-04-28 NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Soujanya Poria Team 2504.19854 null
2025-04-22 $π_{0.5}$ : a Vision-Language-Action Model with Open-World Generalization Ury Zhilinsky Team 2504.16054 null
2025-04-22 Few-Shot Vision-Language Action-Incremental Policy Learning Weili Guan Team 2504.15517 null
2025-04-18 GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents Xiaobo Xia Team 2504.10458 null
2025-04-09 OPAL: Encoding Causal Understanding of Physical Systems for Robot Learning Tyler Fenstermaker Team 2504.06538 null
2025-04-02 Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning Roozbeh Mottaghi Team 2504.00907 null
2025-03-30 OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model Alois C. Knoll Team 2503.23463 link
2025-03-27 CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models Tsung-Yi Lin Team 2503.22020 null
2025-04-14 MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation Shanghang Zhang Team 2503.20384 null
2025-03-25 Gemini Robotics: Bringing AI into the Physical World Yuxiang Zhou Team 2503.20020 null
2025-03-25 Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Yuntao Chen Team 2503.19757 null
2025-03-25 DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data Lin Ma Team 2503.19516 null
2025-03-27 GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Yuke Zhu Team 2503.14734 null
2025-03-15 ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis Mingyu Ding Team 2503.14526 null
2025-03-17 MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation Haibin Yan Team 2503.13446 null
2025-03-17 HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model Shanghang Zhang Team 2503.10631 null
2025-03-13 SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment Oleg Sinavski Team 2503.09594 null
2025-03-12 CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games Bo Zheng Team 2503.09527 null
2025-03-11 MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models Zongyuan Ge Team 2503.08007 null
2025-03-10 PointVLA: Injecting the 3D World into Vision-Language-Action Models Yichen Zhu Team 2503.07511 null
2025-03-11 CLAD: Constrained Latent Action Diffusion for Vision-Language Procedure Planning Andreas Bulling Team 2503.06637 null
2025-03-06 Refined Policy Distillation: From VLA Generalists to RL Experts Florian Walter Team 2503.05833 null
2025-03-06 VLA Model-Expert Collaboration for Bi-directional Manipulation Learning Zeng-Guang Hou Team 2503.04163 null
2025-03-26 OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction Pieter Abbeel Team 2503.03734 null
2025-03-05 SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning Yaodong Yang Team 2503.03480 null
2025-03-04 Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding Haoang Li Team 2503.02310 null
2025-03-03 CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs Dzmitry Tsetserukou Team 2503.01378 null
2025-10-15 CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving Issei Yamamoto Team 2408.10845 link
2024-07-23 Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators Harsh Lunia Team 2407.14834 null
2024-03-15 3D-VLA: A 3D Vision-Language-Action Generative World Model Chuang Gan Team 2403.09631 link
2022-07-19 Zero-Shot Temporal Action Detection via Vision-Language Prompting Tao Xiang Team 2207.08184 link
2022-06-01 ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts Xiaodan Liang Team 2205.15509 null
2022-08-16 A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility Bryan A. Plummer Team 2202.02312 null
2017-04-25 An Analysis of Action Recognition Datasets for Language and Vision Tasks Frank Keller Team 1704.07129 null

Humanoid

Publish Date Title Authors PDF Code
2025-11-20 InEKFormer: A Hybrid State Estimator for Humanoid Robots Frank Kirchner Team 2511.16306 null
2025-11-19 VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation Yuke Zhu Team 2511.15200 link
2025-11-18 HMC: Learning Heterogeneous Meta-Control for Contact-Rich Loco-Manipulation Xiaolong Wang Team 2511.14756 null
2025-11-15 Learning Adaptive Neural Teleoperation for Humanoid Robots: From Inverse Kinematics to End-to-End Control Sanjar Atamuradov Team 2511.12390 null
2025-11-14 Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning Xiaoyu Ren Team 2511.11218 null
2025-11-13 DecARt Leg: Design and Evaluation of a Novel Humanoid Robot Leg with Decoupled Actuation for Agile Locomotion Roman Gorbachev Team 2511.10021 null
2025-11-12 SPIDER: Scalable Physics-Informed Dexterous Retargeting Francois Hogan Team 2511.09484 link
2025-11-12 Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots Siheng Chen Team 2511.09241 null
2025-11-12 RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation Miao Li Team 2511.09141 null
2025-11-10 Unified Humanoid Fall-Safety Policy from a Few Demonstrations Stella X. Yu Team 2511.07407 null
2025-11-10 Human-Level Actuation for Humanoids MD-Nazmus Sunbeam Team 2511.06796 null
2025-11-11 Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning Chenjia Bai Team 2511.06371 null
2025-11-08 Towards Human-AI-Robot Collaboration and AI-Agent based Digital Twins for Parkinson’s Disease Management: Review and Outlook Tareq Y. Al-Naffouri Team 2511.06036 null
2025-11-06 ScheduleStream: Temporal Planning with Samplers for GPU-Accelerated Multi-Arm Task and Motion Planning & Scheduling Fabio Ramos Team 2511.04758 link
2025-11-06 GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction C. Karen Liu Team 2511.04679 link
2025-11-06 BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning Guanya Shi Team 2511.04131 null
2025-11-06 Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots Mingguo Zhao Team 2511.03996 link
2025-11-05 OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera Kaiwei Wang Team 2511.03571 link
2025-11-04 TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System C. Karen Liu Team 2511.02832 link
2025-11-02 Heuristic Step Planning for Learning Dynamic Bipedal Locomotion: A Comparative Study of Model-Based and Model-Free Approaches Roman Gorbachev Team 2511.00840 null
2025-10-31 EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations Philipp Wu Team 2511.00153 null
2025-10-31 Towards a Multi-Embodied Grasping Agent Gerhard Neumann Team 2510.27420 null
2025-10-30 Cooperative Task Spaces for Multi-Arm Manipulation Control based on Similarity Transformations Sylvain Calinon Team 2510.26362 null
2025-11-05 Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments Shaqi Luo Team 2510.26280 null
2025-11-01 Beyond the Uncanny Valley: A Mixed-Method Investigation of Anthropomorphism in Protective Responses to Robot Abuse Renkai Ma Team 2510.26082 null
2025-10-28 A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation Kyung-Joong Kim Team 2510.25725 null
2025-10-27 Awakening Facial Emotional Expressions in Human-Robot Jianwei Zhang Team 2510.23059 null
2025-11-05 Toward Humanoid Brain-Body Co-design: Joint Optimization of Control and Morphology for Fall Recovery Guiliang Liu Team 2510.22336 null
2025-10-21 SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices Yueyue Dai Team 2510.18544 null
2025-10-20 Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints Jiangmiao Pang Team 2510.18002 null
2025-10-20 SoftMimic: Learning Compliant Whole-body Control from Examples Pulkit Agrawal Team 2510.17792 link
2025-10-19 CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions Aaron D. Ames Team 2510.14959 null
2025-10-17 From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance Chang Xu Team 2510.14952 null
2025-10-16 Towards Adaptable Humanoid Control via Adaptive Motion Tracking Jiangmiao Pang Team 2510.14454 null
2025-10-15 A Modular Object Detection System for Humanoid Robots Using YOLO Meng Cheng Lau Team 2510.13625 null
2025-10-15 Development of an Intuitive GUI for Non-Expert Teleoperation of Humanoid Robots Meng Cheng Lau Team 2510.13594 null
2025-10-14 PolygMap: A Perceptive Locomotion Framework for Humanoid Robot Stair Climbing Yucong Wu Team 2510.12346 null
2025-10-13 Ego-Vision World Model for Humanoid Contact Planning Koushil Sreenath Team 2510.11682 null
2025-10-13 Simultaneous Calibration of Noise Covariance and Kinematics for State Estimation of Legged Robots via Bi-level Optimization Xiaobin Xiong Team 2510.11539 null
2025-10-13 Path and Motion Optimization for Efficient Multi-Location Inspection with Humanoid Robots Yao Su Team 2510.11401 null
2025-10-13 DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation Zongqing Lu Team 2510.11258 null
2025-10-13 PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System Jiangmiao Pang Team 2510.11072 link
2025-10-12 Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion Mingguo Zhao Team 2510.10851 null
2025-10-11 It Takes Two: Learning Interactive Whole-Body Control Between Humanoid Robots Siheng Chen Team 2510.10206 null
2025-10-10 Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks Zhicheng He Team 2510.09786 null
2025-10-09 Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation Yue Wang Team 2510.08807 null
2025-10-09 DexMan: Learning Bimanual Dexterous Manipulation from Human and Generated Videos Tsung-Wei Ke Team 2510.08475 link
2025-10-09 Reliability of Single-Level Equality-Constrained Inverse Optimal Control Vincent Bonnet Team 2510.08406 null
2025-10-15 Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots Zongqing Lu Team 2510.07882 null
2025-10-10 DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction Qiang Zhang Team 2510.07152 null
2025-10-07 A Co-Design Framework for Energy-Aware Monoped Jumping with Detailed Actuator Modeling Shishir Kolathaya Team 2510.05923 null
2025-10-06 Walking, Rolling, and Beyond: First-Principles and RL Locomotion on a TARS-Inspired Robot Abhishek Warrier Team 2510.05001 null
2025-10-05 Stability-Aware Retargeting for Humanoid Multi-Contact Teleoperation Robert Griffin Team 2510.04353 null
2025-10-03 LapSurgie: Humanoid Robots Performing Surgery via Teleoperated Handheld Laparoscopy Michael C. Yip Team 2510.03529 null
2025-10-03 Embracing Evolution: A Call for Body-Control Co-Design in Embodied Humanoid Robot Kui Jia Team 2510.03081 null
2025-10-03 HumanoidExo: Scalable Whole-Body Humanoid Manipulation via Wearable Exoskeleton Yi Xu Team 2510.03022 null
2025-10-02 Retargeting Matters: General Motion Retargeting for Humanoid Motion Tracking C. Karen Liu Team 2510.02252 null
2025-10-02 Stand Up, NAO! Increasing the Reliability of Stand-Up Motions Through Error Compensation in Position Control Tim Laue Team 2510.02129 null
2025-10-02 Like Playing a Video Game: Spatial-Temporal Optimization of Foot Trajectories for Controlled Football Kicking in Bipedal Robots Peng Lu Team 2510.01843 null
2025-09-30 Learning Human Reaching Optimality Principles from Minimal Observation Inverse Reinforcement Learning Ludovic Righetti Team 2510.00329 null
2025-10-08 OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction Guanya Shi Team 2509.26633 link
2025-09-30 ISyHand: A Dexterous Multi-finger Robot Hand with an Articulated Palm Katherine J. Kuchenbecker Team 2509.26236 null
2025-09-30 Evolutionary Continuous Adaptive RL-Powered Co-Design for Humanoid Chin-Up Performance Frank Kirchner Team 2509.26082 null
2025-10-06 CoTaP: Compliant Task Pipeline and Reinforcement Learning of Its Controller with Compliance Modulation Yoshihiko Nakamura Team 2509.25443 null
2025-09-29 Stabilizing Humanoid Robot Trajectory Generation via Physics-Informed Learning and Control-Informed Steering Daniele Pucci Team 2509.24697 null
2025-09-29 Game Theory to Study Cooperation in Human-Robot Mixed Groups: Exploring the Potential of the Public Good Game Alessandra Sciutti Team 2509.24530 null
2025-09-29 Preference-Based Long-Horizon Robotic Stacking with Multimodal Large Language Models Sethu Vijayakumar Team 2509.24163 null
2025-09-28 SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where Chuanchen Luo Team 2509.23852 null
2025-09-25 SEEC: Stable End-Effector Control with Model-Enhanced Residual Learning for Humanoid Loco-Manipulation Ye Zhao Team 2509.21231 null
2025-09-25 RuN: Residual Policy for Natural Humanoid Locomotion Yong Liu Team 2509.20696 null
2025-09-24 Large Pre-Trained Models for Bimanual Manipulation in 3D David Meger Team 2509.20579 null
2025-09-24 VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation Jiajun Wu Team 2509.20322 link
2025-09-25 HL-IK: A Lightweight Implementation of Human-Like Inverse Kinematics in Humanoid Arms Houde Liu Team 2509.20263 null
2025-09-23 Chasing Stability: Humanoid Running via Control Lyapunov Function Guided Reinforcement Learning Aaron D. Ames Team 2509.19573 null
2025-09-23 RoMoCo: Robotic Motion Control Toolbox for Reduced-Order Model-Based Locomotion on Bipedal and Humanoid Robots Aaron D. Ames Team 2509.19545 null
2025-09-25 Residual Off-Policy RL for Finetuning Behavior Cloning Policies Anusha Nagabandi Team 2509.19301 link
2025-09-27 HDMI: Learning Interactive Humanoid Whole-Body Control from Human Videos Guanya Shi Team 2509.16757 null
2025-09-20 KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control Chenjia Bai Team 2509.16638 null
2025-09-19 A Framework for Optimal Ankle Design of Humanoid Robots Daniele Pucci Team 2509.16469 null
2025-09-19 A Matter of Height: The Impact of a Robotic Object on Human Compliance Hadas Erel Team 2509.16032 null
2025-09-18 Implicit Kinodynamic Motion Retargeting for Human-to-humanoid Imitation Learning Haodong Zhang Team 2509.15443 null
2025-09-18 CAD-Driven Co-Design for Flight-Ready Jet-Powered Humanoids Daniele Pucci Team 2509.14935 null
2025-09-18 RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI Tao Shen Team 2509.14687 null
2025-09-23 Cybersecurity AI: Humanoid Robots as Attack Vectors Kevin Finisterre Team 2509.14139 null
2025-09-17 The Cybersecurity of a Humanoid Robot Víctor Mayoral-Vilches Team 2509.14096 null
2025-09-17 Behavior Foundation Model for Humanoid Robots Jiangmiao Pang Team 2509.13780 null
2025-09-17 FSR-VLN: Fast and Slow Reasoning for Vision-Language Navigation with Hierarchical Multi-modal Scene Graph Zhizhong Su Team 2509.13733 null
2025-09-16 Embracing Bulky Objects with Humanoid Robots: Whole-Body Manipulation with Reinforcement Learning Jun Ma Team 2509.13534 null
2025-09-18 StageACT: Stage-Conditioned Imitation for Robust Humanoid Door Opening Shayegan Omidshafiei Team 2509.13200 null
2025-09-14 Quantum deep reinforcement learning for humanoid robot navigation task Ahmed Biyabani Team 2509.11388 null
2025-09-16 FEWT: Improving Humanoid Robot Perception with Frequency-Enhanced Wavelet-based Transformers Zhigong Song Team 2509.11109 null
2025-09-16 Data-fused Model Predictive Control with Guarantees: Application to Flying Humanoid Robots Daniele Pucci Team 2509.10353 null
2025-09-11 MimicDroid: In-Context Learning for Humanoid Robot Manipulation from Human Play Videos Yuke Zhu Team 2509.09769 null
2025-09-11 AGILOped: Agile Open-Source Humanoid Robot for Research Sven Behnke Team 2509.09364 null
2025-09-09 Attribute-based Object Grounding and Robot Grasp Detection with Spatial Reasoning Changhyun Choi Team 2509.08126 null
2025-09-09 Interactive Shaping of Granular Media Using Reinforcement Learning Maren Bennewitz Team 2509.06469 null
2025-09-06 Learning to Walk in Costume: Adversarial Motion Priors for Aesthetically Constrained Humanoids Dennis W. Hong Team 2509.05581 null
2025-09-08 Hierarchical Reduced-Order Model Predictive Control for Robust Locomotion on Humanoid Robots Aaron D. Ames Team 2509.04722 null
2025-09-03 The Role of Embodiment in Intuitive Whole-Body Teleoperation for Mobile Manipulation Georgia Chalvatzaki Team 2509.03222 null
2025-09-01 ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training Dieter Fox Team 2509.01819 null
2025-09-04 HITTER: A HumanoId Table TEnnis Robot via Hierarchical Planning and Learning S. Shankar Sastry Team 2508.21043 null
2025-09-16 Traversing the Narrow Path: A Two-Stage Reinforcement Learning Framework for Humanoid Beam Walking Shiwu Zhang Team 2508.20661 link
2025-08-26 HuBE: Cross-Embodiment Human-like Behavior Execution for Humanoid Robots Guodong Guo Team 2508.19002 null
2025-08-21 PriorFormer: A Transformer for Real-time Monocular 3D Human Pose Estimation with Versatile Geometric Priors Vincent Bonnet Team 2508.18238 null
2025-09-01 SoK: Cybersecurity Assessment of Humanoid Ecosystem Yuval Elovici Team 2508.17481 null
2025-08-20 LookOut: Real-World Humanoid Egocentric Navigation Leonidas J. Guibas Team 2508.14466 null
2025-08-18 Scaling Whole-body Multi-contact Manipulation with Contact Optimization Sethu Vijayakumar Team 2508.12980 null
2025-08-18 Foundation Model for Skeleton-Based Human Action Understanding Liang Wang Team 2508.12586 link
2025-08-27 Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids Shuran Song Team 2508.12252 null
2025-08-17 Humanoid Motion Scripting with Postural Synergies Oussama Khatib Team 2508.12184 null
2025-08-16 Contact-Rich and Deformable Foot Modeling for Locomotion Control of the Human Musculoskeletal System Yanan Sui Team 2508.11885 null
2025-08-16 From Screen to Stage: Kid Cosmo, A Life-Like, Torque-Controlled Humanoid for Entertainment Robotics Dennis W. Hong Team 2508.11884 null
2025-08-15 Anticipatory and Adaptive Footstep Streaming for Teleoperated Bipedal Robots Robert Griffin Team 2508.11802 null
2025-08-15 A Comparative Study of Floating-Base Space Parameterizations for Agile Whole-Body Motion Planning Konstantinos Chatzilygeroudis Team 2508.11520 null
2025-08-15 Learning Differentiable Reachability Maps for Optimization-based Humanoid Motion Generation Fumio Kanehiro Team 2508.11275 null
2025-08-15 Geometry-Aware Predictive Safety Filters on Humanoids: From Poisson Safety Functions to CBF Constrained MPC Aaron D. Ames Team 2508.11129 null
2025-08-14 MASH: Cooperative-Heterogeneous Multi-Agent Reinforcement Learning for Single Humanoid Robot Locomotion Yanjie Li Team 2508.10423 null
2025-08-13 GBC: Generalized Behavior-Cloning Framework for Whole-Body Humanoid Imitation Jun-Guo Lu Team 2508.09960 null
2025-08-11 PCHands: PCA-based Hand Pose Synergy Representation on Manipulators with N-DoF Lorenzo Natale Team 2508.07945 null
2025-08-11 End-to-End Humanoid Robot Safe and Comfortable Locomotion Policy Junwei Liang Team 2508.07611 null
2025-08-09 Learning a Vision-Based Footstep Planner for Hierarchical Walking Control Michael Posa Team 2508.06779 null
2025-08-07 Examining the legibility of humanoid robot arm movements in a pointing task Igor Farkaš Team 2508.05104 null
2025-08-06 INTENTION: Inferring Tendencies of Humanoid Robot Motion Through Interactive Intuition and Grounded VLM Nikos Tsagarakis Team 2508.04931 link
2025-08-06 On the causality between affective impact and coordinated human-robot reactions Kasper Støy Team 2508.04834 null
2025-08-06 Binaural Sound Event Localization and Detection Neural Network based on HRTF Localization Cues for Humanoid Robots Gyeong-Tae Lee Team 2508.04333 null
2025-08-08 Would you let a humanoid play storytelling with your child? A usability study on LLM-powered narrative Human-Robot Interaction Agnieszka Wykowska Team 2508.02505 null
2025-08-04 Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis Jingya Wang Team 2508.02106 null
2025-08-02 Coordinated Humanoid Robot Locomotion with Symmetry Equivariant Reinforcement Learning Policy Yue Gao Team 2508.01247 null
2025-08-01 A Whole-Body Motion Imitation Framework from Human Data for Full-Size Humanoid Robot Rong Xiong Team 2508.00362 null
2025-08-01 TOP: Time Optimization Policy for Stable and Accurate Standing Manipulation with Humanoid Robots Rong Xiong Team 2508.00355 null
2025-07-31 CHILD (Controller for Humanoid Imitation and Live Demonstration): a Whole-Body Humanoid Teleoperation System Joohyung Kim Team 2508.00162 null
2025-07-31 The Monado SLAM Dataset for Egocentric Visual-Inertial Tracking Taihú Pire Team 2508.00088 null
2025-07-28 Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots Yong-Hwa Park Team 2507.20530 null
2025-07-28 LLMs-guided adaptive compensator: Bringing Adaptivity to Automatic Control Systems with Large Language Models Yusuke Iwasawa Team 2507.20509 null
2025-07-29 Humanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid Robots Qiang Zhang Team 2507.20217 null
2025-07-27 LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks Xiaoshuang Shi Team 2507.20174 null
2025-07-25 How Age Influences the Interpretation of Emotional Body Language in Humanoid Robots – long paper version Giuseppe Palestra Team 2507.19335 null
2025-07-24 Experimental Comparison of Whole-Body Control Formulations for Humanoid Robots in Task Acceleration and Task Force Spaces Christian Ott Team 2507.18502 link
2025-07-22 Humanoid Robot Whole-body Geometric Calibration with Embedded Sensors and a Single Plane Florent Lamiraux Team 2507.16369 null
2025-07-20 Integrating Reason-Based Moral Decision-Making in the Reinforcement Learning Architecture Lisa Dargasz Team 2507.15895 null
2025-07-21 EMP: Executable Motion Prior for Humanoid Robot Standing Upper-body Motion Imitation Rong Xiong Team 2507.15649 null
2025-07-16 Robot Drummer: Learning Rhythmic Skills for Humanoid Drumming Loris Roveda Team 2507.11498 null
2025-07-15 From Production Logistics to Smart Manufacturing: The Vision for a New RoboCup Industrial League Shohei Yasuda Team 2507.11402 null
2025-07-14 Physics-Informed Neural Networks with Unscented Kalman Filter for Sensorless Joint Torque Estimation in Humanoid Robots Daniele Pucci Team 2507.10105 null
2025-07-11 Learning Robust Motion Skills via Critical Adversarial Attacks for Humanoid Robots Yue Gao Team 2507.08303 null
2025-07-10 UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots Weinan Zhang Team 2507.07356 null
2025-07-09 ULC: A Unified and Fine-Grained Controller for Humanoid Loco-Manipulation Zongwu Xie Team 2507.06905 null
2025-07-08 Learning to Evaluate Autonomous Behaviour in Human-Robot Interaction Alessio Del Bue Team 2507.06404 null
2025-07-05 Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning Sangbae Kim Team 2507.04140 null
2025-07-01 HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning Chenjia Bai Team 2507.00833 null
2025-06-30 Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation Dennis Hong Team 2507.00273 null
2025-07-02 DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover Yuexin Ma Team 2506.23152 null
2025-06-29 Learning Motion Skills with Adaptive Assistive Curriculum Force in Humanoid Robots Yue Gao Team 2506.23125 null
2025-07-10 Hierarchical Vision-Language Planning for Multi-Step Humanoid Manipulation Navid Azizan Team 2506.22827 null
2025-06-20 Unsupervised Discovery of Behavioral Primitives from Sensorimotor Dynamic Functional Connectivity Matej Hoffmann Team 2506.22473 null
2025-07-14 Ark: An Open-source Python-based Framework for Robot Learning Haitham Bou-Ammar Team 2506.21628 null
2025-07-18 A Survey of Behavior Foundation Model: Next-Generation Whole-Body Control System of Humanoid Robots Wenjun Zeng Team 2506.20487 null
2025-06-19 DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning Zongqing Lu Team 2506.16012 link
2025-06-18 TACT: Humanoid Whole-body Contact Manipulation through Deep Imitation Learning with Tactile Modality Eiichi Yoshida Team 2506.15146 null
2025-06-18 Booster Gym: An End-to-End Reinforcement Learning Framework for Humanoid Robot Locomotion Mingguo Zhao Team 2506.15132 link
2025-06-17 GMT: General Motion Tracking for Humanoid Whole-Body Control Xiaolong Wang Team 2506.14770 null
2025-06-17 Whole-Body Control Framework for Humanoid Robots with Heavy Limbs: A Model-Based Approach Yun-Hui Liu Team 2506.14278 null
2025-06-15 KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills Xuelong Li Team 2506.12851 null
2025-06-19 From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots Zongqing Lu Team 2506.12779 null
2025-06-15 RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control Zongqing Lu Team 2506.12769 null
2025-06-14 Explosive Output to Enhance Jumping Ability: A Variable Reduction Ratio Design Paradigm for Humanoid Robots Knee Joint Qiang Huang Team 2506.12314 null
2025-06-13 mimic-one: a Scalable Model Recipe for General Purpose Robot Dexterity Robert K. Katzschmann Team 2506.11916 null
2025-06-11 Exploring EEG Responses during Observation of Actions Performed by Human Actor and Humanoid Robot Michelle J. Johnson Team 2506.10170 null
2025-06-11 Locomotion on Constrained Footholds via Layered Architectures and Model Predictive Control Aaron D. Ames Team 2506.09979 null
2025-06-11 Attention-Based Map Encoding for Learning Generalized Legged Locomotion Marco Hutter Team 2506.09588 null
2025-06-11 Bipedal Balance Control with Whole-body Musculoskeletal Standing and Falling Simulations Yanan Sui Team 2506.09383 null
2025-06-11 SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending Yue Wang Team 2506.09366 link
2025-06-10 Fast Estimation of Globally Optimal Independent Contact Regions for Robust Grasping and Manipulation Nancy S. Pollard Team 2506.08856 null
2025-06-12 MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains Xuelong Li Team 2506.08840 null
2025-06-10 Periodic Bipedal Gait Learning Using Reward Composition Based on a Novel Gait Planner for Humanoid Robots Lijun Zhu Team 2506.08416 null
2025-06-05 Realizing Text-Driven Motion Generation on NAO Robot: A Reinforcement Learning-Optimized Control Pipeline Qijun Chen Team 2506.05117 link
2025-06-04 Phase-based Nonlinear Model Predictive Control for Humanoid Walking Stabilization with Single and Double Support Time Adjustments Jaeheung Park Team 2506.03856 null
2025-06-03 AURA: Agentic Upskilling via Reinforced Abstractions Dennis Hong Team 2506.02507 null
2025-06-02 Reinforcement Learning with Data Bootstrapping for Dynamic Subgoal Pursuit in Humanoid Robot Navigation Ayonga Hereid Team 2506.02206 null
2025-06-02 Learning with pyCub: A New Simulation and Exercise Framework for Humanoid Robotics Matej Hoffmann Team 2506.01756 null
2025-06-05 Hierarchical Intention-Aware Expressive Motion Generation for Humanoid Robots Chengxu Zhou Team 2506.01563 null
2025-06-01 Humanoid World Models: Open World Foundation Models for Humanoid Robotics Mohammad Al-Sharman Team 2506.01182 null
2025-06-01 iRonCub 3: The Jet-Powered Flying Humanoid Robot Daniele Pucci Team 2506.01125 null
2025-05-30 Learning Aerodynamics for the Control of Flying Humanoid Robots Daniele Pucci Team 2506.00305 null
2025-05-30 Interactive Imitation Learning for Dexterous Robotic Manipulation: Challenges and Perspectives – A Survey Rania Rayyes Team 2506.00098 null
2025-06-05 SignBot: Learning Human-to-Humanoid Sign Language Interaction Guiliang Liu Team 2505.24266 null
2025-05-30 Humanoid Loco-Manipulations Pattern Generation and Stabilization Control Abderrahmane Kheddar Team 2505.24116 null
2025-05-29 Humanoid Loco-manipulation Planning based on Graph Search and Reachability Maps Abderrahmane Kheddar Team 2505.23505 null
2025-05-29 Centroidal Trajectory Generation and Stabilization based on Preview Control for Humanoid Multi-contact Motion Fumio Kanehiro Team 2505.23499 link
2025-06-01 FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control Pieter Abbeel Team 2505.22642 null
2025-05-27 Learning Unified Force and Position Control for Legged Loco-Manipulation Siyuan Huang Team 2505.20829 null
2025-05-27 Gait-Conditioned Reinforcement Learning with Multi-Phase Curriculum for Humanoid Locomotion CHengxu Zhou Team 2505.20619 null
2025-05-26 Integrating emotional intelligence, memory architecture, and gestures to achieve empathetic humanoid robot interaction in an educational setting Paul Craig Team 2505.19803 null
2025-05-26 Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning Jean-Baptiste Mouret Team 2505.19717 null
2025-05-26 Whole-body Multi-contact Motion Control for Humanoid Robots Based on Distributed Tactile Sensors Eiichi Yoshida Team 2505.19580 link
2025-05-26 Heavy lifting tasks via haptic teleoperation of a wheeled humanoid Joao Ramos Team 2505.19530 null
2025-05-26 SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control Junting Dong Team 2505.19463 null
2025-05-25 Towards Humanoid Robot Autonomy: A Dynamic Architecture Integrating Continuous thought Machines (CTM) and Model Context Protocol (MCP) Libo Wang Team 2505.19339 link
2025-05-25 Staircase Recognition and Location Based on Polarization Vision Zhiying Tan Team 2505.19026 null
2025-05-23 DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation Ruqi Huang Team 2505.18078 null
2025-08-11 Unified Multi-Rate Model Predictive Control for a Jet-Powered Humanoid Robot Daniele Pucci Team 2505.16478 null
2025-05-19 TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion Minh Nhat Vu Team 2505.13549 null
2025-05-19 DreamGen: Unlocking Generalization in Robot Learning through Neural Trajectories Linxi Fan Team 2505.12705 null
2025-05-19 Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion Qi Wu Team 2505.12679 null
2025-05-16 Bracing for Impact: Robust Humanoid Push Recovery and Locomotion with Reduced Order Models Aaron D. Ames Team 2505.11495 null
2025-05-16 X2C: A Dataset Featuring Nuanced Facial Expressions for Realistic Humanoid Imitation Xiaohan Yu Team 2505.11146 link
2025-05-15 NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance Jiangmiao Pang Team 2505.08712 null
2025-05-13 Rethink Repeatable Measures of Robot Performance with Statistical Query Dylan Khor Team 2505.08216 null
2025-05-14 Neural Brain: A Neuroscience-inspired Framework for Embodied Agents Lin Wang Team 2505.07634 link
2025-05-12 HuB: Learning Extreme Humanoid Balance Yang Gao Team 2505.07294 null
2025-05-11 Dynamic Safety in Complex Environments: Synthesizing Safety Filters with Poisson’s Equation Aaron D. Ames Team 2505.06794 null
2025-05-10 JAEGER: Dual-Level Humanoid Whole-Body Controller Zongqing Lu Team 2505.06584 null
2025-05-09 Let Humanoids Hike! Integrative Skill Development on Complex Trails Stella X. Yu Team 2505.06218 null
2025-05-09 Safe-EF: Error Feedback for Nonsmooth Constrained Optimization Ilyas Fatkhullin Team 2505.06053 null
2025-05-09 Human-Robot Collaboration for the Remote Control of Mobile Humanoid Robots with Torso-Arm Coordination Zhi Li Team 2505.05773 null
2025-05-07 Vision-Language-Action Models: Concepts, Progress, Applications and Challenges Manoj Karkee Team 2505.04769 null
2025-05-06 AMO: Adaptive Motion Optimization for Hyper-Dexterous Humanoid Whole-Body Control Xiaolong Wang Team 2505.03738 null
2025-05-13 Visual Imitation Enables Contextual Humanoid Control Angjoo Kanazawa Team 2505.03729 null
2025-05-05 TWIST: Teleoperated Whole-Body Imitation System C. Karen Liu Team 2505.02833 null
2025-04-30 LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning Koushil Sreenath Team 2504.21738 null
2025-04-29 SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings Jianwei Zhang Team 2504.20808 null
2025-04-27 Personalized Artificial General Intelligence (AGI) via Neuroscience-Inspired Continuous Learning Systems Jairaj Singh Shaktawat Team 2504.20109 null
2025-04-24 Demonstrating Berkeley Humanoid Lite: An Open-source, Accessible, and Customizable 3D-printed Humanoid Robot Koushil Sreenath Team 2504.17249 null
2025-04-20 ExFace: Expressive Facial Control for Humanoid Robots with Diffusion Transformers and Bootstrap Training Jiahao Chen Team 2504.14477 null
2025-04-19 Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning Xuelong Li Team 2504.14305 null
2025-04-18 Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning Fumio Kanehiro Team 2504.13619 link
2025-04-16 EmoACT: a Framework to Embed Emotions into Artificial Agents Based on Affect Control Theory Carmine Tommaso Recchiuto Team 2504.12125 null
2025-04-14 Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain Zhengtao Zhang Team 2504.10390 null
2025-04-14 PreCi: Pretraining and Continual Improvement of Humanoid Locomotion via Model-Assumption-Based Regularization Sehoon Ha Team 2504.09833 null
2025-04-13 Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation Yi Fang Team 2504.09532 null
2025-04-11 Spectral Normalization for Lipschitz-Constrained Policies on Learning Humanoid Locomotion Jaeheung Park Team 2504.08246 null
2025-04-07 MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond Xun Cao Team 2504.05046 null
2025-04-07 A High-Force Gripper with Embedded Multimodal Sensing for Powerful and Perception Driven Grasping Nikos G. Tsagarakis Team 2504.04970 null
2025-04-06 Public speech recognition transcripts as a configuring parameter Christian Licoppe Team 2504.04488 null
2025-04-02 The Social Life of Industrial Arms: How Arousal and Attention Shape Human-Robot Interaction Matthew K. X. J Pan Team 2504.01260 null
2025-04-01 Extended Hybrid Zero Dynamics for Bipedal Walking of the Knee-less Robot SLIDER Petar Kormushev Team 2504.01165 null
2025-04-11 Learning Bipedal Locomotion on Gear-Driven Humanoid Robot Using Foot-Mounted IMUs Masaya Kinoshita Team 2504.00614 null
2025-03-30 Exploring GPT-4 for Robotic Agent Strategy with Real-Time State Feedback and a Reactive Behaviour Framework Ysobel Sims Team 2503.23601 null
2025-03-28 Control of Humanoid Robots with Parallel Mechanisms using Kinematic Actuation Models Nicolas Mansard Team 2503.22459 null
2025-03-28 FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation Debin Zhao Team 2503.22249 null
2025-03-27 OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation Wanting Li Team 2503.21257 null
2025-03-26 Anti Robot Speciesism Miklos Sarvary Team 2503.20842 null
2025-03-25 Can Vision-Language Models Answer Face to Face Questions in the Real-World? Roland Memisevic Team 2503.19356 null
2025-03-19 StyleLoco: Generative Adversarial Distillation for Natural Humanoid Robot Locomotion Siyuan Huang Team 2503.15082 null
2025-03-27 GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Yuke Zhu Team 2503.14734 null
2025-03-24 Humanoid Policy ~ Human Policy Xiaolong Wang Team 2503.13441 null
2025-03-17 Humanoids in Hospitals: A Technical Study of Humanoid Surrogates for Dexterous Medical Interventions Michael Yip Team 2503.12725 null
2025-03-16 Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Zongqing Lu Team 2503.12533 null
2025-03-14 Fast and Robust Localization for Humanoid Soccer Robot via Iterative Landmark Matching Dennis W. Hong Team 2503.11020 null
2025-03-13 NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models Michael Black Team 2503.10626 null
2025-03-13 NuExo: A Wearable Exoskeleton Covering all Upper Limb ROM for Outdoor Data Collection and Teleoperation of Humanoid Robots Huimin Lu Team 2503.10554 null
2025-03-12 Natural Humanoid Robot Locomotion with Generative Motion Prior Rong Xiong Team 2503.09015 null
2025-03-13 HumanoidPano: Hybrid Spherical Panoramic-LiDAR Cross-Modal Perception for Humanoid Robots Renjing Xu Team 2503.09010 null
2025-03-11 LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures Renjing Xu Team 2503.08349 null
2025-04-29 Learning Getting-Up Policies for Real-World Humanoid Robots Saurabh Gupta Team 2502.12152 link
2024-10-17 Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions Yuke Zhu Team 2410.12773 link
2023-12-29 How to Raise a Robot – A Case for Neuro-Symbolic AI in Constrained Task Planning for Humanoid Assistive Robots Hannes Hartenstein Team 2312.08820 link
2022-11-28 Optimization of Humanoid Robot Designs for Human-Robot Ergonomic Payload Lifting Daniele Pucci Team 2211.13503 null
2022-10-20 Dialogue system with humanoid robot Naoki Igo Team 2210.10151 null
2021-04-20 The MIT Humanoid Robot: Design, Motion Planning, and Control For Acrobatic Behaviors Sangbae Kim Team 2104.09025 null
2019-09-24 Whole-Body Geometric Retargeting for Humanoid Robots Daniele Pucci Team 1909.10080 null
2019-09-06 NimbRo Robots Winning RoboCup 2018 Humanoid AdultSize Soccer Competitions Sven Behnke Team 1909.02385 null
2018-10-22 NimbRo-OP2X: Adult-sized Open-source 3D Printed Humanoid Robot Sven Behnke Team 1810.08395 null
2018-10-22 Online Balanced Motion Generation for Humanoid Robots Sven Behnke Team 1810.08388 null
2018-10-01 NimbRo-OP2: Grown-up 3D Printed Open Humanoid Platform for Research Sven Behnke Team 1809.11144 null
2018-10-01 A ROS-based Software Framework for the NimbRo-OP Humanoid Open Platform Sven Behnke Team 1809.11051 null
2017-01-11 Automatic Gain Tuning of a Momentum Based Balancing Controller for Humanoid Robots Francesco Nori Team 1610.02849 null
2017-07-18 Walking of the iCub humanoid robot in different scenarios: implementation and performance analysis Katja Mombaur Team 1607.08525 null
2017-01-16 Walking on Partial Footholds Including Line Contacts with the Humanoid Robot Atlas Jerry Pratt Team 1607.08089 null
2016-07-19 Design and implementation of computational platform for social-humanoid robot Lumen as an exhibition guide in Electrical Engineering Days 2015 Ary Setijadi Prihatmanto Team 1607.04763 null
2016-11-18 Gaze Stabilization for Humanoid Robots: a Comprehensive Framework Lorenzo Natale Team 1411.3525 null

Dexterous

Publish Date Title Authors PDF Code
2025-11-20 Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations Homanga Bharadhwaj Team 2511.16661 null
2025-11-20 InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy Jiangmiao Pang Team 2511.16651 null
2025-11-19 VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation Yuke Zhu Team 2511.15200 link
2025-11-18 Toward Robust and Harmonious Adaptation for Cross-modal Retrieval Xi Peng Team 2511.14416 null
2025-11-17 From Power to Precision: Learning Fine-grained Dexterity for Multi-fingered Robotic Hands Xiaolong Wang Team 2511.13710 link
2025-11-17 ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning Ruizhen Hu Team 2511.13327 null
2025-11-14 Dexterous Manipulation Transfer via Progressive Kinematic-Dynamic Alignment Yi Sun Team 2511.10987 null
2025-11-13 Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning Xiaocong Li Team 2511.10087 null
2025-11-12 ScaleADFG: Affordance-based Dexterous Functional Grasping via Scalable Dataset Peng Wang Team 2511.09602 null
2025-11-12 IFG: Internet-Scale Guidance for Functional Grasping Generation Deepak Pathak Team 2511.09558 link
2025-11-12 SPIDER: Scalable Physics-Informed Dexterous Retargeting Francois Hogan Team 2511.09484 link
2025-11-12 RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation Miao Li Team 2511.09141 null
2025-11-12 MirrorLimb: Implementing hand pose acquisition and robot teleoperation based on RealMirror Tao Shen Team 2511.08865 null
2025-11-10 Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields Pieter Abbeel Team 2511.07418 link
2025-11-09 Robust Differentiable Collision Detection for General Objects He Wang Team 2511.06267 null
2025-11-08 Adversarial Game-Theoretic Algorithm for Dexterous Grasp Synthesis Jeffrey Ichnowski Team 2511.05809 null
2025-11-06 Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning Gavriel State Team 2511.04831 link
2025-11-05 Dexterous Intramyocardial Needle Ablation (d-INA): Design, Fabrication, and In-Vivo Validation Yue Chen Team 2511.03763 null
2025-11-09 Development of the Bioinspired Tendon-Driven DexHand 021 with Proprioceptive Compliance Control Sheng Yi Team 2511.03481 null
2025-11-10 3D Cal: An Open-Source Software Library for Calibrating Tactile Sensors Gregory Reardon Team 2511.03078 null
2025-11-04 TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System C. Karen Liu Team 2511.02832 link
2025-11-04 Dexterous Robotic Piano Playing at Scale Dieter Büchler Team 2511.02504 null
2025-11-10 Whole-body motion planning and safety-critical control for aerial manipulation Jeonghyun Byun Team 2511.02342 null
2025-11-03 GenDexHand: Generative Simulation for Dexterous Hands Yi Ma Team 2511.01791 null
2025-11-09 Scaling Cross-Embodiment World Models for Dexterous Manipulation Hao Su Team 2511.01177 null
2025-10-31 End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection Zhibin Li Team 2511.00139 null
2025-10-31 Whole-Body Proprioceptive Morphing: A Modular Soft Gripper for Robust Cross-Scale Grasping Xiaonan Huang Team 2510.27666 null
2025-10-30 SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation Matei Ciocarlie Team 2510.27048 null
2025-10-28 A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation Kyung-Joong Kim Team 2510.25725 null
2025-10-27 OmniDexGrasp: Generalizable Dexterous Grasping via Foundation Model and Force Feedback Wei-Shi Zheng Team 2510.23119 link
2025-10-24 Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos Baining Guo Team 2510.21571 link
2025-10-23 SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing Axel Krieger Team 2510.20965 null
2025-10-21 RAPID Hand Prototype: Design of an Affordable, Fully-Actuated Biomimetic Hand for Dexterous Teleoperation Hui Cheng Team 2510.16931 null
2025-10-23 DexCanvas: Bridging Human Demonstrations and Robot Learning for Dexterous Manipulation Yiwen Lu Team 2510.15786 null
2025-10-16 Open TeleDex: A Hardware-Agnostic Teleoperation System for Imitation Learning based Dexterous Manipulation Shan An Team 2510.14771 null
2025-10-16 Leveraging Neural Descriptor Fields for Learning Contact-Aware Dynamic Recovery Dmitry Berenson Team 2510.14768 null
2025-10-16 Spatially anchored Tactile Awareness for Robust Dexterous Manipulation Kaifeng Zhang Team 2510.14647 null
2025-10-16 Restoring Noisy Demonstration for Imitation Learning With Diffusion Models Shao-Hua Sun Team 2510.14467 null
2025-10-14 Learning to Grasp Anything by Playing with Random Toys Roei Herzig Team 2510.12866 null
2025-10-14 T(R,O) Grasp: Efficient Graph Diffusion of Robot-Object Spatial Transformation for Cross-Embodiment Dexterous Grasping Lin Shao Team 2510.12724 null
2025-10-10 Glovity: Learning Dexterous Contact-Rich Manipulation via Spatial Wrench Feedback Teleoperation System Pai Zheng Team 2510.09229 null
2025-10-10 PLEXUS Hand: Lightweight Four-Motor Prosthetic Hand Enabling Precision-Lateral Dexterous Manipulation Kazutoshi Tanaka Team 2510.09209 null
2025-10-09 DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model Li Yi Team 2510.08556 link
2025-10-09 DexMan: Learning Bimanual Dexterous Manipulation from Human and Generated Videos Tsung-Wei Ke Team 2510.08475 link
2025-10-08 AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation Dmitry Berenson Team 2510.07548 null
2025-10-07 Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning Yan Wu Team 2510.06068 null
2025-10-06 A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation Zeynep Temel Team 2510.05382 null
2025-10-01 ISyHand: A Dexterous Multi-finger Robot Hand with an Articulated Palm Katherine J. Kuchenbecker Team 2509.26236 null
2025-09-28 DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation Yuanpei Chen Team 2509.23829 null
2025-09-26 DemoGrasp: Universal Dexterous Grasping from a Single Demonstration Zongqing Lu Team 2509.22149 null
2025-09-25 Residual Off-Policy RL for Finetuning Behavior Cloning Policies Anusha Nagabandi Team 2509.19301 link
2025-09-23 Lang2Morph: Language-Driven Morphological Design of Robotic Hands Josie Hughes Team 2509.18937 null
2025-10-07 Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands Daniel Seita Team 2509.18455 null
2025-09-23 Learning Dexterous Manipulation with Quantized Hand State Cewu Lu Team 2509.17450 null
2025-09-18 A Novel Task-Driven Diffusion-Based Policy with Affordance Learning for Generalizable Manipulation of Articulated Objects Yongduan Song Team 2509.14939 null
2025-09-18 Learning to Pick: A Visuomotor Policy for Clustered Strawberry Picking Chen Peng Team 2509.14530 null
2025-09-17 LeVR: A Modular VR Teleoperation Framework for Imitation Learning in Dexterous Manipulation Han Liu Team 2509.14349 null
2025-09-16 \textsc{Gen2Real}: Towards Demo-Free Dexterous Manipulation by Harnessing Generated Video Rui Huang Team 2509.14178 null
2025-09-17 Whole-body Motion Control of an Omnidirectional Wheel-Legged Mobile Manipulator via Contact-Aware Dynamic Optimization Yiqun Li Team 2509.14010 null
2025-10-03 Beyond Anthropomorphism: Enhancing Grasping and Eliminating a Degree of Freedom by Fusing the Abduction of Digits Four and Five Robert K. Katzschmann Team 2509.13074 null
2025-09-16 MoiréTac: A Dual-Mode Visuotactile Sensor for Multidimensional Perception Using Moiré Pattern Amplification Wenbo Ding Team 2509.12714 null
2025-09-11 Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference-Scoped Exploration Wei Yang Team 2509.09671 null
2025-09-10 Grasp Like Humans: Learning Generalizable Multi-Fingered Grasping from Human Proprioceptive Sensorimotor Integration Huimin Lu Team 2509.08354 null
2025-09-09 Text2Touch: Tactile In-Hand Manipulation with LLM-Designed Reward Functions Nathan F. Lepora Team 2509.07445 null
2025-09-05 OpenEgo: A Large-Scale Multimodal Egocentric Dataset for Dexterous Manipulation Yu Xiang Team 2509.05513 null
2025-09-08 DEXOP: A Device for Robotic Transfer of Dexterous Human Manipulation Pulkit Agrawal Team 2509.04441 link
2025-09-09 EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Dong Wang Team 2508.21112 null
2025-08-31 HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation Huazhe Xu Team 2508.20085 null
2025-08-24 LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations Hao Su Team 2508.17547 null
2025-08-21 Exploiting Policy Idling for Dexterous Manipulation Dushyant Rao Team 2508.15669 null
2025-08-20 GraspQP: Differentiable Optimization of Force Closure for Diverse and Robust Dexterous Grasping Marco Hutter Team 2508.15002 null
2025-08-20 FBI: Learning Dexterous In-hand Manipulation with Dynamic Visuotactile Shortcut Policy Cewu Lu Team 2508.14441 null
2025-08-17 Geodesic Tracing-Based Kinematic Integration of Rolling and Sliding Contact on Manifold Meshes for Dexterous In-Hand Manipulation Nancy S. Pollard Team 2508.12439 null
2025-08-15 Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors Hua Zou Team 2508.08896 null
2025-08-22 OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing Hengdi Zhang Team 2508.08706 link
2025-08-11 PCHands: PCA-based Hand Pose Synergy Representation on Manipulators with N-DoF Lorenzo Natale Team 2508.07945 null
2025-08-29 DexFruit: Dexterous Manipulation and Gaussian Splatting Inspection of Fruit Monroe Kennedy III Team 2508.07118 null
2025-08-05 UniFucGrasp: Human-Hand-Inspired Unified Functional Grasp Annotation Strategy and Dataset for Diverse Dexterous Hands Yaonan Wang Team 2508.03339 link
2025-08-03 DexReMoE:In-hand Reorientation of General Object via Mixtures of Experts Yunlong Dong Team 2508.01695 null
2025-08-01 Video Generators are Robot Policies Carl Vondrick Team 2508.00795 null
2025-07-31 XRoboToolkit: A Cross-Platform Framework for Robot Teleoperation Ning Yang Team 2508.00097 link
2025-09-11 villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Jiang Bian Team 2507.23682 link
2025-07-19 A 21-DOF Humanoid Dexterous Hand with Hybrid SMA-Motor Actuation: CYJ Hand-0 Erbao Dong Team 2507.14538 null
2025-07-18 Improving Low-Cost Teleoperation: Augmenting GELLO with Force Kai Arulkumaran Team 2507.13602 null
2025-07-16 The Developments and Challenges towards Dexterous and Embodied Robotic Manipulation: A Survey Jiming Chen Team 2507.11840 null
2025-07-14 Demonstrating the Octopi-1.5 Visual-Tactile-Language Model Harold Soh Team 2507.09985 null
2025-07-09 Hierarchical Reinforcement Learning for Articulated Tool Manipulation with Multifingered Hand Xinjun Sheng Team 2507.06822 null
2025-07-07 A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation Russ Tedrake Team 2507.05331 null
2025-07-06 SimLauncher: Launching Sample-Efficient Real-world Robotic Reinforcement Learning via Simulation Pre-training Hao Dong Team 2507.04452 null
2025-07-03 DexVLG: Dexterous Vision-Language-Grasp Model at Scale He Wang Team 2507.02747 null
2025-07-03 TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types Wei-Shi Zheng Team 2507.01857 link
2025-07-01 HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning Chenjia Bai Team 2507.00833 link
2025-06-26 Lightweight Fingernail Haptic Device: Unobstructed Fingerpad Force and Vibration Feedback for Enhanced Virtual Dexterous Manipulation Shoichi Hasegawa Team 2506.21417 null
2025-06-24 Scaffolding Dexterous Manipulation with Vision-Language Models Dorsa Sadigh Team 2506.19212 null
2025-06-24 The MOTIF Hand: A Robotic Hand for Multimodal Observations with Thermal, Inertial, and Force Sensors Daniel Seita Team 2506.19201 null
2025-06-21 VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models Lin Shao Team 2506.17561 null
2025-06-20 Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation Xiaolong Wang Team 2506.17198 link
2025-06-19 ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation Jitendra Malik Team 2506.15953 null
2025-06-17 Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation Mustafa Mukadam Team 2506.14754 null
2025-06-16 CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding Haoang Li Team 2506.13725 null
2025-06-13 ViTaSCOPE: Visuo-tactile Implicit Representation for In-hand Pose and Extrinsic Contact Estimation Nima Fazeli Team 2506.12239 null
2025-06-13 ExoStart: Efficient learning for dexterous manipulation with sensorized exoskeleton demonstrations Maria Bauza Villalonga Team 2506.11775 null
2025-06-30 Adaptive event-triggered robust tracking control of soft robots Marios M. Polycarpou Team 2506.09523 null
2025-06-11 Analyzing Key Objectives in Human-to-Robot Retargeting for Dexterous Manipulation Xiang Li Team 2506.09384 null
2025-06-09 TensorTouch: Calibration of Tactile Sensors for High Resolution Stress Tensor and Deformation for Dexterous Manipulation Monroe Kennedy III Team 2506.08291 null
2025-06-09 RAPID Hand: A Robust, Affordable, Perception-Integrated, Dexterous Manipulation Platform for Generalist Robot Autonomy Hui Cheng Team 2506.07490 null
2025-06-05 GEX: Democratizing Dexterity with Fully-Actuated Dexterous Hand and Exoskeleton Glove Zelin Deng Team 2506.04982 link
2025-06-06 ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning Jian Tang Team 2506.04941 null
2025-06-03 Reachability Weighted Offline Goal-conditioned Resampling Joni Pajarinen Team 2506.02577 null
2025-05-30 Interactive Imitation Learning for Dexterous Robotic Manipulation: Challenges and Perspectives – A Survey Rania Rayyes Team 2506.00098 null
2025-05-30 DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation Shuran Song Team 2505.24853 null
2025-05-28 ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Wenqiang Zhang Team 2505.22159 null
2025-10-03 DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation Shuran Song Team 2505.21864 null
2025-05-27 Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt Jianyu Chen Team 2505.20795 null
2025-05-25 MaskedManipulator: Versatile Whole-Body Control for Loco-Manipulation Xue Bin Peng Team 2505.19086 null
2025-05-24 Beyond Domain Randomization: Event-Inspired Perception for Visually Robust Adversarial Imitation from Videos Mario Bijelic Team 2505.18899 link
2025-05-24 DiffusionRL: Efficient Training of Diffusion Policies for Robotic Grasping Using RL-Adapted Large-Scale Datasets Dzmitry Tsetserukou Team 2505.18876 null
2025-05-27 GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning Ye Shi Team 2505.18763 null
2025-05-22 TacCompress: A Benchmark for Multi-Point Tactile Data Compression in Dexterous Manipulation Hengdi Zhang Team 2505.16289 null
2025-05-21 Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation Xiaodong He Team 2505.15098 null
2025-05-20 Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation Hao Dong Team 2505.13982 null
2025-05-19 Approximating Global Contact-Implicit MPC via Sampling and Local Complementarity Michael Posa Team 2505.13350 null
2025-05-19 TeleOpBench: A Simulator-Centric Benchmark for Dual-Arm Dexterous Teleoperation Jiangmiao Pang Team 2505.12748 null
2025-05-18 PartDexTOG: Generating Dexterous Task-Oriented Grasping via Language-driven Part Analysis Zhipong Cai Team 2505.12294 null
2025-05-17 OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning Yang Gao Team 2505.11917 null
2025-05-16 EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video Jian Zhang Team 2505.11709 null
2025-05-16 Self-supervised perception for tactile skin covered dexterous hands Mustafa Mukadam Team 2505.11420 null
2025-05-16 Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space Reza Abiri Team 2505.11366 link
2025-05-16 Estimating Deformable-Rigid Contact Interactions for a Deformable Tool via Learning and Model-Based Optimization Nima Fazeli Team 2505.10884 null
2025-05-15 SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning Axel Krieger Team 2505.10251 null
2025-05-13 HandCept: A Visual-Inertial Fusion Framework for Accurate Proprioception in Dexterous Hands Yunhui Liu Team 2505.08213 null
2025-05-12 DexWild: Dexterous Human Interactions for In-the-Wild Robot Policies Deepak Pathak Team 2505.07813 null
2025-05-08 Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation Georgia Chalvatzaki Team 2505.05287 null
2025-05-04 Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning Sven Behnke Team 2505.02232 null
2025-05-04 KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation Yang Gao Team 2505.01974 null
2025-05-02 DexFlow: A Unified Approach for Dexterous Hand Pose Retargeting and Interaction Miao Li Team 2505.01083 null
2025-05-02 DexCtrl: Towards Sim-to-Real Dexterity with Adaptive Controller Learning Masayoshi Tomizuka Team 2505.00991 null
2025-05-01 Multi-Goal Dexterous Hand Manipulation using Probabilistic Model-based Reinforcement Learning Yunduan Cui Team 2504.21585 null
2025-04-27 PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies Edward Adelson Team 2504.19341 null
2025-04-23 PP-Tac: Paper Picking Using Tactile Feedback in Dexterous Robotic Hands Ziyuan Jiao Team 2504.16649 null
2025-04-22 $π_{0.5}$ : a Vision-Language-Action Model with Open-World Generalization Ury Zhilinsky Team 2504.16054 null
2025-04-21 LAPP: Large Language Model Feedback for Preference-Driven Reinforcement Learning Boyuan Chen Team 2504.15472 null
2025-04-21 SuFIA-BC: Generating High Quality Demonstration Data for Visuomotor Policy Learning in Surgical Subtasks Animesh Garg Team 2504.14857 null
2025-04-20 BiDexHand: Design and Evaluation of an Open-Source 16-DoF Biomimetic Dexterous Hand Zhengyang Kris Weng Team 2504.14712 null
2025-04-18 On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting Jan Peters Team 2504.13618 null
2025-04-17 RUKA: Rethinking the Design of Humanoid Hands with Learning Lerrel Pinto Team 2504.13165 null
2025-04-17 Adaptive Task Space Non-Singular Terminal Super-Twisting Sliding Mode Control of a 7-DOF Robotic Manipulator E. Witrant Team 2504.13056 null
2025-04-17 Krysalis Hand: A Lightweight, High-Payload, 18-DoF Anthropomorphic End-Effector for Robotic Learning and Dexterous Manipulation Iman Soltani Team 2504.12967 null
2025-04-22 Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration Jeannette Bohg Team 2504.12609 null
2025-04-14 Look-to-Touch: A Vision-Enhanced Proximity and Tactile Sensor for Distance and Geometry Perception in Robotic Manipulation Guoying Gu Team 2504.10280 null
2025-04-08 Functionally graded keratin facilitates tactile sensing in elephant whiskers Katherine J. Kuchenbecker Team 2504.07143 null
2025-04-08 ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface Rui Chen Team 2504.06156 null
2025-04-06 DexTOG: Learning Task-Oriented Dexterous Grasp with Language Cewu Lu Team 2504.04573 null
2025-04-06 DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Cluttered Environments Lin Shao Team 2504.04516 null
2025-04-05 ORCA: An Open-Source, Reliable, Cost-Effective, Anthropomorphic Robotic Hand for Uninterrupted Dexterous Task Learning Robert K. Katzschmann Team 2504.04259 null
2025-09-11 Dexterous Manipulation through Imitation Learning: A Survey Hong Zhang Team 2504.03515 null
2025-03-29 Dexterous Non-Prehensile Manipulation for Ungraspable Object via Extrinsic Dexterity Yuanpei Chen Team 2503.23120 null
2025-03-27 ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning Siyuan Huang Team 2503.21860 null
2025-03-25 G-DexGrasp: Generalizable Dexterous Grasping Synthesis Via Part-Aware Prior Retrieval and Prior-Assisted Generation Ruizhen Hu Team 2503.19457 null
2025-03-16 Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Zongqing Lu Team 2503.12533 null
2025-03-14 Is Your Imitation Learning Policy Better than Mine? Policy Comparison with Near-Optimal Stopping Haruki Nishimura Team 2503.10966 null
2025-03-12 Sequential Multi-Object Grasping with One Dexterous Hand Daniel Seita Team 2503.09078 null
2025-03-16 DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness Yuexin Ma Team 2503.08257 link
2025-03-13 AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems Jianchao Zhu Team 2503.06669 link
2025-03-08 ReJSHand: Efficient Real-Time Hand Pose Estimation and Mesh Reconstruction Using Refined Joint and Skeleton Features Hong Zhang Team 2503.05995 link
2025-03-07 Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction Bin He Team 2503.05231 null
2025-03-06 Dexterous Hand Manipulation via Efficient Imitation-Bootstrapped Online Reinforcement Learning Xiaodong He Team 2503.04014 null
2025-03-05 LensDFF: Language-enhanced Sparse Feature Distillation for Efficient Few-Shot Dexterous Manipulation Alois Knoll Team 2503.03890 null
2025-03-05 Selective Tweezing and Immobilization of Colloids for Dexterous Manipulation of Biological Materials Kimani C. Toussaint Jr Team 2503.03102 null
2025-03-03 TacCap: A Wearable FBG-Based Tactile Sensor for Seamless Human-to-Robot Skill Transfer Mark R. Cutkosky Team 2503.01789 null
2025-03-03 RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation Jun Ma Team 2503.01616 null
2025-03-03 Exo-ViHa: A Cross-Platform Exoskeleton System with Visual and Haptic Feedback for Efficient Dexterous Skill Learning Wenbo Ding Team 2503.01543 null
2025-03-03 KineSoft: Learning Proprioceptive Manipulation Policies with Soft Robot Hands Jeffrey Ichnowski Team 2503.01078 null
2025-02-27 Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids Yuke Zhu Team 2502.20396 null
2025-02-28 ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration Feifei Feng Team 2502.19250 null
2025-02-26 Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand Yuanpei Chen Team 2502.18423 null
2025-02-07 Dexterous Cable Manipulation: Taxonomy, Multi-Fingered Hand Design, and Long-Horizon Manipulation Robert B. Fisher Team 2502.00396 link
2024-12-23 Dexterous Manipulation Based on Prior Dexterous Grasp Pose Knowledge Cewu Lu Team 2412.15587 null
2024-08-22 Tilde: Teleoperation for Dexterous In-Hand Manipulation Learning with a DeltaHand Oliver Kroemer Team 2405.18804 null
2023-12-13 DEFT: Dexterous Fine-Tuning for Real-World Hand Policies Deepak Pathak Team 2310.19797 link
2023-12-27 DELTAHANDS: A Synergistic Dexterous Hand Framework Based on Delta Robots F. Zeynep Temel Team 2310.05266 null
2023-10-17 Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation C. Karen Liu Team 2309.00987 null
2023-08-23 Dexterous Soft Hands Linearize Feedback-Control for In-Hand Manipulation Oliver Brock Team 2308.10691 null
2023-04-20 Progressive Transfer Learning for Dexterous In-Hand Manipulation with Multi-Fingered Anthropomorphic Hand Jia Sun Team 2304.09526 null
2022-03-25 Dexterous Imitation Made Easy: A Learning-Based Framework for Efficient Dexterous Manipulation Lerrel Pinto Team 2203.13251 null
2022-05-11 RBO Hand 3 – A Platform for Soft Dexterous Manipulation Oliver Brock Team 2201.10883 null
2019-01-23 Learning Dexterous In-Hand Manipulation Wojciech Zaremba Team 1808.00177 null
2018-06-27 Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations Sergey Levine Team 1709.10087 link
2017-03-21 Learning Dexterous Manipulation for a Soft Robotic Hand from Human Demonstration Pieter Abbeel Team 1603.06348 null