-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 39 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 35 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 45 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 32
Collections
Discover the best community collections!
Collections including paper arxiv:2603.16932
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper • 2603.03276 • Published • 105 -
Qwen3-Coder-Next Technical Report
Paper • 2603.00729 • Published • 65 -
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper • 2603.03205 • Published • 13 -
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
Paper • 2602.23166 • Published • 45
-
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 47 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 92 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper • 2409.10173 • Published • 36
-
dLLM: Simple Diffusion Language Modeling
Paper • 2602.22661 • Published • 153 -
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Paper • 2603.15594 • Published • 149 -
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
Paper • 2603.13398 • Published • 155 -
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Paper • 2603.06569 • Published • 120
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 14.9k • 1.45k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
8B • Updated • 109 • 18 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 39 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 35 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 45 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 32
-
dLLM: Simple Diffusion Language Modeling
Paper • 2602.22661 • Published • 153 -
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
Paper • 2603.15594 • Published • 149 -
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence
Paper • 2603.13398 • Published • 155 -
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders
Paper • 2603.06569 • Published • 120
-
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Paper • 2603.03276 • Published • 105 -
Qwen3-Coder-Next Technical Report
Paper • 2603.00729 • Published • 65 -
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Paper • 2603.03205 • Published • 13 -
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
Paper • 2602.23166 • Published • 45
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 14.9k • 1.45k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
8B • Updated • 109 • 18 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 47 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 92 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
Paper • 2409.10173 • Published • 36