xiangan's picture

xiangan

xiangan

·

https://anxiangsir.github.io/

anxiangsir

AI & ML interests

None yet

Recent Activity

new activity 2 days ago

lmms-lab-encoder/onevision-encoder-large-lang:Add metadata and link to paper/code

authored a paper 3 days ago

4DThinker: Thinking with 4D Imagery for Dynamic Spatial Understanding

authored a paper 3 days ago

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

View all activity

Organizations

upvoted a paper 4 days ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

Paper • 2605.28820 • Published 5 days ago • 68

upvoted a collection 5 days ago

LLaVA-OneVision-2

2 items • Updated 5 days ago • 2

upvoted a paper 5 days ago

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

Paper • 2605.25979 • Published 7 days ago • 25

upvoted a paper 6 days ago

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning

Paper • 2605.20342 • Published 13 days ago • 34

upvoted a collection 25 days ago

LLaVA-OneVision-2

2 items • Updated 12 days ago • 6

upvoted a paper about 2 months ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

upvoted 2 papers 3 months ago

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

Paper • 2603.01068 • Published Mar 1 • 22

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16, 2025 • 70

upvoted an article 3 months ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

sensenova

•

Mar 5

• 163

upvoted a paper 3 months ago

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published Mar 3 • 87

upvoted a changelog 3 months ago

Hugging Face Changelog

Public Storage Add-ons

Feb 26

• 168

upvoted a collection 3 months ago

onevision-encoder

4 items • Updated 25 days ago • 6

upvoted 3 papers 3 months ago

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Paper • 2602.12279 • Published Feb 12 • 20

CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Paper • 2602.13191 • Published Feb 13 • 32

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

upvoted 2 papers 4 months ago

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published Feb 12 • 62

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Paper • 2601.19325 • Published Jan 27 • 82

upvoted 2 papers 5 months ago

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published Jan 15 • 35

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Paper • 2601.10305 • Published Jan 15 • 37

upvoted a collection 5 months ago

OneVision-Encoder

HEVC-Style Vision Transformer • 2 items • Updated Feb 10 • 3