Dataset Viewer
Duplicate
The dataset viewer is not available for this split.
Cannot load the dataset split (in streaming mode) to extract the first rows.
Error code:   StreamingRowsError
Exception:    ValueError
Message:      Invalid string class label OddGridBench@9387cde0416686d6e9ed91633844f948846d07fa
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 99, in get_rows_or_raise
                  return get_rows(
                         ^^^^^^^^^
                File "/src/libs/libcommon/src/libcommon/utils.py", line 272, in decorator
                  return func(*args, **kwargs)
                         ^^^^^^^^^^^^^^^^^^^^^
                File "/src/services/worker/src/worker/utils.py", line 77, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 2690, in __iter__
                  for key, example in ex_iterable:
                                      ^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 2240, in __iter__
                  example = _apply_feature_types_on_example(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/iterable_dataset.py", line 2157, in _apply_feature_types_on_example
                  encoded_example = features.encode_example(example)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 2152, in encode_example
                  return encode_nested_example(self, example)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 1437, in encode_nested_example
                  {k: encode_nested_example(schema[k], obj.get(k), level=level + 1) for k in schema}
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 1460, in encode_nested_example
                  return schema.encode_example(obj) if obj is not None else None
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 1143, in encode_example
                  example_data = self.str2int(example_data)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 1080, in str2int
                  output = [self._strval2int(value) for value in values]
                            ^^^^^^^^^^^^^^^^^^^^^^^
                File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 1101, in _strval2int
                  raise ValueError(f"Invalid string class label {value}")
              ValueError: Invalid string class label OddGridBench@9387cde0416686d6e9ed91633844f948846d07fa

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models

🚀 New! Our paper "OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models" has been accepted to CVPR 2026! 🎉
This repository contains the official evaluation code and data for our work.

📚 Read the paper: arXiv PDF | arXiv Page

🌐 Project Homepage

📊 HuggingFace Dataset

🤖 HuggingFace Model

Introduction

MLLMs have achieved remarkable performance across a wide range of vision-language tasks. However, their ability in low-level visual perception, particularly in detecting fine-grained visual discrepancies, remains underexplored and lacks systematic analysis.In this work, we introduce OddGridBench, a controllable benchmark for evaluating the visual discrepancy sensitivity of MLLMs. OddGridBench comprises over 1,400 grid-based images, where a single element differs from all others by one or multiple visual attributes such as color, size, rotation, or position. Experiments reveal that all evaluated MLLMs—including open-source families such as Qwen3-VL and InternVL3.5, and proprietary systems like Gemini-2.5-Pro and GPT-5—perform far below human levels in visual discrepancy detection.We further propose OddGrid-GRPO, a reinforcement learning framework that integrates curriculum learning and distance-aware reward. By progressively controlling the difficulty of training samples and incorporating spatial proximity constraints into the reward design, OddGrid-GRPO significantly enhances the model’s fine-grained visual discrimination ability. We hope OddGridBench and OddGrid-GRPO will lay the groundwork for advancing perceptual grounding and visual discrepancy sensitivity in multimodal intelligence.

Dataset Creation

OddGridBench is designed to systematically evaluate the visual discrepancy sensitivity of multimodal large language models in a controlled and interpretable setting. The benchmark consists of over 1,400 grid-based images, where most elements follow a shared visual pattern and only one element deviates from the others. The discrepancy can arise from one or multiple visual attributes, including color, size, rotation, and position, enabling fine-grained assessment of perceptual sensitivity under varying difficulty levels. Please refer to our huggingface 🤗 Dataset for more details.

Load Dataset

from datasets import load_dataset

# Login using e.g. `huggingface-cli login` to access this dataset
ds = load_dataset("wwwtttjjj/OddGridBench")

Load Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "wwwtttjjj/OddGrid-GRPO"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

Citation

BibTeX:

      @inproceedings{weng2026oddgridbench,
      title={OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models},
      author={Weng, Tengjin and Jiang, Wenhao and Wang, Jingyi and Li, Ming and Ma, Lin and Ming, Zhong},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2026}
      }
Downloads last month
32

Paper for wwwtttjjj/OddGridBench