Qwen3-4B CoT Compression — Level 4 (Ultra-compact)

LoRA adapter for Qwen/Qwen3-4B-Instruct-2507 that produces reasoning in Level-4 style: ~45 chars; short variable chain with arrow.

Part of a 5-level Pareto study on chain-of-thought compression for math reasoning. See the collection and the eval artifacts at ssurface/qwen3-4b-cot-compress-eval.

GSM8K-test results (single seed = 42)

metric	value
accuracy	0.579985
n_correct / n_total	765 / 1319
mean think tokens	39.05
median think tokens	35.00

Baseline (stock Qwen/Qwen3-4B-Instruct-2507) is ~0.896 accuracy at ~294 completion tokens.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Instruct-2507", torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "ssurface/qwen3-4b-cot-compress-l4")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

prompt = tok.apply_chat_template(
    [{"role": "user",
       "content": "Solve this using Level 4 (Ultra-compact).\n"
                  "Problem: Natalia sold clips to 48 friends in April, "
                  "then half as many in May. How many in total?"}],
    tokenize=False, add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs, max_new_tokens=256,
    eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"),
)
print(tok.decode(out[0], skip_special_tokens=False))

Training

Base: Qwen/Qwen3-4B-Instruct-2507, 4-bit NF4 via bitsandbytes
Adapter: LoRA (PEFT) via TRL SFTTrainer
Hardware: 4× RTX 2080 Ti, fp16 (Turing — no bf16, no FlashAttention 2)
Data: GSM8K train, filtered to rows whose teacher-generated Level-4 reasoning matches the GSM8K ground-truth answer.
Loss: full-sequence (no completion-only masking in this TRL version).

Caveats

Single seed (42). Error bars not yet computed.
Levels 4 and 5 trade accuracy for token count; Level 2 is the accuracy-preserving sweet spot. If you don't need extreme compression, prefer L1 or L2.

Citation

Paper draft in progress.

Downloads last month: 25

Model tree for ssurface/qwen3-4b-cot-compress-l4

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5499)

this model

Dataset used to train ssurface/qwen3-4b-cot-compress-l4

Collection including ssurface/qwen3-4b-cot-compress-l4

Qwen3-4B CoT Compression Study

Collection

LoRA adapters trained for 5 progressively shorter chain-of-thought styles on GSM8K, plus the eval artifacts behind the Pareto curve. • 6 items • Updated 5 days ago • 1