Qwen3-4B CoT Compression — Level 4 (Ultra-compact)

LoRA adapter for Qwen/Qwen3-4B-Instruct-2507 that produces reasoning in Level-4 style: ~45 chars; short variable chain with arrow.

Part of a 5-level Pareto study on chain-of-thought compression for math reasoning. See the collection and the eval artifacts at ssurface/qwen3-4b-cot-compress-eval.

GSM8K-test results (single seed = 42)

metric value
accuracy 0.579985
n_correct / n_total 765 / 1319
mean think tokens 39.05
median think tokens 35.00

Baseline (stock Qwen/Qwen3-4B-Instruct-2507) is ~0.896 accuracy at ~294 completion tokens.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-4B-Instruct-2507", torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "ssurface/qwen3-4b-cot-compress-l4")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

prompt = tok.apply_chat_template(
    [{"role": "user",
       "content": "Solve this using Level 4 (Ultra-compact).\n"
                  "Problem: Natalia sold clips to 48 friends in April, "
                  "then half as many in May. How many in total?"}],
    tokenize=False, add_generation_prompt=True,
)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs, max_new_tokens=256,
    eos_token_id=tok.convert_tokens_to_ids("<|im_end|>"),
)
print(tok.decode(out[0], skip_special_tokens=False))

Training

  • Base: Qwen/Qwen3-4B-Instruct-2507, 4-bit NF4 via bitsandbytes
  • Adapter: LoRA (PEFT) via TRL SFTTrainer
  • Hardware: 4× RTX 2080 Ti, fp16 (Turing — no bf16, no FlashAttention 2)
  • Data: GSM8K train, filtered to rows whose teacher-generated Level-4 reasoning matches the GSM8K ground-truth answer.
  • Loss: full-sequence (no completion-only masking in this TRL version).

Caveats

  • Single seed (42). Error bars not yet computed.
  • Levels 4 and 5 trade accuracy for token count; Level 2 is the accuracy-preserving sweet spot. If you don't need extreme compression, prefer L1 or L2.

Citation

Paper draft in progress.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ssurface/qwen3-4b-cot-compress-l4

Adapter
(5499)
this model

Dataset used to train ssurface/qwen3-4b-cot-compress-l4

Collection including ssurface/qwen3-4b-cot-compress-l4