# Qwen 3.6 Chat Template A universally fixed Jinja chat template for Qwen 3.6 that serves as a drop-in upgrade for **all inference engines** (vLLM, llama.cpp, text-generation-webui, LM Studio, oMLX, etc). The official template continues to crash on C++ tool calls, struggles with the new `preserve_thinking` feature by spamming empty tags, is vulnerable to model hallucinations, and lacks a way to cleanly toggle thinking inline. This universal template handles all of that. ## What's broken in the official template 1. **Tool calls crash on C++ engines.** The official template uses Python's `|items` dictionary filter and `|safe`, neither of which exist in C++ Jinja runtimes (like those used by LM Studio or MLX). Any tool call triggers an out-of-bounds error. It also crashes if the arguments payload is returned as a raw string instead of an object. 2. **No `"developer"` role.** Modern APIs sometimes send `message.role == "developer"`. The official template raises an exception and dies. 3. **Empty `preserve_thinking` block spam.** Qwen 3.6 introduces a `preserve_thinking` kwarg. If toggled on, the official template wraps *every past turn* in a `` block, which means a non-reasoning turn wastes context tokens with `\n\n`. 4. **The `` hallucination.** The Qwen 3.6 LLM sometimes mistakenly generates `` at the end of its reasoning block. The official parser expects strictly ``, resulting in parsing failure and leaking `` tokens into the chat. 5. **No-user-query exception breaks tool calling.** `raise_exception` crashes agentic loops and resets in OpenClaw and similar runtimes. 6. **Unclosed thinking before tool call.** Model starts reasoning then calls a tool without closing the thinking block, producing malformed output. ## What this template does ### Universal tool arguments compatibility Replaced `|items` iteration with direct dictionary key lookups. Swapped `is sequence` for `is iterable` (which strict C++ runtimes require). Removed `|safe` wrappers and safely map raw JSON fallback schemas so that primitive parameters (like booleans) serialize precisely to JSON standard `true` instead of crashing environments by generating Python-flavored titlecase `"True"`. ### `"developer"` role support Intercepts `"developer"` messages and implicitly maps them to `"system"`. No crash, no data loss. ### Smarter `preserve_thinking` historical context **Now ON by default without any required kwargs!** Instead of mindlessly generating empty XML tags for past turns, this template checks if the historical context actually contains reasoning `(reasoning_content|trim|length > 0)`. Only then does it emit an active block into the chat cache, keeping context windows hyper-efficient. Furthermore, history is tied to the `<|think_off|>` override: disabling thinking in the prompt automatically sweeps older thinking blocks from the cache to drastically accelerate processing. ### `` Hallucination handling During the assistant phase, the logic actively looks for boundary hallucinations. If Qwen generates ``, this template dynamically splits on that literal instead of ``, cleanly isolating tags seamlessly. If generation is interrupted mid-thought (max tokens/aborts) preventing a closing `` tag from surfacing, the parser actively rescues the incomplete thought-stream instead of injecting invalid raw `` pairs into the timeline. ### Auto-close unclosed thinking before tool calls The model sometimes starts a thinking block and then immediately calls a tool without emitting the closing tag. The unclosed thinking tag bleeds into the tool call, producing malformed output. This template detects the pattern and auto-injects the closing tag before the tool call boundary. ### No-user-query crash fix The official template scans messages in reverse to find the last real user query. If all user messages are tool results or there are none, it fires `raise_exception` and hard-crashes. This breaks agentic tool-calling chains and session resets. The fix replaces the exception with a graceful fallback. ### Thinking toggle from any message Drop `<|think_on|>` or `<|think_off|>` anywhere in a prompt. The template detects the tag, strips it iteratively without sequential state-bleeding so the model never sees it, and cascades the thinking state down to the generator prompt dynamically. ```text System: You are a coding assistant. <|think_off|> User: Check the weather in Paris. ``` The tag disappears. The model answers fast, generating `\n\n\n\n` natively. ```text System: You are a coding assistant. <|think_on|> User: Implement a red-black tree in Rust. ``` The model gets its `\n` prompt and reasons deeply before answering. ## Comparison | Feature | Official | **This Fixed Template** | |---|---|---| | Tool arguments work | Crashes | **Fixed** | | `\|safe` removed | Crashes | **Fixed** | | `"developer"` role | Missing | **Added** | | Thinking toggle | None | **`<\|think_off\|>` anywhere** | | `preserve_thinking` | Spams empty blocks | **Dynamic length checks** | | Tag extraction | Fails on `` | **Supports ``** | | No-user-query crash | Crashes | **Graceful fallback** | | Auto-close thinking before tool | Not handled | **Auto-injects close tag** | ## Installation This template can be used anywhere standard HuggingFace Jinja templates are supported. ### General (vLLM, llama.cpp, TextGen) Simply replace your model's existing `chat_template` string in your `tokenizer_config.json` with the minified contents of this file, or load it as a custom template in your UI. ### LM Studio 1. Open LM Studio 2. Go to the **My Models** tab (or the right-side panel in Chat) 3. Select your Qwen 3.6 model 4. Scroll to **Prompt Template** 5. Delete the default template, paste this one in 6. Save ### oMLX 1. Unload any `chat_template_kwargs` arguments you may have forced. It is handled by the template actively. 2. Make sure you load the `--jinja` flag so the engine utilizes the custom parsing rules. 3. Overwrite the `chat_template.jinja` source file locally.