# Qwen 3.6 Chat Template
A universally fixed Jinja chat template for Qwen 3.6 that serves as a drop-in upgrade for **all inference engines** (vLLM, llama.cpp, text-generation-webui, LM Studio, oMLX, etc). The official template continues to crash on C++ tool calls, struggles with the new `preserve_thinking` feature by spamming empty tags, is vulnerable to model hallucinations, and lacks a way to cleanly toggle thinking inline. This universal template handles all of that.
## What's broken in the official template
1. **Tool calls crash on C++ engines.** The official template uses Python's `|items` dictionary filter and `|safe`, neither of which exist in C++ Jinja runtimes (like those used by LM Studio or MLX). Any tool call triggers an out-of-bounds error. It also crashes if the arguments payload is returned as a raw string instead of an object.
2. **No `"developer"` role.** Modern APIs sometimes send `message.role == "developer"`. The official template raises an exception and dies.
3. **Empty `preserve_thinking` block spam.** Qwen 3.6 introduces a `preserve_thinking` kwarg. If toggled on, the official template wraps *every past turn* in a `` block, which means a non-reasoning turn wastes context tokens with `\n\n`.
4. **The `` hallucination.** The Qwen 3.6 LLM sometimes mistakenly generates `` at the end of its reasoning block. The official parser expects strictly ``, resulting in parsing failure and leaking `` tokens into the chat.
5. **No-user-query exception breaks tool calling.** `raise_exception` crashes agentic loops and resets in OpenClaw and similar runtimes.
6. **Unclosed thinking before tool call.** Model starts reasoning then calls a tool without closing the thinking block, producing malformed output.
## What this template does
### Universal tool arguments compatibility
Replaced `|items` iteration with direct dictionary key lookups. Swapped `is sequence` for `is iterable` (which strict C++ runtimes require). Removed `|safe` wrappers and safely map raw JSON fallback schemas so that primitive parameters (like booleans) serialize precisely to JSON standard `true` instead of crashing environments by generating Python-flavored titlecase `"True"`.
### `"developer"` role support
Intercepts `"developer"` messages and implicitly maps them to `"system"`. No crash, no data loss.
### Smarter `preserve_thinking` historical context
**Now ON by default without any required kwargs!** Instead of mindlessly generating empty XML tags for past turns, this template checks if the historical context actually contains reasoning `(reasoning_content|trim|length > 0)`. Only then does it emit an active block into the chat cache, keeping context windows hyper-efficient. Furthermore, history is tied to the `<|think_off|>` override: disabling thinking in the prompt automatically sweeps older thinking blocks from the cache to drastically accelerate processing.
### `` Hallucination handling
During the assistant phase, the logic actively looks for boundary hallucinations. If Qwen generates ``, this template dynamically splits on that literal instead of ``, cleanly isolating tags seamlessly. If generation is interrupted mid-thought (max tokens/aborts) preventing a closing `` tag from surfacing, the parser actively rescues the incomplete thought-stream instead of injecting invalid raw `` pairs into the timeline.
### Auto-close unclosed thinking before tool calls
The model sometimes starts a thinking block and then immediately calls a tool without emitting the closing tag. The unclosed thinking tag bleeds into the tool call, producing malformed output. This template detects the pattern and auto-injects the closing tag before the tool call boundary.
### No-user-query crash fix
The official template scans messages in reverse to find the last real user query. If all user messages are tool results or there are none, it fires `raise_exception` and hard-crashes. This breaks agentic tool-calling chains and session resets. The fix replaces the exception with a graceful fallback.
### Thinking toggle from any message
Drop `<|think_on|>` or `<|think_off|>` anywhere in a prompt. The template detects the tag, strips it iteratively without sequential state-bleeding so the model never sees it, and cascades the thinking state down to the generator prompt dynamically.
```text
System: You are a coding assistant. <|think_off|>
User: Check the weather in Paris.
```
The tag disappears. The model answers fast, generating `\n\n\n\n` natively.
```text
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.
```
The model gets its `\n` prompt and reasons deeply before answering.
## Comparison
| Feature | Official | **This Fixed Template** |
|---|---|---|
| Tool arguments work | Crashes | **Fixed** |
| `\|safe` removed | Crashes | **Fixed** |
| `"developer"` role | Missing | **Added** |
| Thinking toggle | None | **`<\|think_off\|>` anywhere** |
| `preserve_thinking` | Spams empty blocks | **Dynamic length checks** |
| Tag extraction | Fails on `` | **Supports ``** |
| No-user-query crash | Crashes | **Graceful fallback** |
| Auto-close thinking before tool | Not handled | **Auto-injects close tag** |
## Installation
This template can be used anywhere standard HuggingFace Jinja templates are supported.
### General (vLLM, llama.cpp, TextGen)
Simply replace your model's existing `chat_template` string in your `tokenizer_config.json` with the minified contents of this file, or load it as a custom template in your UI.
### LM Studio
1. Open LM Studio
2. Go to the **My Models** tab (or the right-side panel in Chat)
3. Select your Qwen 3.6 model
4. Scroll to **Prompt Template**
5. Delete the default template, paste this one in
6. Save
### oMLX
1. Unload any `chat_template_kwargs` arguments you may have forced. It is handled by the template actively.
2. Make sure you load the `--jinja` flag so the engine utilizes the custom parsing rules.
3. Overwrite the `chat_template.jinja` source file locally.