Instructions to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit") config = load_config("froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit
Run Hermes
hermes
Qwen 3.6 Chat Template
A universally fixed Jinja chat template for Qwen 3.6 that serves as a drop-in upgrade for all inference engines (vLLM, llama.cpp, text-generation-webui, LM Studio, oMLX, etc). The official template continues to crash on C++ tool calls, struggles with the new preserve_thinking feature by spamming empty tags, is vulnerable to model hallucinations, and lacks a way to cleanly toggle thinking inline. This universal template handles all of that.
What's broken in the official template
- Tool calls crash on C++ engines. The official template uses Python's
|itemsdictionary filter and|safe, neither of which exist in C++ Jinja runtimes (like those used by LM Studio or MLX). Any tool call triggers an out-of-bounds error. It also crashes if the arguments payload is returned as a raw string instead of an object. - No
"developer"role. Modern APIs sometimes sendmessage.role == "developer". The official template raises an exception and dies. - Empty
preserve_thinkingblock spam. Qwen 3.6 introduces apreserve_thinkingkwarg. If toggled on, the official template wraps every past turn in a<think></think>block, which means a non-reasoning turn wastes context tokens with<think>\n\n</think>. - The
</thinking>hallucination. The Qwen 3.6 LLM sometimes mistakenly generates</thinking>at the end of its reasoning block. The official parser expects strictly</think>, resulting in parsing failure and leaking<thinking>tokens into the chat. - No-user-query exception breaks tool calling.
raise_exceptioncrashes agentic loops and resets in OpenClaw and similar runtimes. - Unclosed thinking before tool call. Model starts reasoning then calls a tool without closing the thinking block, producing malformed output.
What this template does
Universal tool arguments compatibility
Replaced |items iteration with direct dictionary key lookups. Swapped is sequence for is iterable (which strict C++ runtimes require). Removed |safe wrappers and safely map raw JSON fallback schemas so that primitive parameters (like booleans) serialize precisely to JSON standard true instead of crashing environments by generating Python-flavored titlecase "True".
"developer" role support
Intercepts "developer" messages and implicitly maps them to "system". No crash, no data loss.
Smarter preserve_thinking historical context
Now ON by default without any required kwargs! Instead of mindlessly generating empty XML tags for past turns, this template checks if the historical context actually contains reasoning (reasoning_content|trim|length > 0). Only then does it emit an active block into the chat cache, keeping context windows hyper-efficient. Furthermore, history is tied to the <|think_off|> override: disabling thinking in the prompt automatically sweeps older thinking blocks from the cache to drastically accelerate processing.
</thinking> Hallucination handling
During the assistant phase, the logic actively looks for boundary hallucinations. If Qwen generates </thinking>, this template dynamically splits on that literal instead of </think>, cleanly isolating tags seamlessly. If generation is interrupted mid-thought (max tokens/aborts) preventing a closing </think> tag from surfacing, the parser actively rescues the incomplete thought-stream instead of injecting invalid raw <think> pairs into the timeline.
Auto-close unclosed thinking before tool calls
The model sometimes starts a thinking block and then immediately calls a tool without emitting the closing tag. The unclosed thinking tag bleeds into the tool call, producing malformed output. This template detects the pattern and auto-injects the closing tag before the tool call boundary.
No-user-query crash fix
The official template scans messages in reverse to find the last real user query. If all user messages are tool results or there are none, it fires raise_exception and hard-crashes. This breaks agentic tool-calling chains and session resets. The fix replaces the exception with a graceful fallback.
Thinking toggle from any message
Drop <|think_on|> or <|think_off|> anywhere in a prompt. The template detects the tag, strips it iteratively without sequential state-bleeding so the model never sees it, and cascades the thinking state down to the generator prompt dynamically.
System: You are a coding assistant. <|think_off|>
User: Check the weather in Paris.
The tag disappears. The model answers fast, generating <think>\n\n</think>\n\n natively.
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.
The model gets its <think>\n prompt and reasons deeply before answering.
Comparison
| Feature | Official | This Fixed Template |
|---|---|---|
| Tool arguments work | Crashes | Fixed |
|safe removed |
Crashes | Fixed |
"developer" role |
Missing | Added |
| Thinking toggle | None | <|think_off|> anywhere |
preserve_thinking |
Spams empty blocks | Dynamic length checks |
| Tag extraction | Fails on </thinking> |
Supports </thinking> |
| No-user-query crash | Crashes | Graceful fallback |
| Auto-close thinking before tool | Not handled | Auto-injects close tag |
Installation
This template can be used anywhere standard HuggingFace Jinja templates are supported.
General (vLLM, llama.cpp, TextGen)
Simply replace your model's existing chat_template string in your tokenizer_config.json with the minified contents of this file, or load it as a custom template in your UI.
LM Studio
- Open LM Studio
- Go to the My Models tab (or the right-side panel in Chat)
- Select your Qwen 3.6 model
- Scroll to Prompt Template
- Delete the default template, paste this one in
- Save
oMLX
- Unload any
chat_template_kwargsarguments you may have forced. It is handled by the template actively. - Make sure you load the
--jinjaflag so the engine utilizes the custom parsing rules. - Overwrite the
chat_template.jinjasource file locally.