Instructions to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx
Run Hermes
hermes
- MLX LM
How to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx", "messages": [ {"role": "user", "content": "Hello"} ] }'
gpt-oss-20b · Hermes-Agent tool finetune · MLX
Apple Silicon native. Runs on M-series Macs through MLX with no PyTorch detour. Tested on M2 Max and M3 Pro.
- Format — MLX (safetensors + index)
- Size on disk — ~12 GB
- Unified memory needed — 24 GB minimum, 32 GB comfortable
- Recommended runtime — mlx-lm ≥ 0.18
What this is
A tool-use finetune of OpenAI's gpt-oss-20b for Hermes-Agent,
a local agent framework that needs models which call tools reliably, follow
multi-turn instructions, and don't argue with system prompts.
The base model is the 21B-parameter (3.6B active) Mixture-of-Experts release from OpenAI. This finetune preserves the Harmony chat template and the reasoning-effort knob, and improves:
- Function-calling adherence (correct JSON, no commentary mid-call)
- Long agent loops (10+ turns of tool → observe → plan)
- System-prompt fidelity (respects role boundaries and refusal/allow-list rules)
It is not affiliated with NousResearch's Hermes model series. "Hermes-Agent" here refers to the local agent framework only.
Quickstart
pip install -U mlx-lm
One-shot generate
mlx_lm.generate \
--model fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx \
--prompt "List three bash one-liners that find files larger than 100 MB." \
--max-tokens 256
Local OpenAI-compatible server
mlx_lm.server --model fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx --port 1234
Point Hermes-Agent (or any OpenAI client) at http://127.0.0.1:1234/v1.
Hermes-Agent integration
Add a profile in ~/.hermes/config.yaml:
profiles:
gpt-oss-20b-tools:
provider: openai
base_url: http://127.0.0.1:1234/v1 # LM Studio / vLLM / mlx_lm.server
model: fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx
temperature: 0.7
top_p: 0.95
min_p: 0.1 # important for MoE stability
max_tokens: 8192
tool_choice: auto
Then hermes profile use gpt-oss-20b-tools and the agent loop will route
tool calls through this model.
Sampling
| Param | Value | Why |
|---|---|---|
| temperature | 0.7 | balanced; drop to 0.2 for strict tool calls |
| top_p | 0.95 | standard nucleus |
| min_p | 0.1 | required for MoE — prevents dead-expert tokens |
| repetition_penalty | 1.0 | the model handles repetition itself |
Harmony reasoning effort: set the system message to Reasoning: low|medium|high.
high is roughly 3-4x more output tokens but noticeably better on multi-step
tool plans.
Training
- Base:
openai/gpt-oss-20b - Method: LoRA SFT (rank 64, alpha 16) merged back into BF16
- Frame: Unsloth + TRL on a single H100 (80 GB)
- Data: ~42k tool-use traces from Hermes-Agent sessions, filtered for successful tool calls and clean JSON. No synthetic distillation.
- Length: 8192 tokens, packing on
- Loss: assistant-only, mask user/system/tool
The _16bit repo holds the merged BF16 weights. The _4bit, _mlx, and
_gguf repos are quantizations of that checkpoint.
Limitations
- Math and code-generation are unchanged from the base — this finetune optimizes the agent loop, not raw reasoning.
- The model can over-call tools when given vague instructions. Add a "if you can answer directly, do so" line to the system prompt.
- English only. Other languages were not in the training mix.
- Not safety-tuned beyond what
gpt-oss-20balready provides.
Other formats
- BF16 reference — full precision, vLLM / Transformers
- MXFP4 4-bit — fits a 16 GB GPU
- MLX — Apple Silicon native
- GGUF — llama.cpp / Ollama / LM Studio
License
Apache-2.0, inherited from the base model. No additional restrictions.
Citation
@misc{fesalfayed_gptoss20b_hermesagent_2025,
author = {Fayed, Fesal},
title = {gpt-oss-20b Hermes-Agent tool finetune (mlx)},
year = {2025},
url = {https://huggingface.co/fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx},
}
- Downloads last month
- 1,160
4-bit
Model tree for fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx
Base model
openai/gpt-oss-20b