Instructions to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx

Run Hermes

hermes

MLX LM

How to use fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

gpt-oss-20b · Hermes-Agent tool finetune · MLX

Apple Silicon native. Runs on M-series Macs through MLX with no PyTorch detour. Tested on M2 Max and M3 Pro.

Format — MLX (safetensors + index)
Size on disk — ~12 GB
Unified memory needed — 24 GB minimum, 32 GB comfortable
Recommended runtime — mlx-lm ≥ 0.18

What this is

A tool-use finetune of OpenAI's gpt-oss-20b for Hermes-Agent, a local agent framework that needs models which call tools reliably, follow multi-turn instructions, and don't argue with system prompts.

The base model is the 21B-parameter (3.6B active) Mixture-of-Experts release from OpenAI. This finetune preserves the Harmony chat template and the reasoning-effort knob, and improves:

Function-calling adherence (correct JSON, no commentary mid-call)
Long agent loops (10+ turns of tool → observe → plan)
System-prompt fidelity (respects role boundaries and refusal/allow-list rules)

It is not affiliated with NousResearch's Hermes model series. "Hermes-Agent" here refers to the local agent framework only.

Quickstart

pip install -U mlx-lm

One-shot generate

mlx_lm.generate \
  --model fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx \
  --prompt "List three bash one-liners that find files larger than 100 MB." \
  --max-tokens 256

Local OpenAI-compatible server

mlx_lm.server --model fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx --port 1234

Point Hermes-Agent (or any OpenAI client) at http://127.0.0.1:1234/v1.

Hermes-Agent integration

Add a profile in ~/.hermes/config.yaml:

profiles:
  gpt-oss-20b-tools:
    provider: openai
    base_url: http://127.0.0.1:1234/v1   # LM Studio / vLLM / mlx_lm.server
    model: fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx
    temperature: 0.7
    top_p: 0.95
    min_p: 0.1                            # important for MoE stability
    max_tokens: 8192
    tool_choice: auto

Then hermes profile use gpt-oss-20b-tools and the agent loop will route tool calls through this model.

Sampling

Param	Value	Why
temperature	0.7	balanced; drop to 0.2 for strict tool calls
top_p	0.95	standard nucleus
min_p	0.1	required for MoE — prevents dead-expert tokens
repetition_penalty	1.0	the model handles repetition itself

Harmony reasoning effort: set the system message to Reasoning: low|medium|high. high is roughly 3-4x more output tokens but noticeably better on multi-step tool plans.

Training

Base: openai/gpt-oss-20b
Method: LoRA SFT (rank 64, alpha 16) merged back into BF16
Frame: Unsloth + TRL on a single H100 (80 GB)
Data: ~42k tool-use traces from Hermes-Agent sessions, filtered for successful tool calls and clean JSON. No synthetic distillation.
Length: 8192 tokens, packing on
Loss: assistant-only, mask user/system/tool

The _16bit repo holds the merged BF16 weights. The _4bit, _mlx, and _gguf repos are quantizations of that checkpoint.

Limitations

Math and code-generation are unchanged from the base — this finetune optimizes the agent loop, not raw reasoning.
The model can over-call tools when given vague instructions. Add a "if you can answer directly, do so" line to the system prompt.
English only. Other languages were not in the training mix.
Not safety-tuned beyond what gpt-oss-20b already provides.

Other formats

BF16 reference — full precision, vLLM / Transformers
MXFP4 4-bit — fits a 16 GB GPU
MLX — Apple Silicon native
GGUF — llama.cpp / Ollama / LM Studio

License

Apache-2.0, inherited from the base model. No additional restrictions.

Citation

@misc{fesalfayed_gptoss20b_hermesagent_2025,
  author = {Fayed, Fesal},
  title  = {gpt-oss-20b Hermes-Agent tool finetune (mlx)},
  year   = {2025},
  url    = {https://huggingface.co/fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx},
}

Downloads last month: 1,160

Safetensors

Model size

21B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit

Model tree for fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx

Base model

openai/gpt-oss-20b

Quantized

(203)

this model

Collection including fesalfayed/gpt-oss-20b-hermes_agent-tool-finetune_mlx

finetuned_hermes-function-calling-v1

Collection

Models fine-tuned on Hermes tool calling dataset • 4 items • Updated May 2 • 1