Text Generation
Transformers
Safetensors
English
qwen2
toke
code-generation
programming-language
qlora
fine-tuned
awq
4bit
conversational
Eval Results (legacy)
text-generation-inference
4-bit precision
Instructions to use karwalski/toke with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use karwalski/toke with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="karwalski/toke") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("karwalski/toke") model = AutoModelForCausalLM.from_pretrained("karwalski/toke") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use karwalski/toke with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "karwalski/toke" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "karwalski/toke", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/karwalski/toke
- SGLang
How to use karwalski/toke with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "karwalski/toke" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "karwalski/toke", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "karwalski/toke" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "karwalski/toke", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use karwalski/toke with Docker Model Runner:
docker model run hf.co/karwalski/toke
toke-7b-gate2
A 7B parameter language model fine-tuned to generate code in toke, a programming language designed to reduce token cost of AI-generated code. This model writes syntactically valid toke 100% of the time.
Model Details
| Property | Value |
|---|---|
| Base model | Qwen 2.5 Coder 7B-Instruct |
| Method | QLoRA (rank 64, alpha 128, 3 epochs) |
| Training data | 25,953 records — 18,890 synthetic + 6,069 from loke production (87K lines) |
| Training time | 37 hours on NVIDIA A10G (24 GB) |
| Weights | AWQ 4-bit quantized (this repo) |
| Context length | 32,768 tokens |
| License | Apache 2.0 |
What is toke?
toke is a statically typed, compiled language with a 55-character alphabet (lowercase a-z, digits 0-9, and 19 symbols). It compiles to native binaries via LLVM. A purpose-built BPE tokenizer achieves 52% fewer tokens on average vs cl100k_base.
- 13 keywords:
mftiifellpbrletmutasrtmt - No comments in source — documentation lives in companion files (.tkc.md)
- Errors as values — no exceptions, result types with
mt(match) - Website: tokelang.dev | Console: console.tokelang.dev
Gate 2 Results (May 2026)
| Metric | Gate 1 | Gate 2 |
|---|---|---|
| Compilation Pass@1 | 63.7% | 100% |
| Tasks evaluated | 1,000 | 700 |
| Functional Pass@1 | — | ~8% |
| Training records | 73,000 | 25,953 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("karwalski/toke", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("karwalski/toke")
prompt = """<|im_start|>system
Write toke programs. m=mod; f=name(p:type):ret{body}; let x=42; <expr return. Semicolons everywhere. Start with m=.
<|im_end|>
<|im_start|>user
Write a hello world program
<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.2, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# m=hello;i=io:std.io;f=main():i64{io.println("hello world");<0};
API Access
Free API access via console.tokelang.dev — no credit card required.
curl -X POST https://api.tokelang.dev/v1/generate \
-H "X-Api-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"description": "Return the absolute value of an integer"}'
Links
- tokelang.dev — Project website
- console.tokelang.dev — Free API access
- GitHub: toke — Compiler, spec, stdlib
- GitHub: loke — 87K lines of production toke
- Live tokenizer — Compare token counts in-browser
- Downloads last month
- 61
Model tree for karwalski/toke
Evaluation results
- Compilation Pass@1self-reported100.000
- Functional Pass@1self-reported8.000