Instructions to use Tweeties/tweety-7b-tatar-v24a with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Tweeties/tweety-7b-tatar-v24a with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Tweeties/tweety-7b-tatar-v24a")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Tweeties/tweety-7b-tatar-v24a")
model = AutoModelForCausalLM.from_pretrained("Tweeties/tweety-7b-tatar-v24a")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Tweeties/tweety-7b-tatar-v24a with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Tweeties/tweety-7b-tatar-v24a"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tweeties/tweety-7b-tatar-v24a",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Tweeties/tweety-7b-tatar-v24a

SGLang

How to use Tweeties/tweety-7b-tatar-v24a with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Tweeties/tweety-7b-tatar-v24a" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tweeties/tweety-7b-tatar-v24a",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Tweeties/tweety-7b-tatar-v24a" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tweeties/tweety-7b-tatar-v24a",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Tweeties/tweety-7b-tatar-v24a with Docker Model Runner:
```
docker model run hf.co/Tweeties/tweety-7b-tatar-v24a
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Tweety-Tatar-7B: A Tatar Large Language Model

Tweety Tatar / Base 7b / 2024-v1

Model description

This model is our trans-tokenized LLM for the Tatar language, converted from the Mistral-7B-Instruct-v0.2 model trained by MistralAI. Trans-tokenized LLMs are language models finetuned to produce output in a particular language, using a novel tokenizer native to that language.

Developed by: François Remy (UGent), Alfiya Khabibullina (BeCode), et al.
Funded by: IDLab / GPULab (UGent)
Model type: Foundation model using the mistral architecture
Language(s) (NLP): Tatar
License: Apache 2.0

In-scope usage

This model can be used as-is to perform basic language modeling operations in Tatar, or finetuned to perform more complex operations. This model has not undergone Instruction- or Chat-based finetuning, which means that the model functions best in few-shot settings.

Usage instructions

This model can be used just like any LLM in the HuggingFace framework:

import transformers

MODEL_NAME = "Tweeties/tweety-tatar-base-7b-2024-v1"
generate = transformers.pipeline("text-generation", model=MODEL_NAME)

Word Analogies

ANALOGY_PROMPT = """Бу аналоглар таблицасын тутырыгыз:
* {x1} : {y1}
* {x2} :"""
def score_analogy(x1, y1, x2, y2):
    Y2_PROMPT = ANALOGY_PROMPT.replace('{x1}', x1).replace('{y1}', y1).replace('{x2}', x2)
    answer = generate(Y2_PROMPT, use_cache=True, do_sample=False, max_new_tokens=10, return_full_text=False, pad_token_id=generate.tokenizer.eos_token_id, eos_token_id=generate.tokenizer.convert_tokens_to_ids(['<0x0A>','</s>']))[0]['generated_text'].strip()
    return 1 if answer == y2 else 0

score_analogy('Мәскәү', 'Русия', 'Әнкара', 'Төркия') # 1

Summarization


SUMMARIZE = "Түбәндәге текстка йомгак ясагыз:\n"
LONG_TEXT = "\n\nОзын текст:\n"
LONG_TEXT_DEMO = "Кеше организмы катлаулы организм, аның өчен кирәкле туклыклы матдәләрнең аерым баланс таләп итә. Кеше организмының туклану рационы нигездә пешекләнгән ризыклардан тора икән, аның организмы бу ысул белән туклануга җайлаша. Әмма, шул ук кеше кинәт чимал диетасына күчә икән, аның организмы әлеге үзгәрешне кабул итә алмый, бу мөмкин кадәр зыян китерергә мөмкин." # The human body is a complex organism that requires a specific balance of nutrients. If the human body's diet consists mainly of cooked foods, its body adapts to this type of nutrition. However, if the same person suddenly switches to a raw diet, his body cannot adapt to this change, which can be harmful. # The human body is a complex organism that requires a specific balance of nutrients to function optimally. When a person's diet consists primarily of cooked food, their body adapts to this way of eating. However, if that same person suddenly switches to a raw food diet, their body may not be able to handle the sudden change, leading to potential harm. 
SHORT_TEXT = "\n\nКыска текст:\n"
SHORT_TEXT_DEMO = "Әмма пешкән ризык ашауга гына күнгән организмга кинәт чи ризык белән туклануга күчүнең зарарлы нәтиҗәсе дә булырга мөмкин." # However, a body accustomed to eating only cooked food can have harmful consequences when suddenly switching to eating raw food.

def generate_tatar_summary(tatar_text_to_summarize: str) -> str:

    # craft the 1-shot example
    input_ids = torch.concat([
        tokenizer.encode(SUMMARIZE, return_tensors='pt'),
        tokenizer.encode(LONG_TEXT, add_special_tokens=False, return_tensors='pt'),
        tokenizer.encode(LONG_TEXT_DEMO, add_special_tokens=False, return_tensors='pt'),
        tokenizer.encode(SHORT_TEXT, add_special_tokens=False, return_tensors='pt'),
        tokenizer.encode(SHORT_TEXT_DEMO, add_special_tokens=False, return_tensors='pt'),
        tokenizer.encode("\n\n", add_special_tokens=False, return_tensors='pt')
    ], axis=1)
    
    # craft the input
    input_ids = torch.concat([
        input_ids,
        tokenizer.encode(SUMMARIZE, return_tensors='pt'),
        tokenizer.encode(LONG_TEXT, add_special_tokens=False, return_tensors='pt'),
        tokenizer.encode(tatar_text_to_summarize, add_special_tokens=False, return_tensors='pt'),
        tokenizer.encode(SHORT_TEXT, add_special_tokens=False, return_tensors='pt'),
    ], axis=1)

    # generate the output
    model_inputs = {'input_ids':input_ids.to(cuda_device)}
    model_outputs = model.generate(
        **model_inputs,
        max_new_tokens=80,
        num_beams=8,
        no_repeat_ngram_size=6,
        early_stopping=False,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.convert_tokens_to_ids(['<0x0A>','</s>']),
    )

    # decode the output
    return (tokenizer.decode(model_outputs[0][input_ids.shape[1]:])).rstrip()

generate_tatar_summary("Зур шартлау (ингл. Big Bang) – Галәмнең башлангыч, сингуляр халәттә торган чорын тасвирлаучы космологик модель. Әле ХХ гасырда да без яшәгән Галәм статик структуралы, дигән фикер яшәгән. Ягъни, Галәмнең башы һәм ахыры юк, имеш, ул һәрвакыт булган һәм булачак. Бу фикер фән дөньясында бик озак, астрономия фәненең бөтен нигезләрен җимереп яңа теория барлыкка килгәнче яшәгән. Бу теориянең исеме – «Зур шартлау» теориясе.")

Citation

If you use this model, please cite our work as:

@article{tweeties2024,
    title = {Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP},
    author = {François Remy and Pieter Delobelle and Hayastan Avetisyan and Alfiya Khabibullina and Miryam de Lhoneux and Thomas Demeester},
    url = {https://arxiv.org/abs/2408.04303},
    year = {2024},
    note = {Accepted at COLM 2024}
}