Model Card for multiscreen_psi16_768

This model is an unofficial experimental pre-traind model of multiscreen with TinyStories datasets. It has been trained using TRL.

Quick start

from transformers import AutoTokenizer, AutoModelForCausalLM


model_id = "kurogane/tinystorys_multiscreen_vocab768"
cache_dir = r"/media/kurogane/backup/cache"

model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    trust_remote_code=True,
    cache_dir=cache_dir,
    )
model.to("cuda:0")

tokenizer = AutoTokenizer.from_pretrained(
    model_id, 
    padding_side="left", 
    cache_dir=cache_dir,
    )

model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs)

s_output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(s_output)

result example

A list of colors: red, blue, yellow, green, orange. All the people

Training procedure

This model was trained with SFT.

Framework versions

  • TRL: 0.24.0
  • Transformers: 5.8.0
  • Pytorch: 2.11.0+cu129
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Used archtechture

This model is an experimental tiny language model trained on TinyStories using a Multiscreen-style architecture inspired by the paper Screening Is Enough by Ken M. Nakanishi. This model implementation was developed as an experimental Hugging Face Transformers port, with reference to the unofficial PyTorch implementation dieOD/multiscreen-pytorch. This model is not an official implementation released by the author of the Multiscreen paper.

Used dataset

The training data is based on the TinyStories dataset by Ronen Eldan and Yuanzhi Li.

Downloads last month
237
Safetensors
Model size
14.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kurogane/tinystorys_multiscreen_vocab768

Finetunes
1 model

Dataset used to train kurogane/tinystorys_multiscreen_vocab768

Collection including kurogane/tinystorys_multiscreen_vocab768

Papers for kurogane/tinystorys_multiscreen_vocab768