Instructions to use google/paligemma2-10b-pt-896 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/paligemma2-10b-pt-896 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/paligemma2-10b-pt-896")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/paligemma2-10b-pt-896") model = AutoModelForImageTextToText.from_pretrained("google/paligemma2-10b-pt-896") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use google/paligemma2-10b-pt-896 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/paligemma2-10b-pt-896" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/paligemma2-10b-pt-896", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/paligemma2-10b-pt-896
- SGLang
How to use google/paligemma2-10b-pt-896 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/paligemma2-10b-pt-896" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/paligemma2-10b-pt-896", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/paligemma2-10b-pt-896" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/paligemma2-10b-pt-896", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/paligemma2-10b-pt-896 with Docker Model Runner:
docker model run hf.co/google/paligemma2-10b-pt-896
Error deploy on inference endpoints
KeyError: "Unknown task image-text-to-text, available tasks are ['audio-classification', 'automatic-speech-recognition', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-feature-extraction', 'image-segmentation', 'image-to-image', 'image-to-text', 'mask-generation', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'summarization', 'table-question-answering', 'text-classification', 'text-generation', 'text-to-audio', 'text-to-speech', 'text2text-generation', 'token-classification', 'translation', 'video-classification', 'visual-question-answering', 'vqa', 'zero-shot-audio-classification', 'zero-shot-classification', 'zero-shot-image-classification', 'zero-shot-object-detection', 'translation_XX_to_YY']"
Hi @yongyi169 ,
Getting an error because the task image-text-to-text is not among the tasks currently supported by Hugging Face Inference Endpoints. The paligemma2-10b-pt-896 model appears to perform image-to-text generation based on the image and optional text input. So, the task you should specify for this functionality is likely image-to-text, which corresponds to generating text based on image inputs.
So, please update endpoint deployment script to as shown below:
"model": "google/paligemma2-10b-pt-896",
"task": "image-to-text"
Thank you.