NVIDIA Parakeet TDT 0.6B V3 (Multilingual) model converted to ONNX format for onnx-asr.
Install onnx-asr
pip install onnx-asr[cpu,gpu,hub]
Load Parakeet TDT model and recognize wav file
# https://github.com/istupakov/onnx-asr
# https://istupakov.github.io/onnx-asr/usage/#using-soundfile
# https://istupakov.github.io/onnx-asr/conversion
# https://docs.nvidia.com/nemo-framework/user-guide/24.09/nemotoolkit/core/export.html
# https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/export/transducer/infer_transducer_onnx.py
import onnx_asr
from pathlib import Path
from huggingface_hub import snapshot_download
model_path = snapshot_download("grimavatar/parakeet-tdt-0.6b-v3")
providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
model = onnx_asr.load_model("nemo-parakeet-tdt-0.6b-v3", path = model_path, providers = providers).with_timestamps()
audio_paths = [str(e) for e in Path(".").absolute().glob("*wav")] * 15
results = model.recognize(audio_paths)
texts = [e.text.strip() for e in results]
alignments = []
for output in results:
alignments.append([
{"start": start, "end": end if text.strip().isalnum() else start, "text": text} for start, end, text in zip(
output.timestamps, output.timestamps[1:] + output.timestamps[-1:], output.tokens
)
])
- Downloads last month
- 6
Model tree for grimavatar/parakeet-tdt-0.6b-v3
Base model
nvidia/parakeet-tdt-0.6b-v3