generate() - Kitten TTS

Signature

KittenTTS.generate(
    text,
    voice="expr-voice-5-m",
    speed=1.0,
    clean_text=False,
) -> numpy.ndarray

Parameters

text

str

required

The input text to synthesize. Pass plain prose — numbers, currencies, and abbreviations are not expanded automatically unless clean_text=True.

voice

str

default:"\"expr-voice-5-m\""

Voice to use for synthesis. Accepts any friendly name from available_voices: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo.The default "expr-voice-5-m" is the internal ID for Leo. You can pass either the friendly name or the internal ID.

speed

float

default:"1.0"

Speech speed multiplier.

1.0 — normal speed
Values below 1.0 slow down speech (e.g., 0.75 is 75% speed)
Values above 1.0 speed it up (e.g., 1.5 is 150% speed)

clean_text

bool

default:"False"

If True, runs the TextPreprocessor pipeline before synthesis. This expands numbers, currencies, abbreviations, and more into spoken form.By default this is False — pass text that is already in spoken form, or enable this option to let KittenTTS handle expansion automatically.

Returns

audio

numpy.ndarray

Audio samples as a 1-D float32 numpy array at 24 kHz. You can write this directly to a file with soundfile.write() or play it back with sounddevice.play().

clean_text defaults to False in KittenTTS.generate(). If you pass raw text containing numbers or special characters without enabling clean_text, the model may mispronounce them. Either pre-process the text yourself or set clean_text=True.

Usage examples

from kittentts import KittenTTS

tts = KittenTTS()
audio = tts.generate("Welcome to KittenTTS.")

Python API

Documentation Index

​Signature

​Parameters

​Returns

​Usage examples

Signature

Parameters

Returns

Usage examples