Best TTS For Oobabooga Web Ui

7 Min Read

Oobabooga is a web UI for large language models (LLMs) that can generate text from prompts, edit grammar, and customize settings. It also has a chat interface that allows users to interact with LLMs in a conversational manner. Users can ask questions, request information, or have fun with LLMs using natural language.

One of the features that can make Oobabooga’s chat interface more engaging and immersive is text-to-speech (TTS). TTS can convert the text generated by LLMs into spoken audio, so users can hear what LLMs replied to their responses. TTS can also add emotions, expressions, and sound effects to the speech, making it more realistic and lively.

Best TTS For Oobabooga

bark_tts

bark_tts is an extension that uses suno-ai/bark, a transformer-based text-to-audio model that can generate realistic speech as well as other audio, such as music, background noise, and simple sound effects. It can also produce nonverbal communications like laughing, sighing, and crying. Emotions can be controlled using trigger words in brackets, such as [sad] or [laughs].

bark_tts supports various languages out-of-the-box and automatically determines the language from the input text. It also allows users to add custom speakers by placing .npz files in the voices folder. The extension is easy to install and use, and the output quality is high. However, bark_tts has some limitations.

To use bark_tts with Oobabooga’s chat interface, users need to enable the extension in the settings and select the desired speaker. Then, they can type their prompts in the chat box and press enter. The text generated by LLMs will be converted into audio and played automatically. Users can also type trigger words in brackets to change the emotions of the speech.

XTTSv2

XTTSv2 is a variant of the coqui_tts extension in the main repository. Both use the XTTSv2 model from coqui-ai/TTS, a neural network-based TTS model that can clone voices from short audio samples. XTTSv2 has a “narrator” feature for text written between asterisks, which can be useful for storytelling or dialogue.

XTTSv2 is fast and easy to use, and the output quality is good. It supports multiple languages and allows users to add custom voices by placing wav files in the voices folder. The extension also has some options to adjust the speed, pitch, and volume of the speech.

To use XTTSv2 with Oobabooga’s chat interface, users need to enable the extension in the settings and select the desired voice. Then, they can type their prompts in the chat box and press enter. The text generated by LLMs will be converted into audio and played automatically. Users can also type text between asterisks to use the narrator voice.

SpeakLocal

SpeakLocal is an extension that uses pyttsx4, a cross-platform text-to-speech library that uses the native TTS abilities of the host machine (Linux, MacOS, Windows). It can generate speech from text in different languages and voices, depending on the installed TTS engines and voices.

SpeakLocal is 100% offline, low-resource, and has no word limit. It is suitable for accessing Oobabooga’s chat interface with a mobile device while conserving bandwidth with high token responses. The extension also has some options to select the voice, adjust the speech rate, and change the bitrate. However, SpeakLocal also has some limitations.

To use SpeakLocal with Oobabooga’s chat interface, users need to enable the extension in the settings and select the desired voice and other options. Then, they can type their prompts in the chat box and press enter. The text generated by LLMs will be converted into audio and played automatically.

Diffusion_TTS

Diffusion_TTS is an extension that uses Diff-TTS, a denoising diffusion model for text-to-speech. Diff-TTS can transform a noise signal into a mel-spectrogram via diffusion time steps, conditioned on the input text. It can synthesize speech with natural prosody and expression, and can also clone voices from short audio samples.

Diffusion_TTS is non-autoregressive and highly efficient. It can generate speech much faster than real-time with a single GPU, and can also leverage the accelerated sampling method to further boost the inference speed without significantly degrading the perceptual quality. The extension also supports multiple languages and allows users to choose from different voice models and speakers. However, Diffusion_TTS also has some challenges.

MOZTTS

MOZTTS is an extension that uses Mozilla TTS, an open source project that aims to create a universal, modular, and multilingual TTS engine. It uses deep learning to synthesize speech with natural prosody and expression. It supports multiple characters and allows users to preset voices for their custom characters in a configuration file.

MOZTTS is versatile and customizable, and the output quality is excellent. It supports many languages and regions, and allows users to choose from different voice models and speakers. The extension also has some options to enable CUDA, adjust the speed, and use vocoders.

To use MOZTTS with Oobabooga’s chat interface, users need to enable the extension in the settings and select the desired voice model and speaker. Then, they can type their prompts in the chat box and press enter. The text generated by LLMs will be converted into audio and played automatically. Users can also use different characters and voices by editing the configuration file.

TAGGED: ,
Share This Article
Follow:
SK is a versatile writer deeply passionate about anime, evolution, storytelling, art, AI, game development, and VFX. His writings transcend genres, exploring these interests and more. Dive into his captivating world of words and explore the depths of his creative universe.