FlashVSR - Video and Image Upscaler: [Runs on 12GB vram, 32GB ram] Diffusion-Based Streaming Video Super-Resolution

NeuTTS Air is the world’s first super-realistic, on-device, TTS speech language model with instant voice cloning. Built off a 0.5B
[NVIDIA ONLY] Stable Diffusion WebUI Forge supporting Flux, Qwen, wan, nunchaku and more in a lightweight WebUI. https://github.com/Haoming02/sd-webui-forge-classic/tree/neo
All in one Gradio interface for chatterbox. Voice cloning from uploaded audio samples, automatic text processing for long content and real-time speech generation with configurable parameters. (Minimum Requirements 4GB VRAM / Recommended Requirements 8GB VRAM)
SongBloom, a novel framework for full-length song generation

Super fast Multilingual TTS supporting 54 voices across 8 languages.

A Step Towards Music Generation Foundation Model

(WINDOWS)NVIDIA, Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
[NVIDIA ONLY] AllTalk-TTS is a unified UI for E5-TTS, XTTS, Vite TTS, Piper TTS, Parler TTS and RVC, based on CoquiTTS, including a finetune mode.
[NVIDIA ONLY] [RTX 50 Support] Image generation, image editing and free-form manipulation with a VLM (Minimum Requirements 12GB VRAM / 32GB RAM Recommended Requirements 24GB VRAM / 48GB RAM)
[NVIDIA ONLY] Image generation, image editing and free-form manipulation with a VLM (Minimum Requirements 12GB VRAM / 32GB RAM Recommended Requirements 24GB VRAM / 48GB RAM)
Fast Lipsync application for smaller GPU's.
Manage your ComfyUI environments with Docker
Kokoro, KittenTTS, Higgs audio, Chatterbox/Multi, Fish-Speech, F5 & index-tts & indextts2, VoxCPM and VibeVoice in one app
Forget everything you thought you knew about AI art generation - RuinedFooocus is here to completely reinvent the game!
Fast and High-Quality Zero-Shot voice clone Text-to-Speech with Flow Matching
Fast and High-Quality Zero-Shot voice clone Text-to-Speech with Flow Matching Multilingual
Audio Transcription App with Parakeet-TDT-0.6b-v2