"Next-Gen Cinema, AI-Powered in Beverly Hills
Smart Films, Brighter Futures.

Blog

voice2

Alibaba Unveils Revolutionary Voice AI: Clone Any Voice in 3 Seconds or Design a New One from Scratch

Once again, Chinese tech giant Alibaba is making headlines in AI. The company has introduced two groundbreaking new models for speech synthesis: Qwen3-TTS‑VD‑Flash (VoiceDesign) and Qwen3-TTS‑VC‑Flash (VoiceClone).

  • Qwen3-TTS‑VC‑Flash (VoiceClone) is a powerful tool for voice cloning. Its standout feature is incredible efficiency: it requires just 3 seconds of target audio to accurately capture and replicate a voice’s timbre. The cloned voice can then generate speech in ten different languages, paving the way for seamless multilingual content creation and localization. Alibaba claims VC‑Flash outperforms renowned competitors like ElevenLabs and MiniMax in multilingual accuracy tests. The model also handles complex texts and even boasts a playful ability to mimic animal sounds.
  • Qwen3-TTS‑VD‑Flash (VoiceDesign), on the other hand, is essentially a voice constructor “from scratch.” While VC‑Flash copies an existing voice, VD‑Flash allows users to engineer a completely unique vocal identity based on a simple text description (e.g., “a deep, smooth baritone with a slight rasp and a British accent”). Developers state that in some benchmarks, this model surpasses similar offerings from rivals like the GPT-4o mini‑tts API and Gemini 2.5 Pro. Similar to GPT-4o’s approach, it moves beyond selecting from preset voices, enabling the design of a custom vocal persona.

What This Means for the Industry
The arrival of such tools represents a significant leap in democratizing access to professional-grade speech synthesis. Barriers like the need for lengthy, high-quality recordings or expensive studio solutions are beginning to fall. This opens vast opportunities for small businesses, independent content creators, game developers, and educational platforms, allowing them to easily incorporate high-quality, distinctive voiceovers into their projects. While ethical concerns regarding deepfakes and responsible use remain paramount, the technological advancement itself is undeniable.

Qwen3-TTS Voice Design Demo

Qwen Voice Cloning and Synthesis

Leave your comment

Your email address will not be published. Required fields are marked *