"Next-Gen Cinema, AI-Powered in Beverly Hills
Smart Films, Brighter Futures.

Blog

butterfly

Google Unveils Veo 3.1: Video Generation Model Now with Native Sound and Advanced Editing

October 15, 2025. Google has officially released its updated video generation model, Veo 3.1. This new version, succeeding the Veo 3.0 introduced in May, boasts several key improvements, the most significant of which is native audio generation. The model is now available to users within the Google Flow video editor.

What’s New in Veo 3.1?

According to Google, the main enhancements focus on three areas: realism, prompt understanding, and most importantly, sound.

  1. Realistic Sound. Veo 3.1 no longer just creates video clips from text descriptions; it now automatically generates full audio accompaniment for them. This includes sound effects, ambient noises, music, and even synchronized character speech. For instance, you can create a dialogue scene where the AI will “voice” the characters’ lines, assign them voices, and sync lip movements.
  2. Enhanced Detail and Scenario Understanding. The model has become better at “understanding” complex user prompts and more accurately reproducing the textures and physics of the real world. This allows for the creation of more “lifelike” and cinematic clips that adhere to a specified style and narrative.
  3. Vertical Video Support. A key new feature for content creators is the official support for the vertical 9:16 format, simplifying the process of making videos for social networks like Instagram Reels and TikTok.

New Creative Features

Beyond quality improvements, Veo 3.1 introduces new creative tools:

  • Video Extension (Extend): This feature allows users to “extend” a video by generating a continuation of the scene for up to 8 seconds. By repeating the process, you can assemble a seamless clip up to a minute long, with continuous sound.
  • Object Insertion and Removal (Insert/Remove): These experimental functions enable precise editing of generated video—adding new characters or removing unwanted objects. The model automatically adjusts shadows and lighting for believability.
  • Multilingual Voice Synthesis: Veo 3.1 can synthesize speech in different languages.

Limitations Remain

Despite the breakthrough in quality, the maximum length of a single generated segment remains the same—8 seconds at 720p resolution. The functionality to create long videos from a single prompt is not yet available, and users will need to utilize the extend feature to get minute-long content.

Experts note that with the release of Veo 3.1, Google has made a significant leap in generative video, integrating such advanced audio and scene composition capabilities into a model for the first time, raising the competition in the AI video market to a new level.

Leave your comment

Your email address will not be published. Required fields are marked *