Google: With Veo 3 you can transform your photos into mini video (audio included)

With VEO 3 of Google, the IA model launched by Google Deepmind, just one photograph to generate an eight second video, complete with audio.

You can take a photo and turn it into an animated mini movie, complete with realistic sounds and movement. VEO 3, the IA model launched by Google Deepmind in May, now integrates directly into the Gemini app: just an image to generate an eight second video, complete with audio, a function designed to give life to memories, illustrations or even the sketches, exploiting the power of the automatic video generation. The function is already Available for subscribers to Pro and Ultra Plansin over 150 countries around the world, here’s how it works and why it could revolutionize digital creativity.

Animated images. The process is simple: you access Gemini.google.com (from desktop), you choose the “video” function, upload a photo and you write a brief description of what you want to get, including sound details. After a few moments, The image comes to life in the form of a short video clip. It can be a natural scene in motion, a drawing that comes alive or of a surreal reconstruction, the user’s imagination decides.

There novelty Compared to other generators with similar functions, it is the integrated inclusion of audio effects: environmental noises, soundtrack, even synthetic dialogues, all produced automatically. A leap forward compared to competing solutions such as Sora di Openai or Runway Gen-2, which instead require to add audio at a later time and with dedicated tools.

Enhanced creativity. Less than two months after the launch, over 40 million videos have already been created with Veo 3, between the Gemini app and the Flow tool (a Google tool designed precisely to help videomakers and creators to make short videos with IA). Some users used it for reinvent classic fairy tales in a modern key, others to make ASMR visual and sound experiences (videos and sounds that stimulate sensations of relaxation, such as the sound of lava that cools down or the rustle of the wind between the leaves).

Google did not specify the precise technical limits of the model, but according to several experts the current resolution (720p) is only the beginning: future updates could go to the full HD or 4K. Meanwhile, what is striking is the Consistency between original image, animation and audiowhich makes the content generated extremely credible, at least at first glance.

What about security? But there is a downside: when it comes to content generated by artificial intelligence, the safety It is a central theme.

Google said she adopted a double marking system: all videos produced with Veo 3 include a Visible watermark with the word “Veo” and a second invisible, called Synthid, To ensure the traceability of the contents. This countermeasure was designed for prevent improper uses and to make the videos generated compared to the real ones recognizable.

In addition, each update of the model is tested with “red team” sessions, a method that plans to simulate computer attacks to anticipate bugs and critical issues in a very simple way. Users can also contribute with a thumb up or down, to report the quality of the videos and help Google to further improve the experience.

About the author

Dr. Kyle Muller

Dr. Kyle Mueller is a Research Analyst at the Harris County Juvenile Probation Department in Houston, Texas. He earned his Ph.D. in Criminal Justice from Texas State University in 2019, where his dissertation was supervised by Dr. Scott Bowman. Dr. Mueller's research focuses on juvenile justice policies and evidence-based interventions aimed at reducing recidivism among youth offenders. His work has been instrumental in shaping data-driven strategies within the juvenile justice system, emphasizing rehabilitation and community engagement.