"Next-Gen Cinema, AI-Powered in Beverly Hills
Smart Films, Brighter Futures.

Blog

Gemini

Gemini Robotics: How Google DeepMind is Teaching AI to Think and Act in the Real World

The team at Google DeepMind has made breakthroughs in creating multimodal AI that understands text, images, audio, and video. However, until now, these capabilities have been confined to the digital space. For artificial intelligence to become truly useful, it must learn “embodied” reasoning—the human ability to understand the physical world and act safely within it. The answer to this challenge is the announcement of two new models based on Gemini 2.0, marking the birth of a new generation of robots.

Introducing Gemini Robotics and Gemini Robotics-ER 1.5

Google DeepMind has introduced not one, but two key developments:

  1. Gemini Robotics is an advanced Vision-Language-Action (VLA) model. Essentially, it adds the capability for direct physical control of robots to the powerful intelligence of Gemini 2.0. The model perceives the environment, understands commands in natural language, and immediately generates actions for robotic arms.
  2. Gemini Robotics-ER (ER stands for Embodied Reasoning) is a model focused on deep spatial understanding. It does not control robots directly but serves as a “brain” for roboticists, allowing them to connect it to their own control systems to generate complex commands and code.

The Three Pillars of a Useful Robot: Generality, Interactivity, Dexterity

The developers highlighted three key qualities needed for helper robots. Gemini Robotics shows significant progress in all areas.

1. Generality
The model uses Gemini’s vast knowledge of the world to solve tasks it wasn’t specifically trained on. It works successfully with new objects, complex instructions, and in unfamiliar conditions. According to a technical report, Gemini Robotics more than doubles the performance of contemporary analogues in generalization tests.

2. Interactivity
A robot must be able to respond flexibly to changes. Gemini Robotics understands commands in conversational language and can adapt its actions in real-time. If an object slips from its “grip” or a person moves the target, the model instantly adjusts the plan. This “controllability” is critical for operating in dynamic environments—from kitchens to warehouses.

3. Dexterity
Many actions simple for humans but requiring fine motor skills remain a major challenge for robots. Gemini Robotics handles multi-step processes like folding origami or carefully packing groceries into a bag, demonstrating an unprecedented level of dexterity.

One Model for Different Robots

The developers considered that robots come in many forms. Although the model was trained primarily on data from the two-armed ALOHA 2 robot, it has already been adapted to work with Franka arms, which are common in laboratories. Work is underway to apply the model to control the Apollo humanoid robot from Apptronik, indicating ambitions to create a universal “brain” for robotics.

Safety and Responsible Development

Integrating AI into the physical world requires increased attention to safety. Google DeepMind’s approach includes multiple layers:

  • Physical Safety: Gemini Robotics-ER integrates with classic low-level controllers that ensure collision avoidance and force control.
  • Semantic Safety: The company has developed a framework for creating “constitutions” for robots—sets of rules in natural language that help the AI choose safer actions. To assist researchers, a special ASIMOV dataset has been released for safety assessment.
  • Expert Review: The work is carried out in collaboration with internal responsibility boards and external experts.

Partners and the Future

Google DeepMind is actively collaborating with leaders in the robotics industry. The partnership with Apptronik aims to create next-generation humanoid robots. The Gemini Robotics-ER model is already available for testing by companies such as Boston Dynamics, Agility Robots, and others.

The announcement of Gemini Robotics is a significant step towards creating truly useful and versatile helper robots. By combining the power of large language models with an understanding of the physical world, Google DeepMind is laying the foundation for a future where AI can not only think but also act safely alongside us in reality.

Leave your comment

Your email address will not be published. Required fields are marked *