New Intelligent Drone Project Testing Out Gemini Robotics-ER 1.5

Today, Google launched the preview of a new state-of-the-art robotics embodied reasoning model, Gemini Robotics-ER 1.5. If you’ve ever used Gemini Live with a camera, which allows the model to see what you see, imagine giving it a physical body. That’s how I understand the potential of this model.

The capabilities include point and finding objects to planning trajectories and orchestrating long horizon tasks.

While similar to other Gemini models, Gemini Robotics-ER 1.5 is purpose-built to enhance robotic perception and real-world interaction. It provides advanced reasoning to solve physical problems by interpreting complex visual data, performing spatial reasoning, and planning actions from natural language commands.
Google API Docs

Project Goals

Since I’m not planning to buy additional hardware, I’ll be using a couple of RoboMaster TT drones I already have to see if I can get them to navigate an environment autonomously to carry out instructions. An example would be to “go downstairs and tell me if you see a person”. This is something I’ve been thinking about for quite awhile and this new model seems like the perfect solution to pull it off.

A stretch goal is to have each drone act as an agent to perform coordinated tasks. Those tasks wont be anything impressive, and have yet to be determined, but the main goal is to learn and explore possibilities. One example might be to coordinate tasks between the two drones to accomplish checking for a person downstairs by passing information between them using their dot matrix screens.

I’d also like to include the ability to talk to them to give them tasks to carry out, even if it’s through my laptop. There’s an ESP-32 attached to each, so I could try to add a mic and speaker, but that’s added weight and complexity for initial experimentation.

Drone API

The Google example for calling a custom robot API seems to be the best fit for my use case. Its telling a robotic arm put a blue block in an orange bowl.

prompt = f"""
    You are a robotic arm with six degrees-of-freedom. You have the
    following functions available to you:

    def move(x, y, high):
      # moves the arm to the given coordinates. The boolean value 'high' set
      to True means the robot arm should be lifted above the scene for
      avoiding obstacles during motion. 'high' set to False means the robot
      arm should have the gripper placed on the surface for interacting with
      objects.

    def setGripperState(opened):
      # Opens the gripper if opened set to true, otherwise closes the gripper

    def returnToOrigin():
      # Returns the robot to an initial state. Should be called as a cleanup
      operation.

    The origin point for calculating the moves is at normalized point
    y={robot_origin_y}, x={robot_origin_x}. Use this as the new (0,0) for
    calculating moves, allowing x and y to be negative.
    Perform a pick and place operation where you pick up the blue block at
    normalized coordinates ({block_x}, {block_y}) (relative coordinates:
    {block_relative_x}, {block_relative_y}) and place it into the orange
    bowl at normalized coordinates ({bowl_x}, {bowl_y})
    (relative coordinates: {bowl_relative_x}, {bowl_relative_y}).
    Provide the sequence of function calls as a JSON list of objects, where
    each object has a "function" key (the function name) and an "args" key
    (a list of arguments for the function).
    Also, include your reasoning before the JSON output.
    For example:
    Reasoning: To pick up the block, I will first move the arm to a high
    position above the block, open the gripper, move down to the block,
    close the gripper, lift the arm, move to a high position above the bowl,
    move down to the bowl, open the gripper, and then lift the arm back to
    a high position.
"""

Here’s an example of the drone’s API functions that could be called instead to allow the model to explore its environment:

    # take off
    flight.takeoff().wait_for_completed()

    # fly up 100 cm (1 m)
    flight.up(distance=100).wait_for_completed()

    # fly forward 200 cm
    flight.forward(distance=200).wait_for completed()

    # rotate clockwise 90°
    flight.rotate(angle=90).wait_for_completed()

    # land
    flight.land().wait_for_completed()

I’ll be sharing how it goes and will add links to articles related to this project here.

Until then, you can learn all about this model on the Google Developers blog article that announced the release.