NVIDIA’s Robots Dream of Trajectories, Not Electric Sheep, with GR00T-Dreams



NVIDIA has released a blueprint, dubbed Isaac GR00T-Dreams, which it says can deliver better and faster training for artificially intelligent autonomous robots — using generative AI world foundation models to create synthetic training data, dubbed “dreams.”

“The Isaac GR00T-Dreams blueprint is a reference workflow for generating vast amounts of synthetic trajectory data,” NVIDIA’s Oyindamola Omotuyi, Spencer Huang, Kalyan Meher Vadrevu, and Dennis Lynch explain in a joint announcement. “This data is used for teaching humanoid robots to perform new actions in novel environments. The blueprint enables robots to generalize across behaviors and adapt to new environments with minimal human demonstration data. As a result, a small team of human demonstrators can create the same amount of training data it would otherwise take thousands of people to produce.”

Machine learning and artificial intelligence both rely on a vast quantity of data for their training, in order to be able to correctly respond during the inference stage of operation. Without plenty of diverse data, a robot might only recognize blue cups and not red, for example — or, more seriously, an autonomous vehicle might only recognize pedestrians and not cyclists.

Gathering and labelling the data is hard work, which is where Isaac GR00T-Dreams comes in: a blueprint that, NVIDIA says, can supplement real-world data with synthetic data from a generative AI world foundation model. Using the Cosmos Predict-2 physical AI model combined with the Cosmos-Reason1 multimodal model, the blueprint is claimed to be capable of turning real robot data into synthetic trajectories — effectively “dreaming” new scenarios not present in the training data.

“First, developers collect a limited set of human-teleoperated trajectories for a humanoid robot performing a single task, such as pick-and-place, in a single environment,” NVIDIA’s staff explain. “Next, developers prompt the fine-tuned Cosmos model with an initial image and new text-based instructions for the generated robot to perform. This prompts the generative model to create a vast number of diverse and novel task scenarios or future world states (also called dreams) such as opening, closing, arranging objects, cleaning and sorting. These scenarios are created in the form of 2D videos.”

The company likens the generated data to “dreams,” and uses a separate model to weed out “bad dreams” from the training dataset. (📹: NVIDIA)

“Once a large number of dreams are generated,” the team continues, “the Cosmos Reason model can be used to evaluate the quality and success of each dream. It filters out ‘bad’ dreams, which depict unsuccessful or flawed task attempts, ensuring only the highest-quality and most relevant scenarios are selected for the next stage.

“The selected dreams, which are initially just pixels in a 2D video, are then processed using an Inverse Dynamics Model (IDM), a generative AI model for action labeling, to generate 3D action trajectories. Finally, these neural trajectories are used as a large-scale synthetic dataset to train visuomotor policies either by co-training alongside real-world data to enhance performance, or through solely training on them to enable generalization to novel behaviors and unseen environments.”

More information is available on the NVIDIA technical blog; a precursor to GR00T-Dreams itself, dubbed DreamGen, is available on GitHub under the permissive Apache 2.0 license, with a supporting preprint paper available on Cornell’s arXiv server.

By admin

Deixe um comentário

O seu endereço de email não será publicado. Campos obrigatórios marcados com *