Unlocking the potential of Embodied AI has become a hot topic in recent years, and researchers have been making remarkable strides in this field. In this article, we delve into six groundbreaking research studies that have pushed the boundaries of Embodied AI.
From enhancing human-robot interaction to revolutionizing autonomous systems, these studies shed light on the latest developments in the realm of Embodied AI. Join us as we explore the cutting-edge advancements that are paving the way for a future where intelligent machines seamlessly integrate with the physical world.
Transforming Robot Decision Making with Spatial Language Attention
This paper explores how to train robots to make decisions using Transformers, which are powerful tools for processing language and large amounts of data. The authors propose a new method called Spatial Language Attention Policies (SLAP) that uses three-dimensional tokens to represent spatial information and train a language-conditioned action prediction policy.
The goal of SLAP is to enable robots to quickly adapt to new environments, handle changes in object appearance, and be robust to irrelevant clutter. The approach leverages a skill representation called attention-driven robot policies to predict goal poses and execute goal-driven motion.
The paper shows that SLAP outperforms prior work in terms of success rates on various tasks, including those with unseen distractors and configurations. This work has important implications for the development of robots that can operate in diverse human environments and generalize well to new situations.
Teaching Robot How to reason
This paper proposes a method for training an AI agent that can use language reasoning and actions together. The researchers suggest augmenting the agent with word outputs so it can generate textual captions interleaved with actions. This way, it can use language-based reasoning with decision making in reinforcement learning. They train an autoregressive transformer to predict both actions and text captions in a unified way. At test time, the model predicts actions towards the goal, as well as text tokens to reason. The researchers experiment with BabyAI, a grid world platform that allows them to study language-grounded tasks. They find that the method consistently outperforms the caption-free baseline when tested on the most challenging task in BabyAI.
In simpler terms, the paper proposes a way to teach robots to use language and actions together. They do this by creating a policy that can switch between language reasoning and actions. This way, the robot can use captions in its training data to reason while taking actions. They test the method on BabyAI and find that it works better than methods that don’t use captions.
Transforming Human-Robot Interaction with ChatGPT Language Model
This paper comes from microsoft, The research paper discusses how we can improve natural human-robot interaction using OpenAI’s ChatGPT language model. The goal is to make it easier for people to interact with robots without requiring complex programming languages or knowledge of robotics.
Currently, engineers need to translate a task’s requirements into code for the system, which can be slow, expensive, and inefficient. ChatGPT would allow a non-technical user to monitor and provide high-level feedback to the large language model (LLM) while controlling different robots.
The paper outlines a set of design principles for creating prompts that let ChatGPT solve robotics tasks by defining high-level robot APIs or function libraries that map to existing robot control stacks or perception libraries. The user can evaluate ChatGPT’s output and provide feedback if needed.
Some examples of ChatGPT in action include zero-shot task planning where it controlled a drone using intuitive language-based commands and performed complex tasks such as taking a selfie; manipulation scenarios where it coded APIs into more complex high-level functions like stacking blocks through conversational feedback; and perception-action loops where it was able to explore an environment until finding a user-specified object by implementing a perception-action loop.
Overall, the paper shows that with some guidance, ChatGPT has the potential to revolutionize robotics systems by enabling more natural human-robot interactions and allowing non-technical users to sit in control loops.
How Language Models are Making Robots More Helpful
This paper is similar to the previous one but comes from google.
The challenge of programming helpful robots that can interact with humans is that robots need to understand the way we communicate and reason through human tasks. Google Research and Everyday Robots are collaborating on PaLM-SayCan, an effort that uses Pathways Language Model (PaLM) in a robot learning model running on an Everyday Robots helper robot. This joint research is the first implementation that uses a large-scale language model to plan for a real robot. PaLM-SayCan makes it possible for people to communicate with helper robots via text or speech, while also enhancing the robot’s overall performance and ability to execute more complex and abstract tasks by tapping into the world knowledge encoded in the language model.
PaLM-SayCan enables the robot to understand and interpret commands more naturally through language. The model learns via chain of thought prompting by processing more complex, open-ended prompts and responding to them in ways that are reasonable and sensible. A language model may suggest something that appears reasonable, but may not be safe or realistic in a given setting. By fusing language from PaLM with robotic knowledge, we can improve the overall performance of a robot system in real-world environments.
For example, if you ask PaLM-SayCan to “Bring me a snack and something to wash it down with”, it uses chain of thought prompting to recognize that “wash it down” means bring a drink while “a bag of chips” may make for an adequate snack. The combined system cross-references two models such as the suggested approaches from PaLM based on language understanding with what could possibly happen given feasible skill sets in real-world environments.
The researchers conducted experiments responsibly by following Google’s AI principles using physical controls, safety protocols, multiple levels of safety measures such as algorithmic protections, risk assessments alongside emergency stops which mitigate risky scenarios for safe interactions between humans and robots. Although these are baby steps towards enabling human-centered robots of tomorrow which comprehend spoken languages as humans do and solve mechanical and intelligence challenges in robotics, these promising research outcomes show enormous potential.
Teaching Robots to Rearrange Cluttered Objects without Explicit Models
This research paper addresses the challenge of training robots to rearrange objects in cluttered environments without explicit object models. The researchers generated over 650K cluttered scenes, diverse everyday environments, such as cabinets and shelves. They used these scenes to train a model called CabiNet, which is a collision model that accepts object and scene point clouds and predicts collisions for SE(3) object poses in the scene. CabiNet allows the robot to navigate tight spaces during rearrangement, improving performance by nearly 35% compared to baselines.
The researchers extended prior work on neural collision checking methods to scale up to multiple cluttered environments with a fast inference speed of around 7µs/query—30 times faster than prior work—and trained over nearly 60 billion collision queries. They learned a scene SDF-based waypoint sampler from this dataset and used it with a Model Predictive Path Integral (MPPI) planner to generate collision-free trajectories for picking and placing objects in clutter.
The research showed that their approach directly transfers to real-world scenarios despite being trained exclusively in simulation. The cost for robot integration, which includes environment modeling, can be expensive. This research aims to learn a single representation that requires minimal engineering effort for each new scene type.
Detection Model: https://arxiv.org/abs/2304.09302
Teaching Miniature Robots Soccer
This research paper looks into whether Deep Reinforcement Learning (Deep RL) can help a miniature humanoid robot learn complex and safe movements in dynamic environments. The researchers used Deep RL to train the robot to play a simplified 1v1 soccer game. They trained the robot’s skills separately and then combined them in a self-play setting. The policy that resulted showed many robust and dynamic movement skills, such as quick fall recovery, walking, turning, and kicking. The robots showed not only individual skills but also transitions between them that were smooth, stable, and efficient.
Furthermore, the agents also learned basic strategic understanding of the game by anticipating ball movements and positioning themselves to block opponent shots. These behaviors emerged from simple rewards configuration during training and are what we would not have intuitively expected from a robot.
The entire training happened in simulation but was tested on real robots without any modifications to the models to find out if transfer learning is possible. The researchers found out that by using a sufficiently high-frequency control process along with targeted dynamics randomization and perturbations during training in simulation, it allows for good-quality transfer even with unmodeled effects and variations across robot instances.
Overall, this research shows that Deep RL can synthesize adaptive movements that are safe to be executed on real robots even for low-cost ones. It improves learning-based approaches towards creating general embodied intelligence that can act in physical environments with agility, dexterity, and understanding—a long-standing goal of AI researchers and roboticists alike.
Robot reinforcement training: https://arxiv.org/abs/2304.13653