A New Microsoft AI Research Shows How ChatGPT Can Convert Natural Language Instructions Into Executable Robot Actions – Codelivly

Related Articles

Large-scale language models (LLMs) that can understand and produce human-like language have become possible thanks to recent developments in natural language processing. Some curricula for specific jobs can be refined in a few shots through discussions as a result of learning a large amount of data. A good example of such an LLM is ChatGPT. Robotics is one fascinating area where ChatGPT can be used, where it can be used to translate natural language commands into operating codes to command robots. Generating robot programs from natural language commands is a desirable goal, and there are several existing studies, some of which are based on LLMs.

Unfortunately, most of them lack human-in-the-loop capability, are built in limited scope, or are hardware dependent. However, most of this research relies on particular datasets, which requires recalling the data and retraining models to adapt or extend them to different robotic situations. A robotic system that can be easily adapted to multiple applications or operating circumstances without requiring a significant amount of data collection or model retraining would be excellent from a practical use perspective. The advantage of adopting ChatGPT for robotic applications is that they may start with a modest amount of sample data to adapt the model to particular applications and use its language recognition and interaction capabilities as an interface.

Figure 1: Demonstrates real-world cues that ChatGPT can use to translate multi-step human instructions into actionable robot sequences that may be executed in diverse settings.

Although ChatGPT’s potential for robotic applications is gaining attention, there is currently no proven approach for practical use. In this study, researchers from Microsoft give a concrete illustration of how ChatGPT can be applied in a multi-shot situation to translate natural language commands into a series of actions that a robot can perform (Figure 1). The guidelines were created to meet the specifications typical of many real-world applications, while being set up for easy adaptation.

🚀 Join the fastest growing ML Subreddit community

To meet these requirements, they designed input directives to encourage ChatGPT to 1) output a sequence of predefined robot actions with explanations in human-readable JSON format. 2) Represent the operating environment in a formal style. 3) Infer and output the updated state of the runtime environment, which can be reused as the next input, allowing ChatGPT to operate based on the memory of recent operations only. They conducted experiments to test the effectiveness of their proposed guidelines in inferring appropriate actions for multi-step language instructions in different environments. They listed the following requirements for this paper: 1) Simple interaction with robot execution systems or visual recognition software. 2) adaptation to diverse domestic settings. 3) The ability to provide any number of instructions in plain English while reducing the impact of ChatGPT’s token limitation.

They also noted that ChatGPT’s conversational capabilities allow users to modify its output using natural language feedback, which is critical to creating an application that is both secure and durable while offering a user-friendly interface. The collection of robot actions, the representation of the environment and the names of the objects are easily modifiable and may be used as templates in the proposed instructions. The contribution of this paper is to create and disseminate generic guidelines that can be easily adapted to the needs of each experimenter, providing the robotics research community with useful information. They are open source and freely available on GitHub, along with their usage guidelines.


Popular Articles