The Human Guided Exploration (HuGE) method allows AI agents to rapidly learn with assistance from humans, even in cases where humans may make errors.
To instruct an AI agent in a novel task, such as opening a kitchen cabinet, researchers typically employ reinforcement learning—a trial-and-error process where the agent is rewarded for actions bringing it closer to the goal.
Traditionally, a human expert crafts a reward function, motivating the agent to explore, and iteratively refines it as the agent attempts different actions. This process is time-consuming, inefficient, and challenging to scale, particularly for complex tasks with numerous steps.
Researchers from MIT, Harvard University, and the University of Washington have introduced a reinforcement learning approach that doesn’t depend on an expertly designed reward function. Instead, it utilizes crowdsourced feedback from non-expert users to guide the agent’s learning process.
While some methods also leverage non-expert feedback, this new approach enables faster learning for the AI agent, even when dealing with data from users that may contain errors—data that might cause other methods to fail.
Furthermore, this approach permits asynchronous feedback gathering, allowing non-expert users worldwide to contribute to teaching the agent.
Pulkit Agrawal, an assistant professor at MIT, emphasizes the challenge of designing reward functions, highlighting the scalability issue with the current paradigm of expert researchers designing them for various tasks. This new approach aims to scale robot learning by crowdsourcing the design of reward functions, making it feasible for non-experts to provide valuable feedback.
In the future, this method could expedite a robot’s learning of specific tasks in a user’s home without the owner physically demonstrating each task. The robot could explore independently, guided by crowdsourced non-expert feedback.
The reward function in this method directs the agent’s exploration rather than providing precise instructions on completing the task. This allows the agent to learn more effectively, even if human supervision is somewhat inaccurate and noisy.
Lead author Marcel Torne, a research assistant in the Improbable AI Lab, explains that their method empowers the agent to explore based on the reward function, enhancing its learning capabilities. The research will be presented at the Conference on Neural Information Processing Systems next month.
Feedback Characterized By Noise
A method for collecting user feedback in reinforcement learning involves presenting a user with two photos depicting states achieved by an agent and asking them to identify which state is closer to a goal. For example, if a robot’s objective is to open a kitchen cabinet, one image might show the cabinet successfully opened, while the other might depict the microwave being opened. The user selects the photo representing the more favorable state.
Some previous approaches attempt to use this binary crowdsourced feedback to optimize a reward function guiding the agent’s learning process. However, due to the likelihood of mistakes from non-experts, the reward function can become excessively noisy, potentially causing the agent to become stuck and fail to reach its goal.
Lead author Marcel Torne and collaborators addressed this by dividing the process into two separate parts, each governed by its algorithm, introducing a new reinforcement learning method called HuGE (Human Guided Exploration).
On one side, a goal selector algorithm is continuously updated with crowdsourced human feedback. This feedback isn’t treated as a reward function but instead guides the agent’s exploration. Essentially, non-expert users leave incremental breadcrumbs that guide the agent toward its goal.
On the other side, the agent autonomously explores in a self-supervised manner, guided by the goal selector. It captures images or videos of its actions, which are then sent to humans for feedback, refining the goal selector.
This approach narrows down the exploration space for the agent, leading it to more promising areas closer to its goal. Even in the absence of feedback or if it takes time to arrive, the agent continues learning independently, albeit at a slower pace. This flexibility enables infrequent and asynchronous feedback gathering.
The exploration loop can continue autonomously, exploring and learning new things. When better signals are received, the agent explores in more concrete ways, allowing each component to progress at its own pace. Importantly, because the feedback gently guides the agent’s behavior, it eventually learns to complete the task, even if users provide incorrect answers.
Accelerated Learning
The researchers conducted tests on various simulated and real-world tasks to assess the effectiveness of the HuGE method. In simulations, HuGE demonstrated proficiency in learning tasks involving extended sequences of actions, such as orderly block stacking and navigating complex mazes.
Real-world experiments involved training robotic arms using HuGE to draw the letter “U” and perform pick-and-place tasks. Data from 109 nonexpert users across 13 countries and three continents were crowdsourced for these tests.
Results from both real-world and simulated experiments showed that HuGE enabled agents to achieve goals more rapidly compared to alternative methods. Interestingly, the researchers discovered that performance improved when using data from nonexpert crowdsourcing compared to synthetic data generated and labeled by researchers. Nonexpert users could label 30 images or videos in less than two minutes, indicating the method’s scalability.
In a related paper presented at the Conference on Robot Learning, the researchers enhanced HuGE to allow an AI agent not only to learn a task but also to autonomously reset the environment for continued learning. For example, if the agent learns to open a cabinet, HuGE guides it to close the cabinet without requiring human intervention.
The researchers underscore the importance of ensuring that AI agents align with human values in various learning approaches. Future plans involve refining HuGE to enable agents to learn from diverse forms of communication, such as natural language and physical interactions with robots. Additionally, they aim to apply the method to teach multiple agents simultaneously. The research is supported, in part, by the MIT-IBM Watson AI Lab.