Products and applications of OpenAI

At its beginning, OpenAI's research included many projects focused on reinforcement learning (RL). OpenAI has been viewed as an important competitor to DeepMind. Gym Announced in 2016, Gym was an open-source Python library designed to facilitate the development of reinforcement learning algorithms. It aimed to standardize how environments are defined in AI research, making published research more easily reproducible while providing users with a simple interface for interacting with these environments. In 2022, new developments of Gym have been moved to the library Gymnasium. Gym Retro Released in 2018, Gym Retro is a platform for reinforcement learning (RL) research on video games using RL algorithms and study generalization. Prior RL research focused mainly on optimizing agents to solve single tasks. Gym Retro gives the ability to generalize between games with similar concepts but different appearances. RoboSumo Released in 2017, RoboSumo is a virtual world where humanoid metalearning robot agents initially lack knowledge of how to even walk, but are given the goals of learning to move and to push the opposing agent out of the ring. Through this adversarial learning process, the agents learn how to adapt to changing conditions. When an agent is then removed from this virtual environment and placed in a new virtual environment with high winds, the agent braces to remain upright, suggesting it had learned how to balance in a generalized way. OpenAI's Igor Mordatch argued that competition between agents could create an intelligence "arms race" that could increase an agent's ability to function even outside the context of the competition. After the match, CTO Greg Brockman explained that the bot had learned by playing against itself for two weeks of real time, and that the learning software was a step in the direction of creating software that can handle complex tasks like a surgeon. The system uses a form of reinforcement learning, as the bots learn over time by playing against themselves hundreds of times a day for months, and are rewarded for actions such as killing an enemy and taking map objectives. By June 2018, the ability of the bots expanded to play together as a full team of five, and they were able to defeat teams of amateur and semi-professional players. At The International 2018, OpenAI Five played in two exhibition matches against professional players, but ended up losing both games. In April 2019, OpenAI Five defeated OG, the reigning world champions of the game at the time, 2:0 in a live exhibition match in San Francisco. The bots' final public appearance came later that month, where they played in 42,729 total games in a four-day open online competition, winning 99.4% of those games. OpenAI Five's mechanisms in Dota 2's bot player show the challenges of AI systems in multiplayer online battle arena (MOBA) games and how OpenAI Five has demonstrated the use of deep reinforcement learning (DRL) agents to achieve superhuman competence in Dota 2 matches. Dactyl Developed in 2018, Dactyl uses machine learning to train a Shadow Hand, a human-like robot hand, to manipulate physical objects. It learns entirely in simulation using the same RL algorithms and training code as OpenAI Five. OpenAI tackled the object orientation problem by using domain randomization, a simulation approach which exposes the learner to a variety of experiences rather than trying to fit to reality. The setup for Dactyl, aside from having motion tracking cameras, also has RGB cameras to allow the robot to manipulate an arbitrary object by seeing it. In 2018, OpenAI showed that the system was able to manipulate a cube and an octagonal prism. In 2019, OpenAI demonstrated that Dactyl could solve a Rubik's Cube. The robot was able to solve the puzzle 60% of the time. Objects like the Rubik's Cube introduce complex physics that is harder to model. OpenAI did this by improving the robustness of Dactyl to perturbations by using Automatic Domain Randomization (ADR), a simulation approach of generating progressively more difficult environments. ADR differs from manual domain randomization by not needing a human to specify randomization ranges. ==API==