Roboschool

Roboschool provides new OpenAI Gym environments for controlling robots in simulation. Eight of these environments serve as free alternatives to pre-existing MuJoCo implementations, re-tuned to produce more realistic motion. We also include several new, challenging environments.

Roboschool also makes it easy to train multiple agents together in the same environment.

After we launchedGym⁠(opens in a new window), one issue we heard from many users was that theMuJoCo⁠(opens in a new window)component required a paid license (though MuJoCo recently addedfree⁠(opens in a new window)student licenses for personal and class work). Roboschool removes this constraint, letting everyone conduct research regardless of their budget. Roboschool is based on theBullet Physics Engine⁠(opens in a new window), an open-source, permissivelylicensed⁠(opens in a new window)physics library that has been used by other simulation software such asGazebo⁠(opens in a new window)andV-REP⁠(opens in a new window).

Roboschool ships with twelve environments, including tasks familiar to Mujoco users as well as new challenges, such as harder versions of the Humanoid walker task, and a multi-player Pong environment. We plan to expand this collection over time and look forward to the community contributing as well.

For the existing MuJoCo environments, besides porting them to Bullet, we have modified them to be more realistic. Here are three of the environments we ported, with explanations of how they differ from the existing environments.

You can find trained policies for all of these environments in the`agent_zoo`folder in the GitHub repository. You can also access a`demo_race`script to initiate a race between three robots.

## Interactive and robust control

In several of the previous OpenAI Gym environments, the goal was to learn a walking controller. However, these environments involved a very basic version of the problem, where the goal is simply to move forward. In practice, the walking policies would learn a single cyclic trajectory and leave most of the state space unvisited. Furthermore, the final policies tended to be very fragile: a small push would often cause the robot to crash and fall.

We have added two more environments with the 3D humanoid, which make the locomotion problem more interesting and challenging. These environments require _interactive control_— the robots must run towards a flag, whose position randomly varies over time.

HumanoidFlagrun is designed to teach the robot to slow down and turn. The goal is to run towards the flag, whose position varies randomly.

HumanoidFlagrunHarder in addition allows the robot to fall and gives it time to get back on foot. It also starts each episode upright or laying on the ground, and the robot is constantly bombarded by white cubes to push it off its trajectory.

We ship trained policies for both

HumanoidFlagrun⁠(opens in a new window)andHumanoidFlagrunHarder⁠(opens in a new window). The walks aren’t as fast and natural-looking as the ones we see from the regular humanoid, but these policies can recover from many situations, and they know how to steer. This policy itself is still a multilayer perceptron, which has no internal state, so we believe that in some cases the agent uses its arms to store information.

Roboschool lets you both run and train multiple agents in the same environment. We start with RoboschoolPong, with more environments to follow.

With multiplayer training, you can train the same agent playing for both parties (so it plays with itself), you can train two different agents using the same algorithm, or you can even set two different algorithms against each other.

The multi-agent setting presents some interesting challenges. If you train both players simultaneously, you’ll likely see a learning curve like the following one, obtained from a policy gradient method:

Learning curves for pong, where policies are updated with policy gradient algorithms running simultaneously.

Here’s what’s happening:

That way, the policies oscillated, and neither agent learned anything useful after hours of training. As in generative adversarial networks, learning in an adversarial setting is tricky, but we think it’s an interesting research problem because this interplay can lead to sophisticated strategies even in simple environments, and it can provide a natural curriculum.

There’s been a lot of work by the community to createenvironments for OpenAI Gym⁠(opens in a new window), some of which are based on open-source physics simulators. In one recent project, researchers created afork of OpenAI Gym⁠(opens in a new window)that replaced MuJoCo by the open-source physics simulatorDART⁠(opens in a new window). Theyshowed⁠(opens in a new window)that policies can even be transferred between the two physics simulators, MuJoCo and DART.

Introducing Whisper Release Sep 21, 2022

Techniques for training large neural networks Publication Jun 9, 2022

Introducing Triton: Open-source GPU programming for neural networks Release Jul 28, 2021

Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research

Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

English United States

The unpaid, unrecognised burden of the women-led care economy of India

Andrej Karpathy Transitions from Coding to Directing AI Agents

Musk and Hassabis Discuss AI's Impact on Scientific Discovery

Perfios Reports 46% Profit Increase to ₹104 Cr in FY25, Revenue Surpasses ₹700 Cr

Latest Briefs