We’re releasing a NeuralMMO(opens in a new window), a massively multiagent game environment for reinforcement learning agents. Our platform supports a large, variable number of agents within a persistent and open-ended task. The inclusion of many agents and species leads to better exploration, divergent niche formation, and greater overall competence.
In recent years, multiagent settings have become an effectiveplatform(opens in a new window)for(opens in a new window)deep(opens in a new window)reinforcement(opens in a new window)learning(opens in a new window)research(opens in a new window). Despite this progress, there are still two main challenges for multiagent reinforcement learning. We need to create open-ended tasks with a high complexity ceiling: current environments are either complex buttoonarrow(opens in a new window)or open-ended buttoo(opens in a new window)simple(opens in a new window). Properties such as persistence and large population scale are key, but we also need morebenchmark environmentsto quantify learning progress in the presence of large population scales and persistence. The game genre of Massively Multiplayer Online Games (MMOs) simulates a large ecosystem of a variable number of players competing in persistent and extensive environments.
To address these challenges, we built our Neural MMO to meet the following criteria:
Players_(agents)_ may join any availableserver_(environment)_, each containing an automatically generated tile-based game map of configurable size. Some tiles, such as food-bearing forest tiles and grass tiles, are traversable. Others, such as water and solid stone, are not. Agents spawn at a random location along the edges of the environment. They must obtain food and water, and avoid combat damage from other agents, in order to sustain their health. Stepping on a forest tile or next to a water tile refills a portion of the agent’s food or water supply, respectively. However, forest tiles have a limited supply of food, which regenerates slowly over time. This means that agents must compete for food tiles while periodically refilling their water supply from infinite water tiles. Players engage in combat using three combat styles, denoted _Melee, Range,_ and _Mage_ for flavor.
Input:Agents observe a square crop of tiles centered on their current position. This includes tile terrain types and the select properties (health, food, water, and position) of occupying agents.
Output:Agents output action choices for the next gametick_(timestep)_. Actions consist of one movement and one attack.
As a simple baseline, we train a small, fully connected architecture usingvanilla policy gradients(opens in a new window), with a value function baseline and reward discounting as the only enhancements. Instead of rewarding agents for achieving particular objectives, agents optimize only for theirlifetime_(trajectory length)_: they receive reward 1 for each tick of their lifetime. We convert variable length observations, such as the list of surrounding players, into a single length vector by computing the maximum across all players (OpenAI Fivealso utilized this trick). The source release includes our full distributed training implementation, which is based onPyTorch(opens in a new window)andRay(opens in a new window).
## Evaluation results
Agents’ policies are sampled uniformly from a number of populations—agents in different populations share architectures, but only agents in the same population share weights. Initial experiments show that agent competence scales with increasing multiagent interaction. Increasing the maximum number of concurrent players magnifies exploration; increasing the number of populations magnifies niche formation—that is, the tendency of populations to spread out and forage within different parts of the map.
### Server merge tournaments: Multiagent magnifies competence
There is no standard procedure among MMOs for evaluating relative player competence across multiple servers. However, MMO servers sometimes undergo merges where the player bases from multiple servers are placed within a single server. We implement “tournament” style evaluation by merging the player bases trained in different servers. This allows us to directly compare the policies learned in different experiment settings. We vary test time scale and find that agents trained in larger settings consistently outperform agents trained in smaller settings.
### Increased population size magnifies exploration
In the natural world, competition among animals can incentivize them to spread out to avoid conflict. We observe that map coverage increases as the number of concurrent agents increases. Agents learn to explore only because the presence of other agents provides a natural incentive for doing so.
### Increased species count magnifies niche formation
Given a sufficiently large and resource-rich environment, we found different populations of agents separated across the map to avoid competing with others as the populations increased. As entities cannot out-compete other agents of their own population (i.e., agents with whom they share weights), they tend to seek areas of the map that contain enough resources to sustain their population. Similar effects were also independently observed inconcurrent multiagent research by DeepMind(opens in a new window).
## Additional insights
We visualize agent-agent dependencies by fixing an agent at the center of a hypothetical map crop. For each position visible to that agent, we show what the value function would be if there were a second agent at that position. We find that agents learn policies dependent on those of other agents, in both the foraging and combat environments. Agents learn “bull’s eye” avoidance maps to begin foraging more effectively after only a few minutes of training. As agents learn the combat mechanics of the environment, they begin to appropriately value effective engagement ranges and angles of approach.
Our Neural MMO resolves two key limitations of previous game-based environments, but there are still many left unsolved. This Neural MMO strikes a middle ground betweenenvironmentcomplexity(opens in a new window)andpopulation(opens in a new window)scale(opens in a new window). We’ve designed this environment with open-source expansion in mind and for the research community to build upon.
If you are excited about conducting research on multiagent systems, considerjoiningOpenAI.
Joseph Suarez, Yilun Du, Phillip Isola, Igor Mordatch
Thanks to Clare Zhu for her substantial work on the 3D client.
We also thank the following for feedback on drafts of this post: Greg Brockman, Ilya Sutskever, Jack Clark, Ashley Pilipiszyn, Ryan Lowe, Julian Togelius, Joel Liebo, Cinjon Resnick.
Introducing Whisper Release Sep 21, 2022
Learning to play Minecraft with Video PreTraining Conclusion Jun 23, 2022
Techniques for training large neural networks Publication Jun 9, 2022
Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research
Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex
Safety * Safety Approach * Security & Privacy * Trust & Transparency
ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)
Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)
API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)
For Business * Business Overview * Solutions * Contact Sales
Company * About Us * Our Charter * Foundation * Careers * Brand
Support * Help Center(opens in a new window)
More * News * Stories * Livestreams * Podcast * RSS
Terms & Policies * Terms of Use * Privacy Policy * Other Policies
(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)
OpenAI © 2015–2026 Manage Cookies
English United States