This 16-game arcade for AIs tests their playing prowess

INSUBCONTINENT EXCLUSIVE:
Figuring out just what an AI is good at is one of the hardest thing about understanding them
To help determine this, OpenAI has designed a set of games that can help researchers tell whether their machine learning agent is actually
learning basic skills or, what is equally likely, has figured out how to rig the system in its favor. It one of those aspects of AI research
that never fails to delight: the ways an agent will bend or break the rules in its endeavors to appear good at whatever the researchers are
asking it to do
Cheating may be thinking outside the box, but it isn&t always welcome, and one way to check is to change the rules a bit and see if the
system breaks down. Clever hide-and-seek AIs learn to use tools and break the rules What the agent actually learned can be determined by
seeing if those &skills& can be applied when it put into new circumstances where only some of its knowledge is relevant. For instance, say
you want to learn if an AI has learned to play a Mario-like game where it travels right and jumps over obstacles
You could switch things around so it has to walk left; you could change the order of the obstacles; or you could change the game entirely
and have monsters appear that the AI has to shoot while it travels right instead. If the agent has really learned something about playing a
game like this, it should be able to pick up the modified versions of the game much quicker than something entirely new
This is called &generalizing& — applying existing knowledge to a new set of circumstances — and humans do it constantly. OpenAI
researchers have encountered this many times in their research, and in order to test generalizable AI knowledge at a basic level, they&ve
designed a sort of AI arcade where an agent has to prove its mettle in a variety of games with varying overlap of gameplay concepts. The
16 game environments they designed are similar to games we know and love, like Pac-Man, Super Mario Bros., Asteroids, and so on
The difference is the environments have been build from the ground up towards AI play, with simplified controls, rewards, and graphics. Each
taxes an AI abilities in a different way
For instance in one game there may be no penalty for sitting still and observing the game environment for a few seconds, while in others it
may place the agent in danger
In some the AI must explore the environment, in others it may be focused on a single big boss spaceship
But they&re all made to be unmistakably different games, not unlike (though obviously a bit different from) what you might find available
for an Atari or NES console. Here the full list, as seen in the gif below from top to bottom, left to right: Ninja: Climb a tower while
avoiding bombs or destroying them with throwing stars. Coinrun: Get the coin at the right side of the level while avoiding traps and
monsters. Plunder: Fire cannonballs from the bottom of the screen to hit enemy ships and avoid friendlies. Caveflyer: Navigate caves using
Asteroids-style controls, shooting enemies and avoiding obstacles. Jumper: Open-world platformer with a double-jumping rabbit and compass
pointing towards the goal. Miner: Dig through dirt to get diamonds and boulders that obey Atari-era gravity rules. Maze: Navigate randomly
generated mazes of various sizes. Bigfish: Eat smaller fish than you to become the bigger fish, while avoiding a similar fate. Chaser: Like
Pac-Man, eat the dots and use power pellets strategically to eat enemies. Starpilot: Gradius-like shmup focused on dodging and quick
elimination of enemy ships. Bossfight: 1 on 1 battle with a boss ship with randomly selected attacks and replenishing shields. Heist:
Navigate a maze with colored locks and corresponding keys. Fruitbot: Ascend through levels while collecting fruit and avoiding
non-fruit. Dodgeball: Move around a room without touching walls, hitting others with balls and avoiding getting hit. Climber: Climb a series
of platforms collecting stars along the way and avoiding monsters. Leaper: Frogger-type lane-crossing game with cars, logs, etc. You can
imagine that an AI might be created that excels at the grid-based ones like Heist, Maze, and Chaser, but loses the track in Jumper, Coinrun,
and Bossfight
Just like a human — because there are different skills involved in each
But there are shared ones as well: understanding that the player character and moving objects may have consequences, or that certain areas
of the play area are inaccessible
An AI that can generalize and adapt quickly will learn to dominate all these games in a shorter time than one that doesn&t generalize
well. The set of games and methods for observing and rating agent performance in them is called the ProcGen benchmark, since the
environments and enemy placements in the games are procedurally generated
You can read more about them, or learn to build your own little AI arcade, at the project GitHub page.