INSUBCONTINENT EXCLUSIVE:

Figuring out just what an AI is good at is one of the hardest thing about understanding them

To help determine this, OpenAI has designed a set of games that can help researchers tell whether their machine learning agent is actually

learning basic skills or, what is equally likely, has figured out how to rig the system in its favor. It one of those aspects of AI research

that never fails to delight: the ways an agent will bend or break the rules in its endeavors to appear good at whatever the researchers are

asking it to do

Cheating may be thinking outside the box, but it isn&t always welcome, and one way to check is to change the rules a bit and see if the

system breaks down. Clever hide-and-seek AIs learn to use tools and break the rules What the agent actually learned can be determined by

seeing if those &skills& can be applied when it put into new circumstances where only some of its knowledge is relevant. For instance, say

you want to learn if an AI has learned to play a Mario-like game where it travels right and jumps over obstacles

You could switch things around so it has to walk left; you could change the order of the obstacles; or you could change the game entirely

and have monsters appear that the AI has to shoot while it travels right instead. If the agent has really learned something about playing a

game like this, it should be able to pick up the modified versions of the game much quicker than something entirely new

This is called &generalizing& — applying existing knowledge to a new set of circumstances — and humans do it constantly. OpenAI

researchers have encountered this many times in their research, and in order to test generalizable AI knowledge at a basic level, they&ve

designed a sort of AI arcade where an agent has to prove its mettle in a variety of games with varying overlap of gameplay concepts. The

16 game environments they designed are similar to games we know and love, like Pac-Man, Super Mario Bros., Asteroids, and so on

The difference is the environments have been build from the ground up towards AI play, with simplified controls, rewards, and graphics. Each

taxes an AI abilities in a different way

For instance in one game there may be no penalty for sitting still and observing the game environment for a few seconds, while in others it

may place the agent in danger

In some the AI must explore the environment, in others it may be focused on a single big boss spaceship

But they&re all made to be unmistakably different games, not unlike (though obviously a bit different from) what you might find available

for an Atari or NES console. Here the full list, as seen in the gif below from top to bottom, left to right: Ninja: Climb a tower while

avoiding bombs or destroying them with throwing stars. Coinrun: Get the coin at the right side of the level while avoiding traps and

monsters. Plunder: Fire cannonballs from the bottom of the screen to hit enemy ships and avoid friendlies. Caveflyer: Navigate caves using

Asteroids-style controls, shooting enemies and avoiding obstacles. Jumper: Open-world platformer with a double-jumping rabbit and compass

pointing towards the goal. Miner: Dig through dirt to get diamonds and boulders that obey Atari-era gravity rules. Maze: Navigate randomly

generated mazes of various sizes. Bigfish: Eat smaller fish than you to become the bigger fish, while avoiding a similar fate. Chaser: Like

Pac-Man, eat the dots and use power pellets strategically to eat enemies. Starpilot: Gradius-like shmup focused on dodging and quick

elimination of enemy ships. Bossfight: 1 on 1 battle with a boss ship with randomly selected attacks and replenishing shields. Heist:

Navigate a maze with colored locks and corresponding keys. Fruitbot: Ascend through levels while collecting fruit and avoiding

non-fruit. Dodgeball: Move around a room without touching walls, hitting others with balls and avoiding getting hit. Climber: Climb a series

of platforms collecting stars along the way and avoiding monsters. Leaper: Frogger-type lane-crossing game with cars, logs, etc. You can

imagine that an AI might be created that excels at the grid-based ones like Heist, Maze, and Chaser, but loses the track in Jumper, Coinrun,

and Bossfight

Just like a human — because there are different skills involved in each

But there are shared ones as well: understanding that the player character and moving objects may have consequences, or that certain areas

of the play area are inaccessible

An AI that can generalize and adapt quickly will learn to dominate all these games in a shorter time than one that doesn&t generalize

well. The set of games and methods for observing and rating agent performance in them is called the ProcGen benchmark, since the

environments and enemy placements in the games are procedurally generated

You can read more about them, or learn to build your own little AI arcade, at the project GitHub page.

This 16-game arcade for AIs tests their playing prowess