Back to home

Marshmallows, Kindergartners, and Reinforcement Learning: A Tale of Strategy and AI

On the TED stage in 2010, Tom Wujec introduced the “Marshmallow Experiment.” The experiment involves twenty sticks of spaghetti, one yard of tape, one yard of string, and a marshmallow. From these materials the participants are told they have 18 minutes to build the largest free-standing structure they can. The structure is only deemed complete when it successfully holds the marshmallow on top. He has completed this experiment hundreds of times with different groups of people. From MBA graduates to recent graduates of kindergarten; each of these groups tackles the problem a little bit differently.

MBAs will spend some time organizing themselves and negotiating for power. They will spend time thinking, sketching planning, and eventually get started on building the tower. They build it up until time is almost up and finally place the marshmallow on top. They stand back, admire their work, and the end result is often the tower crumbling under the weight of the marshmallow.

Kindergarten students complete the task completely differently. They don’t spend any time planning and immediately start working and building. The key difference between strategies, though, is the continuous placement of the marshmallow on the tower throughout the experiment. The 5-year-olds build the tower up a little bit, place the marshmallow, get the feedback (did the tower fall or not), and continue iterating and building from there. This results in kindergartners building towers over twice the size, on average, than MBA graduates.

Now I know what you’re thinking, what in the world does this have to do with machine learning? Past blogs describ while this story is meant to illustrate the concept of reinforcement machine learning and how a computer uses feedback to learn.

In reinforcement learning, engineers will program the ‘rules’ of a game into a computer. In this case they would program the types of items, the rules of gravity, the load-bearing capability of spaghetti, the weight of the marshmallow, etc. Once the rules are programmed, the engineers then program feedback that ‘rewards’ the computer for taller towers. The computer then performs several iterations of the game and learns the most optimal path and configuration of all the items to receive the highest reward. Think of the computer as mimicking the kindergarten students who tried an iteration, placed the marshmallow, got the feedback, and then reconfigured their structure.

A recent example of successful implementation of reinforcement learning occurred in 2017 when a computer was programmed to learn ‘GO’. GO is more complicated than chess and has millions more possible combinations of moves. The program was able to beat World Champion Ke Jie signifying a huge step forward in reinforcement learning.

Reinforcement learning is one of the most exciting and revolutionary examples of AI. It is most useful when you are trying to determine an optimal set of actions in an environment with clear rules and an end-state that can be defined as ‘success’ or ‘failure’. Business cases include:

  • Self-driving cars
  • Optimizing an investment portfolio trading strategy
  • Use robots to re-stock and pick inventory in warehouses

When companies are able to conceptualize and understand reinforcement learning, they move from simply being able to predict, as with other types of machine learning, towards an ability to optimize. While competitors have their MBAs jockeying for power, planning, and watching their tower crumble under the weight of a marshmallow, those who operationalize reinforcement learning have the blueprint to the tallest and strongest tower already printed out.