Research Collection – Reinforcement Learning at Microsoft

This post has been republished via RSS; it originally appeared at: Microsoft Research.

Reinforcement learning is about agents taking information from the world and learning a policy for interacting with it, so that they perform better. So, you can imagine a future where, every time you type on the keyboard, the keyboard learns to understand you better. Or every time you interact with some website, it understands better what your preferences are, so the world just starts working better and better at interacting with people.

John Langford, Partner Research Manager, MSR NYC

Fundamentally, reinforcement learning (RL) is an approach to machine learning in which a software agent interacts with its environment, receives rewards, and chooses actions that will maximize those rewards. Research on reinforcement learning goes back many decades and is rooted in work in many different fields, including animal psychology, and some of its basic concepts were explored in the earliest research on artificial intelligence – such as Marvin Minsky’s 1951 SNARC machine, which used an ancestor of modern reinforcement learning techniques to simulate a rat solving a maze.

In the 1990s and 2000s, theoretical and practical work in reinforcement learning began to accelerate, leading to the rapid progress we see today. The theory behind reinforcement learning continues to advance, while its applications in real-world scenarios are leading to meaningful impact in many areas – from training autonomous systems to operate more safely and reliably in real-world environments, to making games more engaging and entertaining, to delivering more personalized information and experiences on the web.

Below is a timeline of advances that researchers and their collaborators across Microsoft have made in reinforcement learning, along with key milestones in the field generally.

Foundational work in reinforcement learning (1992-2014)

In 1992, this paper and its Reinforce algorithm were instrumental in the development of policy optimization algorithms.
This 1995 paper (and a later journal version) presented a novel approach to solving the “multiarmed bandit problem” without making any statistical assumptions about the distribution of payoffs.
This 1998 paper (and a later journal ve r sion) show how to learn optimal behavior in solving Markov Decision Processes generally.
This 2002 paper showed the first conditions under which learning to improve a policy locally achieves optimal policies.
In 2007, bandits that are generalized to use features and context are named contextual bandits.
Also in 2007, the first public version of Vowpal Wabbit is released, offering fast, efficient and flexible online machine learning techniques, as well as other machine learning approaches. John Langford and several of his colleagues on this project later join Microsoft Research to continue their work.
Microsoft researcher John Langford presents a tutorial on interactive learning at the Neural Information Processing Systems conference. (NIPS 2013)
In 2014, Richard Sutton and Andrew Barto publish Reinforcement Learning: An Introduction, recounting work in the field that began in the late 1970s.

2016

Work begins on Project Malmo

Researchers at Microsoft Research Cambridge introduce the Malmo Platform for Artificial Intelligence Experimentation (Project Malmo), which uses Minecraft as a platform to help AI learn to make sense of complex environments, learn from others, interact with the world, learn transferable skills and apply them to solve new problems.
- Publication
  
  The Malmo Platform for Artificial Intelligence Experimentation
- Download
  
  Malmo

2017

AirSim for real world RL

Microsoft researchers begin work on the Aerial Informatics and Robotics Platform (AirSim), an open-source robotics simulation platform that designers can use to generate the massive datasets required to train ground vehicles, wheeled robotics, aerial drones and other devices – without costly real-world field operations.
- Podcast
  
  Autonomous systems, aerial robotics and Game of Drones with Gurdeep Pall and Dr. Ashish Kapoor
- Blog
  
  Toward AI that operates in the real world
Hybrid Reward Architecture wins Ms. Pac-Man

The Hybrid Reward Architecture project is established, combining standard reinforcement learning techniques with deep neural networks, with the aim of outperforming humans in Arcade Learning Environment (ALE) games. It achieves a perfect score on Ms. Pac-Man.
- Blog
  
  Hybrid Reward Architecture (HRA) Achieving super-human performance on Ms. Pac-Man
- Publication
  
  Hybrid Reward Architecture for Reinforcement Learning
- Podcast
  
  Hybrid Reward Architecture and the Fall of Ms. Pac-Man with Dr. Harm van Seijen

2018

Bonsai: RL for autonomous systems

Microsoft acquires Bonsai, which developed a novel “machine teaching” approach, based on reinforcement learning, that abstracts its low-level mechanics. This enables subject matter experts to specify and train autonomous systems to accomplish tasks, regardless of their AI experience.

Teaching agents language, decision-making using games
- Download
  
  TextWorld
Microsoft Research Montreal researchers introduce TextWorld, an open-source, extensible engine that generates and simulates text games. This can be used to train reinforcement learning agents to learn skills such as language understanding and grounding, as well as sequential decision-making.

Blog

TextWorld: A learning environment for training reinforcement learning agents, inspired by text-based games

Publication

TextWorld: A Learning Environment for Text-based Games

Podcast: Malmo, Minecraft and machine learning with Dr. Katja Hofmann

Podcast excerpt: “I look at how artificial agents can learn to interact with complex environments. And I’m particularly excited about possibilities of those environments being ones where they interact with humans. So, one area is, for example, in video games, where AI agents that learn to interact intelligently could really enrich video games and create new types of experiences. For example, learn directly from their interactions with players, remember what kinds of interactions they’ve had and be really more relatable and more responsive to what is actually going on in the game and how they’re interacting with the player.” Katja Hofmann, Principal Researcher, Microsoft Research Cambridge.

Blog

Project Malmo: Reinforcement learning in a complex world

Blog

Challenge accepted – MARLÖ competition among conference highlight

2019

Microsoft launches Azure Cognitive Services Personalizer

Microsoft researchers establish the Real World Reinforcement Learning project, with the goal of enabling the next generation of machine learning using interactive reinforcement-based approaches to solve real-world problems.

One result of this work is the Azure Cognitive Services Personalizer, built on Microsoft Research’s Custom Decision Service and also supported by Vowpal Wabbit. In addition to its availability to the developer community, it is used by many teams at Microsoft, including Xbox, MSN, Microsoft.com and the Experiences & Devices division.
- Blog
  
  Real world interactive learning at cusp of enabling new class of applications
- Tutorial
  
  ICML 2017 Tutorial on Real World Interactive Learning
Game of Drones competition
- Blog
  
  Game of Drones at NeurIPS 2019: Simulation-based drone-facing competition built on AirSim
At NeurIPS, Microsoft researchers host the first “Game of Drones” competition, in which teams race a quadrotor drone in AirSim to push the boundaries of building competitive autonomous systems. The competition focuses on trajectory planning and control, computer vision, and opponent drone avoidance.

Project Paidia established

Microsoft Research Cambridge and game developer Ninja Theory establish Project Paidia, to drive state-of-the-art research in reinforcement learning aimed at novel applications in modern video games. Specifically, its early work focuses on creating agents that learn to collaborate with human players.

Also in 2019:
- Airsim is released on the Unity platform.
- Microsoft holds its first MineRL competition on sample-efficient reinforcement learning, in which participants attempt to mine a diamond in Minecraft using only four days of training time. The top solutions are recounted in this 2020 paper.
- Blog
  
  Three new reinforcement learning methods aim to improve AI in gaming and beyond – Microsoft Research
- Podcast
  
  Reinforcement learning for the real world with Dr. John Langford and Rafah Hosn
- Video
  
  MineRL Competition 2019
- Webinar
  
  Exploring Reinforcement Learning Methods from Algorithm to Application
- Webinar
  
  Reinforcement learning for the real world with Dr. John Langford and Rafah Hosn

2020

At Microsoft Build, the company makes Project Bonsai available for public preview, and introduces the Moab robotics platform for developers to test its capabilities.
- Podcast
  
  Provably efficient reinforcement learning with Dr. Akshay Krishnamurthy
- Blog
  
  Microsoft holds its second MineRL competition
- Blog
  
  NeurIPS 2020: Moving toward real-world reinforcement learning via batch RL, strategic exploration, and representation learning

2021

Microsoft will host its third Reinforcement Learning Day event in January 2021. This virtual workshop will feature talks by a number of outstanding speakers whose research covers a broad swath of the topic, from statistics to neuroscience, from computer science to control. A key objective is to bring together the research communities of all these areas to learn from each other and build on the latest knowledge.

The post Research Collection – Reinforcement Learning at Microsoft appeared first on Microsoft Research.

Foundational work in reinforcement learning (1992-2014)

Work begins on Project Malmo

AirSim for real world RL

Hybrid Reward Architecture wins Ms. Pac-Man

Bonsai: RL for autonomous systems

Teaching agents language, decision-making using games

Microsoft launches Azure Cognitive Services Personalizer

Game of Drones competition

Project Paidia established

Leave a Reply Cancel reply