Using the Ornstein-Uhlenbeck process for random exploration

Johannes Nauta, Yara Khaluf, Pieter Simoens

Research output: Chapter in Book/Report/Conference proceedingConference paperAcademicpeer-review

3 Citations (Scopus)

Abstract

In model-based Reinforcement Learning, an agent aims to learn a transition model between attainable states. Since the agent initially has zero knowledge of the transition model, it needs to resort to random exploration in order to learn the model. In this work, we demonstrate how the Ornstein-Uhlenbeck process can be used as a sampling scheme to generate exploratory Brownian motion in the absence of a transition model. Whereas current approaches rely on knowledge of the transition model to generate the steps of Brownian motion, the Ornstein-Uhlenbeck process does not. Additionally, the Ornstein-Uhlenbeck process naturally includes a drift term originating from a potential function. We show that this potential can be controlled by the agent itself, and allows executing non-equilibrium behavior such as ballistic motion or local trapping.

Original languageEnglish
Title of host publicationCOMPLEXIS 2019 - Proceedings of the 4th International Conference on Complexity, Future Information Systems and Risk
EditorsVictor Mendez Munoz, Farshad Firouzi, Ernesto Estrada, Victor Chang
PublisherSciTePress
Pages59-66
Number of pages8
ISBN (Electronic)9789897583667
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event4th International Conference on Complexity, Future Information Systems and Risk, COMPLEXIS 2019 - Heraklion, Crete, Greece
Duration: 2 May 20194 May 2019

Conference

Conference4th International Conference on Complexity, Future Information Systems and Risk, COMPLEXIS 2019
Country/TerritoryGreece
CityHeraklion, Crete
Period2/05/194/05/19

Keywords

  • Brownian Motion
  • Exploration
  • Ornstein-Uhlenbeck Process

Fingerprint

Dive into the research topics of 'Using the Ornstein-Uhlenbeck process for random exploration'. Together they form a unique fingerprint.

Cite this