OAR Format

The OAR format is a set of rules to check a pandas DataFrame is valid for SARA reinforcement learning applications.

Let us construct and validate such OAR DataFrame for a simple golf environment:

  • Observations: the ball position

  • Actions: the ball move

  • Rewards: the distance from the whole (at 0)

  • Episodes: truncated after 10 shots

Basic setup

Import libs, some parameters and set rng seed:

>>> import numpy as np
>>> import pandas as pd
>>> num_episodes = 1000
>>> len_episode = 10
>>> rng = np.random.default_rng(42)

Generate some random actions:

>>> moves = rng.normal(size=(num_episodes, len_episode))
>>> moves = moves / len_episode

Generate some observations with move’s cumsum:

>>> initial_positions = rng.normal(size=(num_episodes, 1))
>>> positions = np.concatenate([initial_positions, moves], axis=1)
>>> positions = np.cumsum(positions, axis=1)

Generate some rewards with position distance from 0:

>>> rewards = -np.abs(positions)

Data is ready, let us see how to organize it into OAR format.

Pack data in OAR dataframe

The dataframe should have multicolumns with exactly two levels respectively called “signal” and “key”. Signal is the type of data in the RL process:

>>> from sara.oar.schemas import SIGNALS
>>> SIGNALS == {'rew1', 'obs1', 'act0', 'term1', 'obs0', 'rtg0'}
True

only “act0” and “rew1” are mandatory, the other are optionals.

So, let us pack data into a pandas DataFrame:

>>> df = pd.DataFrame({
...     ("obs0", "position"): positions[:, :-1].flatten(),
...     ("act0", "move"): moves.flatten(),
...     ("rew1", "distance"): rewards[:, :-1].flatten()
... })

and set column names:

>>> df.columns.names = ["signal", "key"]

As well, the dataframe should be multiindexed with:

  • at least one index level for episodes

  • exactly one index “date” for time inside an episode

So let us set a multiindex with episode ids and dates for each shot:

>>> episodes = list(range(num_episodes))
>>> dates = pd.date_range("2000-01-01 10:00", "2000-01-01, 10:09", freq="min")
>>> idx = pd.MultiIndex.from_product(
...     [episodes, dates], names=["episodes", "date"])
>>> df = df.set_index(idx)

Each line of the DataFrame corresponds to a transition. The “0” in the key (e.g. “obs0”, “act0”) indicates data that come before the environment step. Conversely, the “1” in the key (e.g. “obs1”, “rew1”) indicates date that come after the environment step.

Now we should be able to validate our OAR format:

>>> from sara.oar import OARSchema
>>> df = OARSchema.validate(df)

Congrats, you have a well structured OAR DataFrame, let’s save it for next guides.

>>> df.to_pickle("docs/source/golf_oar.pkl")