Computing Return-to-go
======================

Let us load previous OAR dataframe:

>>> import pandas as pd
>>> df = pd.read_pickle("docs/source/golf_oar.pkl")

Return-to-go (rtg) are the discounted sum of reward from the current date until
episode's end. This is an important signal since it is what actions should
maximize.

However, computing rtg for truncated episodes (as it is our case here) is a bit
tricky since episodes never actually reach their end. By default our method of
rtg computation assume such episodes are infinite and properly scale them so
that rtg at the beginning of the episode and rtg near the truncation are
comparable.

>>> from sara.oar import enrich_rtg
>>> df = enrich_rtg(df, discount=0.75)
>>> df.head()  # doctest: +NORMALIZE_WHITESPACE
signal                            obs0      act0      rew1                rtg0
key                           position      move  distance cumulative distance
episodes date
0        2000-01-01 10:00:00  0.176277  0.030472 -0.176277           -0.631686
         2000-01-01 10:01:00  0.206749 -0.103998 -0.206749           -0.605225
         2000-01-01 10:02:00  0.102750  0.075045 -0.102750           -0.523078
         2000-01-01 10:03:00  0.177795  0.094056 -0.177795           -0.566191
         2000-01-01 10:04:00  0.271852 -0.195104 -0.271852           -0.507397

As you can see, `enrich_rtg` simply add a new "rtg0, cumulative distance"
column to the dataframe.

Because it is not easy to select discount, we provided a simple helper function
that return the discount based on an horizon such that the rtg tail is
relatively negligible beyond this horizon.

>>> from sara.oar import discount_from_horizon
>>> discount_from_horizon(10)
0.7411344491069477

Congrats, you have a dataframe enriched with rtg, let's save it for next guide.

>>> df.to_pickle("docs/source/golf_enriched.pkl")