Pandera SchemasΒΆ

Panderas schemas (Pandera doc) are used to validate the OAR structure of a dataframe.

You can make your own schema that precise the sara.oar.OARSchema:

from typing import Final

import numpy as np
import pandas as pd
import pandera.pandas as pa
from pandera.typing.pandas import Index, Series

from sara.oar import OARSchema


class GolfSchema(OARSchema):
    """Pandera OAR Schema for golf env."""

    position: Series[pd.Float64Dtype] = pa.Field(
        alias=("obs0", "position"),  # use an alias for the name of the multicolumn
        metadata={"description": "the ball position"},  # you can use metadata to
        # hold description of the variables
    )
    move: Series[pd.Float64Dtype] = pa.Field(
        alias=("act0", "move"),
        metadata={"description": "the ball variation of position"},
    )
    distance: Series[pd.Float64Dtype] = pa.Field(
        alias=("rew1", "distance"),
        le=0.0,  # you can use validators to impose conditions. Here, the
        #         distance is always negative thanks to `le`
        metadata={"description": "the negative distance from the hole"},
    )
    episodes: Index[pd.Int64Dtype] = pa.Field(
        metadata={"description": "the episode number"},
    )
    date: Index[np.datetime64] = pa.Field(
        metadata={"description": "date of the move"},
    )

    class Config:
        metadata: Final[dict] = {
            "description": (
                "A simple golf environment. "
                "The objective is to shot a ball in a hole"
            ),
            "process": (
                "1. Agent observes ball position.\n"
                "2. Agent shots the ball.\n"
                "3. Ball moves.\n"
            ),
        }

With this schema, you can check dataframes have valid structure:

>>> import pandas as pd
>>> from examples.golf import GolfSchema
>>> df = pd.read_pickle("docs/source/golf_oar.pkl")
>>> GolfSchema.validate(df).head()
signal                            obs0      act0      rew1
key                           position      move  distance
episodes date
0        2000-01-01 10:00:00  0.176277  0.030472 -0.176277
         2000-01-01 10:01:00  0.206749 -0.103998 -0.206749
         2000-01-01 10:02:00  0.102750  0.075045 -0.102750
         2000-01-01 10:03:00  0.177795  0.094056 -0.177795
         2000-01-01 10:04:00  0.271852 -0.195104 -0.271852

Of course, the inherited dataframe validates base schema:

>>> from sara.oar import OARSchema
>>> df = OARSchema.validate(df)

You can also type functions to indicate they produce dataframes respecting some schemas:

import pandera.pandas as pa
from pandera.typing.pandas import DataFrame

@pa.check_types  # use this to check schemas at runtime
def read_golf_oar(
    num_episodes: int = 1000,
    len_episode: int = 10,
    seed: int = 42,
) -> DataFrame[GolfSchema]:
     df = pd.read_pickle("docs/source/golf_oar.pkl")
     return DataFrame[GolfSchema](df)

To display the global description, you can use (textwrap is optional, it prettifies the display):

>>> import textwrap
>>> from examples.golf import GolfSchema
>>> print(textwrap.fill(GolfSchema.Config.metadata["description"], width=80))
A simple golf environment. The objective is to shot a ball in a hole

To display some field description, you can use:

>>> print(GolfSchema.to_schema().columns[("obs0", "position")].metadata["description"])
the ball position