Pandera SchemasΒΆ
Panderas schemas (Pandera doc) are used to validate the OAR structure of a dataframe.
You can make your own schema that precise the sara.oar.OARSchema:
from typing import Final
import numpy as np
import pandas as pd
import pandera.pandas as pa
from pandera.typing.pandas import Index, Series
from sara.oar import OARSchema
class GolfSchema(OARSchema):
"""Pandera OAR Schema for golf env."""
position: Series[pd.Float64Dtype] = pa.Field(
alias=("obs0", "position"), # use an alias for the name of the multicolumn
metadata={"description": "the ball position"}, # you can use metadata to
# hold description of the variables
)
move: Series[pd.Float64Dtype] = pa.Field(
alias=("act0", "move"),
metadata={"description": "the ball variation of position"},
)
distance: Series[pd.Float64Dtype] = pa.Field(
alias=("rew1", "distance"),
le=0.0, # you can use validators to impose conditions. Here, the
# distance is always negative thanks to `le`
metadata={"description": "the negative distance from the hole"},
)
episodes: Index[pd.Int64Dtype] = pa.Field(
metadata={"description": "the episode number"},
)
date: Index[np.datetime64] = pa.Field(
metadata={"description": "date of the move"},
)
class Config:
metadata: Final[dict] = {
"description": (
"A simple golf environment. "
"The objective is to shot a ball in a hole"
),
"process": (
"1. Agent observes ball position.\n"
"2. Agent shots the ball.\n"
"3. Ball moves.\n"
),
}
With this schema, you can check dataframes have valid structure:
>>> import pandas as pd
>>> from examples.golf import GolfSchema
>>> df = pd.read_pickle("docs/source/golf_oar.pkl")
>>> GolfSchema.validate(df).head()
signal obs0 act0 rew1
key position move distance
episodes date
0 2000-01-01 10:00:00 0.176277 0.030472 -0.176277
2000-01-01 10:01:00 0.206749 -0.103998 -0.206749
2000-01-01 10:02:00 0.102750 0.075045 -0.102750
2000-01-01 10:03:00 0.177795 0.094056 -0.177795
2000-01-01 10:04:00 0.271852 -0.195104 -0.271852
Of course, the inherited dataframe validates base schema:
>>> from sara.oar import OARSchema
>>> df = OARSchema.validate(df)
You can also type functions to indicate they produce dataframes respecting some schemas:
import pandera.pandas as pa
from pandera.typing.pandas import DataFrame
@pa.check_types # use this to check schemas at runtime
def read_golf_oar(
num_episodes: int = 1000,
len_episode: int = 10,
seed: int = 42,
) -> DataFrame[GolfSchema]:
df = pd.read_pickle("docs/source/golf_oar.pkl")
return DataFrame[GolfSchema](df)
To display the global description, you can use (textwrap is optional, it prettifies the display):
>>> import textwrap
>>> from examples.golf import GolfSchema
>>> print(textwrap.fill(GolfSchema.Config.metadata["description"], width=80))
A simple golf environment. The objective is to shot a ball in a hole
To display some field description, you can use:
>>> print(GolfSchema.to_schema().columns[("obs0", "position")].metadata["description"])
the ball position