Provencia¶

Provencia sara.viz example.

>>> from examples.provencia import ProvenciaEnvSchema
>>> import textwrap
>>> print(textwrap.fill(ProvenciaEnvSchema.Config.metadata["description"], width=80))
Créé en 1963 à Annecy, le groupe Provencia est aujourd hui un acteur majeur de
la grande distribution en Rhône Alpes. Cet environnement modélise la commande de
produits pour le rayon boucherie.

Data¶

Online connection with Provencia DataBase

Q-table visualization¶

>>> from examples.provencia import etl_pipeline, env_pipeline, get_store_ids, get_product_codes
>>> from sara.oar import enrich_rtg, discount_from_horizon, bin_with_quantiles
>>> from sara.viz import plot_insight
>>> from pathlib import Path
>>> filename = Path("docs/source/provencia/provencia.png")
>>> store_ids = get_store_ids().index.to_list()
>>> product_codes = get_product_codes(store_ids)
>>> sausage_codes = product_codes.index[
...     product_codes['description'].str.contains('CHIPOS MERGUEZ')].to_list()
>>> df = etl_pipeline(store_ids, sausage_codes)
>>> df = env_pipeline(df)
>>> df = enrich_rtg(df, discount_from_horizon(30))
>>> num_quantiles = {
...    ('act0', 'ordered'): 10,
...    ('obs0', 'stock_after_delivery'): 10}
>>> df = bin_with_quantiles(df, num_quantiles)
>>> _,_,_=plot_insight(
...     df,
...     col_labels=list(num_quantiles),
...     filename=filename)

Interpretation¶

High returns are found for high ordering and high score, indicating that it is important to keep a high level of stock. Also, very high stock levels leading to throwing away are not experienced. Together with the fact that 0 ordering is more often selected, this indicates that Provencia should move toward a more important purchasing policy.

API reference¶

class examples.provencia.ProvenciaETLSchema(*args, **kwargs)[source]¶: Describe Schema for Provencia ETL dataframe.

class examples.provencia.ProvenciaEnvSchema(*args, **kwargs)[source]¶: Pandera OAR Schema for provencia supply chain environment.

examples.provencia.env_pipeline(df_etl: DataFrame, lag_profit: int = 0) → DataFrame[OARSchema][source]¶

Transform Provencia ETL dataframe into Env dataframe with environment definition.

Parameters:

df_etl – dataframe from etl_pipeline()
lag_profit – number of days between ‘total_paid’ (at ‘date’) and ‘gross’ (at ‘date’+lag_profit) in profit calculcation for the reward.

Returns:

OAR dataframe formatted with environment

Return type:

DataFrame[OARSchema]

Exemples:

>>> from examples.provencia import etl_pipeline
>>> df = etl_pipeline(
...          ['JE', 'KV', 'EV'],
...          [2870622000000, 2870557000000, 2870549000000])
>>> env_pipeline(df)
signal                               obs0                                                           act0    rew1
key                               sold_yd gross_yd stock_after_delivery delivered purchase_price ordered  profit
store_id product_code  date
EV       2870549000000 2023-12-06       0        0                    5         0            647       0     0.0
                       2023-12-07       0        0                    5         0            647       0     0.0
                       2023-12-08       0        0                    5         0            647       0  1834.0
                       2023-12-09       4     1834                    1         0            647       0     0.0
                       2023-12-10       0        0                    1         0            647       0     0.0
...                                   ...      ...                  ...       ...            ...     ...     ...
         2870622000000 2024-07-28       0        0                    0         0            560       0     0.0
                       2024-07-29       0        0                    0         0            560       0     0.0
                       2024-07-30       0        0                    0         0            560       0     0.0
                       2024-07-31       0        0                    0         0            560       0     0.0
                       2024-08-01       0        0                    0         0            560       1 -1680.0

[515 rows x 7 columns]

examples.provencia.etl_pipeline(store_ids: list[str], product_codes: list[int]) → DataFrame[ProvenciaETLSchema][source]¶

Provencia SQL connector.

Parameters:

store_ids – the list of store ids
product_codes – the list of product codes

Returns:

the Provencia ETL dataframe,: see ProvenciaETLSchema for a description of its structure.

Return type:

DataFrame[ProvenciaETLSchema]

Examples

>>> etl_pipeline(
...     ['JE', 'KV', 'EV'],
...     [2870622000000, 2870557000000, 2870549000000])
                                   delivered  purchase_price  ordered  total_paid  sold  gross  stock_evening
store_id product_code  date
EV       2870549000000 2023-12-05          4             647        0         0.0     0      0              5
                       2023-12-06          0             647        0         0.0     0      0              5
                       2023-12-07          0             647        0         0.0     0      0              5
                       2023-12-08          0             647        0         0.0     4   1834              1
                       2023-12-09          0             647        0         0.0     0      0              1
...                                      ...             ...      ...         ...   ...    ...            ...
         2870622000000 2024-07-28          0             560        0         0.0     0      0              0
                       2024-07-29          0             560        0         0.0     0      0              0
                       2024-07-30          0             560        0         0.0     0      0              0
                       2024-07-31          0             560        0         0.0     0      0              0
                       2024-08-01          0             560        1      1680.0     0      0              0

[517 rows x 7 columns]

Note

product_code=2870626000000 (foie de porc) is problematic in order table

examples.provencia.get_product_codes(store_ids: list[str]) → DataFrame[source]¶

Get all product codes and description in specified shops.

Parameters:: store_ids – a list of store ids to get product codes from
Returns:: the product_codes as index, the description as column
Return type:: pd.DataFrame

Example

>>> get_product_codes(['JE', 'KV', 'EV'])
                       description
product_code
2870622000000      *PO-SAUTE X 3KG
2870557000000            PORC FOIE
2870549000000   KG SAUTE PORC FRAN
2477358000000  *EL-ROTI VEAU FARCI
2276357000000  *SA-SAUCI MENAGE X3
...                            ...
2477424000000  *VO-CUISSE DE POULE
2443970000000     SAUCISSON TRUFFE
2280328000000    EL-CREPINETTE X 3
2224892000000   GIGOT *** ENTIER S
2610547000000     BLOC DE DINDE RA

[519 rows x 1 columns]

examples.provencia.get_store_ids() → DataFrame[source]¶

Get all the store ids and name as DataFrame.

Specifically, the store_ids in public.order table.

Returns:: the store ids as index, the store name as column
Return type:: pd.DataFrame

Examples

>>> get_store_ids()
                              name
store_id
JE                          Seynod
KV            Saint-Jeoire-Prieuré
EV                        Faverges
BS                 Annecy-le-Vieux
MO                    Villeurbanne
...                            ...
LW                Thonon-les-Bains
KM           Saint-Jean-de-Moirans
FW                             Gex
GF                   Grésy-sur-Aix
EM                        Douvaine