Main entry point

In this file, the main pipeline is defined. From here, the functions to clean the data are called, and the final parquet file is saved.

main.customerPrediction(df: DataFrame)

Runs the full customer prediction pipeline.

Parameters:

df (pd.DataFrame) – The cleaned DataFrame.

main.itemPrediction(df: DataFrame)

Runs the full item prediction pipeline.

Parameters:

df (pd.DataFrame) – The cleaned DataFrame.

main.main()

Runs the full pipeline including the predictions.

main.preprocessData(rechnung_path: str, kunden_path: str, nomi) DataFrame

Runs the cleaning pipeline to convert the initial data into a cleaned parquet file.

Parameters:
  • rechnung_path (str) – The path to the Rechnungen_new.parquet file.

  • kunden_path (str) – The path to the Kunden.csv file.

Returns:

The cleaned DataFrame.

Return type:

pd.DataFrame