Main entry point
In this file, the main pipeline is defined. From here, the functions to clean the data are called, and the final parquet file is saved.
- main.customerPrediction(df: DataFrame)
Runs the full customer prediction pipeline.
- Parameters:
df (pd.DataFrame) – The cleaned DataFrame.
- main.itemPrediction(df: DataFrame)
Runs the full item prediction pipeline.
- Parameters:
df (pd.DataFrame) – The cleaned DataFrame.
- main.main()
Runs the full pipeline including the predictions.
- main.preprocessData(rechnung_path: str, kunden_path: str, nomi) DataFrame
Runs the cleaning pipeline to convert the initial data into a cleaned parquet file.
- Parameters:
rechnung_path (str) – The path to the Rechnungen_new.parquet file.
kunden_path (str) – The path to the Kunden.csv file.
- Returns:
The cleaned DataFrame.
- Return type:
pd.DataFrame