Customer predictions

This part of the pipeline is responsible for predicting the future purchases of the customers based on their purchase history. The predictions are done using the RandomForestRegressor algorithm from the scikit-learn library. The model is trained on the historical data and then used to predict the future purchases.

customerPredictions.predictRevenuePerCluster(clustered_customers_df: DataFrame, allOrders: DataFrame, predictionType: str = 'LSTM', clusterID: int = 1, modeltype: int = 1, predweeks: int = 52) DataFrame

Takes the complete dataset and the assigned cluster for each customer and creates predicts the future net revenue values on a weekly basis per cluster for the next n weeks.

Parameters:
  • clustered_customers_df (pd.DataFrame) – The output dataframe from the clusterRFM()-function including CustomerID and assigned cluster.

  • allOrders (pd.DataFrame) – The preprocessed DataFrame. Necessary Columns are “OrderDate”, “CustomerID”, “NetRevenue”..

  • predictionType (str) – Choice of prediction model approach. Either “LSTM” or “ARIMA”.

  • clusterID (int) – Choice of cluster to predict for, that is the number of the respective cluster in the clustered_customers_df.

  • modeltype (int) – Choice of specific NN architecture design for the LSTM prediction. Either 1 for a sequential model including 1 LSTM layer and the Huber-loss-function, 2 for a sequential model including 2 LSTM layers, 2 Dropout layers, and the MSE-loss-function.

  • predweeks (int) – Sets the number of weeks in the future to predict the revenue for. Default is 52 weeks, i.e., 1 year.

Returns:

A DataFrame containing the predicted future revenue values. For the LSTM model this includes the original data, the training, the testing, and the predicted data. For the ARIMA model this includes the original data, the fitting data, and the predicted data.

Return type:

pd.DataFrame