Item clustering

This part of the pipeline is responsible for clustering the items. The clustering is done using the KMeans algorithm from the scikit-learn library. The number of clusters is determined by the user.

itemClustering.clusterRFC(itemDataset: DataFrame) → DataFrame

Prepares the dataset for clustering and performs K-Means clustering on the item level.

Parameters:: item_dataset (pd.DataFrame) – A DataFrame prepared for clustering.
Returns:: A DataFrame with the RFC metrics and an additional column indicating the cluster assignment.
Return type:: pd.DataFrame

itemClustering.getDistributionCentres(row)

Assigns a Distribution Center based on the State values.

Parameters:: row (pd.Series) – A row of data containing the ‘State’ field.
Returns:: The name of the Distribution Center corresponding to the state.
Return type:: str

itemClustering.getItemDataset(df: DataFrame) → DataFrame

Prepares the dataset for item-level clustering by filtering and calculating necessary metrics.

Parameters:: df (pd.DataFrame) – The original DataFrame.
Returns:: A DataFrame filtered and prepared for clustering, with Distribution Centers assigned.
Return type:: pd.DataFrame