Item clustering

This part of the pipeline is responsible for clustering the items. The clustering is done using the KMeans algorithm from the scikit-learn library. The number of clusters is determined by the user.

itemClustering.clusterRFC(itemDataset: DataFrame) DataFrame

Prepares the dataset for clustering and performs K-Means clustering on the item level.

Parameters:

item_dataset (pd.DataFrame) – A DataFrame prepared for clustering.

Returns:

A DataFrame with the RFC metrics and an additional column indicating the cluster assignment.

Return type:

pd.DataFrame

itemClustering.getDistributionCentres(row)

Assigns a Distribution Center based on the State values.

Parameters:

row (pd.Series) – A row of data containing the ‘State’ field.

Returns:

The name of the Distribution Center corresponding to the state.

Return type:

str

itemClustering.getItemDataset(df: DataFrame) DataFrame

Prepares the dataset for item-level clustering by filtering and calculating necessary metrics.

Parameters:

df (pd.DataFrame) – The original DataFrame.

Returns:

A DataFrame filtered and prepared for clustering, with Distribution Centers assigned.

Return type:

pd.DataFrame