Item clustering
This part of the pipeline is responsible for clustering the items. The clustering is done using the KMeans algorithm from the scikit-learn library. The number of clusters is determined by the user.
- itemClustering.clusterRFC(itemDataset: DataFrame) DataFrame
Prepares the dataset for clustering and performs K-Means clustering on the item level.
- Parameters:
item_dataset (pd.DataFrame) – A DataFrame prepared for clustering.
- Returns:
A DataFrame with the RFC metrics and an additional column indicating the cluster assignment.
- Return type:
pd.DataFrame
- itemClustering.getDistributionCentres(row)
Assigns a Distribution Center based on the State values.
- Parameters:
row (pd.Series) – A row of data containing the ‘State’ field.
- Returns:
The name of the Distribution Center corresponding to the state.
- Return type:
str
- itemClustering.getItemDataset(df: DataFrame) DataFrame
Prepares the dataset for item-level clustering by filtering and calculating necessary metrics.
- Parameters:
df (pd.DataFrame) – The original DataFrame.
- Returns:
A DataFrame filtered and prepared for clustering, with Distribution Centers assigned.
- Return type:
pd.DataFrame