Customer Decision Tree
R Programming – Customer decision tree
I want to be able to build a customer decision tree where I have a y variable / dependent variable and my x variables are all categorical / qualitative variables. I want to be able to come to see which of the categorical variables are more important to my dependent variable in their order of significance.
These are all the details. I want to understand which of my n variables are important and I want that to be displayed as a decision tree.
Solution
Noor.R
library(readxl)
Sample_Superstore_Sales_Excel_ <- read_excel(“~/Desktop/Noor/Sample – Superstore Sales (Excel).xls”)
View(Sample_Superstore_Sales_Excel_)
x <-Sample_Superstore_Sales_Excel_ #this is the step to load your data .You can load your own data
x <- x[,-6] #we are leaving out customer ethinicity as there are more than 53 predictors in a categorical variable ,which is not supported in R.
#Further , since there are so many ethinicities , it hardly gives any useful inferrence
names(x) <- make.names(names(x))
x$Order.Priority<-as.factor(x$Order.Priority)
x$Ship.Mode<-as.factor(x$Ship.Mode)
x$Customer.Income<-as.factor(x$Customer.Income)
x$Province<-as.factor(x$Province)
x$Region<-as.factor(x$Region)
x$Customer.Segment<-as.factor(x$Customer.Segment)
x$Product.Category<-as.factor(x$Product.Category)
x$Product.Sub.Category<-as.factor(x$Product.Sub.Category)
x$Product.Container<-as.factor(x$Product.Container) #these commands transform categorical variables into factors
library(randomForest)
fit<-randomForest(x$Sales~.,data=x,ntree=1000)#here is your required random forest
library(caret)
importance(fit)
#the features having higher IncNodePurity values are more important
# Product SUb category & Product container are the most important predictors
#other important predictors are Loyalty to the product, Discount , Order priority , Ship.mode, customer income etc.