Customer Decision Tree

Customer Decision Tree

R Programming – Customer decision tree

I want to be able to build a customer decision tree where I have a y variable / dependent variable and my x variables are all categorical / qualitative variables.  I want to be able to come to see which of the categorical variables are more important to my dependent variable in their order of significance.

These are all the details. I want to understand which of my n variables are important and I want that to be displayed as a decision tree.

Solution 

Noor.R 

library(readxl)

Sample_Superstore_Sales_Excel_ <- read_excel(“~/Desktop/Noor/Sample – Superstore Sales (Excel).xls”)

View(Sample_Superstore_Sales_Excel_)

x <-Sample_Superstore_Sales_Excel_ #this is the step to load your data .You can load your own data

x <- x[,-6] #we are leaving out customer ethinicity as there are more than 53 predictors in a categorical variable ,which is not supported in R.

#Further , since there are so many ethinicities , it hardly gives any useful inferrence

names(x) <- make.names(names(x))

x$Order.Priority<-as.factor(x$Order.Priority)

x$Ship.Mode<-as.factor(x$Ship.Mode)

x$Customer.Income<-as.factor(x$Customer.Income)

x$Province<-as.factor(x$Province)

x$Region<-as.factor(x$Region)

x$Customer.Segment<-as.factor(x$Customer.Segment)

x$Product.Category<-as.factor(x$Product.Category)

x$Product.Sub.Category<-as.factor(x$Product.Sub.Category)

x$Product.Container<-as.factor(x$Product.Container) #these commands transform categorical variables into factors

library(randomForest)

fit<-randomForest(x$Sales~.,data=x,ntree=1000)#here is your required random forest

library(caret)

importance(fit)

#the features having higher IncNodePurity values are more important

# Product SUb category & Product container are the most important predictors

#other important predictors are Loyalty to the product, Discount , Order priority , Ship.mode, customer income etc.