randomForest + cforest method

#### Forked from Skobnikoff in Titanic: Machine Learning from Disaster #read train/test data train<-read.csv(“../input/train.csv”,na.strings=c(‘NA’,”),stringsAsFactors=F) test<-read.csv(“../input/test.csv”,na.strings=c(‘NA’,”),stringsAsFactors=F) #train<-read.csv(“train.csv”,na.strings=c(‘NA’,”),stringsAsFactors=F) #test<-read.csv(“test.csv”,na.strings=c(‘NA’,”),stringsAsFactors=F) #loading libraries library(randomForest) library(party) library(rpart) # library(rattle) #checking the missing data check.missing<-function(x) return(paste0(round(sum(is.na(x))/length(x),4)*100,’%’)) data.frame(sapply(train,check.missing)) data.frame(sapply(test,check.missing)) #combine train/test data for pre-processing train$Cat<-‘train’ test$Cat<-‘test’ test$Survived<-NA full Google -> “S”… full$Embarked[is.na(full$Embarked)]<-‘S’ #Extract Title from Name full$Title<-sapply(full$Name,function(x) strsplit(x,'[.,]’)[[1]][2]) full$Title<-gsub(‘ ‘,”,full$Title) aggregate(Age~Title,full,median) full$Title[full$Title %in% c(‘Capt’, ‘Don’, ‘Major’, ‘Sir’)] <- ‘Sir’ full$Title[full$Title %in% c(‘Dona’, ‘Lady’, ‘the Countess’, ‘Jonkheer’)] <- ‘Lady’ #check the result aggregate(Age~Title,full,summary, digits=2) # Title Age.Min. Age.1st Qu. Age.Median Age.Mean Age.3rd Qu. Age.Max. #1 Col 47 52 54 54 57 60 #2 Dr 23 38 49 44 52 54 #3 Lady 38 38 39 42 44 48 #4 Master 0.33 2 4 5.50 9 14 #5 Miss 0.17 15 22 22 30 63 #6 Mlle 24 24…


Link to Full Article: randomForest + cforest method

Pin It on Pinterest

Share This