How to split Train and Test data in R

Today we’ll be seeing how to split data into Training data sets and Test data sets in R. While creating machine learning model we’ve to train our model on some part of the available data and test the accuracy of model on the part of the data.

There are two ways to split the data and both are very easy to follow:

1. Using Sample() function

#read the data
data<- read.csv("data.csv")
#create a list of random number ranging from 1 to number of rows from actual data 
and 70% of the data into training data  
data1 = sort(sample(nrow(data), nrow(data)*.7))
#creating training data set by selecting the output row values
train<-data[data1,]
#creating test data set by not selecting the output row values
test<-data[-data1,]

 

Here sample() function work as : sample(value, size, replace)

> sample(10,7)
[1] 8 4 9 2 7 10 5

Then we’ll select only those rows using the output of sample function.

 

2. Using caTools Package:

#loading package
library(caTools)
#read the data
data<- read.csv("data.csv")
#use caTools function to split, SplitRatio for 70%:30% splitting
data1= sample.split(data,SplitRatio = 0.3)
#subsetting into Train data
train =subset(data,data1==TRUE)
#subsetting into Test data
test =subset(data,data1==FALSE)

 

This was about splitting into Training and Test data set. Easy to follow.

Keep visiting Analytics Tuts for more tutorials.

Thanks for reading! Comment your suggestions and queries.

 

 

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *