Classification: k-NN, Naive Bayes, Support Vector Machines

In this example we will demonstrate k-NN, Naive Bayes algorithm and Support Vector Machines. Some of the libraries we are going to be used are below.

Load the library

library(class)
library(e1071)
library(caret)
library(gmodels)

The Iris dataset

We will use a simpler dataset to illustrate how the algorithms work. You can load iris dataset from the web, and you can find more information at this webpage: http://archive.ics.uci.edu/ml/datasets/Iris

Since there is no header in the file, we name our variables appropriately.

iris = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", sep = ",", header = FALSE)
names(iris) = c("sepal.length", "sepal.width", "petal.length", "petal.width","iris.type")

Make sure that you can run some basic statistics (from the data analysis/visualization) to inspect your dataset.

Split to training and testing (validation) sets

First, we split our datasets to the training and testing (or validation) set

train = rbind(iris[1:25, -5], iris[50:75, -5], iris[100:125, -5])
test = rbind(iris[26:50, -5], iris[76:100, -5], iris[126:150, -5])
train_cl = factor(c(iris[1:25, 5], iris[50:75, 5], iris[100:125, 5]))
test_cl = factor(c(iris[26:50, 5], iris[76:100, 5], iris[126:150, 5]))

train_svm = rbind(iris[1:25,], iris[50:75,], iris[100:125,])
test_svm = rbind(iris[26:50,], iris[76:100,], iris[126:150,])

The k-NN algorithm

Remember that k-NN is a lazy algorithm, so there is not an actual model, but each time decides using all training data available. That’s why it can get really greedy in terms of time but that’s also the reason that we do not need to have a model.

prc_test_pred = knn(train, test, train_cl, k = 3, prob = TRUE)

Then we can test the algorithm performance using a confusion matrix as before.

ckNN=confusionMatrix(data=prc_test_pred,reference=test_cl)

Alternatively, you can use CrossTable that provides the same information

CrossTable(x=test_cl,y=prc_test_pred,prop.chisq=FALSE)

Can you try experimenting with different k - values? Try using different k and then make a plot of the accuracy. Does it increase or decrease?

Naive Bayes

The Naive Bayes algorithm is more demanding in terms of time required to run because of the strong probabilistic framework. If you run the model afterwards you get to see the probabilities of the model.

m = naiveBayes(train, train_cl)
m

Now let’s put together the results of the training and testing accuracy in a confusion matrix.

trainpreds = table(train_cl, predict(m, train))
testpreds = table(test_cl, predict(m, test))

Can you now compute precision, recall and F1-measures for the three classes? Check them against the ones computed by confusionMatrix and/or CrossTable.

SVM

Support vector machine is the last class of algorithm we are going to experiment with. We will start by the simplest kernel which is the linear and choose only two variables to demonstrate the usability of SVMs

model = svm(iris.type ~ sepal.length + sepal.width, data = train_svm, kernel = "linear")
model

In the results of the algorithm, you can see the cost value, the gamma value and the number of support vectors. We want the latter number to be as low as possible. Why?

Also, try to explore your “model” object and see it’s other parameters, for example what does “model$fitted” represent. Use ?svm to understand the rest.

There is a plot method for SVMs that makes it possible to see the decision boundaries w.r.t two of the variables.

plot(model, train_svm, sepal.width ~ sepal.length, slice = list(sepal.width = 1,
        sepal.length = 2))

Next, we try the performance of the algorithm both in the training and the test dataset. You should be able to do this by yourselfs now.

matrix1=confusionMatrix(model$fitted, data=train_svm[,5])

test_labels_p=predict(model, newdata=test_svm[,1:4])
matrix2=confusionMatrix(data=test_labels_p,reference=test_svm[,5])

Now, let’s try the linear model with all data. That means use the complete formula.

model2 = svm(iris.type~., data = train_svm, kernel = "linear")
model2

Can you predict the accuracy here? Is is better or worse than Naive Bayes? What about k-NN?

Non-linear kernels

Now let’s try playing with a non-linear kernel. For a simple problem like this, it might not be useful, but for more complex problems it is necessary. Notice, that other kernels (than the linear), require extra parameters. See ?SVM for a full list of those.

model3 = svm(iris.type ~ sepal.length + sepal.width, data = train_svm, kernel="radial", gamma = 0.1)
model3

We will again plot the support vectors.

plot(model3, train_svm, sepal.width ~ sepal.length, slice = list(sepal.width = 1,
        sepal.length = 2))

Try changing the parameter gamma of the RBF kernel (try small values like 0.01 or 0.1 and large values like 10 or 100). What do you observe? What is the role of gamma parameter?

Then try to run the full model as well. What is the performance now?

Finally, let’s experiment with the polynomial kernel.

model4 = svm(iris.type ~ sepal.length + sepal.width, data = train_svm, kernel = "polynomial", degree = 3, gamma = 1)
model4

plot(model4, train_svm, sepal.width ~ sepal.length, slice = list(sepal.width = 1, sepal.length = 2))