Implementing Support Vector Classifier in Python

Standard

plot_iris_001Support Vector Classifier (SVC) is a form of Support Vector Machines (SVM) capable of categorizing inputs under supervised training.  This blog post discusses how to implement SVC using Python using the scikit learn module. We will walk through a simple example below.

To start off, import the necessary support modules, which includes numpy,and SVC from sklearn.svm, and joblib from sklearn.externals.

#!/usr/bin/python
import numpy as np
from sklearn.svm import SVC
from sklearn.externals import joblib

Next we create two arrays, X and y. X is a list of feature vectors, each feature vector a list in itself. In the example below, each feature vector has two elements, and there are 8 sample feature vectors. y is the label vector, each element in the vector is a ‘label’; that is, the label you want the SVC to associate the corresponding input with. In the below example, the feature vector, [-2, -2] will be associated with label ‘d’ (fifth element of y).

# X is a list of feature vector.
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1], [-2, -2], [8, 4], [5, 3], [4, 8]])
# y is a list of labels. A label could be integer number or a string
y = np.array(['a', 'a', 'c', 'b', 'd', 'c', 'd', 'e'])

Now, we can create an SVC and train it with X and y:

# clf is the SVM Classifier
clf = SVC()
# Train the SVC
clf.fit(X, y)

To test the network’s ability to generalize, we pick a feature vector that is very close to one of the feature vectors used for training. We do this because we know what the right answer ought to be. For instance, since we know [-2, -2] is associated with label ‘d’, we would expect that [-1.8, -2.1] to also respond with label ‘d’.

# Print the prediction - note [-1.8, -2.1] is very close to [-2, -2],
# which maps to 'd' positionally (5th position), 'd' should be the output
print(clf.predict([-1.8, -2.1]))

Finally, after the SVC is trained, we want the learning to persist by saving the SVC to a file. This is done using joblib:

# save file as binary
joblib.dump(clf, "mysvm_save.pkl", compress=9)

To test the save file, load it into another variable, clf2, and test it:

# reload saved file and run the svm
clf2 = joblib.load("mysvm_save.pkl")
print(clf2.predict([-1.8, -2.1]))
# 'd' should be the output

Now run the code:

$ python sci2.py 
['d']
['d']

Sure enough, the results are as expected.

Leave a Reply