支持向量机(SVM)

假设

假设给定一个特征空间上的训练数据集:

$$T=\left { \left ( x_{1},y_{1} \right ),\left ( x_{2},y_{2} \right )… \left (x_{N},y_{N} \right) \right }, x_{i}\in R^{n}$$

其中$x_{i}\in R^{n}, y_{i}\in { +1,-1 },i=1,2,…N$

  1. $x_{i}$为第i个实例,N>1, $x_{i}$为向量

  2. $y_{i}$为$x_{i}$的类标记,即:当$y_{i}=+1$时, 称为$x_{i}$正例;当$y_{i}=-1$时, 称为$x_{i}$负例。

  3. $\left(x_{i},y_{i} \right)$ 称为样本点

线性可分支持向量机

给定线性可分训练数据集,通过间隔最大化得到的分离超平面为

$$\omega^{T}\cdot\Phi \left ( x \right )+b=0$$

相应的分类决策函数$f\left ( x \right ) = sign\left ( \omega^{T}\cdot\Phi \left ( x \right )+b \right )$, 称为线性可分支持向量机。$\Phi\left(x\right)$是某个确定的特征空间转换函数,将$x$映射到更高维度。

推导目标函数

设$y\left(x\right)=\omega^{T}\cdot\Phi \left ( x \right )+b$, 有:$y_{i}\cdot y\left(x\right) > 0$,对参数进行等比例缩放,从而:

$$\frac{y_{i}\cdot y\left(x\right)}{\left | \omega \right |}=\frac{y_{i}\cdot\left ( \omega^{T}\cdot\Phi \left ( x_{i} \right )+b \right ) }{\left | \omega \right |}$$

目标函数:

$$max\left { \frac{1}{\left | \omega \right |} min\left [ y_{i}\cdot\left ( \omega^{T}\cdot\Phi \left ( x_{i} \right )+b \right ) \right ]\right }$$

约束条件:

$$y_{i}\cdot\left ( \omega^{T}\cdot\Phi \left ( x_{i} \right )+b \right)\geqslant 1$$

通过等比例缩放的方法,将两类函数值的绝对值满足大于等于1,则新目标函数:

$$max \frac{1}{\left | \omega \right |}$$

为计算需要,转换得:

$$min \frac{1}{2}\left | \omega \right |^{2}$$

拉格朗日乘子法求解

$$L\left ( \omega ,b,\alpha \right ) = \frac{1}{2}\left | \omega \right |^{2}-\sum_{i=1}^{n} \alpha {i}\left ( y{i}\cdot\left ( \omega^{T}\cdot\Phi \left ( x_{i} \right )+b \right)-1 \right )$$

原问题的极小极大问题,其对偶问题是极大极小问题:

$$min\: max L\left ( \omega ,b,\alpha \right ) \Rightarrow max\:minL\left ( \omega ,b,\alpha \right )$$

将拉格朗日函数L分别对w,b求偏导:

$$\frac{\partial L}{\partial \omega }=0 \Rightarrow \omega =\sum_{i=1}^{n}\alpha {i}y{i}\Phi \left ( x_{i} \right )$$

$$\frac{\partial L}{\partial b}=0 \Rightarrow 0 =\sum_{i=1}^{n}\alpha {i}y{i}$$

代入拉格朗日函数,推导得:

$$\alpha ^{*} = max\left { \sum_{i=1}^{n}\alpha_{i} - \frac{1}{2}\sum_{i,j=1}^{n}\alpha_{i}\alpha_{j}y_{i}y_{j}\Phi^{T}\left ( x_{i} \right ) \Phi \left ( x_{j} \right ) \right }$$

即:

$$min\left { \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_{i}\alpha_{j}y_{i}y_{j}\left(\Phi\left ( x_{i} \right ) \Phi \left ( x_{j} \right )\right) -\sum_{i=1}^{n}\alpha_{i} \right }$$

$$\sum_{i=1}^{n}\alpha_{i}y_{i}=0, \alpha_{i}\geqslant 1, i=1,2,…,n$$

线性可分支持向量机学习算法

  1. 计算:

$$\omega^{}=\sum_{i=1}^{n}\alpha_{i}^{}y_{i}\Phi\left( x_{i} \right)$$

$$b^{}=y_{i} - \sum_{i=1}^{n}\alpha_{i}^{}y_{i}\left(\Phi\left ( x_{i} \right ) \cdot \Phi \left ( x_{j} \right )\right) $$

  1. 求的超平面

$$\omega^{}\cdot\Phi \left ( x \right )+b^{}=0$$

  1. 分类决策函数

$$f\left ( x \right ) = sign\left ( \omega^{}\cdot\Phi \left ( x \right )+b^{} \right )$$

线性支持向量机

若数据线性不可分,则增加松弛因子$\xi _{i}\geqslant 0$,使函数间隔加上松弛度量大于等于1,即约束条件变为:

$$y_{i}\left(\omega \cdot x_{i}+ b \right)\geqslant 1-\xi_{i}$$

线性svm的目标函数:

$$min \frac{1}{2} |\omega|^{2} + C\sum_{i=1}^{N}\xi_{i}, \:\:\xi_{i} \geqslant 0, i=1,2,…,n$$

带松弛因子的SVM拉格朗日函数:

$$L\left(\omega,b,\xi,\alpha,\mu\right)\equiv \frac{1}{2}|\omega|^2 + C\sum_{i=1}^{n}\xi_{i} - \sum_{i=1}^{n} \alpha {i}\left ( y{i}\left ( \omega\cdot\Phi \left ( x_{i} \right )+b \right)-1 +\xi_{i}\right ) - \sum_{i=1}^{n}\mu_{i}\xi_{i}$$

对$\omega,b,\xi$求偏导:

$$\frac{\partial L}{\partial \omega }=0 \Rightarrow \omega =\sum_{i=1}^{n}\alpha {i}y{i}\Phi \left ( x_{i} \right )$$

$$\frac{\partial L}{\partial b}=0 \Rightarrow 0 =\sum_{i=1}^{n}\alpha {i}y{i}$$

$$\frac{\partial L}{\partial \xi_{i}}=0 \Rightarrow C - \alpha_{i} - \mu_{i} = 0$$

代入目标函数,得:


$$min\: L\left(\omega,b,\xi,\alpha,\mu\right)=-\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_{i}\alpha_{j}y_{i}y_{j}\Phi^{T}\left ( x_{i} \right ) \Phi \left ( x_{j} \right ) + \sum_{i=1}^{n} \alpha_{i}$$

对上式求关于$\alpha$得极大,得到:

$$max\: -\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_{i}\alpha_{j}y_{i}y_{j}\Phi^{T}\left ( x_{i} \right ) \Phi \left ( x_{j} \right ) + \sum_{i=1}^{n} \alpha_{i}$$

$$\sum_{i=1}^{n}\alpha_{i}y_{i}=0$$

$$C - \alpha_{i} - \mu_{i} = 0$$

$$0\leqslant \alpha_{i}\leqslant C$$

$$\mu_{i} \geqslant 0, \:\: i=1,2,…,n$$

对偶问题:

$$min\: \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\alpha_{i}\alpha_{j}y_{i}y_{j}\Phi^{T}\left ( x_{i} \right ) \Phi \left ( x_{j} \right ) - \sum_{i=1}^{n} \alpha_{i}$$

线性支持向量机学习算法

  1. 计算:

$$\omega^{}=\sum_{i=1}^{n}\alpha_{i}^{}y_{i}\Phi\left( x_{i} \right)$$

$$b^{}=y_{i} - \sum_{i=1}^{n}\alpha_{i}^{}y_{i}\left(\Phi\left ( x_{i} \right ) \cdot \Phi \left ( x_{j} \right )\right) $$

计算时,需要满足条件$0<\alpha_{j}<C$的向量,实践中往往取支持向量的所有值取平均,作为$b^{*}$

  1. 求的超平面

$$\omega^{}\cdot\Phi \left ( x \right )+b^{}=0$$

  1. 分类决策函数

$$f\left ( x \right ) = sign\left ( \omega^{}\cdot\Phi \left ( x \right )+b^{} \right )$$

非线性支持向量机

核函数

可以使用和函数,将原始输入空间映射到新的特征空间,从而使得原本线性不可分的样本可能在核空间可分。

  1. 多项式和函数: $\kappa\left(x_{1},x_{2}\right)=\left(x_{1} \cdot x_{2} + c \right)^{d}$

  2. 高斯核RBF函数:$\kappa\left(x_{1},x_{2}\right)=exp\left( - \gamma \cdot |x_{1} - x_{2}|^{2} \right)$

  3. Sigmoid核函数:$\kappa\left(x_{1},x_{2}\right)=tanh\left(x_{1} \cdot x_{2} + c \right)$

在实际应用中,往往依赖先验领域知识或交叉验证等方案才能选择有效的核函数,如果没有更多先验信息,则使用高斯核函数。

实践1-鸢尾花分类

1.python实现鸢尾花分类代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# -*- coding: utf-8 -*
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import svm
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

iris = datasets.load_iris()
x = iris.data
y = iris.target

x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, test_size=0.6)

clf = svm.SVC(C=0.1, kernel='linear', decision_function_shape='ovr')
clf.fit(x_train, y_train.ravel())

print clf.score(x_train, y_train) # 精度
print '训练集准确率:', accuracy_score(y_train, clf.predict(x_train))
print clf.score(x_test, y_test)
print '测试集准确率:', accuracy_score(y_test, clf.predict(x_test))

2.鸢尾花分类结果

1
2
训练集准确率: 0.966666666667
测试集准确率: 0.944444444444

实践2-手写体

1.python实现手写体代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# -*- coding:utf-8 -*-
import numpy as np
from sklearn import svm
import matplotlib.colors
import matplotlib.pyplot as plt
from PIL import Image
from sklearn.metrics import accuracy_score
import os
from sklearn.model_selection import GridSearchCV
from time import time


def show_accuracy(a, b, tip):
acc = a.ravel() == b.ravel()
print tip + '正确率:%.2f%%' % (100*np.mean(acc))


def save_image(im, i):
im *= 15.9375
im = 255 - im
a = im.astype(np.uint8)
output_path = '.\\HandWritten'
if not os.path.exists(output_path):
os.mkdir(output_path)
Image.fromarray(a).save(output_path + ('\\%d.png' % i))


if __name__ == "__main__":
print 'Load Training File Start...'
data = np.loadtxt('./data/optdigits.tra', dtype=np.float, delimiter=',')
x, y = np.split(data, (-1, ), axis=1)
images = x.reshape(-1, 8, 8)
y = y.ravel().astype(np.int)

print 'Load Test Data Start...'
data = np.loadtxt('./data/optdigits.tes', dtype=np.float, delimiter=',')
x_test, y_test = np.split(data, (-1, ), axis=1)
print y_test.shape
images_test = x_test.reshape(-1, 8, 8)
y_test = y_test.ravel().astype(np.int)
print 'Load Data OK...'

# x, x_test, y, y_test = train_test_split(x, y, test_size=0.4, random_state=1)
# images = x.reshape(-1, 8, 8)
# images_test = x_test.reshape(-1, 8, 8)

matplotlib.rcParams['font.sans-serif'] = [u'SimHei']
matplotlib.rcParams['axes.unicode_minus'] = False
plt.figure(figsize=(15, 9), facecolor='w')
for index, image in enumerate(images[:16]):
plt.subplot(4, 8, index + 1)
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title(u'训练图片: %i' % y[index])
for index, image in enumerate(images_test[:16]):
plt.subplot(4, 8, index + 17)
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
save_image(image.copy(), index)
plt.title(u'测试图片: %i' % y_test[index])
plt.tight_layout()
plt.show()

# params = {'C':np.logspace(0, 3, 7), 'gamma':np.logspace(-5, 0, 11)}
# model = GridSearchCV(svm.SVC(kernel='rbf'), param_grid=params, cv=3)
model = svm.SVC(C=10, kernel='rbf', gamma=0.001)
print 'Start Learning...'
t0 = time()
model.fit(x, y)
t1 = time()
t = t1 - t0
print '训练+CV耗时:%d分钟%.3f秒' % (int(t/60), t - 60*int(t/60))
# print '最优参数:\t', model.best_params_
#clf.fit(x, y)
print 'Learning is OK...'
print '训练集准确率:', accuracy_score(y, model.predict(x))
y_hat = model.predict(x_test)
print '测试集准确率:', accuracy_score(y_test, model.predict(x_test))
print y_hat
print y_test

err_images = images_test[y_test != y_hat]
err_y_hat = y_hat[y_test != y_hat]
err_y = y_test[y_test != y_hat]
print err_y_hat
print err_y
plt.figure(figsize=(10, 8), facecolor='w')
for index, image in enumerate(err_images):
if index >= 12:
break
plt.subplot(3, 4, index + 1)
plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title(u'错分为:%i,真实值:%i' % (err_y_hat[index], err_y[index]))
plt.tight_layout()
plt.show()

2.实验结果

1
2
3
4
5
6
7
8
9
Load Training File Start...
Load Test Data Start...
(1797L, 1L)
Load Data OK...
Start Learning...
训练+CV耗时:0分钟0.632
Learning is OK...
训练集准确率: 1.0
测试集准确率: 0.982749026155
感谢您的阅读,本文由 Gavinhome Blog 版权所有。如若转载,请注明出处:Gavinhome Blog(http://gavinhome.github.io/2018/01/27/支持向量机/
Logistic回归
决策树与随机森林