Adaline: 自适应线性神经元分类器

用于二元分类任务的自适应线性神经元（Adaline）实现。

from mlxtend.classifier import Adaline

概述

自适应线性神经元（Adaline）的图示 -- 这是一个带阈值单元的单层人工线性神经元

Adaline 分类器与普通最小二乘 (OLS) 线性回归算法密切相关；在 OLS 回归中，我们寻找最小化垂直偏移的直线（或超平面）。换句话说，我们将最佳拟合线定义为最小化所有样本上目标变量 (y) 和预测输出之间的平方误差之和 (SSE) 或均方误差 (MSE) 的直线 $i$ 在我们大小为 $n$ .

$SSE = \sum_i (\text{target}^{(i)} - \text{output}^{(i)})^2$

$MSE = \frac{1}{n} \times SSE$

LinearRegression 实现了用于执行普通最小二乘回归的线性回归模型，而在 Adaline 中，我们添加了一个阈值函数 $g(\cdot)$ 将连续输出转换为分类类标签

$$y = g({z}) = $\begin{cases} 1 & \text{if z $\ge$ 0}\\ -1 & \text{otherwise}. \end{cases}$ $$

Adaline 模型可以通过以下三种方法之一进行训练

正规方程
梯度下降
随机梯度下降

正规方程（闭式解）

对于“较小”的数据集，计算（“昂贵”）矩阵逆不是问题时，应首选闭式解。对于超大型数据集，或者逆矩阵 $[X^T X]$ 可能不存在（矩阵不可逆或奇异，例如在完全多重共线性情况下），则应首选梯度下降或随机梯度下降方法。

线性函数（线性回归模型）定义为

$z = w_0x_0 + w_1x_1 + ... + w_mx_m = \sum_{j=0}^{m} w_j x_j = \mathbf{w}^T\mathbf{x}$

其中 $y$ 是响应变量， $\mathbf{x}$ 是 $m$ 维样本向量，并且 $\mathbf{w}$ 是权重向量（系数向量）。请注意， $w_0$ 表示模型的 y 轴截距，因此 $x_0=1$ .

使用闭式解（正规方程），我们计算模型的权重如下

$\mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^Ty$

梯度下降 (GD) 和随机梯度下降 (SGD)

在当前实现中，Adaline 模型通过梯度下降或随机梯度下降进行学习。

详见梯度下降和随机梯度下降以及线性回归和 Adaline 的梯度下降规则推导。

随机打乱实现如下

对于一个或多个 epoch
- 随机打乱训练集中的样本
  - 对于训练样本 i
    - 计算梯度并执行权重更新

参考文献

B. Widrow, M. E. Hoff, et al. Adaptive switching circuits. 1960。

示例 1 - 闭式解

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt

# Loading Data

X, y = iris_data()
X = X[:, [0, 3]] # sepal length and petal width
X = X[0:100] # class 0 and class 1
y = y[0:100] # class 0 and class 1

# standardize
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


ada = Adaline(epochs=30, 
              eta=0.01, 
              minibatches=None, 
              random_seed=1)
ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Closed Form')

plt.show()

png

示例 2 - 梯度下降

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt

# Loading Data

X, y = iris_data()
X = X[:, [0, 3]] # sepal length and petal width
X = X[0:100] # class 0 and class 1
y = y[0:100] # class 0 and class 1

# standardize
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


ada = Adaline(epochs=30, 
              eta=0.01, 
              minibatches=1, # for Gradient Descent Learning
              random_seed=1,
              print_progress=3)

ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Gradient Descent')
plt.show()

plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')

Iteration: 30/30 | Cost 3.79 | Elapsed: 0:00:00 | ETA: 0:00:00

png

Text(0, 0.5, 'Cost')

png

示例 3 - 随机梯度下降

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt

# Loading Data

X, y = iris_data()
X = X[:, [0, 3]] # sepal length and petal width
X = X[0:100] # class 0 and class 1
y = y[0:100] # class 0 and class 1

# standardize
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


ada = Adaline(epochs=15, 
              eta=0.02, 
              minibatches=len(y), # for SGD learning 
              random_seed=1,
              print_progress=3)

ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Stochastic Gradient Descent')
plt.show()

plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()

Iteration: 15/15 | Cost 3.81 | Elapsed: 0:00:00 | ETA: 0:00:00

png

示例 4 - 随机梯度下降（使用 Minibatches）

from mlxtend.data import iris_data
from mlxtend.plotting import plot_decision_regions
from mlxtend.classifier import Adaline
import matplotlib.pyplot as plt

# Loading Data

X, y = iris_data()
X = X[:, [0, 3]] # sepal length and petal width
X = X[0:100] # class 0 and class 1
y = y[0:100] # class 0 and class 1

# standardize
X[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()


ada = Adaline(epochs=15, 
              eta=0.02, 
              minibatches=5, # for SGD learning w. minibatch size 20
              random_seed=1,
              print_progress=3)

ada.fit(X, y)
plot_decision_regions(X, y, clf=ada)
plt.title('Adaline - Stochastic Gradient Descent w. Minibatches')
plt.show()

plt.plot(range(len(ada.cost_)), ada.cost_)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()

Iteration: 15/15 | Cost 3.87 | Elapsed: 0:00:00 | ETA: 0:00:00

png

API

Adaline(eta=0.01, epochs=50, minibatches=None, random_seed=None, print_progress=0)

自适应线性神经元分类器。

请注意，此 Adaline 实现需要二元类标签 {0, 1}。

参数

eta : float (默认: 0.01)

求解器学习率 (介于 0.0 和 1.0 之间)
epochs : int (默认: 50)

遍历训练数据集。在每个 epoch 之前，如果 minibatches > 1，数据集会被打乱以防止随机梯度下降中的循环。
minibatches : int (默认: None)

基于梯度的优化中的 minibatches 数量。如果 None: 正规方程 (闭式解) 如果 1: 梯度下降学习如果 len(y): 随机梯度下降 (SGD) 在线学习如果 1 < minibatches < len(y): SGD Minibatch 学习
random_seed : int (默认: None)

设置用于打乱和初始化权重的随机状态。
print_progress : int (默认: 0)

如果 solver 不等于 'normal equation'，则将拟合进度打印到 stderr。 0: 无输出 1: 已过 epoch 数和成本 2: 1 加已用时间 3: 2 加估计完成时间

属性

w_ : 2d-array, shape={n_features, 1}

拟合后的模型权重。
b_ : 1d-array, shape={1,}

拟合后的偏置单元。
cost_ : list

每个 epoch 后的平方误差之和。

示例

有关使用示例，请参阅 https://mlxtend.cn/mlxtend/user_guide/classifier/Adaline/

方法

fit(X, y, init_params=True)

从训练数据中学习模型。

参数

X : {array-like, 稀疏矩阵}, shape = [n_samples, n_features]

训练向量，其中 n_samples 是样本数量，n_features 是特征数量。
y : array-like, shape = [n_samples]

目标值。
init_params : bool (默认: True)

在拟合之前重新初始化模型参数。设置为 False 以继续使用之前模型拟合的权重进行训练。

返回值

self : object

get_params(deep=True)

获取此估计器的参数。

参数

deep : boolean, 可选

如果为 True，将返回此估计器及其包含的作为估计器的子对象的参数。

返回值

params : 字符串到任意类型的映射

参数名称映射到其值。

改编自 https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py 作者: Gael Varoquaux gael.varoquaux@normalesup.org 许可证: BSD 3 clause

predict(X)

根据 X 预测目标值。

参数

X : {array-like, 稀疏矩阵}, shape = [n_samples, n_features]

训练向量，其中 n_samples 是样本数量，n_features 是特征数量。

返回值

target_values : array-like, shape = [n_samples]

预测的目标值。

score(X, y)

计算预测准确率

参数

X : {array-like, 稀疏矩阵}, shape = [n_samples, n_features]

训练向量，其中 n_samples 是样本数量，n_features 是特征数量。
y : array-like, shape = [n_samples]

目标值（真实类标签）。

返回值

acc : float

预测准确率，一个介于 0.0 和 1.0 之间的浮点数（完美得分为 1.0）。

set_params(params)

设置此估计器的参数。此方法适用于简单估计器以及嵌套对象（例如 pipelines）。后者的参数形式为 <component>__<parameter>，以便可以更新嵌套对象的每个组件。

返回值

self

改编自 https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py 作者: Gael Varoquaux gael.varoquaux@normalesup.org 许可证: BSD 3 clause

按键	动作
`?`	打开此帮助
`n`	下一页
`p`	上一页
`s`	搜索