lift_score: 用于分类和关联规则挖掘的 Lift 分数

计算 LIFT 指标的评分函数，即正确预测的正例与测试数据集中实际正例的比率。

from mlxtend.evaluate import lift_score

概述

在分类的背景下，lift [1] 将模型预测与随机生成的预测进行比较。Lift 通常与 增益和 Lift 图一起在市场研究中用作视觉辅助 [2]。例如，假设客户响应基线为 10%，则 Lift 值为 3 对应于使用预测模型时的 30% 客户响应。注意，lift 的范围是 $\lbrack 0, \infty \rbrack$ .

有多种策略可以计算 lift，下面我们将使用经典的混淆矩阵来演示 lift 分数的计算。例如，假设有以下预测和目标标签，其中 "1" 是正类

$\text{true labels}: [0, 0, 1, 0, 0, 1, 1, 1, 1, 1]$
$\text{prediction}: [1, 0, 1, 0, 0, 0, 0, 1, 0, 0]$

那么，我们的混淆矩阵如下所示

基于上面的混淆矩阵，将 "1" 作为正标签，我们按如下方式计算 lift

$\text{lift} = \frac{(TP/(TP+FP)}{(TP+FN)/(TP+TN+FP+FN)}$

将上面示例中的实际值代入，我们得到以下 Lift 值

$\frac{2/(2+1)}{(2+4)/(2+3+1+4)} = 1.1111111111111112$

计算 Lift 的另一种方法是使用 支持度 (support) 指标 [3]

$\text{lift} = \frac{\text{support}(\text{true labels} \cap \text{prediction})}{\text{support}(\text{true labels}) \times \text{support}(\text{prediction})},$

支持度是 $x / N$ ，其中 $x$ 是观察事件的次数， $N$ 是数据集中的总样本数。 $\text{true labels} \cap \text{prediction}$ 是真阳性（True Positives）， $true labels$ 是真阳性加假阴性（True Positives + False Negatives），以及 $prediction$ 是真阳性加假阳性（True Positives + False Positives）。将示例中的值代入上面的公式，我们得到

$\frac{2/10}{(6/10 \times 3/10)} = 1.1111111111111112$

参考文献

[1] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. 载于 Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (ACM SIGMOD '97), pages 265-276, 1997.
[2] https://www3.nd.edu/~busiforc/Lift_chart.html
[3] https://en.wikipedia.org/wiki/Association_rule_learning#Support

示例 1 - 计算 Lift

本示例演示了使用概述部分中的示例来基本使用 lift_score 函数。

import numpy as np
from mlxtend.evaluate import lift_score

y_target =    np.array([0, 0, 1, 0, 0, 1, 1, 1, 1, 1])
y_predicted = np.array([1, 0, 1, 0, 0, 0, 0, 1, 0, 0])

lift_score(y_target, y_predicted)

1.1111111111111112

示例 2 - 在 `GridSearch` 中使用 `lift_score`

lift_score 函数也可以与 scikit-learn 对象一起使用，例如 GridSearch

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import make_scorer

# make custom scorer
lift_scorer = make_scorer(lift_score)


iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=123)

hyperparameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                   {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

clf = GridSearchCV(SVC(), hyperparameters, cv=10,
                   scoring=lift_scorer)
clf.fit(X_train, y_train)

print(clf.best_score_)
print(clf.best_params_)

3.0
{'gamma': 0.001, 'kernel': 'rbf', 'C': 1000}

API

lift_score(y_target, y_predicted, binary=True, positive_label=1)

Lift 衡量了分类模型的预测优于随机生成预测的程度。

用真阳性 (TP)、真阴性 (TN)、假阳性 (FP) 和假阴性 (FN) 表示，lift 分数计算如下：[ TP/(TP+FN) ] / [ (TP+FP) / (TP+TN+FP+FN) ]

参数

y_target : 类似数组，形状=[n_samples]

真实类别标签。
y_predicted : 类似数组，形状=[n_samples]

预测类别标签。
binary : 布尔值 (默认值: True)

将多类问题映射为二元问题，其中正类为 1，所有其他类为 0。
positive_label : 整数 (默认值: 0)

正类的类别标签。

返回值

score : 浮点数

Lift 分数，范围 [0, $\infty$ ]

示例

有关用法示例，请参阅 https://mlxtend.cn/mlxtend/user_guide/evaluate/lift_score/

键	操作
`?`	打开此帮助
`n`	下一页
`p`	上一页
`s`	搜索

lift_score: 用于分类和关联规则挖掘的 Lift 分数

概述

参考文献

示例 1 - 计算 Lift

示例 2 - 在 GridSearch 中使用 lift_score

API

示例 2 - 在 `GridSearch` 中使用 `lift_score`