DenseTransformer：将稀疏数组转换为密集 NumPy 数组，例如，在 scikit-learn pipeline 中

一个简单的转换器，将稀疏数组转换为密集 numpy 数组，例如，scikit-learn 的 Pipeline 所需，例如，当 CountVectorizers 与不兼容稀疏矩阵的估计器组合使用时。

from mlxtend.preprocessing import DenseTransformer

示例 1

from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from mlxtend.preprocessing import DenseTransformer
import re
import numpy as np

X_train = np.array(['abc def ghi', 'this is a test',
                    'this is a test', 'this is a test'])
y_train = np.array([0, 0, 1, 1])

pipe_1 = Pipeline([
    ('vect', CountVectorizer()),
    ('to_dense', DenseTransformer()),
    ('clf', RandomForestClassifier())
])

parameters_1 = dict(
    clf__n_estimators=[50, 100, 200],
    clf__max_features=['sqrt', 'log2', None],)

grid_search_1 = GridSearchCV(pipe_1, 
                             parameters_1, 
                             n_jobs=1, 
                             verbose=1,
                             scoring='accuracy',
                             cv=2)


print("Performing grid search...")
print("pipeline:", [name for name, _ in pipe_1.steps])
print("parameters:")
grid_search_1.fit(X_train, y_train)
print("Best score: %0.3f" % grid_search_1.best_score_)
print("Best parameters set:")
best_parameters_1 = grid_search_1.best_estimator_.get_params()
for param_name in sorted(parameters_1.keys()):
    print("\t%s: %r" % (param_name, best_parameters_1[param_name]))

Performing grid search...
pipeline: ['vect', 'to_dense', 'clf']
parameters:
Fitting 2 folds for each of 9 candidates, totalling 18 fits
Best score: 0.500
Best parameters set:
    clf__max_features: 'sqrt'
    clf__n_estimators: 50


[Parallel(n_jobs=1)]: Done  18 out of  18 | elapsed:    3.9s finished

API

DenseTransformer(return_copy=True)

将稀疏数组转换为密集数组。

有关使用示例，请参阅 https://mlxtend.cn/mlxtend/user_guide/preprocessing/DenseTransformer/

方法

fit(X, y=None)

模拟方法。不执行任何操作。

参数

X : {类数组, 稀疏矩阵}, shape = [n_samples, n_features]

训练向量，其中 n_samples 是样本数，n_features 是特征数。
y : 类数组, shape = [n_samples] (默认值: None)

self

fit_transform(X, y=None)

返回输入数组的密集版本。

参数

X : {类数组, 稀疏矩阵}, shape = [n_samples, n_features]

训练向量，其中 n_samples 是样本数，n_features 是特征数。
y : 类数组, shape = [n_samples] (默认值: None)

X_dense：输入 X 数组的密集版本。

get_params(deep=True)

获取此估计器的参数。

参数

deep : 布尔值，可选

如果为 True，将返回此估计器及其包含的作为估计器的子对象的参数。

params：字符串到任意类型的映射

参数名称与其值的映射。

set_params(params)

设置此估计器的参数。

该方法适用于简单估计器以及嵌套对象（如 pipeline）。后者的参数形式为 <component>__<parameter>，以便更新嵌套对象的每个组件。

self

transform(X, y=None)

返回输入数组的密集版本。

参数

X : {类数组, 稀疏矩阵}, shape = [n_samples, n_features]

训练向量，其中 n_samples 是样本数，n_features 是特征数。
y : 类数组, shape = [n_samples] (默认值: None)

X_dense：输入 X 数组的密集版本。

按键	操作
`?`	打开此帮助
`n`	下一页
`p`	上一页
`s`	搜索