mnist_data: MNIST 数据集的分类子集

一个将 MNIST 数据集加载到 NumPy 数组中的函数。

from mlxtend.data import mnist_data

概述

MNIST 数据集由美国国家标准与技术研究院 (NIST) 的两个数据集构建而成。训练集包含来自 250 个不同人的手写数字,其中 50% 是高中生,50% 是人口普查局的员工。注意,测试集包含来自不同人的手写数字,分布与训练集相同。

特征

每个特征向量(特征矩阵中的行)由 784 个像素(强度)组成——从原始 28x28 像素图像展开而来。

  • 样本数:5000 张图像的子集(每个类别的最前 500 位数字)

  • 目标变量 (离散): {每个类别 500 个样本}

参考文献

示例 1 - 数据集概述

from mlxtend.data import mnist_data
X, y = mnist_data()

print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('1st row', X[0])
Dimensions: 5000 x 784
1st row [   0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.   51.  159.  253.  159.   50.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   48.  238.
  252.  252.  252.  237.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.   54.  227.  253.  252.  239.  233.  252.   57.    6.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.   10.   60.  224.  252.  253.  252.  202.   84.  252.
  253.  122.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.  163.  252.  252.  252.  253.
  252.  252.   96.  189.  253.  167.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   51.  238.
  253.  253.  190.  114.  253.  228.   47.   79.  255.  168.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.   48.  238.  252.  252.  179.   12.   75.  121.   21.    0.    0.
  253.  243.   50.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.   38.  165.  253.  233.  208.   84.    0.    0.
    0.    0.    0.    0.  253.  252.  165.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    7.  178.  252.  240.   71.
   19.   28.    0.    0.    0.    0.    0.    0.  253.  252.  195.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   57.
  252.  252.   63.    0.    0.    0.    0.    0.    0.    0.    0.    0.
  253.  252.  195.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.  198.  253.  190.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.  255.  253.  196.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.   76.  246.  252.  112.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.  253.  252.  148.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   85.  252.
  230.   25.    0.    0.    0.    0.    0.    0.    0.    0.    7.  135.
  253.  186.   12.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.   85.  252.  223.    0.    0.    0.    0.    0.    0.    0.
    0.    7.  131.  252.  225.   71.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.   85.  252.  145.    0.    0.    0.
    0.    0.    0.    0.   48.  165.  252.  173.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   86.  253.
  225.    0.    0.    0.    0.    0.    0.  114.  238.  253.  162.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.   85.  252.  249.  146.   48.   29.   85.  178.  225.  253.
  223.  167.   56.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.   85.  252.  252.  252.  229.  215.
  252.  252.  252.  196.  130.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.   28.  199.
  252.  252.  253.  252.  252.  233.  145.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.   25.  128.  252.  253.  252.  141.   37.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
    0.    0.    0.    0.]
import numpy as np
print('Classes: Setosa, Versicolor, Virginica')
print(np.unique(y))
print('Class distribution: %s' % np.bincount(y))
Classes: Setosa, Versicolor, Virginica
[0 1 2 3 4 5 6 7 8 9]
Class distribution: [500 500 500 500 500 500 500 500 500 500]

示例 2 - 可视化 MNIST

%matplotlib inline
import matplotlib.pyplot as plt
def plot_digit(X, y, idx):
    img = X[idx].reshape(28,28)
    plt.imshow(img, cmap='Greys',  interpolation='nearest')
    plt.title('true label: %d' % y[idx])
    plt.show()
plot_digit(X, y, 4)       

png

API

mnist_data()

来自 MNIST 手写数字数据集的 5000 个样本。

  • 数据来源 : https://yann.lecun.com/exdb/mnist/

返回值

  • X, y : [样本数, 特征数], [类别标签数]

    X 是特征矩阵,包含 5000 个图像样本作为行,每行由从原始 28x28 像素图像展开而来的 784 个像素特征向量组成。y 包含 10 个唯一的类别标签 0-9。

示例

有关使用示例,请参阅 https://mlxtend.cn/mlxtend/user_guide/data/mnist_data/