tokenizer_emoticons: 表情符号的分词器

不同的文本分词函数。

from mlxtend.text import tokenizer_emoticons
from mlxtend.text import tokenizer_words_and_emoticons

概述

用于自然语言处理任务的不同文本分词函数，例如为文本分类构建词袋模型。

参考文献

示例 1 - 提取表情符号

from mlxtend.text import tokenizer_emoticons

tokenizer_emoticons('</a>This :) is :( a test :-)!')

[':)', ':(', ':-)']

示例 2 - 提取词语和表情符号

from mlxtend.text import tokenizer_words_and_emoticons

tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')

['this', 'is', 'a', 'test', ':)', ':(', ':-)']

API

tokenizer_emoticons(文本)

从文本返回表情符号

示例

    >>> tokenizer_emoticons('</a>This :) is :( a test :-)!')
    [':)', ':(', ':-)']

    For usage examples, please see
    https://mlxtend.cn/mlxtend/user_guide/text/tokenizer_emoticons/

tokenizer_words_and_emoticons(文本)

将文本转换为小写词语和表情符号。

示例

    >>> tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')
    ['this', 'is', 'a', 'test', ':)', ':(', ':-)']

    For more usage examples, please see
    https://mlxtend.cn/mlxtend/user_guide/text/tokenizer_words_and_emoticons/

按键	操作
`?`	打开此帮助
`n`	下一页
`p`	上一页
`s`	搜索