示例代码:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# 输入句子
text = "The quick brown fox jumped over the lazy dog."
# 分词
tokens = tokenizer.tokenize(text)
# 由于该句子中并没有丢失单词,因此分词结果应该与原句一致
assert " ".join(tokens) == text
# 另一种分词器示例(仅供参考)
from nltk.tokenize import word_tokenize
text = "The quick brown fox jumped over the lazy dog."
tokens = word_tokenize(text)
assert " ".join(tokens) == text
上一篇:BERT+BiGRU+Softmax网络在命名实体识别中无法减少过拟合问题。
下一篇:BERT-CNNlayer:conv1d()receivedaninvalidcombinationofarguments