cardiffnlp/twitter-roberta-base-sentiment
Twitter-roBERTa-base for Sentiment Analysis
This is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English (for a similar multilingual model, see XLM-T).
- Reference Paper: TweetEval (Findings of EMNLP 2020).
- Git Repo: Tweeteval official repository.
Labels:
0 -> Negative;
1 -> Neutral;
2 -> Positive
New! We just released a new sentiment analysis model trained on more recent and a larger quantity of tweets.
See twitter-roberta-base-sentiment-latest and TweetNLP for more details.
Example of classification
from transformers import AutoModelForSequenceClassification
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import softmax
import csv
import urllib.request
# Preprocess text (username and link placeholders)
def preprocess(text):
new_text = []
for t in text.split(" "):
t = '@user' if t.startswith('@') and len(t) > 1 else t
t = 'http' if t.startswith('http') else t
new_text.append(t)
return " ".join(new_text)
# Tasks:
# emoji, emotion, hate, irony, offensive, sentiment
# stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary
task='sentiment'
MODEL = f"cardiffnlp/twitter-roberta-base-{task}"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
# download label mapping
labels=[]
mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt"
with urllib.request.urlopen(mapping_link) as f:
html = f.read().decode('utf-8').split("\n")
csvreader = csv.reader(html, delimiter='\t')
labels = [row[1] for row in csvreader if len(row) > 1]
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
model.save_pretrained(MODEL)
text = "Good night "
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
# # TF
# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
# model.save_pretrained(MODEL)
# text = "Good night "
# encoded_input = tokenizer(text, return_tensors='tf')
# output = model(encoded_input)
# scores = output[0][0].numpy()
# scores = softmax(scores)
ranking = np.argsort(scores)
ranking = ranking[::-1]
for i in range(scores.shape[0]):
l = labels[ranking[i]]
s = scores[ranking[i]]
print(f"{i+1}) {l} {np.round(float(s), 4)}")
Output:
1) positive 0.8466
2) neutral 0.1458
3) negative 0.0076
BibTeX entry and citation info
Please cite the reference paper if you use this model.
@inproceedings{barbieri-etal-2020-tweeteval,
title = "{T}weet{E}val: Unified Benchmark and Comparative Evaluation for Tweet Classification",
author = "Barbieri, Francesco and
Camacho-Collados, Jose and
Espinosa Anke, Luis and
Neves, Leonardo",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.findings-emnlp.148",
doi = "10.18653/v1/2020.findings-emnlp.148",
pages = "1644--1650"
}
收录说明:
1、本网页并非 cardiffnlp/twitter-roberta-base-sentiment 官网网址页面,此页面内容编录于互联网,只作展示之用;
2、如果有与 cardiffnlp/twitter-roberta-base-sentiment 相关业务事宜,请访问其网站并获取联系方式;
3、本站与 cardiffnlp/twitter-roberta-base-sentiment 无任何关系,对于 cardiffnlp/twitter-roberta-base-sentiment 网站中的信息,请用户谨慎辨识其真伪。
4、本站收录 cardiffnlp/twitter-roberta-base-sentiment 时,此站内容访问正常,如遇跳转非法网站,有可能此网站被非法入侵或者已更换新网址,导致旧网址被非法使用,
5、如果你是网站站长或者负责人,不想被收录请邮件删除:i-hu#Foxmail.com (#换@)
前往AI网址导航