cardiffnlp/twitter-roberta-base-sentiment

微草AIGC录1年前 (2024)发布 873b2a563b3acc92

Twitter-roBERTa-base for Sentiment Analysis

This is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English (for a similar multilingual model, see XLM-T).

Reference Paper: TweetEval (Findings of EMNLP 2020).

Git Repo: Tweeteval official repository.

Labels:
0 -> Negative;
1 -> Neutral;
2 -> Positive
New! We just released a new sentiment analysis model trained on more recent and a larger quantity of tweets.
See twitter-roberta-base-sentiment-latest and TweetNLP for more details.

Example of classification

from transformers import AutoModelForSequenceClassification from transformers import TFAutoModelForSequenceClassification from transformers import AutoTokenizer import numpy as np from scipy.special import softmax import csv import urllib.request # Preprocess text (username and link placeholders) def preprocess(text): new_text = [] for t in text.split(" "): t = '@user' if t.startswith('@') and len(t) > 1 else t t = 'http' if t.startswith('http') else t new_text.append(t) return " ".join(new_text) # Tasks: # emoji, emotion, hate, irony, offensive, sentiment # stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary task='sentiment' MODEL = f"cardiffnlp/twitter-roberta-base-{task}" tokenizer = AutoTokenizer.from_pretrained(MODEL) # download label mapping labels=[] mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt" with urllib.request.urlopen(mapping_link) as f: html = f.read().decode('utf-8').split("\n") csvreader = csv.reader(html, delimiter='\t') labels = [row[1] for row in csvreader if len(row) > 1] # PT model = AutoModelForSequenceClassification.from_pretrained(MODEL) model.save_pretrained(MODEL) text = "Good night " text = preprocess(text) encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) scores = output[0][0].detach().numpy() scores = softmax(scores) # # TF # model = TFAutoModelForSequenceClassification.from_pretrained(MODEL) # model.save_pretrained(MODEL) # text = "Good night " # encoded_input = tokenizer(text, return_tensors='tf') # output = model(encoded_input) # scores = output[0][0].numpy() # scores = softmax(scores) ranking = np.argsort(scores) ranking = ranking[::-1] for i in range(scores.shape[0]): l = labels[ranking[i]] s = scores[ranking[i]] print(f"{i+1}) {l} {np.round(float(s), 4)}")

Output:
1) positive 0.8466 2) neutral 0.1458 3) negative 0.0076

BibTeX entry and citation info

Please cite the reference paper if you use this model.
@inproceedings{barbieri-etal-2020-tweeteval, title = "{T}weet{E}val: Unified Benchmark and Comparative Evaluation for Tweet Classification", author = "Barbieri, Francesco and Camacho-Collados, Jose and Espinosa Anke, Luis and Neves, Leonardo", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2020.findings-emnlp.148", doi = "10.18653/v1/2020.findings-emnlp.148", pages = "1644--1650" }

收录说明：
1、本网页并非 cardiffnlp/twitter-roberta-base-sentiment 官网网址页面，此页面内容编录于互联网，只作展示之用；
2、如果有与 cardiffnlp/twitter-roberta-base-sentiment 相关业务事宜，请访问其网站并获取联系方式；
3、本站与 cardiffnlp/twitter-roberta-base-sentiment 无任何关系，对于 cardiffnlp/twitter-roberta-base-sentiment 网站中的信息，请用户谨慎辨识其真伪。
4、本站收录 cardiffnlp/twitter-roberta-base-sentiment 时，此站内容访问正常，如遇跳转非法网站，有可能此网站被非法入侵或者已更换新网址，导致旧网址被非法使用,
5、如果你是网站站长或者负责人，不想被收录请邮件删除：i-hu#Foxmail.com （#换@）

前往AI网址导航

# 微草AIGC录 # HF自然语言处理 # Text Classification

文章版权归作者所有，未经允许请勿转载。

cardiffnlp/twitter-roberta-base-sentiment

Twitter-roBERTa-base for Sentiment Analysis

Example of classification

BibTeX entry and citation info

ACM Awards

rct AI官网

相关文章