标签:HF多模态
microsoft/xclip-base-patch16-zero-shot
X-CLIP (base-sized model) X-CLIP model (base-sized, patch resolution of 16) trained on Kinetics-400. It was introduced in the paper Expanding Lang...
prithivida/bert-for-patents-64d
Motivation This model is based on anferico/bert-for-patents – a BERTLARGE model (See next section for details below). By default, the pre-trained ...
google/vit-base-patch16-224-in21k
Vision Transformer (base-sized model) Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolutio...
facebook/dino-vitb16
模型描述 Vision Transformer(ViT、是一个transformer编码器模型(类似于BERT、,以自监督方式预训练于一个大型图像集合(即ImageNet-1k、,分辨率为224×22...
microsoft/wavlm-base
WavLM-Base Microsoft’s WavLMThe base model pretrained on 16kHz sampled speech audio. When using the model, make sure that your speech input is als...
laion/clap-htsat-unfused
Model card for CLAP Model card for CLAP: Contrastive Language-Audio Pretraining Table of Contents TL;DR Model Details Usage Uses ...
asapp/sew-d-tiny-100k
SEW-D-tiny SEW-D by ASAPP ResearchThe base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input i...
nielsr/lilt-xlm-roberta-base
LiLT + XLM-RoBERTa-base This model is created by combining the Language-Independent Layout Transformer (LiLT) with XLM-RoBERTa, a multilingual RoB...
intfloat/e5-small
E5-small Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Ji...
monsoon-nlp/hindi-bert
Releasing Hindi ELECTRA model This is a first attempt at a Hindi language model trained with Google Research’s ELECTRA. As of 2022 I recommend Go...