HF多模态-第 3 页-微草录AIGC导航

microsoft/xclip-base-patch16-zero-shot

X-CLIP (base-sized model) X-CLIP model (base-sized, patch resolution of 16) trained on Kinetics-400. It was introduced in the paper Expanding Lang...

1年前 (2024)

Motivation This model is based on anferico/bert-for-patents – a BERTLARGE model (See next section for details below). By default, the pre-trained ...

1年前 (2024)

Vision Transformer (base-sized model) Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolutio...

1年前 (2024)

模型描述 Vision Transformer（ViT、是一个transformer编码器模型（类似于BERT、，以自监督方式预训练于一个大型图像集合（即ImageNet-1k、，分辨率为224×22...

1年前 (2024)

WavLM-Base Microsoft’s WavLMThe base model pretrained on 16kHz sampled speech audio. When using the model, make sure that your speech input is als...

1年前 (2024)

Model card for CLAP Model card for CLAP: Contrastive Language-Audio Pretraining Table of Contents TL;DR Model Details Usage Uses ...

1年前 (2024)

SEW-D-tiny SEW-D by ASAPP ResearchThe base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input i...

1年前 (2024)

LiLT + XLM-RoBERTa-base This model is created by combining the Language-Independent Layout Transformer (LiLT) with XLM-RoBERTa, a multilingual RoB...

1年前 (2024)

E5-small Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Ji...

1年前 (2024)

Releasing Hindi ELECTRA model This is a first attempt at a Hindi language model trained with Google Research’s ELECTRA. As of 2022 I recommend Go...

1年前 (2024)