Self-supervised Learning Model Zoo

Pretrained models

MAE

Pretrained on ImageNet dataset.

Config Backbone Params
(backbone/total)
Train memory
(GB)
Flops inference time(V100)
(ms/img)
Epochs Download
mae_vit_base_patch16_8xb64_400e ViT-B/16 85M/111M 9.5 9.8G 8.03 400 model
mae_vit_base_patch16_8xb64_1600e ViT-B/16 85M/111M 9.5 9.8G 8.03 1600 model
mae_vit_large_patch16_8xb32_1600e ViT-L/16 303M/329M 11.3 20.8G 16.30 1600 model

Fast ConvMAE

Pretrained on ImageNet dataset.

Config Backbone Params
(backbone/total)
Train memory
(GB)
Flops inference time(V100)
(ms/img)
Total train time Epochs Download
fast_convmae_vit_base_patch16_8xb64_50e ConvViT-B/16 88M/115M 30.3 45.1G 6.88 20h
(8*A100)
50 model - log

The flops of Fast ConvMAE is about four times of MAE, because the mask of MAE only retains 25% of the tokens each forward, but the mask of Fast ConvMAE adopts a complementary strategy, dividing the mask into four complementary parts with 25% token each part. This is equivalent to learning four samples at each forward, achieving 4 times the learning effect.

DINO

Pretrained on ImageNet dataset.

Config Backbone Params
(backbone/total)
Train memory
(GB)
inference time(V100)
(ms/img)
Epochs Download
dino_deit_small_p16_8xb32_100e DeiT-S/16 21M/88M 10.5 6.17 100 model

MoBY

Pretrained on ImageNet dataset.

Config Backbone Params
(backbone/total)
Flops Train memory
(GB)
inference time(V100)
(ms/img)
Epochs Download
moby_deit_small_p16_4xb128_300e DeiT-S/16 21M/26M 18.6G 21.4 6.17 300 model - log
moby_swin_tiny_8xb64_300e Swin-T 27M/33M 18.1G 16.1 9.74 300 model - log

MoCo V2

Pretrained on ImageNet dataset.

Config Backbone Params
(backbone/total)
Flops Train memory
(GB)
inference time(V100)
(ms/img)
Epochs Download
mocov2_resnet50_8xb32_200e ResNet50 23M/28M 8.2G 5.4 8.59 200 model

SwAV

Pretrained on ImageNet dataset.

Config Backbone Params
(backbone/total)
Flops Train memory
(GB)
inference time(V100)
(ms/img)
Epochs Download
swav_resnet50_8xb32_200e ResNet50 23M/28M 12.9G 11.3 8.59 200 model - log

Benchmarks

For detailed usage of benchmark tools, please refer to benchmark README.md.

COCO2017 Object Detection

Algorithm Eval Config Pretrained Config mAP (Box) mAP (Mask) Download
Fast ConvMAE mask_rcnn_conv_vitdet_50e_coco fast_convmae_vit_base_patch16_8xb64_50e 51.3 45.6 eval model
SwAV mask_rcnn_r50_fpn_1x_coco swav_resnet50_8xb32_200e 40.38 36.48 eval model - log
MoCo-v2 mask_rcnn_r50_fpn_1x_coco mocov2_resnet50_8xb32_200e 39.9 35.8 eval model - log
MoBY mask_rcnn_swin_tiny_1x_coco moby_swin_tiny_8xb64_300e 43.11 39.37 eval model - log

VOC2012 Aug Semantic Segmentation

Algorithm Eval Config Pretrained Config mIOU Download
SwAV fcn_r50-d8_512x512_60e_voc12aug swav_resnet50_8xb32_200e 63.91 eval model - log
MoCo-v2 fcn_r50-d8_512x512_60e_voc12aug mocov2_resnet50_8xb32_200e 68.49 eval model - log