Detection Model Zoo

Inference default use V100 16G.

YOLOX-PAI

Pretrained on COCO2017 dataset. (The result has been optimized with PAI-Blade, and only computes the model inference time. To learn about end2end inference time, you can refer to export.md.)

Algorithm Config Params SpeedV100
fp16 b32
mAPval
0.5:0.95
APval
50
Download
YOLOX-s yolox_s_8xb16_300e_coco 9M 0.68ms 40.0 58.9 model - log
PAI-YOLOXs yoloxs_pai_8xb16_300e_coco 16M 0.71ms 41.4 60.0 model - log
PAI-YOLOXs-ASFF yoloxs_pai_asff_8xb16_300e_coco 21M 0.87ms 42.8 61.8 model - log
PAI-YOLOXs-ASFF-TOOD3 yoloxs_pai_asff_tood3_8xb16_300e_coco 24M 1.15ms 43.9 62.1 model - log
YOLOX-m yolox_m_8xb16_300e_coco 25M 1.52ms 46.3 64.9 model - log
YOLOX-l yolox_l_8xb8_300e_coco 54M 2.47ms 48.9 67.5 model - log
YOLOX-x yolox_x_8xb8_300e_coco 99M 4.74ms 50.9 69.2 model - log
YOLOX-tiny yolox_tiny_8xb16_300e_coco 5M 0.28ms 31.5 49.2 model - log
YOLOX-nano yolox_nano_8xb16_300e_coco 2.2M 0.19ms 26.5 42.6 model - log

ViTDet

| Algorithm | Config | Params
(backbone/total) | Train memory
(GB) | inference time(V100)
(ms/img) | bbox_mAPval
0.5:0.95 | mask_mAPval
0.5:0.95 | Download | | ———- | ———————————————————— | ———————— | ———————————————————— | ———————————————————— | ———————————————————— | ———————————————————— | ———————————————————— | | ViTDet_MaskRCNN | vitdet_maskrcnn | 86M/111M | 13.3 (fp16) | 138ms | 50.65 | 45.41 | model - log |

FCOS

Algorithm Config Params
(backbone/total)
Train memory
(GB)
inference time(V100)
(ms/img)
mAPval
0.5:0.95
APval
50
Download
FCOS-r50(caffe) fcos-r50 23M/32M 5.0 85.8ms 38.58 57.18 model - log
FCOS-r50(torch) fcos-r50 23M/32M 4.0 (fp16) 105.3ms 38.88 58.01 model - log

DETR

Algorithm Config Params
(backbone/total)
Train memory
(GB)
inference time(V100)
(ms/img)
bbox_mAPval
0.5:0.95
APval
50
Download
DETR-r50 detr-r50 23M/41M 8.5 48.5ms 39.92 60.52 model - log
DAB-DETR-r50 dab-detr-r50 23M/43M 2.6 58.5ms 42.52 63.03 model - log
DN-DETR-r50 dab-detr-r50 23M/43M 7.8 58.5ms 44.39 64.66 model - log

DINO

Algorithm Config Params
(backbone/total)
inference time(V100)
(ms/img)
bbox_mAPval
0.5:0.95
APval
50
Download Comment
DINO_4sc_r50_12e DINO_4sc_r50_12e 23M/47M 184ms 48.71 66.27 model - log Inference use V100 32G
DINO_4sc_r50_36e DINO_4sc_r50_36e 23M/47M 184ms 50.69 68.60 model - log Inference use V100 32G
DINO_4sc_swinl_12e DINO_4sc_swinl_12e 195M/217M 155ms 56.86 75.61 model - log Inference use V100 32G
DINO_4sc_swinl_36e DINO_4sc_swinl_36e 195M/217M 155ms 58.04 76.76 model - log Inference use V100 32G
DINO_5sc_swinl_36e DINO_5sc_swinl_36e 195M/217M 235ms 58.47 77.10 model - log Inference use V100 32G
DINO++_5sc_swinl_18e DINO++_5sc_swinl_18e 195M/218M 325ms 63.39 80.25 model - log Inference use A100 80G