Prepare Datasets¶

EasyCV provides various datasets for multi tasks. Please refer to the following guide for data preparation and keep the same data structure.

Image Classification
Object Detection
Self-Supervised Learning
Pose (Keypoint)
Image Segmentation
Object Detection 3D

Image Classification¶

Cifar10
Cifar100
Imagenet-1k
Imagenet-1k-TFrecords

Cifar10¶

The CIFAR-10 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class.

There are 50000 training images and 10000 test images.

Here is the list of classes in the CIFAR-10: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

For more detailed information, please refer to CIFAR.

Download¶

Download data from cifar-10-python.tar.gz (163MB). And uncompress files to data/cifar10.

Directory structure is as follows:

data/cifar10
└── cifar-10-batches-py
    ├── batches.meta
    ├── data_batch_1
    ├── data_batch_2
    ├── data_batch_3
    ├── data_batch_4
    ├── data_batch_5
    ├── readme.html
    ├── read.py
    └── test_batch

Cifar100¶

The CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each.

There are 500 training images and 100 testing images per class.

The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a “fine” label (the class to which it belongs) and a “coarse” label (the superclass to which it belongs).

For more detailed information, please refer to CIFAR.

Download¶

Download data from cifar-100-python.tar.gz (161MB). And uncompress files to data/cifar100.

Directory structure should be as follows:

data/cifar100
└── cifar-100-python
    ├── file.txt~
    ├── meta
    ├── test
    ├── train

Imagenet-1k¶

ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns).

It is used in the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) and is a benchmark for image classification.

For more detailed information, please refer to ImageNet.

Download¶

ILSVRC2012 is widely used, download it as follows:

Go to the download-url, Register an account and log in .
Recommended ILSVRC2012, download the following files：
- Training images (Task 1 & 2). 138GB.
- Validation images (all tasks). 6.3GB.
Unzip the downloaded file.
Using this scrip to get data meta.

Directory structure should be as follows:

data/imagenet
└── train
    └── n01440764
    └── n01443537
    └── ...
└── val
    └── n01440764
    └── n01443537
    └── ...
└── meta
    ├── train.txt
    ├── val.txt
    ├── ...

Imagenet-1k-TFrecords¶

Original imagenet raw images packed in TFrecord format.

For more detailed information about Imagenet dataset, please refer to ImageNet.

Download¶

Go to the download-url, Register an account and log in .
The dataset is divided into two parts, part0 (79GB) and part1 (75GB), you need download all of them.

Directory structure should be as follows, put the image file and the idx file in the same folder:

data/imagenet
└── train
    ├── train-00000-of-01024
    ├── train-00000-of-01024.idx
    ├── train-00001-of-01024
    ├── train-00001-of-01024.idx
    ├── ...
└── validation
    ├── validation-00000-of-00128
    ├── validation-00000-of-00128.idx
    ├── validation-00001-of-00128
    ├── validation-00001-of-00128.idx
    ├── ...

PAI-iTAG detection¶

PAI-iTAG is a platform for intelligent data annotation, which supports the annotation of various data types such as images, texts, videos, and audios, as well as multi-modal mixed annotation.

Please refer to 智能标注iTAG for file format and data annotation.

Download¶

Download SmallCOCO dataset to data/demo_itag_coco, Directory structure should be as follows:

data/demo_itag_coco/
├── train2017
├── train2017_20_local.manifest
├── val2017
└── val2017_20_local.manifest

COCO2017¶

The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.

The COCO dataset has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set.

For more detailed information, please refer to COCO.

Download¶

Download train2017.zip (18G) ,val2017.zip (1G), annotations_trainval2017.zip (241MB) and uncompress files to to data/coco2017.

Directory structure is as follows:

data/coco2017
└── annotations
    ├── instances_train2017.json
    ├── instances_val2017.json
└── train2017
    ├── 000000000009.jpg
    ├── 000000000025.jpg
    ├── ...
└── val2017
    ├── 000000000139.jpg
    ├── 000000000285.jpg
    ├── ...

VOC2007¶

PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations.

For more detailed information, please refer to voc2007.

Download¶

Download VOCtrainval_06-Nov-2007.tar (439MB) and uncompress files to to data/VOCdevkit.

Directory structure is as follows:

data/VOCdevkit
└── VOC2007
    └── Annotations
        ├── 000005.xml
        ├── 001010.xml
    	├── ...
    └── JPEGImages
        ├── 000005.jpg
        ├── 001010.jpg
        ├── ...
    └── SegmentationClass
        ├── 000005.png
        ├── 001010.png
        ├── ...
    └── SegmentationObject
        ├── 000005.png
        ├── 001010.png
        ├── ...
    └── ImageSets
        └── Layout
            ├── train.txt
            ├── trainval.txt
            ├── val.txt
        └── Main
            ├── train.txt
            ├── val.txt
            ├── ...
        └── Segmentation
            ├── train.txt
            ├── trainval.txt
            ├── val.txt

VOC2012¶

The PASCAL VOC 2012 dataset contains 20 object categories including:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations.

For more detailed information, please refer to voc2012.

Download¶

Download VOCtrainval_11-May-2012.tar (2G) and uncompress files to to data/VOCdevkit.

Directory structure is as follows:

data/VOCdevkit
└── VOC2012
    └── Annotations
        ├── 000005.xml
        ├── 001010.xml
    	├── ...
    └── JPEGImages
        ├── 000005.jpg
        ├── 001010.jpg
        ├── ...
    └── SegmentationClass
        ├── 000005.png
        ├── 001010.png
        ├── ...
    └── SegmentationObject
        ├── 000005.png
        ├── 001010.png
        ├── ...
    └── ImageSets
        └── Layout
            ├── train.txt
            ├── trainval.txt
            ├── val.txt
        └── Main
            ├── train.txt
            ├── val.txt
            ├── ...
        └── Segmentation
            ├── train.txt
            ├── trainval.txt
            ├── val.txt

Self-Supervised Learning¶

Imagenet-1k
Imagenet-1k-TFrecords

Imagenet-1k¶

Refer to Image Classification: Imagenet-1k.

Imagenet-1k-TFrecords¶

Refer to Image Classification: Imagenet-1k-TFrecords.

Pose¶

COCO2017

COCO2017¶

The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.

The COCO dataset has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set.

For more detailed information, please refer to COCO.

Download¶

Download it as follows:

Download data: train2017.zip (18G) , val2017.zip (1G)
Download annotations: annotations_trainval2017.zip (241MB)
Download person detection results: HRNet-Human-Pose-Estimation provides person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive (26.2MB).

Then uncompress files to data/coco2017, directory structure is as follows:

data/coco2017
└── annotations
    ├── person_keypoints_train2017.json
    ├── person_keypoints_val2017.json
└── person_detection_results
    ├── COCO_val2017_detections_AP_H_56_person.json
    ├── COCO_test-dev2017_detections_AP_H_609_person.json
└── train2017
    ├── 000000000009.jpg
    ├── 000000000025.jpg
    ├── ...
└── val2017
    ├── 000000000139.jpg
    ├── 000000000285.jpg
    ├── ...

Image Segmentation¶

COCO Stuff 164k

COCO Stuff 164k¶

For COCO Stuff 164k dataset, please run the following commands to download and convert the augmented dataset.

# download
mkdir coco_stuff164k && cd coco_stuff164k
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
wget http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip

# unzip
unzip train2017.zip -d images/
unzip val2017.zip -d images/
unzip stuffthingmaps_trainval2017.zip -d annotations/

# --nproc means 8 process for conversion, which could be omitted as well.
python tools/prepare_data/coco_stuff164k.py /path/to/coco_stuff164k --nproc 8

By convention, mask labels in /path/to/coco_stuff164k/annotations/*2017/*_labelTrainIds.png are used for COCO Stuff 164k training and testing.

The details of this dataset could be found at here.

Object Detection 3D¶

NuScenes

NuScenes¶

Download nuScenes V1.0 full dataset data and CAN bus expansion data HERE. Prepare nuscenes data by running:

python tools/prepare_data/prepare_nuscenes.py \
--root_path=./data/nuscenes \
--canbus_root_path=./data/canbus \
--out_dir=./data/nuscenes \
--version=v1.0

It will generate nuscenes_infos_temporal_{train,val}.pkl files.

The data structure is as follows:

data/nuscenes
 ├── can_bus
 ├── nuscenes-v1.0
 │   ├── maps
 │   ├── samples
 │   ├── sweeps
 │   ├── v1.0-test
 |   ├── v1.0-trainval
 |   ├── nuscenes_infos_temporal_train.pkl
 |   ├── nuscenes_infos_temporal_val.pkl