easycv.datasets.detection.pipelines package¶

class easycv.datasets.detection.pipelines.MMToTensor[source]¶

Bases: object

Transform image to Tensor. Required key: ‘img’. Modifies key: ‘img’. :param results: contain all information about training. :type results: dict

class easycv.datasets.detection.pipelines.NormalizeTensor(mean, std)[source]¶

Bases: object

Normalize the Tensor image (CxHxW), with mean and std. Required key: ‘img’. Modifies key: ‘img’. :param mean: Mean values of 3 channels. :type mean: list[float] :param std: Std values of 3 channels. :type std: list[float]

__init__(mean, std)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶

Bases: object

Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image. .. code:: text

mosaic transform
center_x

center_y |----+-------------+-----------|

| cropped | |

|pad | image3 | image4 | | | | | +----|————-+———–+

|

The mosaic transform steps are as follows:

Choose the mosaic center as the intersections of 4 images

Get the left top image according to the index, and randomly sample another 3 images from the custom dataset.

Sub image will be cropped if image is larger than mosaic patch

Parameters

img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.

__init__(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_indexes(dataset)[source]¶

Call function to collect indexes. :param dataset: The dataset. :type dataset: DetImagesMixDataset

Returns: indexes.
Return type: list

class easycv.datasets.detection.pipelines.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶

Bases: object

MixUp data augmentation. .. code:: text

mixup transform

mixup image |
+——–|——–+ | | |

—————+ |

|

image |

|

|

|—————–+
pad

The mixup transform steps are as follows::

Another random image is picked by dataset and embedded in the top left patch(after padding and resizing)

The target of mixup transform is the weighted average of mixup image and origin image.

Parameters

img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.

__init__(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_indexes(dataset)[source]¶

Call function to collect indexes. :param dataset: The dataset. :type dataset: DetImagesMixDataset

Returns: indexes.
Return type: list

class easycv.datasets.detection.pipelines.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶

Bases: object

Random affine transform data augmentation. for yolox This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms. :param max_rotate_degree: Maximum degrees of rotation transform.

Default: 10.

Parameters

max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.

__init__(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶: Initialize self. See help(type(self)) for accurate signature.

filter_gt_bboxes(origin_bboxes, wrapped_bboxes)[source]¶

class easycv.datasets.detection.pipelines.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶

Bases: object

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last. 1. random brightness 2. random contrast (mode 0) 3. convert color from BGR to HSV 4. random saturation 5. random hue 6. convert color from HSV to BGR 7. random contrast (mode 1) 8. randomly swap channels :param brightness_delta: delta of brightness. :type brightness_delta: int :param contrast_range: range of contrast. :type contrast_range: tuple :param saturation_range: range of saturation. :type saturation_range: tuple :param hue_delta: delta of hue. :type hue_delta: int

__init__(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶

Bases: object

Resize images & bbox & mask. This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor. img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes: - ratio_range is not None: randomly sample a ratio from the ratio range and multiply it with the image scale. - ratio_range is None and multiscale_mode == "range": randomly sample a scale from the multiscale range. - ratio_range is None and multiscale_mode == "value": randomly sample a scale from multiple scales. :param img_scale: Images scales for resizing. :type img_scale: tuple or list[tuple] :param multiscale_mode: Either “range” or “value”. :type multiscale_mode: str :param ratio_range: (min_ratio, max_ratio) :type ratio_range: tuple[float] :param keep_ratio: Whether to keep the aspect ratio when resizing the

image.

Parameters

bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.

__init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶: Initialize self. See help(type(self)) for accurate signature.

static random_select(img_scales)[source]¶

Randomly select an img_scale from given candidates. :param img_scales: Images scales for selection. :type img_scales: list[tuple]

Returns: Returns a tuple (img_scale, scale_dix), where img_scale is the selected image scale and scale_idx is the selected index in the given candidates.
Return type: (tuple, int)

static random_sample(img_scales)[source]¶

Randomly sample an img_scale when multiscale_mode=='range'. :param img_scales: Images scale range for sampling.

There must be two tuples in img_scales, which specify the lower and upper bound of image scales.

Returns: Returns a tuple (img_scale, None), where img_scale is sampled scale and None is just a placeholder to be consistent with random_select().
Return type: (tuple, None)

static random_sample_ratio(img_scale, ratio_range)[source]¶

Randomly sample an img_scale when ratio_range is specified. A ratio will be randomly sampled from the range specified by ratio_range. Then it would be multiplied with img_scale to generate sampled scale. :param img_scale: Images scale base to multiply with ratio. :type img_scale: tuple :param ratio_range: The minimum and maximum ratio to scale

the img_scale.

Returns: Returns a tuple (scale, None), where scale is sampled ratio multiplied with img_scale and None is just a placeholder to be consistent with random_select().
Return type: (tuple, None)

class easycv.datasets.detection.pipelines.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶

Bases: object

Flip the image & bbox & mask. If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method. When random flip is enabled, flip_ratio/direction can either be a float/string or tuple of float/string. There are 3 flip modes: - flip_ratio is float, direction is string: the image will be

direction``ly flipped with probability of ``flip_ratio . E.g., flip_ratio=0.5, direction='horizontal', then image will be horizontally flipped with probability of 0.5.

flip_ratio is float, direction is list of string: the image wil
be direction[i]``ly flipped with probability of ``flip_ratio/len(direction). E.g., flip_ratio=0.5, direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratio is list of float, direction is list of string:
given len(flip_ratio) == len(direction), the image wil be direction[i]``ly flipped with probability of ``flip_ratio[i]. E.g., flip_ratio=[0.3, 0.5], direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.

Parameters

flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal flip_ratio. Each element in flip_ratio indicates the flip probability of corresponding direction.

__init__(flip_ratio=None, direction='horizontal')[source]¶: Initialize self. See help(type(self)) for accurate signature.

bbox_flip(bboxes, img_shape, direction)[source]¶

Flip bboxes horizontally. :param bboxes: Bounding boxes, shape (…, 4*k) :type bboxes: numpy.ndarray :param img_shape: Image shape (height, width) :type img_shape: tuple[int] :param direction: Flip direction. Options are ‘horizontal’,

‘vertical’.

Returns: Flipped bounding boxes.
Return type: numpy.ndarray

class easycv.datasets.detection.pipelines.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val={'img': 0, 'masks': 0, 'seg': 255})[source]¶

Bases: object

Pad the image & mask. There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”, :param size: Fixed padding size. :type size: tuple, optional :param size_divisor: The divisor of padded size. :type size_divisor: int, optional :param pad_to_square: Whether to pad the image into a square.

Currently only used for YOLOX. Default: False.

Parameters: pad_val (dict, optional) – A dict for padding value, the default value is dict(img=0, masks=0, seg=255).

__init__(size=None, size_divisor=None, pad_to_square=False, pad_val={'img': 0, 'masks': 0, 'seg': 255})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.MMNormalize(mean, std, to_rgb=True)[source]¶

Bases: object

Normalize the image. Added key is “img_norm_cfg”. :param mean: Mean values of 3 channels. :type mean: sequence :param std: Std values of 3 channels. :type std: sequence :param to_rgb: Whether to convert the image from BGR to RGB,

default is true.

__init__(mean, std, to_rgb=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load an image from file. Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1). :param to_float32: Whether to convert the loaded image to a float32

numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

Parameters

color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.LoadImageFromWebcam(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶

Bases: easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile

Load an image from webcam.

Similar with LoadImageFromFile, but the image read from webcam is in results['img'].

class easycv.datasets.detection.pipelines.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load multi-channel images from a list of separate channel files. Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1). :param to_float32: Whether to convert the loaded image to a float32

numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

Parameters

color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load multiple types of annotations. :param with_bbox: Whether to parse and load the bbox annotation.

Default: True.

Parameters

with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

process_polygons(polygons)[source]¶

Convert polygons to list of ndarray and filter invalid polygons. :param polygons: Polygons of one instance. :type polygons: list[list]

Returns: Processed polygons.
Return type: list[numpy.ndarray]

class easycv.datasets.detection.pipelines.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶

Bases: object

Test-time augmentation with multiple scales and flipping. An example configuration is as followed: .. code-block:

img_scale=[(1333, 400), (1333, 800)],
flip=True,
transforms=[
    dict(type='Resize', keep_ratio=True),
    dict(type='RandomFlip'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img']),
]

After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed: .. code-block:

dict(
    img=[...],
    img_shape=[...],
    scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)]
    flip=[False, True, False, True]
    ...
)

Parameters

transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.

__init__(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.MMRandomCrop(crop_size, crop_type='absolute', allow_negative_crop=False, recompute_bbox=False, bbox_clip_border=True)[source]¶

Bases: object

Random crop the image & bboxes & masks.

The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.

Parameters

crop_size (tuple) – The relative ratio or absolute pixels of height and width.
crop_type (str, optional) – one of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Default “absolute”.
allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Default False.
recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Default False.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

Note

If the image is smaller than the absolute crop size, return the
original image.
The keys for bboxes, labels and masks must be aligned. That is, gt_bboxes corresponds to gt_labels and gt_masks, and gt_bboxes_ignore corresponds to gt_labels_ignore and gt_masks_ignore.
If the crop does not contain any gt-bbox region and allow_negative_crop is set to False, skip this image.

__init__(crop_size, crop_type='absolute', allow_negative_crop=False, recompute_bbox=False, bbox_clip_border=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.MMFilterAnnotations(min_gt_bbox_wh=(1.0, 1.0), min_gt_mask_area=1, by_box=True, by_mask=False, keep_empty=True)[source]¶

Bases: object

Filter invalid annotations. :param min_gt_bbox_wh: Minimum width and height of ground truth

boxes. Default: (1., 1.)

Parameters

min_gt_mask_area (int) – Minimum foreground area of ground truth masks. Default: 1
by_box (bool) – Filter instances with bounding boxes not meeting the min_gt_bbox_wh threshold. Default: True
by_mask (bool) – Filter instances with masks not meeting min_gt_mask_area threshold. Default: False
keep_empty (bool) – Whether to return None when it becomes an empty bbox after filtering. Default: True

__init__(min_gt_bbox_wh=(1.0, 1.0), min_gt_mask_area=1, by_box=True, by_mask=False, keep_empty=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

Submodules¶

easycv.datasets.detection.pipelines.mm_transforms module¶

class easycv.datasets.detection.pipelines.mm_transforms.MMToTensor[source]¶

Bases: object

Transform image to Tensor. Required key: ‘img’. Modifies key: ‘img’. :param results: contain all information about training. :type results: dict

class easycv.datasets.detection.pipelines.mm_transforms.NormalizeTensor(mean, std)[source]¶

Bases: object

Normalize the Tensor image (CxHxW), with mean and std. Required key: ‘img’. Modifies key: ‘img’. :param mean: Mean values of 3 channels. :type mean: list[float] :param std: Std values of 3 channels. :type std: list[float]

__init__(mean, std)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶

Bases: object

Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image. .. code:: text

mosaic transform
center_x

center_y |----+-------------+-----------|

| cropped | |

|pad | image3 | image4 | | | | | +----|————-+———–+

|

The mosaic transform steps are as follows:

Choose the mosaic center as the intersections of 4 images

Get the left top image according to the index, and randomly sample another 3 images from the custom dataset.

Sub image will be cropped if image is larger than mosaic patch

Parameters

img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.

__init__(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_indexes(dataset)[source]¶

Call function to collect indexes. :param dataset: The dataset. :type dataset: DetImagesMixDataset

Returns: indexes.
Return type: list

class easycv.datasets.detection.pipelines.mm_transforms.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶

Bases: object

MixUp data augmentation. .. code:: text

mixup transform

mixup image |
+——–|——–+ | | |

—————+ |

|

image |

|

|

|—————–+
pad

The mixup transform steps are as follows::

Another random image is picked by dataset and embedded in the top left patch(after padding and resizing)

The target of mixup transform is the weighted average of mixup image and origin image.

Parameters

img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.

__init__(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶: Initialize self. See help(type(self)) for accurate signature.

get_indexes(dataset)[source]¶

Call function to collect indexes. :param dataset: The dataset. :type dataset: DetImagesMixDataset

Returns: indexes.
Return type: list

class easycv.datasets.detection.pipelines.mm_transforms.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶

Bases: object

Random affine transform data augmentation. for yolox This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms. :param max_rotate_degree: Maximum degrees of rotation transform.

Default: 10.

Parameters

max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.

__init__(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶: Initialize self. See help(type(self)) for accurate signature.

filter_gt_bboxes(origin_bboxes, wrapped_bboxes)[source]¶

class easycv.datasets.detection.pipelines.mm_transforms.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶

Bases: object

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last. 1. random brightness 2. random contrast (mode 0) 3. convert color from BGR to HSV 4. random saturation 5. random hue 6. convert color from HSV to BGR 7. random contrast (mode 1) 8. randomly swap channels :param brightness_delta: delta of brightness. :type brightness_delta: int :param contrast_range: range of contrast. :type contrast_range: tuple :param saturation_range: range of saturation. :type saturation_range: tuple :param hue_delta: delta of hue. :type hue_delta: int

__init__(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶

Bases: object

Resize images & bbox & mask. This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor. img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes: - ratio_range is not None: randomly sample a ratio from the ratio range and multiply it with the image scale. - ratio_range is None and multiscale_mode == "range": randomly sample a scale from the multiscale range. - ratio_range is None and multiscale_mode == "value": randomly sample a scale from multiple scales. :param img_scale: Images scales for resizing. :type img_scale: tuple or list[tuple] :param multiscale_mode: Either “range” or “value”. :type multiscale_mode: str :param ratio_range: (min_ratio, max_ratio) :type ratio_range: tuple[float] :param keep_ratio: Whether to keep the aspect ratio when resizing the

image.

Parameters

bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.

__init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶: Initialize self. See help(type(self)) for accurate signature.

static random_select(img_scales)[source]¶

Randomly select an img_scale from given candidates. :param img_scales: Images scales for selection. :type img_scales: list[tuple]

Returns: Returns a tuple (img_scale, scale_dix), where img_scale is the selected image scale and scale_idx is the selected index in the given candidates.
Return type: (tuple, int)

static random_sample(img_scales)[source]¶

Randomly sample an img_scale when multiscale_mode=='range'. :param img_scales: Images scale range for sampling.

There must be two tuples in img_scales, which specify the lower and upper bound of image scales.

Returns: Returns a tuple (img_scale, None), where img_scale is sampled scale and None is just a placeholder to be consistent with random_select().
Return type: (tuple, None)

static random_sample_ratio(img_scale, ratio_range)[source]¶

Randomly sample an img_scale when ratio_range is specified. A ratio will be randomly sampled from the range specified by ratio_range. Then it would be multiplied with img_scale to generate sampled scale. :param img_scale: Images scale base to multiply with ratio. :type img_scale: tuple :param ratio_range: The minimum and maximum ratio to scale

the img_scale.

Returns: Returns a tuple (scale, None), where scale is sampled ratio multiplied with img_scale and None is just a placeholder to be consistent with random_select().
Return type: (tuple, None)

class easycv.datasets.detection.pipelines.mm_transforms.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶

Bases: object

Flip the image & bbox & mask. If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method. When random flip is enabled, flip_ratio/direction can either be a float/string or tuple of float/string. There are 3 flip modes: - flip_ratio is float, direction is string: the image will be

direction``ly flipped with probability of ``flip_ratio . E.g., flip_ratio=0.5, direction='horizontal', then image will be horizontally flipped with probability of 0.5.

flip_ratio is float, direction is list of string: the image wil
be direction[i]``ly flipped with probability of ``flip_ratio/len(direction). E.g., flip_ratio=0.5, direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratio is list of float, direction is list of string:
given len(flip_ratio) == len(direction), the image wil be direction[i]``ly flipped with probability of ``flip_ratio[i]. E.g., flip_ratio=[0.3, 0.5], direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.

Parameters

flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal flip_ratio. Each element in flip_ratio indicates the flip probability of corresponding direction.

__init__(flip_ratio=None, direction='horizontal')[source]¶: Initialize self. See help(type(self)) for accurate signature.

bbox_flip(bboxes, img_shape, direction)[source]¶

Flip bboxes horizontally. :param bboxes: Bounding boxes, shape (…, 4*k) :type bboxes: numpy.ndarray :param img_shape: Image shape (height, width) :type img_shape: tuple[int] :param direction: Flip direction. Options are ‘horizontal’,

‘vertical’.

Returns: Flipped bounding boxes.
Return type: numpy.ndarray

class easycv.datasets.detection.pipelines.mm_transforms.MMRandomCrop(crop_size, crop_type='absolute', allow_negative_crop=False, recompute_bbox=False, bbox_clip_border=True)[source]¶

Bases: object

Random crop the image & bboxes & masks.

The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.

Parameters

crop_size (tuple) – The relative ratio or absolute pixels of height and width.
crop_type (str, optional) – one of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Default “absolute”.
allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Default False.
recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Default False.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

Note

If the image is smaller than the absolute crop size, return the
original image.
The keys for bboxes, labels and masks must be aligned. That is, gt_bboxes corresponds to gt_labels and gt_masks, and gt_bboxes_ignore corresponds to gt_labels_ignore and gt_masks_ignore.
If the crop does not contain any gt-bbox region and allow_negative_crop is set to False, skip this image.

__init__(crop_size, crop_type='absolute', allow_negative_crop=False, recompute_bbox=False, bbox_clip_border=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val={'img': 0, 'masks': 0, 'seg': 255})[source]¶

Bases: object

Pad the image & mask. There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”, :param size: Fixed padding size. :type size: tuple, optional :param size_divisor: The divisor of padded size. :type size_divisor: int, optional :param pad_to_square: Whether to pad the image into a square.

Currently only used for YOLOX. Default: False.

Parameters: pad_val (dict, optional) – A dict for padding value, the default value is dict(img=0, masks=0, seg=255).

__init__(size=None, size_divisor=None, pad_to_square=False, pad_val={'img': 0, 'masks': 0, 'seg': 255})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMNormalize(mean, std, to_rgb=True)[source]¶

Bases: object

Normalize the image. Added key is “img_norm_cfg”. :param mean: Mean values of 3 channels. :type mean: sequence :param std: Std values of 3 channels. :type std: sequence :param to_rgb: Whether to convert the image from BGR to RGB,

default is true.

__init__(mean, std, to_rgb=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load an image from file. Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1). :param to_float32: Whether to convert the loaded image to a float32

numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

Parameters

color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromWebcam(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶

Bases: easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile

Load an image from webcam.

Similar with LoadImageFromFile, but the image read from webcam is in results['img'].

class easycv.datasets.detection.pipelines.mm_transforms.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load multi-channel images from a list of separate channel files. Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1). :param to_float32: Whether to convert the loaded image to a float32

numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

Parameters

color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶

Bases: object

Load multiple types of annotations. :param with_bbox: Whether to parse and load the bbox annotation.

Default: True.

Parameters

with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

process_polygons(polygons)[source]¶

Convert polygons to list of ndarray and filter invalid polygons. :param polygons: Polygons of one instance. :type polygons: list[list]

Returns: Processed polygons.
Return type: list[numpy.ndarray]

class easycv.datasets.detection.pipelines.mm_transforms.LoadPanopticAnnotations(with_bbox=True, with_label=True, with_mask=True, with_seg=True, file_client_args={'backend': 'disk'})[source]¶

Bases: easycv.datasets.detection.pipelines.mm_transforms.LoadAnnotations

Load multiple types of panoptic annotations.

Parameters

with_bbox (bool) – Whether to parse and load the bbox annotation. Default: True.
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: True.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

__init__(with_bbox=True, with_label=True, with_mask=True, with_seg=True, file_client_args={'backend': 'disk'})[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶

Bases: object

Test-time augmentation with multiple scales and flipping. An example configuration is as followed: .. code-block:

img_scale=[(1333, 400), (1333, 800)],
flip=True,
transforms=[
    dict(type='Resize', keep_ratio=True),
    dict(type='RandomFlip'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img']),
]

After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed: .. code-block:

dict(
    img=[...],
    img_shape=[...],
    scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)]
    flip=[False, True, False, True]
    ...
)

Parameters

transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.

__init__(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.detection.pipelines.mm_transforms.MMFilterAnnotations(min_gt_bbox_wh=(1.0, 1.0), min_gt_mask_area=1, by_box=True, by_mask=False, keep_empty=True)[source]¶

Bases: object

Filter invalid annotations. :param min_gt_bbox_wh: Minimum width and height of ground truth

boxes. Default: (1., 1.)

Parameters

min_gt_mask_area (int) – Minimum foreground area of ground truth masks. Default: 1
by_box (bool) – Filter instances with bounding boxes not meeting the min_gt_bbox_wh threshold. Default: True
by_mask (bool) – Filter instances with masks not meeting min_gt_mask_area threshold. Default: False
keep_empty (bool) – Whether to return None when it becomes an empty bbox after filtering. Default: True

__init__(min_gt_bbox_wh=(1.0, 1.0), min_gt_mask_area=1, by_box=True, by_mask=False, keep_empty=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.