easycv.datasets.detection.pipelines package¶
- class easycv.datasets.detection.pipelines.MMToTensor[source]¶
Bases:
object
Transform image to Tensor. Required key: ‘img’. Modifies key: ‘img’. :param results: contain all information about training. :type results: dict
- class easycv.datasets.detection.pipelines.NormalizeTensor(mean, std)[source]¶
Bases:
object
Normalize the Tensor image (CxHxW), with mean and std. Required key: ‘img’. Modifies key: ‘img’. :param mean: Mean values of 3 channels. :type mean: list[float] :param std: Std values of 3 channels. :type std: list[float]
- class easycv.datasets.detection.pipelines.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶
Bases:
object
Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image. .. code:: text
- mosaic transform
center_x
- center_y |----+-------------+-----------|
- | cropped | |
|pad | image3 | image4 | | | | | +----|————-+———–+
- The mosaic transform steps are as follows:
Choose the mosaic center as the intersections of 4 images
Get the left top image according to the index, and randomly sample another 3 images from the custom dataset.
Sub image will be cropped if image is larger than mosaic patch
- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.
- class easycv.datasets.detection.pipelines.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
object
MixUp data augmentation. .. code:: text
- The mixup transform steps are as follows::
Another random image is picked by dataset and embedded in the top left patch(after padding and resizing)
The target of mixup transform is the weighted average of mixup image and origin image.
- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.
- class easycv.datasets.detection.pipelines.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
object
Random affine transform data augmentation. for yolox This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms. :param max_rotate_degree: Maximum degrees of rotation transform.
Default: 10.
- Parameters
max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.
- class easycv.datasets.detection.pipelines.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶
Bases:
object
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last. 1. random brightness 2. random contrast (mode 0) 3. convert color from BGR to HSV 4. random saturation 5. random hue 6. convert color from HSV to BGR 7. random contrast (mode 1) 8. randomly swap channels :param brightness_delta: delta of brightness. :type brightness_delta: int :param contrast_range: range of contrast. :type contrast_range: tuple :param saturation_range: range of saturation. :type saturation_range: tuple :param hue_delta: delta of hue. :type hue_delta: int
- class easycv.datasets.detection.pipelines.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Bases:
object
Resize images & bbox & mask. This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor. img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes: -
ratio_range is not None
: randomly sample a ratio from the ratio range and multiply it with the image scale. -ratio_range is None
andmultiscale_mode == "range"
: randomly sample a scale from the multiscale range. -ratio_range is None
andmultiscale_mode == "value"
: randomly sample a scale from multiple scales. :param img_scale: Images scales for resizing. :type img_scale: tuple or list[tuple] :param multiscale_mode: Either “range” or “value”. :type multiscale_mode: str :param ratio_range: (min_ratio, max_ratio) :type ratio_range: tuple[float] :param keep_ratio: Whether to keep the aspect ratio when resizing theimage.
- Parameters
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.
- __init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- static random_select(img_scales)[source]¶
Randomly select an img_scale from given candidates. :param img_scales: Images scales for selection. :type img_scales: list[tuple]
- Returns
Returns a tuple
(img_scale, scale_dix)
, whereimg_scale
is the selected image scale andscale_idx
is the selected index in the given candidates.- Return type
(tuple, int)
- static random_sample(img_scales)[source]¶
Randomly sample an img_scale when
multiscale_mode=='range'
. :param img_scales: Images scale range for sampling.There must be two tuples in img_scales, which specify the lower and upper bound of image scales.
- Returns
Returns a tuple
(img_scale, None)
, whereimg_scale
is sampled scale and None is just a placeholder to be consistent withrandom_select()
.- Return type
(tuple, None)
- static random_sample_ratio(img_scale, ratio_range)[source]¶
Randomly sample an img_scale when
ratio_range
is specified. A ratio will be randomly sampled from the range specified byratio_range
. Then it would be multiplied withimg_scale
to generate sampled scale. :param img_scale: Images scale base to multiply with ratio. :type img_scale: tuple :param ratio_range: The minimum and maximum ratio to scalethe
img_scale
.- Returns
Returns a tuple
(scale, None)
, wherescale
is sampled ratio multiplied withimg_scale
and None is just a placeholder to be consistent withrandom_select()
.- Return type
(tuple, None)
- class easycv.datasets.detection.pipelines.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶
Bases:
object
Flip the image & bbox & mask. If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method. When random flip is enabled,
flip_ratio
/direction
can either be a float/string or tuple of float/string. There are 3 flip modes: -flip_ratio
is float,direction
is string: the image will bedirection``ly flipped with probability of ``flip_ratio
. E.g.,flip_ratio=0.5
,direction='horizontal'
, then image will be horizontally flipped with probability of 0.5.flip_ratio
is float,direction
is list of string: the image wilbe
direction[i]``ly flipped with probability of ``flip_ratio/len(direction)
. E.g.,flip_ratio=0.5
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratio
is list of float,direction
is list of string:given
len(flip_ratio) == len(direction)
, the image wil bedirection[i]``ly flipped with probability of ``flip_ratio[i]
. E.g.,flip_ratio=[0.3, 0.5]
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.
- Parameters
flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal
flip_ratio
. Each element inflip_ratio
indicates the flip probability of corresponding direction.
- __init__(flip_ratio=None, direction='horizontal')[source]¶
Initialize self. See help(type(self)) for accurate signature.
- bbox_flip(bboxes, img_shape, direction)[source]¶
Flip bboxes horizontally. :param bboxes: Bounding boxes, shape (…, 4*k) :type bboxes: numpy.ndarray :param img_shape: Image shape (height, width) :type img_shape: tuple[int] :param direction: Flip direction. Options are ‘horizontal’,
‘vertical’.
- Returns
Flipped bounding boxes.
- Return type
numpy.ndarray
- class easycv.datasets.detection.pipelines.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val={'img': 0, 'masks': 0, 'seg': 255})[source]¶
Bases:
object
Pad the image & mask. There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”, :param size: Fixed padding size. :type size: tuple, optional :param size_divisor: The divisor of padded size. :type size_divisor: int, optional :param pad_to_square: Whether to pad the image into a square.
Currently only used for YOLOX. Default: False.
- Parameters
pad_val (dict, optional) – A dict for padding value, the default value is dict(img=0, masks=0, seg=255).
- class easycv.datasets.detection.pipelines.MMNormalize(mean, std, to_rgb=True)[source]¶
Bases:
object
Normalize the image. Added key is “img_norm_cfg”. :param mean: Mean values of 3 channels. :type mean: sequence :param std: Std values of 3 channels. :type std: sequence :param to_rgb: Whether to convert the image from BGR to RGB,
default is true.
- class easycv.datasets.detection.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load an image from file. Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1). :param to_float32: Whether to convert the loaded image to a float32
numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- Parameters
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.LoadImageFromWebcam(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Bases:
easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile
Load an image from webcam.
Similar with
LoadImageFromFile
, but the image read from webcam is inresults['img']
.
- class easycv.datasets.detection.pipelines.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load multi-channel images from a list of separate channel files. Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1). :param to_float32: Whether to convert the loaded image to a float32
numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- Parameters
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load multiple types of annotations. :param with_bbox: Whether to parse and load the bbox annotation.
Default: True.
- Parameters
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶
Bases:
object
Test-time augmentation with multiple scales and flipping. An example configuration is as followed: .. code-block:
img_scale=[(1333, 400), (1333, 800)], flip=True, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]
After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed: .. code-block:
dict( img=[...], img_shape=[...], scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)] flip=[False, True, False, True] ... )
- Parameters
transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.
- class easycv.datasets.detection.pipelines.MMRandomCrop(crop_size, crop_type='absolute', allow_negative_crop=False, recompute_bbox=False, bbox_clip_border=True)[source]¶
Bases:
object
Random crop the image & bboxes & masks.
The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.
- Parameters
crop_size (tuple) – The relative ratio or absolute pixels of height and width.
crop_type (str, optional) – one of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Default “absolute”.
allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Default False.
recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Default False.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
Note
- If the image is smaller than the absolute crop size, return the
original image.
The keys for bboxes, labels and masks must be aligned. That is, gt_bboxes corresponds to gt_labels and gt_masks, and gt_bboxes_ignore corresponds to gt_labels_ignore and gt_masks_ignore.
If the crop does not contain any gt-bbox region and allow_negative_crop is set to False, skip this image.
- class easycv.datasets.detection.pipelines.MMFilterAnnotations(min_gt_bbox_wh=(1.0, 1.0), min_gt_mask_area=1, by_box=True, by_mask=False, keep_empty=True)[source]¶
Bases:
object
Filter invalid annotations. :param min_gt_bbox_wh: Minimum width and height of ground truth
boxes. Default: (1., 1.)
- Parameters
min_gt_mask_area (int) – Minimum foreground area of ground truth masks. Default: 1
by_box (bool) – Filter instances with bounding boxes not meeting the min_gt_bbox_wh threshold. Default: True
by_mask (bool) – Filter instances with masks not meeting min_gt_mask_area threshold. Default: False
keep_empty (bool) – Whether to return None when it becomes an empty bbox after filtering. Default: True
Submodules¶
easycv.datasets.detection.pipelines.mm_transforms module¶
- class easycv.datasets.detection.pipelines.mm_transforms.MMToTensor[source]¶
Bases:
object
Transform image to Tensor. Required key: ‘img’. Modifies key: ‘img’. :param results: contain all information about training. :type results: dict
- class easycv.datasets.detection.pipelines.mm_transforms.NormalizeTensor(mean, std)[source]¶
Bases:
object
Normalize the Tensor image (CxHxW), with mean and std. Required key: ‘img’. Modifies key: ‘img’. :param mean: Mean values of 3 channels. :type mean: list[float] :param std: Std values of 3 channels. :type std: list[float]
- class easycv.datasets.detection.pipelines.mm_transforms.MMMosaic(img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=114)[source]¶
Bases:
object
Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image. .. code:: text
- mosaic transform
center_x
- center_y |----+-------------+-----------|
- | cropped | |
|pad | image3 | image4 | | | | | +----|————-+———–+
- The mosaic transform steps are as follows:
Choose the mosaic center as the intersections of 4 images
Get the left top image according to the index, and randomly sample another 3 images from the custom dataset.
Sub image will be cropped if image is larger than mosaic patch
- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. Default to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default to (0.5, 1.5).
pad_val (int) – Pad value. Default to 114.
- class easycv.datasets.detection.pipelines.mm_transforms.MMMixUp(img_scale=(640, 640), ratio_range=(0.5, 1.5), flip_ratio=0.5, pad_val=114, max_iters=15, min_bbox_size=5, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
object
MixUp data augmentation. .. code:: text
- The mixup transform steps are as follows::
Another random image is picked by dataset and embedded in the top left patch(after padding and resizing)
The target of mixup transform is the weighted average of mixup image and origin image.
- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. Default: (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Default: (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Default: 0.5.
pad_val (int) – Pad value. Default: 114.
max_iters (int) – The maximum number of iterations. If the number of iterations is greater than max_iters, but gt_bbox is still empty, then the iteration is terminated. Default: 15.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 5.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Default: 20.
- class easycv.datasets.detection.pipelines.mm_transforms.MMRandomAffine(max_rotate_degree=10.0, max_translate_ratio=0.1, scaling_ratio_range=(0.5, 1.5), max_shear_degree=2.0, border=(0, 0), border_val=(114, 114, 114), min_bbox_size=2, min_area_ratio=0.2, max_aspect_ratio=20)[source]¶
Bases:
object
Random affine transform data augmentation. for yolox This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms. :param max_rotate_degree: Maximum degrees of rotation transform.
Default: 10.
- Parameters
max_translate_ratio (float) – Maximum ratio of translation. Default: 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Default: (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Default: 2.
border (tuple[int]) – Distance from height and width sides of input image to adjust output shape. Only used in mosaic dataset. Default: (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Default: (114, 114, 114).
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Default: 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Default: 0.2.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed.
- class easycv.datasets.detection.pipelines.mm_transforms.MMPhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[source]¶
Bases:
object
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last. 1. random brightness 2. random contrast (mode 0) 3. convert color from BGR to HSV 4. random saturation 5. random hue 6. convert color from HSV to BGR 7. random contrast (mode 1) 8. randomly swap channels :param brightness_delta: delta of brightness. :type brightness_delta: int :param contrast_range: range of contrast. :type contrast_range: tuple :param saturation_range: range of saturation. :type saturation_range: tuple :param hue_delta: delta of hue. :type hue_delta: int
- class easycv.datasets.detection.pipelines.mm_transforms.MMResize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Bases:
object
Resize images & bbox & mask. This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used. If the input dict contains the key “scale_factor” (if MultiScaleFlipAug does not give img_scale but scale_factor), the actual scale will be computed by image shape and scale_factor. img_scale can either be a tuple (single-scale) or a list of tuple (multi-scale). There are 3 multiscale modes: -
ratio_range is not None
: randomly sample a ratio from the ratio range and multiply it with the image scale. -ratio_range is None
andmultiscale_mode == "range"
: randomly sample a scale from the multiscale range. -ratio_range is None
andmultiscale_mode == "value"
: randomly sample a scale from multiple scales. :param img_scale: Images scales for resizing. :type img_scale: tuple or list[tuple] :param multiscale_mode: Either “range” or “value”. :type multiscale_mode: str :param ratio_range: (min_ratio, max_ratio) :type ratio_range: tuple[float] :param keep_ratio: Whether to keep the aspect ratio when resizing theimage.
- Parameters
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
override (bool, optional) – Whether to override scale and scale_factor so as to call resize twice. Default False. If True, after the first resizing, the existed scale and scale_factor will be ignored so the second resizing can be allowed. This option is a work-around for multiple times of resize in DETR. Defaults to False.
- __init__(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, bbox_clip_border=True, backend='cv2', override=False)[source]¶
Initialize self. See help(type(self)) for accurate signature.
- static random_select(img_scales)[source]¶
Randomly select an img_scale from given candidates. :param img_scales: Images scales for selection. :type img_scales: list[tuple]
- Returns
Returns a tuple
(img_scale, scale_dix)
, whereimg_scale
is the selected image scale andscale_idx
is the selected index in the given candidates.- Return type
(tuple, int)
- static random_sample(img_scales)[source]¶
Randomly sample an img_scale when
multiscale_mode=='range'
. :param img_scales: Images scale range for sampling.There must be two tuples in img_scales, which specify the lower and upper bound of image scales.
- Returns
Returns a tuple
(img_scale, None)
, whereimg_scale
is sampled scale and None is just a placeholder to be consistent withrandom_select()
.- Return type
(tuple, None)
- static random_sample_ratio(img_scale, ratio_range)[source]¶
Randomly sample an img_scale when
ratio_range
is specified. A ratio will be randomly sampled from the range specified byratio_range
. Then it would be multiplied withimg_scale
to generate sampled scale. :param img_scale: Images scale base to multiply with ratio. :type img_scale: tuple :param ratio_range: The minimum and maximum ratio to scalethe
img_scale
.- Returns
Returns a tuple
(scale, None)
, wherescale
is sampled ratio multiplied withimg_scale
and None is just a placeholder to be consistent withrandom_select()
.- Return type
(tuple, None)
- class easycv.datasets.detection.pipelines.mm_transforms.MMRandomFlip(flip_ratio=None, direction='horizontal')[source]¶
Bases:
object
Flip the image & bbox & mask. If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method. When random flip is enabled,
flip_ratio
/direction
can either be a float/string or tuple of float/string. There are 3 flip modes: -flip_ratio
is float,direction
is string: the image will bedirection``ly flipped with probability of ``flip_ratio
. E.g.,flip_ratio=0.5
,direction='horizontal'
, then image will be horizontally flipped with probability of 0.5.flip_ratio
is float,direction
is list of string: the image wilbe
direction[i]``ly flipped with probability of ``flip_ratio/len(direction)
. E.g.,flip_ratio=0.5
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.
flip_ratio
is list of float,direction
is list of string:given
len(flip_ratio) == len(direction)
, the image wil bedirection[i]``ly flipped with probability of ``flip_ratio[i]
. E.g.,flip_ratio=[0.3, 0.5]
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.
- Parameters
flip_ratio (float | list[float], optional) – The flipping probability. Default: None.
direction (str | list[str], optional) – The flipping direction. Options are ‘horizontal’, ‘vertical’, ‘diagonal’. Default: ‘horizontal’. If input is a list, the length must equal
flip_ratio
. Each element inflip_ratio
indicates the flip probability of corresponding direction.
- __init__(flip_ratio=None, direction='horizontal')[source]¶
Initialize self. See help(type(self)) for accurate signature.
- bbox_flip(bboxes, img_shape, direction)[source]¶
Flip bboxes horizontally. :param bboxes: Bounding boxes, shape (…, 4*k) :type bboxes: numpy.ndarray :param img_shape: Image shape (height, width) :type img_shape: tuple[int] :param direction: Flip direction. Options are ‘horizontal’,
‘vertical’.
- Returns
Flipped bounding boxes.
- Return type
numpy.ndarray
- class easycv.datasets.detection.pipelines.mm_transforms.MMRandomCrop(crop_size, crop_type='absolute', allow_negative_crop=False, recompute_bbox=False, bbox_clip_border=True)[source]¶
Bases:
object
Random crop the image & bboxes & masks.
The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.
- Parameters
crop_size (tuple) – The relative ratio or absolute pixels of height and width.
crop_type (str, optional) – one of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Default “absolute”.
allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Default False.
recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Default False.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.
Note
- If the image is smaller than the absolute crop size, return the
original image.
The keys for bboxes, labels and masks must be aligned. That is, gt_bboxes corresponds to gt_labels and gt_masks, and gt_bboxes_ignore corresponds to gt_labels_ignore and gt_masks_ignore.
If the crop does not contain any gt-bbox region and allow_negative_crop is set to False, skip this image.
- class easycv.datasets.detection.pipelines.mm_transforms.MMPad(size=None, size_divisor=None, pad_to_square=False, pad_val={'img': 0, 'masks': 0, 'seg': 255})[source]¶
Bases:
object
Pad the image & mask. There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”, :param size: Fixed padding size. :type size: tuple, optional :param size_divisor: The divisor of padded size. :type size_divisor: int, optional :param pad_to_square: Whether to pad the image into a square.
Currently only used for YOLOX. Default: False.
- Parameters
pad_val (dict, optional) – A dict for padding value, the default value is dict(img=0, masks=0, seg=255).
- class easycv.datasets.detection.pipelines.mm_transforms.MMNormalize(mean, std, to_rgb=True)[source]¶
Bases:
object
Normalize the image. Added key is “img_norm_cfg”. :param mean: Mean values of 3 channels. :type mean: sequence :param std: Std values of 3 channels. :type std: sequence :param to_rgb: Whether to convert the image from BGR to RGB,
default is true.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load an image from file. Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1). :param to_float32: Whether to convert the loaded image to a float32
numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- Parameters
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromWebcam(to_float32=False, color_type='color', file_client_args={'backend': 'disk'})[source]¶
Bases:
easycv.datasets.detection.pipelines.mm_transforms.LoadImageFromFile
Load an image from webcam.
Similar with
LoadImageFromFile
, but the image read from webcam is inresults['img']
.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadMultiChannelImageFromFiles(to_float32=False, color_type='unchanged', file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load multi-channel images from a list of separate channel files. Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”, which is expected to be a list of filenames). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1). :param to_float32: Whether to convert the loaded image to a float32
numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- Parameters
color_type (str) – The flag argument for
mmcv.imfrombytes()
. Defaults to ‘color’.file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadAnnotations(with_bbox=True, with_label=True, with_mask=False, with_seg=False, poly2mask=True, file_client_args={'backend': 'disk'})[source]¶
Bases:
object
Load multiple types of annotations. :param with_bbox: Whether to parse and load the bbox annotation.
Default: True.
- Parameters
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: False.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: False.
poly2mask (bool) – Whether to convert the instance masks from polygons to bitmaps. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.mm_transforms.LoadPanopticAnnotations(with_bbox=True, with_label=True, with_mask=True, with_seg=True, file_client_args={'backend': 'disk'})[source]¶
Bases:
easycv.datasets.detection.pipelines.mm_transforms.LoadAnnotations
Load multiple types of panoptic annotations.
- Parameters
with_bbox (bool) – Whether to parse and load the bbox annotation. Default: True.
with_label (bool) – Whether to parse and load the label annotation. Default: True.
with_mask (bool) – Whether to parse and load the mask annotation. Default: True.
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Default: True.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Defaults todict(backend='disk')
.
- class easycv.datasets.detection.pipelines.mm_transforms.MMMultiScaleFlipAug(transforms, img_scale=None, scale_factor=None, flip=False, flip_direction='horizontal')[source]¶
Bases:
object
Test-time augmentation with multiple scales and flipping. An example configuration is as followed: .. code-block:
img_scale=[(1333, 400), (1333, 800)], flip=True, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]
After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed: .. code-block:
dict( img=[...], img_shape=[...], scale=[(1333, 400), (1333, 400), (1333, 800), (1333, 800)] flip=[False, True, False, True] ... )
- Parameters
transforms (list[dict]) – Transforms to apply in each augmentation.
img_scale (tuple | list[tuple] | None) – Images scales for resizing.
scale_factor (float | list[float] | None) – Scale factors for resizing.
flip (bool) – Whether apply flip augmentation. Default: False.
flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal”, “vertical” and “diagonal”. If flip_direction is a list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.
- class easycv.datasets.detection.pipelines.mm_transforms.MMFilterAnnotations(min_gt_bbox_wh=(1.0, 1.0), min_gt_mask_area=1, by_box=True, by_mask=False, keep_empty=True)[source]¶
Bases:
object
Filter invalid annotations. :param min_gt_bbox_wh: Minimum width and height of ground truth
boxes. Default: (1., 1.)
- Parameters
min_gt_mask_area (int) – Minimum foreground area of ground truth masks. Default: 1
by_box (bool) – Filter instances with bounding boxes not meeting the min_gt_bbox_wh threshold. Default: True
by_mask (bool) – Filter instances with masks not meeting min_gt_mask_area threshold. Default: False
keep_empty (bool) – Whether to return None when it becomes an empty bbox after filtering. Default: True