easycv.datasets.loader package¶

class easycv.datasets.loader.GroupSampler(dataset, samples_per_gpu=1)[source]¶

Bases: Generic[torch.utils.data.sampler.T_co]

__init__(dataset, samples_per_gpu=1)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.loader.DistributedGroupSampler(dataset, samples_per_gpu=1, seed=0, num_replicas=None, rank=None)[source]¶

Bases: Generic[torch.utils.data.sampler.T_co]

Sampler that restricts data loading to a subset of the dataset. It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it. .. note:

Dataset is assumed to be of constant size.

Parameters

dataset – Dataset used for sampling.
seed (int, Optional) – The seed. Default to 0.
num_replicas (optional) – Number of processes participating in distributed training.
rank (optional) – Rank of the current process within num_replicas.

__init__(dataset, samples_per_gpu=1, seed=0, num_replicas=None, rank=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

set_epoch(epoch)[source]¶

easycv.datasets.loader.build_dataloader(dataset, imgs_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, replace=False, seed=None, reuse_worker_cache=False, odps_config=None, persistent_workers=False, collate_hooks=None, use_repeated_augment_sampler=False, sampler=None, pin_memory=False, **kwargs)[source]¶

Build PyTorch DataLoader. In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs. :param dataset: A PyTorch dataset. :type dataset: Dataset :param imgs_per_gpu: Number of images on each GPU, i.e., batch size of

each GPU.

Parameters

workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
replace (bool) – Replace or not in random shuffle. It works on when shuffle is True.
seed (int, Optional) – The seed. Default to None.
reuse_worker_cache (bool) – If set true, will reuse worker process so that cached data in worker process can be reused.
persistent_workers (bool) – After pytorch1.7, could use persistent_workers=True to avoid reconstruct dataworker before each epoch, speed up before epoch
use_repeated_augment_sampler (bool) – If set true, it will use RASampler. Default: False.
kwargs – any keyword argument to be used to initialize DataLoader

Returns

A PyTorch dataloader.

Return type

DataLoader

class easycv.datasets.loader.DistributedGivenIterationSampler(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]¶

Bases: Generic[torch.utils.data.sampler.T_co]

__init__(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]¶: Initialize self. See help(type(self)) for accurate signature.

set_uniform_indices(labels, num_classes)[source]¶

gen_new_list()[source]¶

set_epoch(epoch)[source]¶

class easycv.datasets.loader.DistributedMPSampler(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False, **kwargs)[source]¶

Bases: torch.utils.data.sampler.Sampler[torch.utils.data.distributed.T_co]

__init__(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False, **kwargs)[source]¶

A Distribute sampler which support sample m instance from one class once for classification dataset dataset: pytorch dataset object num_replicas (optional): Number of processes participating in

distributed training.

rank (optional): Rank of the current process within num_replicas. shuffle (optional): If true (default), sampler will shuffle the indices split_huge_listfile_byrank: if split, return all indice for each rank, because list for each rank has been

split before build dataset in dist training

generate_indice()[source]¶

get_label_dict()[source]¶

calculate_this_label_list()[source]¶

class easycv.datasets.loader.RASampler(dataset, num_replicas=None, rank=None, shuffle=True, num_repeats: int = 3, **kwargs)[source]¶

Bases: Generic[torch.utils.data.sampler.T_co]

Sampler that restricts data loading to a subset of the dataset for distributed, with repeated augmentation. It ensures that different each augmented version of a sample will be visible to a different process (GPU) Heavily based on torch.utils.data.DistributedSampler

__init__(dataset, num_replicas=None, rank=None, shuffle=True, num_repeats: int = 3, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

set_epoch(epoch)[source]¶

class easycv.datasets.loader.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, replace=False, split_huge_listfile_byrank=False)[source]¶

Bases: torch.utils.data.sampler.Sampler[torch.utils.data.distributed.T_co]

__init__(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, replace=False, split_huge_listfile_byrank=False)[source]¶

A Distribute sampler which support sample m instance from one class once for classification dataset :param dataset: pytorch dataset object :param num_replicas: Number of processes participating in

distributed training.

Parameters

rank (optional) – Rank of the current process within num_replicas.
shuffle (optional) – If true (default), sampler will shuffle the indices
seed (int, Optional) – The seed. Default to 0.
split_huge_listfile_byrank – if split, return all indice for each rank, because list for each rank has been split before build dataset in dist training

generate_new_list()[source]¶

set_uniform_indices(labels, num_classes)[source]¶

Submodules¶

easycv.datasets.loader.build_loader module¶

easycv.datasets.loader.build_loader.build_dataloader(dataset, imgs_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, replace=False, seed=None, reuse_worker_cache=False, odps_config=None, persistent_workers=False, collate_hooks=None, use_repeated_augment_sampler=False, sampler=None, pin_memory=False, **kwargs)[source]¶

Build PyTorch DataLoader. In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs. :param dataset: A PyTorch dataset. :type dataset: Dataset :param imgs_per_gpu: Number of images on each GPU, i.e., batch size of

each GPU.

Parameters

workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
replace (bool) – Replace or not in random shuffle. It works on when shuffle is True.
seed (int, Optional) – The seed. Default to None.
reuse_worker_cache (bool) – If set true, will reuse worker process so that cached data in worker process can be reused.
persistent_workers (bool) – After pytorch1.7, could use persistent_workers=True to avoid reconstruct dataworker before each epoch, speed up before epoch
use_repeated_augment_sampler (bool) – If set true, it will use RASampler. Default: False.
kwargs – any keyword argument to be used to initialize DataLoader

Returns

A PyTorch dataloader.

Return type

DataLoader

easycv.datasets.loader.build_loader.worker_init_fn(worker_id, num_workers, rank, seed, odps_config=None)[source]¶

class easycv.datasets.loader.build_loader.InfiniteDataLoader(*args, **kwargs)[source]¶

Bases: Generic[torch.utils.data.dataloader.T_co]

Dataloader that reuses workers. https://github.com/pytorch/pytorch/issues/15849 Uses same syntax as vanilla DataLoader.

__init__(*args, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

dataset: torch.utils.data.dataset.Dataset[torch.utils.data.dataloader.T_co]¶

batch_size: Optional[int]¶

num_workers: int¶

pin_memory: bool¶

drop_last: bool¶

timeout: float¶

sampler: Union[torch.utils.data.sampler.Sampler, Iterable]¶

pin_memory_device: str¶

prefetch_factor: int¶

easycv.datasets.loader.sampler module¶

class easycv.datasets.loader.sampler.DistributedMPSampler(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False, **kwargs)[source]¶

Bases: torch.utils.data.sampler.Sampler[torch.utils.data.distributed.T_co]

__init__(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False, **kwargs)[source]¶

A Distribute sampler which support sample m instance from one class once for classification dataset dataset: pytorch dataset object num_replicas (optional): Number of processes participating in

distributed training.

rank (optional): Rank of the current process within num_replicas. shuffle (optional): If true (default), sampler will shuffle the indices split_huge_listfile_byrank: if split, return all indice for each rank, because list for each rank has been

split before build dataset in dist training

generate_indice()[source]¶

get_label_dict()[source]¶

calculate_this_label_list()[source]¶

class easycv.datasets.loader.sampler.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, replace=False, split_huge_listfile_byrank=False)[source]¶

Bases: torch.utils.data.sampler.Sampler[torch.utils.data.distributed.T_co]

__init__(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, replace=False, split_huge_listfile_byrank=False)[source]¶

A Distribute sampler which support sample m instance from one class once for classification dataset :param dataset: pytorch dataset object :param num_replicas: Number of processes participating in

distributed training.

Parameters

rank (optional) – Rank of the current process within num_replicas.
shuffle (optional) – If true (default), sampler will shuffle the indices
seed (int, Optional) – The seed. Default to 0.
split_huge_listfile_byrank – if split, return all indice for each rank, because list for each rank has been split before build dataset in dist training

generate_new_list()[source]¶

set_uniform_indices(labels, num_classes)[source]¶

class easycv.datasets.loader.sampler.GroupSampler(dataset, samples_per_gpu=1)[source]¶

Bases: Generic[torch.utils.data.sampler.T_co]

__init__(dataset, samples_per_gpu=1)[source]¶: Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.loader.sampler.DistributedGroupSampler(dataset, samples_per_gpu=1, seed=0, num_replicas=None, rank=None)[source]¶

Bases: Generic[torch.utils.data.sampler.T_co]

Sampler that restricts data loading to a subset of the dataset. It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it. .. note:

Dataset is assumed to be of constant size.

Parameters

dataset – Dataset used for sampling.
seed (int, Optional) – The seed. Default to 0.
num_replicas (optional) – Number of processes participating in distributed training.
rank (optional) – Rank of the current process within num_replicas.

__init__(dataset, samples_per_gpu=1, seed=0, num_replicas=None, rank=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

set_epoch(epoch)[source]¶

class easycv.datasets.loader.sampler.DistributedGivenIterationSampler(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]¶

Bases: Generic[torch.utils.data.sampler.T_co]

__init__(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]¶: Initialize self. See help(type(self)) for accurate signature.

set_uniform_indices(labels, num_classes)[source]¶

gen_new_list()[source]¶

set_epoch(epoch)[source]¶

class easycv.datasets.loader.sampler.RASampler(dataset, num_replicas=None, rank=None, shuffle=True, num_repeats: int = 3, **kwargs)[source]¶

Bases: Generic[torch.utils.data.sampler.T_co]

Sampler that restricts data loading to a subset of the dataset for distributed, with repeated augmentation. It ensures that different each augmented version of a sample will be visible to a different process (GPU) Heavily based on torch.utils.data.DistributedSampler

__init__(dataset, num_replicas=None, rank=None, shuffle=True, num_repeats: int = 3, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

set_epoch(epoch)[source]¶