easycv.datasets.loader package

class easycv.datasets.loader.GroupSampler(dataset, samples_per_gpu=1)[source]

Bases: Generic[torch.utils.data.sampler.T_co]

__init__(dataset, samples_per_gpu=1)[source]

Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.loader.DistributedGroupSampler(dataset, samples_per_gpu=1, seed=0, num_replicas=None, rank=None)[source]

Bases: Generic[torch.utils.data.sampler.T_co]

Sampler that restricts data loading to a subset of the dataset. It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it. .. note:

Dataset is assumed to be of constant size.
Parameters
  • dataset – Dataset used for sampling.

  • seed (int, Optional) – The seed. Default to 0.

  • num_replicas (optional) – Number of processes participating in distributed training.

  • rank (optional) – Rank of the current process within num_replicas.

__init__(dataset, samples_per_gpu=1, seed=0, num_replicas=None, rank=None)[source]

Initialize self. See help(type(self)) for accurate signature.

set_epoch(epoch)[source]
easycv.datasets.loader.build_dataloader(dataset, imgs_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, replace=False, seed=None, reuse_worker_cache=False, odps_config=None, persistent_workers=False, collate_hooks=None, use_repeated_augment_sampler=False, sampler=None, pin_memory=False, **kwargs)[source]

Build PyTorch DataLoader. In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs. :param dataset: A PyTorch dataset. :type dataset: Dataset :param imgs_per_gpu: Number of images on each GPU, i.e., batch size of

each GPU.

Parameters
  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.

  • num_gpus (int) – Number of GPUs. Only used in non-distributed training.

  • dist (bool) – Distributed training/test or not. Default: True.

  • shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.

  • replace (bool) – Replace or not in random shuffle. It works on when shuffle is True.

  • seed (int, Optional) – The seed. Default to None.

  • reuse_worker_cache (bool) – If set true, will reuse worker process so that cached data in worker process can be reused.

  • persistent_workers (bool) – After pytorch1.7, could use persistent_workers=True to avoid reconstruct dataworker before each epoch, speed up before epoch

  • use_repeated_augment_sampler (bool) – If set true, it will use RASampler. Default: False.

  • kwargs – any keyword argument to be used to initialize DataLoader

Returns

A PyTorch dataloader.

Return type

DataLoader

class easycv.datasets.loader.DistributedGivenIterationSampler(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]

Bases: Generic[torch.utils.data.sampler.T_co]

__init__(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]

Initialize self. See help(type(self)) for accurate signature.

set_uniform_indices(labels, num_classes)[source]
gen_new_list()[source]
set_epoch(epoch)[source]
class easycv.datasets.loader.DistributedMPSampler(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False, **kwargs)[source]

Bases: torch.utils.data.sampler.Sampler[torch.utils.data.distributed.T_co]

__init__(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False, **kwargs)[source]

A Distribute sampler which support sample m instance from one class once for classification dataset dataset: pytorch dataset object num_replicas (optional): Number of processes participating in

distributed training.

rank (optional): Rank of the current process within num_replicas. shuffle (optional): If true (default), sampler will shuffle the indices split_huge_listfile_byrank: if split, return all indice for each rank, because list for each rank has been

split before build dataset in dist training

generate_indice()[source]
get_label_dict()[source]
calculate_this_label_list()[source]
class easycv.datasets.loader.RASampler(dataset, num_replicas=None, rank=None, shuffle=True, num_repeats: int = 3, **kwargs)[source]

Bases: Generic[torch.utils.data.sampler.T_co]

Sampler that restricts data loading to a subset of the dataset for distributed, with repeated augmentation. It ensures that different each augmented version of a sample will be visible to a different process (GPU) Heavily based on torch.utils.data.DistributedSampler

__init__(dataset, num_replicas=None, rank=None, shuffle=True, num_repeats: int = 3, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

set_epoch(epoch)[source]
class easycv.datasets.loader.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, replace=False, split_huge_listfile_byrank=False)[source]

Bases: torch.utils.data.sampler.Sampler[torch.utils.data.distributed.T_co]

__init__(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, replace=False, split_huge_listfile_byrank=False)[source]

A Distribute sampler which support sample m instance from one class once for classification dataset :param dataset: pytorch dataset object :param num_replicas: Number of processes participating in

distributed training.

Parameters
  • rank (optional) – Rank of the current process within num_replicas.

  • shuffle (optional) – If true (default), sampler will shuffle the indices

  • seed (int, Optional) – The seed. Default to 0.

  • split_huge_listfile_byrank – if split, return all indice for each rank, because list for each rank has been split before build dataset in dist training

generate_new_list()[source]
set_uniform_indices(labels, num_classes)[source]

Submodules

easycv.datasets.loader.build_loader module

easycv.datasets.loader.build_loader.build_dataloader(dataset, imgs_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, replace=False, seed=None, reuse_worker_cache=False, odps_config=None, persistent_workers=False, collate_hooks=None, use_repeated_augment_sampler=False, sampler=None, pin_memory=False, **kwargs)[source]

Build PyTorch DataLoader. In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs. :param dataset: A PyTorch dataset. :type dataset: Dataset :param imgs_per_gpu: Number of images on each GPU, i.e., batch size of

each GPU.

Parameters
  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.

  • num_gpus (int) – Number of GPUs. Only used in non-distributed training.

  • dist (bool) – Distributed training/test or not. Default: True.

  • shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.

  • replace (bool) – Replace or not in random shuffle. It works on when shuffle is True.

  • seed (int, Optional) – The seed. Default to None.

  • reuse_worker_cache (bool) – If set true, will reuse worker process so that cached data in worker process can be reused.

  • persistent_workers (bool) – After pytorch1.7, could use persistent_workers=True to avoid reconstruct dataworker before each epoch, speed up before epoch

  • use_repeated_augment_sampler (bool) – If set true, it will use RASampler. Default: False.

  • kwargs – any keyword argument to be used to initialize DataLoader

Returns

A PyTorch dataloader.

Return type

DataLoader

easycv.datasets.loader.build_loader.worker_init_fn(worker_id, num_workers, rank, seed, odps_config=None)[source]
class easycv.datasets.loader.build_loader.InfiniteDataLoader(*args, **kwargs)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

Dataloader that reuses workers. https://github.com/pytorch/pytorch/issues/15849 Uses same syntax as vanilla DataLoader.

__init__(*args, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

dataset: torch.utils.data.dataset.Dataset[torch.utils.data.dataloader.T_co]
batch_size: Optional[int]
num_workers: int
pin_memory: bool
drop_last: bool
timeout: float
sampler: Union[torch.utils.data.sampler.Sampler, Iterable]
pin_memory_device: str
prefetch_factor: int

easycv.datasets.loader.sampler module

class easycv.datasets.loader.sampler.DistributedMPSampler(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False, **kwargs)[source]

Bases: torch.utils.data.sampler.Sampler[torch.utils.data.distributed.T_co]

__init__(dataset, num_replicas=None, rank=None, shuffle=True, split_huge_listfile_byrank=False, **kwargs)[source]

A Distribute sampler which support sample m instance from one class once for classification dataset dataset: pytorch dataset object num_replicas (optional): Number of processes participating in

distributed training.

rank (optional): Rank of the current process within num_replicas. shuffle (optional): If true (default), sampler will shuffle the indices split_huge_listfile_byrank: if split, return all indice for each rank, because list for each rank has been

split before build dataset in dist training

generate_indice()[source]
get_label_dict()[source]
calculate_this_label_list()[source]
class easycv.datasets.loader.sampler.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, replace=False, split_huge_listfile_byrank=False)[source]

Bases: torch.utils.data.sampler.Sampler[torch.utils.data.distributed.T_co]

__init__(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, replace=False, split_huge_listfile_byrank=False)[source]

A Distribute sampler which support sample m instance from one class once for classification dataset :param dataset: pytorch dataset object :param num_replicas: Number of processes participating in

distributed training.

Parameters
  • rank (optional) – Rank of the current process within num_replicas.

  • shuffle (optional) – If true (default), sampler will shuffle the indices

  • seed (int, Optional) – The seed. Default to 0.

  • split_huge_listfile_byrank – if split, return all indice for each rank, because list for each rank has been split before build dataset in dist training

generate_new_list()[source]
set_uniform_indices(labels, num_classes)[source]
class easycv.datasets.loader.sampler.GroupSampler(dataset, samples_per_gpu=1)[source]

Bases: Generic[torch.utils.data.sampler.T_co]

__init__(dataset, samples_per_gpu=1)[source]

Initialize self. See help(type(self)) for accurate signature.

class easycv.datasets.loader.sampler.DistributedGroupSampler(dataset, samples_per_gpu=1, seed=0, num_replicas=None, rank=None)[source]

Bases: Generic[torch.utils.data.sampler.T_co]

Sampler that restricts data loading to a subset of the dataset. It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it. .. note:

Dataset is assumed to be of constant size.
Parameters
  • dataset – Dataset used for sampling.

  • seed (int, Optional) – The seed. Default to 0.

  • num_replicas (optional) – Number of processes participating in distributed training.

  • rank (optional) – Rank of the current process within num_replicas.

__init__(dataset, samples_per_gpu=1, seed=0, num_replicas=None, rank=None)[source]

Initialize self. See help(type(self)) for accurate signature.

set_epoch(epoch)[source]
class easycv.datasets.loader.sampler.DistributedGivenIterationSampler(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]

Bases: Generic[torch.utils.data.sampler.T_co]

__init__(dataset, total_iter, batch_size, num_replicas=None, rank=None, last_iter=- 1)[source]

Initialize self. See help(type(self)) for accurate signature.

set_uniform_indices(labels, num_classes)[source]
gen_new_list()[source]
set_epoch(epoch)[source]
class easycv.datasets.loader.sampler.RASampler(dataset, num_replicas=None, rank=None, shuffle=True, num_repeats: int = 3, **kwargs)[source]

Bases: Generic[torch.utils.data.sampler.T_co]

Sampler that restricts data loading to a subset of the dataset for distributed, with repeated augmentation. It ensures that different each augmented version of a sample will be visible to a different process (GPU) Heavily based on torch.utils.data.DistributedSampler

__init__(dataset, num_replicas=None, rank=None, shuffle=True, num_repeats: int = 3, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

set_epoch(epoch)[source]