espnet2.iterators package

espnet2.iterators.multiple_iter_factory

class espnet2.iterators.multiple_iter_factory.MultipleIterFactory(build_funcs: Collection[Callable[[], espnet2.iterators.abs_iter_factory.AbsIterFactory]], seed: int = 0, shuffle: bool = False)[source]

Bases: espnet2.iterators.abs_iter_factory.AbsIterFactory

build_iter(epoch: int, shuffle: bool = None) → Iterator[source]

espnet2.iterators.chunk_iter_factory

class espnet2.iterators.chunk_iter_factory.ChunkIterFactory(dataset, batch_size: int, batches: Union[espnet2.samplers.abs_sampler.AbsSampler, Sequence[Sequence[Any]]], chunk_length: Union[int, str], chunk_shift_ratio: float = 0.5, num_cache_chunks: int = 1024, num_samples_per_epoch: Optional[int] = None, seed: int = 0, shuffle: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False, excluded_key_prefixes: Optional[List[str]] = None)[source]

Bases: espnet2.iterators.abs_iter_factory.AbsIterFactory

Creates chunks from a sequence

Examples

>>> batches = [["id1"], ["id2"], ...]
>>> batch_size = 128
>>> chunk_length = 1000
>>> iter_factory = ChunkIterFactory(dataset, batches, batch_size, chunk_length)
>>> it = iter_factory.build_iter(epoch)
>>> for ids, batch in it:
...     ...
  • The number of mini-batches are varied in each epochs and we can’t get the number in advance because IterFactory doesn’t be given to the length information.

  • Since the first reason, “num_iters_per_epoch” can’t be implemented for this iterator. Instead of it, “num_samples_per_epoch” is implemented.

build_iter(epoch: int, shuffle: Optional[bool] = None) → Iterator[Tuple[List[str], Dict[str, torch.Tensor]]][source]

espnet2.iterators.sequence_iter_factory

class espnet2.iterators.sequence_iter_factory.RawSampler(batches)[source]

Bases: espnet2.samplers.abs_sampler.AbsSampler

generate(seed)[source]
class espnet2.iterators.sequence_iter_factory.SequenceIterFactory(dataset, batches: Union[espnet2.samplers.abs_sampler.AbsSampler, Sequence[Sequence[Any]]], num_iters_per_epoch: int = None, seed: int = 0, shuffle: bool = False, shuffle_within_batch: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False)[source]

Bases: espnet2.iterators.abs_iter_factory.AbsIterFactory

Build iterator for each epoch.

This class simply creates pytorch DataLoader except for the following points: - The random seed is decided according to the number of epochs. This feature

guarantees reproducibility when resuming from middle of training process.

  • Enable to restrict the number of samples for one epoch. This features controls the interval number between training and evaluation.

build_iter(epoch: int, shuffle: bool = None) → torch.utils.data.dataloader.DataLoader[source]
espnet2.iterators.sequence_iter_factory.worker_init_fn(worker_id, base_seed=0)[source]

Set random seed for each worker in DataLoader.

espnet2.iterators.abs_iter_factory

class espnet2.iterators.abs_iter_factory.AbsIterFactory[source]

Bases: abc.ABC

abstract build_iter(epoch: int, shuffle: bool = None) → Iterator[source]

espnet2.iterators.__init__