espnet2.train package¶
espnet2.train.spk_trainer¶
Trainer module for speaker recognition.
-
class
espnet2.train.spk_trainer.
SpkTrainer
[source]¶ Bases:
espnet2.train.trainer.Trainer
Trainer. Designed for speaker recognition. Training will be done as closed set classification. Validation will be open set EER calculation.
espnet2.train.abs_gan_espnet_model¶
ESPnetModel abstract class for GAN-based training.
-
class
espnet2.train.abs_gan_espnet_model.
AbsGANESPnetModel
[source]¶ Bases:
espnet2.train.abs_espnet_model.AbsESPnetModel
,torch.nn.modules.module.Module
,abc.ABC
The common abstract class among each GAN-based task.
“ESPnetModel” is referred to a class which inherits torch.nn.Module, and makes the dnn-models “forward” as its member field, a.k.a delegate pattern. And “forward” must accept the argument “forward_generator” and Return the dict of “loss”, “stats”, “weight”, and “optim_idx”. “optim_idx” for generator must be 0 and that for discriminator must be 1.
Example
>>> from espnet2.tasks.abs_task import AbsTask >>> class YourESPnetModel(AbsGANESPnetModel): ... def forward(self, input, input_lengths, forward_generator=True): ... ... ... if forward_generator: ... # return loss for the generator ... # optim idx 0 indicates generator optimizer ... return dict(loss=loss, stats=stats, weight=weight, optim_idx=0) ... else: ... # return loss for the discriminator ... # optim idx 1 indicates discriminator optimizer ... return dict(loss=loss, stats=stats, weight=weight, optim_idx=1) >>> class YourTask(AbsTask): ... @classmethod ... def build_model(cls, args: argparse.Namespace) -> YourESPnetModel:
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(forward_generator: bool = True, **batch) → Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor], int]][source]¶ Return the generator loss or the discrimiantor loss.
This method must have an argument “forward_generator” to switch the generator loss calculation and the discrimiantor loss calculation. If forward_generator is true, return the generator loss with optim_idx 0. If forward_generator is false, return the discrimiantor loss with optim_idx 1.
- Parameters:
forward_generator (bool) – Whether to return the generator loss or the discrimiantor loss. This must have the default value.
- Returns:
loss (Tensor): Loss scalar tensor.
stats (Dict[str, float]): Statistics to be monitored.
weight (Tensor): Weight tensor to summarize losses.
optim_idx (int): Optimizer index (0 for G and 1 for D).
- Return type:
Dict[str, Any]
-
abstract
espnet2.train.collate_fn¶
-
class
espnet2.train.collate_fn.
CommonCollateFn
(float_pad_value: Union[float, int] = 0.0, int_pad_value: int = -32768, not_sequence: Collection[str] = ())[source]¶ Bases:
object
Functor class of common_collate_fn()
-
class
espnet2.train.collate_fn.
HuBERTCollateFn
(float_pad_value: Union[float, int] = 0.0, int_pad_value: int = -32768, label_downsampling: int = 1, pad: bool = False, rand_crop: bool = True, crop_audio: bool = True, not_sequence: Collection[str] = ())[source]¶ Bases:
espnet2.train.collate_fn.CommonCollateFn
Functor class of common_collate_fn()
-
espnet2.train.collate_fn.
common_collate_fn
(data: Collection[Tuple[str, Dict[str, numpy.ndarray]]], float_pad_value: Union[float, int] = 0.0, int_pad_value: int = -32768, not_sequence: Collection[str] = ()) → Tuple[List[str], Dict[str, torch.Tensor]][source]¶ Concatenate ndarray-list to an array and convert to torch.Tensor.
Examples
>>> from espnet2.samplers.constant_batch_sampler import ConstantBatchSampler, >>> import espnet2.tasks.abs_task >>> from espnet2.train.dataset import ESPnetDataset >>> sampler = ConstantBatchSampler(...) >>> dataset = ESPnetDataset(...) >>> keys = next(iter(sampler) >>> batch = [dataset[key] for key in keys] >>> batch = common_collate_fn(batch) >>> model(**batch)
Note that the dict-keys of batch are propagated from that of the dataset as they are.
espnet2.train.dataset¶
-
class
espnet2.train.dataset.
AdapterForLabelScpReader
(loader)[source]¶ Bases:
collections.abc.Mapping
-
class
espnet2.train.dataset.
AdapterForSingingScoreScpReader
(loader)[source]¶ Bases:
collections.abc.Mapping
-
class
espnet2.train.dataset.
AdapterForSoundScpReader
(loader, dtype=None)[source]¶ Bases:
collections.abc.Mapping
-
class
espnet2.train.dataset.
ESPnetDataset
(path_name_type_list: Collection[Tuple[str, str, str]], preprocess: Callable[[str, Dict[str, numpy.ndarray]], Dict[str, numpy.ndarray]] = None, float_dtype: str = 'float32', int_dtype: str = 'long', max_cache_size: Union[float, int, str] = 0.0, max_cache_fd: int = 0)[source]¶ Bases:
espnet2.train.dataset.AbsDataset
Pytorch Dataset class for ESPNet.
Examples
>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'), ... ('token_int', 'output', 'text_int')], ... ) ... uttid, data = dataset['uttid'] {'input': per_utt_array, 'output': per_utt_array}
espnet2.train.gan_trainer¶
Trainer module for GAN-based training.
-
class
espnet2.train.gan_trainer.
GANTrainer
[source]¶ Bases:
espnet2.train.trainer.Trainer
Trainer for GAN-based training.
If you’d like to use this trainer, the model must inherit espnet.train.abs_gan_espnet_model.AbsGANESPnetModel.
-
classmethod
add_arguments
(parser: argparse.ArgumentParser)[source]¶ Add additional arguments for GAN-trainer.
-
classmethod
build_options
(args: argparse.Namespace) → espnet2.train.trainer.TrainerOptions[source]¶ Build options consumed by train(), eval(), and plot_attention().
-
classmethod
train_one_epoch
(model: torch.nn.modules.module.Module, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], reporter: espnet2.train.reporter.SubReporter, summary_writer, options: espnet2.train.gan_trainer.GANTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → bool[source]¶ Train one epoch.
-
classmethod
validate_one_epoch
(model: torch.nn.modules.module.Module, iterator: Iterable[Dict[str, torch.Tensor]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.gan_trainer.GANTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]¶ Validate one epoch.
-
classmethod
-
class
espnet2.train.gan_trainer.
GANTrainerOptions
(ngpu: int, resume: bool, use_amp: bool, train_dtype: str, grad_noise: bool, accum_grad: int, grad_clip: float, grad_clip_type: float, log_interval: Optional[int], no_forward_run: bool, use_matplotlib: bool, use_tensorboard: bool, use_wandb: bool, output_dir: Union[pathlib.Path, str], max_epoch: int, seed: int, sharded_ddp: bool, patience: Optional[int], keep_nbest_models: Union[int, List[int]], nbest_averaging_interval: int, early_stopping_criterion: Sequence[str], best_model_criterion: Sequence[Sequence[str]], val_scheduler_criterion: Sequence[str], unused_parameters: bool, wandb_model_log_interval: int, create_graph_in_tensorboard: bool, generator_first: bool)[source]¶ Bases:
espnet2.train.trainer.TrainerOptions
Trainer option dataclass for GANTrainer.
espnet2.train.class_choices¶
-
class
espnet2.train.class_choices.
ClassChoices
(name: str, classes: Mapping[str, type], type_check: type = None, default: str = None, optional: bool = False)[source]¶ Bases:
object
Helper class to manage the options for variable objects and its configuration.
Example:
>>> class A: ... def __init__(self, foo=3): pass >>> class B: ... def __init__(self, bar="aaaa"): pass >>> choices = ClassChoices("var", dict(a=A, b=B), default="a") >>> import argparse >>> parser = argparse.ArgumentParser() >>> choices.add_arguments(parser) >>> args = parser.parse_args(["--var", "a", "--var_conf", "foo=4") >>> args.var a >>> args.var_conf {"foo": 4} >>> class_obj = choices.get_class(args.var) >>> a_object = class_obj(**args.var_conf)
espnet2.train.iterable_dataset¶
Iterable dataset module.
-
class
espnet2.train.iterable_dataset.
IterableESPnetDataset
(path_name_type_list: Collection[Tuple[str, str, str]], preprocess: Callable[[str, Dict[str, numpy.ndarray]], Dict[str, numpy.ndarray]] = None, float_dtype: str = 'float32', int_dtype: str = 'long', key_file: str = None)[source]¶ Bases:
torch.utils.data.dataset.IterableDataset
Pytorch Dataset class for ESPNet.
Examples
>>> dataset = IterableESPnetDataset([('wav.scp', 'input', 'sound'), ... ('token_int', 'output', 'text_int')], ... ) >>> for uid, data in dataset: ... data {'input': per_utt_array, 'output': per_utt_array}
espnet2.train.distributed_utils¶
-
class
espnet2.train.distributed_utils.
DistributedOption
(distributed: bool = False, dist_backend: str = 'nccl', dist_init_method: str = 'env://', dist_world_size: Union[int, NoneType] = None, dist_rank: Union[int, NoneType] = None, local_rank: Union[int, NoneType] = None, ngpu: int = 0, dist_master_addr: Union[str, NoneType] = None, dist_master_port: Union[int, NoneType] = None, dist_launcher: Union[str, NoneType] = None, multiprocessing_distributed: bool = True)[source]¶ Bases:
object
-
dist_backend
= 'nccl'¶
-
dist_init_method
= 'env://'¶
-
dist_launcher
= None¶
-
dist_master_addr
= None¶
-
dist_master_port
= None¶
-
dist_rank
= None¶
-
dist_world_size
= None¶
-
distributed
= False¶
-
local_rank
= None¶
-
multiprocessing_distributed
= True¶
-
ngpu
= 0¶
-
-
espnet2.train.distributed_utils.
free_port
()[source]¶ Find free port using bind().
There are some interval between finding this port and using it and the other process might catch the port by that time. Thus it is not guaranteed that the port is really empty.
-
espnet2.train.distributed_utils.
get_local_rank
(prior=None, launcher: str = None) → Optional[int][source]¶
-
espnet2.train.distributed_utils.
get_master_addr
(prior=None, launcher: str = None) → Optional[str][source]¶
-
espnet2.train.distributed_utils.
get_node_rank
(prior=None, launcher: str = None) → Optional[int][source]¶ Get Node Rank.
Use for “multiprocessing distributed” mode. The initial RANK equals to the Node id in this case and the real Rank is set as (nGPU * NodeID) + LOCAL_RANK in torch.distributed.
espnet2.train.reporter¶
Reporter module.
-
class
espnet2.train.reporter.
Average
(value: Union[float, int, complex, torch.Tensor, numpy.ndarray])[source]¶
-
class
espnet2.train.reporter.
Reporter
(epoch: int = 0)[source]¶ Bases:
object
Reporter class.
Examples
>>> reporter = Reporter() >>> with reporter.observe('train') as sub_reporter: ... for batch in iterator: ... stats = dict(loss=0.2) ... sub_reporter.register(stats)
-
check_early_stopping
(patience: int, key1: str, key2: str, mode: str, epoch: int = None, logger=None) → bool[source]¶
-
matplotlib_plot
(output_dir: Union[str, pathlib.Path])[source]¶ Plot stats using Matplotlib and save images.
-
observe
(key: str, epoch: int = None) → AbstractContextManager[espnet2.train.reporter.SubReporter][source]¶
-
-
class
espnet2.train.reporter.
SubReporter
(key: str, epoch: int, total_count: int)[source]¶ Bases:
object
This class is used in Reporter.
See the docstring of Reporter for the usage.
-
class
espnet2.train.reporter.
WeightedAverage
(value: Tuple[Union[float, int, complex, torch.Tensor, numpy.ndarray], Union[float, int, complex, torch.Tensor, numpy.ndarray]], weight: Union[float, int, complex, torch.Tensor, numpy.ndarray])[source]¶
-
espnet2.train.reporter.
aggregate
(values: Sequence[ReportedValue]) → Union[float, int, complex, torch.Tensor, numpy.ndarray][source]¶
espnet2.train.__init__¶
espnet2.train.preprocessor¶
-
class
espnet2.train.preprocessor.
CommonPreprocessor
(train: bool, token_type: str = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, aux_task_names: Collection[str] = None, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: str = 'text', fs: int = 0, nonsplit_symbol: Iterable[str] = None)[source]¶
-
class
espnet2.train.preprocessor.
CommonPreprocessor_multi
(train: bool, token_type: str = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, aux_task_names: Collection[str] = None, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: List[str] = ['text'], fs: int = 0, speaker_change_symbol: Iterable[str] = None)[source]¶
-
class
espnet2.train.preprocessor.
DynamicMixingPreprocessor
(train: bool, source_scp: str = None, ref_num: int = 2, dynamic_mixing_gain_db: float = 0.0, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', mixture_source_name: str = None, utt2spk: str = None, categories: Optional[List] = None)[source]¶
-
class
espnet2.train.preprocessor.
EnhPreprocessor
(train: bool, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', noise_ref_name_prefix: str = 'noise_ref', dereverb_ref_name_prefix: str = 'dereverb_ref', use_reverberant_ref: bool = False, num_spk: int = 1, num_noise_type: int = 1, sample_rate: int = 8000, force_single_channel: bool = False, channel_reordering: bool = False, categories: Optional[List] = None)[source]¶ Bases:
espnet2.train.preprocessor.CommonPreprocessor
Preprocessor for Speech Enhancement (Enh) task.
-
class
espnet2.train.preprocessor.
MutliTokenizerCommonPreprocessor
(train: bool, token_type: List[str] = [None], token_list: List[Union[pathlib.Path, str, Iterable[str]]] = [None], bpemodel: List[Union[pathlib.Path, str, Iterable[str]]] = [None], text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: List[str] = ['text'], tokenizer_encode_conf: List[Dict] = [{}, {}])[source]¶
-
class
espnet2.train.preprocessor.
SLUPreprocessor
(train: bool, token_type: str = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, transcript_token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: str = 'text')[source]¶
-
class
espnet2.train.preprocessor.
SVSPreprocessor
(train: bool, token_type: str = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, singing_volume_normalize: float = None, singing_name: str = 'singing', text_name: str = 'text', label_name: str = 'label', midi_name: str = 'score', fs: numpy.int32 = 0, hop_length: numpy.int32 = 256, phn_seg: dict = {1: [1], 2: [0.25, 1], 3: [0.1, 0.5, 1], 4: [0.05, 0.1, 0.5, 1]})[source]¶ Bases:
espnet2.train.preprocessor.AbsPreprocessor
Preprocessor for Sing Voice Sythesis (SVS) task.
-
class
espnet2.train.preprocessor.
SpkPreprocessor
(train: bool, spk2utt: str, target_duration: float, sample_rate: int = 16000, num_eval: int = 10, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_info: List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]] = None, noise_apply_prob: float = 1.0, short_noise_thres: float = 0.5)[source]¶ Bases:
espnet2.train.preprocessor.CommonPreprocessor
Preprocessor for Speaker tasks.
- Parameters:
train (bool) – Whether to use in training mode.
spk2utt (str) – Path to the spk2utt file.
target_duration (float) – Target duration in seconds.
sample_rate (int) – Sampling rate.
num_eval (int) – Number of utterances to be used for evaluation.
rir_scp (str) – Path to the RIR scp file.
rir_apply_prob (float) – Probability of applying RIR.
noise_info (List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]]) –
List of tuples of noise information. Each tuple represents a noise type. Each tuple consists of (prob, noise_scp, num_to_mix, db_range).
prob (float) is the probability of applying the noise type.
noise_scp (str) is the path to the noise scp file.
- num_to_mix (Tuple[int, int]) is the range of the number of noises
to be mixed.
db_range (Tuple[float, float]) is the range of noise levels in dB.
noise_apply_prob (float) – Probability of applying noise.
short_noise_thres (float) – Threshold of short noise.
-
class
espnet2.train.preprocessor.
TSEPreprocessor
(train: bool, train_spk2enroll: str = None, enroll_segment: int = None, load_spk_embedding: bool = False, load_all_speakers: bool = False, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', noise_ref_name_prefix: str = 'noise_ref', dereverb_ref_name_prefix: str = 'dereverb_ref', use_reverberant_ref: bool = False, num_spk: int = 1, num_noise_type: int = 1, sample_rate: int = 8000, force_single_channel: bool = False, channel_reordering: bool = False, categories: Optional[List] = None)[source]¶ Bases:
espnet2.train.preprocessor.EnhPreprocessor
Preprocessor for Target Speaker Extraction.
-
espnet2.train.preprocessor.
detect_non_silence
(x: numpy.ndarray, threshold: float = 0.01, frame_length: int = 1024, frame_shift: int = 512, window: str = 'boxcar') → numpy.ndarray[source]¶ Power based voice activity detection.
- Parameters:
x – (Channel, Time)
>>> x = np.random.randn(1000) >>> detect = detect_non_silence(x) >>> assert x.shape == detect.shape >>> assert detect.dtype == np.bool
espnet2.train.abs_espnet_model¶
-
class
espnet2.train.abs_espnet_model.
AbsESPnetModel
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
The common abstract class among each tasks
“ESPnetModel” is referred to a class which inherits torch.nn.Module, and makes the dnn-models forward as its member field, a.k.a delegate pattern, and defines “loss”, “stats”, and “weight” for the task.
If you intend to implement new task in ESPNet, the model must inherit this class. In other words, the “mediator” objects between our training system and the your task class are just only these three values, loss, stats, and weight.
Example
>>> from espnet2.tasks.abs_task import AbsTask >>> class YourESPnetModel(AbsESPnetModel): ... def forward(self, input, input_lengths): ... ... ... return loss, stats, weight >>> class YourTask(AbsTask): ... @classmethod ... def build_model(cls, args: argparse.Namespace) -> YourESPnetModel:
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(**batch) → Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.espnet2.train.uasr_trainer¶
Trainer module for GAN-based UASR training.
-
class
espnet2.train.uasr_trainer.
UASRTrainer
[source]¶ Bases:
espnet2.train.trainer.Trainer
Trainer for GAN-based UASR training.
If you’d like to use this trainer, the model must inherit espnet.train.abs_gan_espnet_model.AbsGANESPnetModel.
-
classmethod
add_arguments
(parser: argparse.ArgumentParser)[source]¶ Add additional arguments for GAN-trainer.
-
classmethod
build_options
(args: argparse.Namespace) → espnet2.train.trainer.TrainerOptions[source]¶ Build options consumed by train(), eval(), and plot_attention().
-
classmethod
train_one_epoch
(model: torch.nn.modules.module.Module, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], reporter: espnet2.train.reporter.SubReporter, summary_writer, options: espnet2.train.uasr_trainer.UASRTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → bool[source]¶ Train one epoch for UASR.
-
classmethod
validate_one_epoch
(model: torch.nn.modules.module.Module, iterator: Iterable[Dict[str, torch.Tensor]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.uasr_trainer.UASRTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]¶ Validate one epoch.
-
classmethod
-
class
espnet2.train.uasr_trainer.
UASRTrainerOptions
(ngpu: int, resume: bool, use_amp: bool, train_dtype: str, grad_noise: bool, accum_grad: int, grad_clip: float, grad_clip_type: float, log_interval: Optional[int], no_forward_run: bool, use_matplotlib: bool, use_tensorboard: bool, use_wandb: bool, output_dir: Union[pathlib.Path, str], max_epoch: int, seed: int, sharded_ddp: bool, patience: Optional[int], keep_nbest_models: Union[int, List[int]], nbest_averaging_interval: int, early_stopping_criterion: Sequence[str], best_model_criterion: Sequence[Sequence[str]], val_scheduler_criterion: Sequence[str], unused_parameters: bool, wandb_model_log_interval: int, create_graph_in_tensorboard: bool, generator_first: bool, max_num_warning: int)[source]¶ Bases:
espnet2.train.trainer.TrainerOptions
Trainer option dataclass for UASRTrainer.
espnet2.train.trainer¶
Trainer module.
-
class
espnet2.train.trainer.
Trainer
[source]¶ Bases:
object
Trainer having a optimizer.
If you’d like to use multiple optimizers, then inherit this class and override the methods if necessary - at least “train_one_epoch()”
>>> class TwoOptimizerTrainer(Trainer): ... @classmethod ... def add_arguments(cls, parser): ... ... ... ... @classmethod ... def train_one_epoch(cls, model, optimizers, ...): ... loss1 = model.model1(...) ... loss1.backward() ... optimizers[0].step() ... ... loss2 = model.model2(...) ... loss2.backward() ... optimizers[1].step()
-
classmethod
add_arguments
(parser: argparse.ArgumentParser)[source]¶ Reserved for future development of another Trainer
-
classmethod
build_options
(args: argparse.Namespace) → espnet2.train.trainer.TrainerOptions[source]¶ Build options consumed by train(), eval(), and plot_attention()
-
classmethod
plot_attention
(model: torch.nn.modules.module.Module, output_dir: Optional[pathlib.Path], summary_writer, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.trainer.TrainerOptions) → None[source]¶
-
static
resume
(checkpoint: Union[str, pathlib.Path], model: torch.nn.modules.module.Module, reporter: espnet2.train.reporter.Reporter, optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], ngpu: int = 0)[source]¶
-
classmethod
run
(model: espnet2.train.abs_espnet_model.AbsESPnetModel, optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], train_iter_factory: espnet2.iterators.abs_iter_factory.AbsIterFactory, valid_iter_factory: espnet2.iterators.abs_iter_factory.AbsIterFactory, plot_attention_iter_factory: Optional[espnet2.iterators.abs_iter_factory.AbsIterFactory], trainer_options, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]¶ Perform training. This method performs the main process of training.
-
classmethod
train_one_epoch
(model: torch.nn.modules.module.Module, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], reporter: espnet2.train.reporter.SubReporter, summary_writer, options: espnet2.train.trainer.TrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → bool[source]¶
-
classmethod
-
class
espnet2.train.trainer.
TrainerOptions
(ngpu: int, resume: bool, use_amp: bool, train_dtype: str, grad_noise: bool, accum_grad: int, grad_clip: float, grad_clip_type: float, log_interval: Union[int, NoneType], no_forward_run: bool, use_matplotlib: bool, use_tensorboard: bool, use_wandb: bool, output_dir: Union[pathlib.Path, str], max_epoch: int, seed: int, sharded_ddp: bool, patience: Union[int, NoneType], keep_nbest_models: Union[int, List[int]], nbest_averaging_interval: int, early_stopping_criterion: Sequence[str], best_model_criterion: Sequence[Sequence[str]], val_scheduler_criterion: Sequence[str], unused_parameters: bool, wandb_model_log_interval: int, create_graph_in_tensorboard: bool)[source]¶ Bases:
object
-
class
-
abstract