espnet.asr package¶
Initialize sub package.
espnet.asr.__init__¶
Initialize sub package.
espnet.asr.asr_mix_utils¶
This script is used to provide utility functions designed for multi-speaker ASR.
- Copyright 2017 Johns Hopkins University (Shinji Watanabe)
Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
- Most functions can be directly used as in asr_utils.py:
CompareValueTrigger, restore_snapshot, adadelta_eps_decay, chainer_load, torch_snapshot, torch_save, torch_resume, AttributeDict, get_model_conf.
-
class
espnet.asr.asr_mix_utils.
PlotAttentionReport
(att_vis_fn, data, outdir, converter, device, reverse=False)[source]¶ Bases:
chainer.training.extension.Extension
Plot attention reporter.
- Parameters:
att_vis_fn (espnet.nets.*_backend.e2e_asr.calculate_all_attentions) – Function of attention visualization.
data (list[tuple(str, dict[str, dict[str, Any]])]) – List json utt key items.
outdir (str) – Directory to save figures.
converter (espnet.asr.*_backend.asr.CustomConverter) – CustomConverter object. Function to convert data.
device (torch.device) – The destination device to send tensor.
reverse (bool) – If True, input and output length are reversed.
Initialize PlotAttentionReport.
-
draw_attention_plot
(att_w)[source]¶ Visualize attention weights matrix.
- Parameters:
att_w (Tensor) – Attention weight matrix.
- Returns:
pyplot object with attention matrix image.
- Return type:
matplotlib.pyplot
-
get_attention_weight
(idx, att_w, spkr_idx)[source]¶ Transform attention weight in regard to self.reverse.
-
get_attention_weights
()[source]¶ Return attention weights.
- Returns:
- attention weights. It’s shape would be
differ from bachend.dtype=float * pytorch-> 1) multi-head case => (B, H, Lmax, Tmax). 2)
other case => (B, Lmax, Tmax).
chainer-> attention weights (B, Lmax, Tmax).
- Return type:
arr_ws_sd (numpy.ndarray)
-
espnet.asr.asr_mix_utils.
add_results_to_json
(js, nbest_hyps_sd, char_list)[source]¶ Add N-best results to json.
- Parameters:
js (dict[str, Any]) – Groundtruth utterance dict.
nbest_hyps_sd (list[dict[str, Any]]) – List of hypothesis for multi_speakers (# Utts x # Spkrs).
char_list (list[str]) – List of characters.
- Returns:
N-best results added utterance dict.
- Return type:
dict[str, Any]
espnet.asr.asr_utils¶
-
class
espnet.asr.asr_utils.
CompareValueTrigger
(key, compare_fn, trigger=(1, 'epoch'))[source]¶ Bases:
object
Trigger invoked when key value getting bigger or lower than before.
- Parameters:
key (str) – Key of value.
compare_fn ((float, float) -> bool) – Function to compare the values.
trigger (tuple(int, str)) – Trigger that decide the comparison interval.
-
class
espnet.asr.asr_utils.
PlotAttentionReport
(att_vis_fn, data, outdir, converter, transform, device, reverse=False, ikey='input', iaxis=0, okey='output', oaxis=0, subsampling_factor=1)[source]¶ Bases:
chainer.training.extension.Extension
Plot attention reporter.
- Parameters:
att_vis_fn (espnet.nets.*_backend.e2e_asr.E2E.calculate_all_attentions) – Function of attention visualization.
data (list[tuple(str, dict[str, list[Any]])]) – List json utt key items.
outdir (str) – Directory to save figures.
converter (espnet.asr.*_backend.asr.CustomConverter) – Function to convert data.
device (int | torch.device) – Device.
reverse (bool) – If True, input and output length are reversed.
ikey (str) – Key to access input (for ASR/ST ikey=”input”, for MT ikey=”output”.)
iaxis (int) – Dimension to access input (for ASR/ST iaxis=0, for MT iaxis=1.)
okey (str) – Key to access output (for ASR/ST okey=”input”, MT okay=”output”.)
oaxis (int) – Dimension to access output (for ASR/ST oaxis=0, for MT oaxis=0.)
subsampling_factor (int) – subsampling factor in encoder
-
draw_attention_plot
(att_w)[source]¶ Plot the att_w matrix.
- Returns:
pyplot object with attention matrix image.
- Return type:
matplotlib.pyplot
-
draw_han_plot
(att_w)[source]¶ Plot the att_w matrix for hierarchical attention.
- Returns:
pyplot object with attention matrix image.
- Return type:
matplotlib.pyplot
-
class
espnet.asr.asr_utils.
PlotCTCReport
(ctc_vis_fn, data, outdir, converter, transform, device, reverse=False, ikey='input', iaxis=0, okey='output', oaxis=0, subsampling_factor=1)[source]¶ Bases:
chainer.training.extension.Extension
Plot CTC reporter.
- Parameters:
ctc_vis_fn (espnet.nets.*_backend.e2e_asr.E2E.calculate_all_ctc_probs) – Function of CTC visualization.
data (list[tuple(str, dict[str, list[Any]])]) – List json utt key items.
outdir (str) – Directory to save figures.
converter (espnet.asr.*_backend.asr.CustomConverter) – Function to convert data.
device (int | torch.device) – Device.
reverse (bool) – If True, input and output length are reversed.
ikey (str) – Key to access input (for ASR/ST ikey=”input”, for MT ikey=”output”.)
iaxis (int) – Dimension to access input (for ASR/ST iaxis=0, for MT iaxis=1.)
okey (str) – Key to access output (for ASR/ST okey=”input”, MT okay=”output”.)
oaxis (int) – Dimension to access output (for ASR/ST oaxis=0, for MT oaxis=0.)
subsampling_factor (int) – subsampling factor in encoder
-
draw_ctc_plot
(ctc_prob)[source]¶ Plot the ctc_prob matrix.
- Returns:
pyplot object with CTC prob matrix image.
- Return type:
matplotlib.pyplot
-
espnet.asr.asr_utils.
adadelta_eps_decay
(eps_decay)[source]¶ Extension to perform adadelta eps decay.
- Parameters:
eps_decay (float) – Decay rate of eps.
- Returns:
An extension function.
-
espnet.asr.asr_utils.
adam_lr_decay
(eps_decay)[source]¶ Extension to perform adam lr decay.
- Parameters:
eps_decay (float) – Decay rate of lr.
- Returns:
An extension function.
-
espnet.asr.asr_utils.
add_gradient_noise
(model, iteration, duration=100, eta=1.0, scale_factor=0.55)[source]¶ Adds noise from a standard normal distribution to the gradients.
The standard deviation (sigma) is controlled by the three hyper-parameters below. sigma goes to zero (no noise) with more iterations.
- Parameters:
model (torch.nn.model) – Model.
iteration (int) – Number of iterations.
duration (int) – Number of durations to control the interval of the sigma change.
eta (float) – The magnitude of sigma.
scale_factor (float) – The scale of sigma.
-
espnet.asr.asr_utils.
add_results_to_json
(js, nbest_hyps, char_list)[source]¶ Add N-best results to json.
- Parameters:
js (dict[str, Any]) – Groundtruth utterance dict.
nbest_hyps_sd (list[dict[str, Any]]) – List of hypothesis for multi_speakers: nutts x nspkrs.
char_list (list[str]) – List of characters.
- Returns:
N-best results added utterance dict.
- Return type:
dict[str, Any]
-
espnet.asr.asr_utils.
chainer_load
(path, model)[source]¶ Load chainer model parameters.
- Parameters:
path (str) – Model path or snapshot file path to be loaded.
model (chainer.Chain) – Chainer model.
-
espnet.asr.asr_utils.
format_mulenc_args
(args)[source]¶ Format args for multi-encoder setup.
It deals with following situations: (when args.num_encs=2): 1. args.elayers = None -> args.elayers = [4, 4]; 2. args.elayers = 4 -> args.elayers = [4, 4]; 3. args.elayers = [4, 4, 4] -> args.elayers = [4, 4].
-
espnet.asr.asr_utils.
get_model_conf
(model_path, conf_path=None)[source]¶ Get model config information by reading a model config file (model.json).
- Parameters:
model_path (str) – Model path.
conf_path (str) – Optional model config path.
- Returns:
Config information loaded from json file.
- Return type:
list[int, int, dict[str, Any]]
-
espnet.asr.asr_utils.
parse_hypothesis
(hyp, char_list)[source]¶ Parse hypothesis.
- Parameters:
hyp (list[dict[str, Any]]) – Recognition hypothesis.
char_list (list[str]) – List of characters.
- Returns:
tuple(str, str, str, float)
-
espnet.asr.asr_utils.
plot_spectrogram
(plt, spec, mode='db', fs=None, frame_shift=None, bottom=True, left=True, right=True, top=False, labelbottom=True, labelleft=True, labelright=True, labeltop=False, cmap='inferno')[source]¶ Plot spectrogram using matplotlib.
- Parameters:
plt (matplotlib.pyplot) – pyplot object.
spec (numpy.ndarray) – Input stft (Freq, Time)
mode (str) – db or linear.
fs (int) – Sample frequency. To convert y-axis to kHz unit.
frame_shift (int) – The frame shift of stft. To convert x-axis to second unit.
bottom (bool) – Whether to draw the respective ticks.
left (bool) –
right (bool) –
top (bool) –
labelbottom (bool) – Whether to draw the respective tick labels.
labelleft (bool) –
labelright (bool) –
labeltop (bool) –
cmap (str) – Colormap defined in matplotlib.
-
espnet.asr.asr_utils.
restore_snapshot
(model, snapshot, load_fn=None)[source]¶ Extension to restore snapshot.
- Returns:
An extension function.
-
espnet.asr.asr_utils.
snapshot_object
(target, filename)[source]¶ Returns a trainer extension to take snapshots of a given object.
- Parameters:
target (model) – Object to serialize.
filename (str) – Name of the file into which the object is serialized.It can be a format string, where the trainer object is passed to the :meth: str.format method. For example,
'snapshot_{.updater.iteration}'
is converted to'snapshot_10000'
at the 10,000th iteration.
- Returns:
An extension function.
-
espnet.asr.asr_utils.
torch_load
(path, model)[source]¶ Load torch model states.
- Parameters:
path (str) – Model path or snapshot file path to be loaded.
model (torch.nn.Module) – Torch model.
-
espnet.asr.asr_utils.
torch_resume
(snapshot_path, trainer)[source]¶ Resume from snapshot for pytorch.
- Parameters:
snapshot_path (str) – Snapshot file path.
trainer (chainer.training.Trainer) – Chainer’s trainer instance.
espnet.asr.chainer_backend.__init__¶
Initialize sub package.
espnet.asr.chainer_backend.asr¶
Training/decoding definition for the speech recognition task.
espnet.asr.pytorch_backend.asr_mix¶
This script is used for multi-speaker speech recognition.
- Copyright 2017 Johns Hopkins University (Shinji Watanabe)
Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
-
class
espnet.asr.pytorch_backend.asr_mix.
CustomConverter
(subsampling_factor=1, dtype=torch.float32, num_spkrs=2)[source]¶ Bases:
object
Custom batch converter for Pytorch.
- Parameters:
subsampling_factor (int) – The subsampling factor.
dtype (torch.dtype) – Data type to convert.
Initialize the converter.
espnet.asr.pytorch_backend.recog¶
V2 backend for asr_recog.py using py:class:espnet.nets.beam_search.BeamSearch.
-
espnet.asr.pytorch_backend.recog.
recog_v2
(args)[source]¶ Decode with custom models that implements ScorerInterface.
Notes
The previous backend espnet.asr.pytorch_backend.asr.recog only supports E2E and RNNLM
- Parameters:
args (namespace) – The program arguments.
:param See py:func:espnet.bin.asr_recog.get_parser for details:
espnet.asr.pytorch_backend.__init__¶
Initialize sub package.
espnet.asr.pytorch_backend.asr_init¶
Finetuning methods.
-
espnet.asr.pytorch_backend.asr_init.
create_transducer_compatible_state_dict
(model_state_dict, encoder_type, encoder_units)[source]¶ Create a compatible transducer model state dict for transfer learning.
If RNN encoder modules from a non-Transducer model are found in the pre-trained model state dict, the corresponding modules keys are renamed for compatibility.
- Parameters:
model_state_dict (Dict) – Pre-trained model state dict
encoder_type (str) – Type of pre-trained encoder.
encoder_units (int) – Number of encoder units in pre-trained model.
- Returns:
Transducer compatible pre-trained model state dict.
- Return type:
new_state_dict (Dict)
-
espnet.asr.pytorch_backend.asr_init.
filter_modules
(model_state_dict, modules)[source]¶ Filter non-matched modules in model state dict.
- Parameters:
model_state_dict (Dict) – Pre-trained model state dict.
modules (List) – Specified module(s) to transfer.
- Returns:
Filtered module list.
- Return type:
new_mods (List)
-
espnet.asr.pytorch_backend.asr_init.
freeze_modules
(model, modules)[source]¶ Freeze model parameters according to modules list.
- Parameters:
model (torch.nn.Module) – Main model.
modules (List) – Specified module(s) to freeze.
- Returns:
Updated main model. model_params (filter): Filtered model parameters.
- Return type:
model (torch.nn.Module)
-
espnet.asr.pytorch_backend.asr_init.
get_lm_state_dict
(lm_state_dict)[source]¶ Create compatible ASR decoder state dict from LM state dict.
- Parameters:
lm_state_dict (Dict) – Pre-trained LM state dict.
- Returns:
State dict with compatible key names.
- Return type:
new_state_dict (Dict)
-
espnet.asr.pytorch_backend.asr_init.
get_partial_state_dict
(model_state_dict, modules)[source]¶ Create state dict with specified modules matching input model modules.
- Parameters:
model_state_dict (Dict) – Pre-trained model state dict.
modules (Dict) – Specified module(s) to transfer.
- Returns:
State dict with specified modules weights.
- Return type:
new_state_dict (Dict)
-
espnet.asr.pytorch_backend.asr_init.
get_trained_model_state_dict
(model_path, new_is_transducer)[source]¶ Extract the trained model state dict for pre-initialization.
- Parameters:
model_path (str) – Path to trained model.
new_is_transducer (bool) – Whether the new model is Transducer-based.
- Returns:
Trained model state dict.
- Return type:
(Dict)
-
espnet.asr.pytorch_backend.asr_init.
load_trained_model
(model_path, training=True)[source]¶ Load the trained model for recognition.
- Parameters:
model_path (str) – Path to model.***.best
training (bool) – Training mode specification for transducer model.
- Returns:
Trained model. train_args (Namespace): Trained model arguments.
- Return type:
model (torch.nn.Module)
-
espnet.asr.pytorch_backend.asr_init.
load_trained_modules
(idim, odim, args, interface=<class 'espnet.nets.asr_interface.ASRInterface'>)[source]¶ Load ASR/MT/TTS model with pre-trained weights for specified modules.
- Parameters:
idim (int) – Input dimension.
odim (int) – Output dimension.
Namespace (args) – Model arguments.
interface (ASRInterface|MTInterface|TTSInterface) – Model interface.
- Returns:
Model with pre-initialized weights.
- Return type:
main_model (torch.nn.Module)
-
espnet.asr.pytorch_backend.asr_init.
transfer_verification
(model_state_dict, partial_state_dict, modules)[source]¶ Verify tuples (key, shape) for input model modules match specified modules.
- Parameters:
model_state_dict (Dict) – Main model state dict.
partial_state_dict (Dict) – Pre-trained model state dict.
modules (List) – Specified module(s) to transfer.
- Returns:
Whether transfer learning is allowed.
- Return type:
(bool)
espnet.asr.pytorch_backend.asr¶
Training/decoding definition for the speech recognition task.
-
class
espnet.asr.pytorch_backend.asr.
CustomConverter
(subsampling_factor=1, dtype=torch.float32)[source]¶ Bases:
object
Custom batch converter for Pytorch.
- Parameters:
subsampling_factor (int) – The subsampling factor.
dtype (torch.dtype) – Data type to convert.
Construct a CustomConverter object.
-
class
espnet.asr.pytorch_backend.asr.
CustomConverterMulEnc
(subsampling_factors=[1, 1], dtype=torch.float32)[source]¶ Bases:
object
Custom batch converter for Pytorch in multi-encoder case.
- Parameters:
subsampling_factors (list) – List of subsampling factors for each encoder.
dtype (torch.dtype) – Data type to convert.
Initialize the converter.
-
class
espnet.asr.pytorch_backend.asr.
CustomEvaluator
(model, iterator, target, device, ngpu=None, use_ddp=False)[source]¶ Bases:
espnet.utils.training.evaluator.BaseEvaluator
Custom Evaluator for Pytorch.
- Parameters:
model (torch.nn.Module) – The model to evaluate.
iterator (chainer.dataset.Iterator) – The train iterator.
target (link | dict[str, link]) – Link object or a dictionary of links to evaluate. If this is just a link object, the link is registered by the name
'main'
.device (torch.device) – The device used.
ngpu (int) – The number of GPUs.
use_ddp (bool) – The flag to use DDP.
-
class
espnet.asr.pytorch_backend.asr.
CustomUpdater
(model, grad_clip_threshold, train_iter, optimizer, device, ngpu, grad_noise=False, accum_grad=1, use_apex=False, use_ddp=False)[source]¶ Bases:
chainer.training.updaters.standard_updater.StandardUpdater
Custom Updater for Pytorch.
- Parameters:
model (torch.nn.Module) – The model to update.
grad_clip_threshold (float) – The gradient clipping value to use.
train_iter (chainer.dataset.Iterator) – The training iterator.
optimizer (torch.optim.optimizer) – The training optimizer.
device (torch.device) – The device to use.
ngpu (int) – The number of gpus to use.
use_apex (bool) – The flag to use Apex in backprop.
use_ddp (bool) – The flag to use DDP for multi-GPU training.
-
class
espnet.asr.pytorch_backend.asr.
DistributedDictSummary
(device=None)[source]¶ Bases:
object
Distributed version of DictSummary.
This implementation is based on an official implementation below. https://github.com/chainer/chainer/blob/v6.7.0/chainer/reporter.py
To gather stats information from all processes and calculate exact mean values, this class is running AllReduce operation in compute_mean().
-
espnet.asr.pytorch_backend.asr.
enhance
(args)[source]¶ Dumping enhanced speech and mask.
- Parameters:
args (namespace) – The program arguments.