espnet.bin package¶
Initialize sub package.
espnet.bin.asr_train¶
Automatic speech recognition model training script.
espnet.bin.tts_train¶
Text-to-speech model training script.
espnet.bin.mt_train¶
Neural machine translation model training script.
espnet.bin.vc_train¶
Voice conversion model training script.
espnet.bin.st_train¶
End-to-end speech translation model training script.
espnet.bin.mt_trans¶
Neural machine translation model decoding script.
espnet.bin.vc_decode¶
VC decoding script.
espnet.bin.asr_align¶
This program performs CTC segmentation to align utterances within audio files.
- Inputs:
- –data-json:
A json containing list of utterances and audio files
- –model:
An already trained ASR model
- Output:
- –output:
A plain segments file with utterance positions in the audio files.
- Selected parameters:
- –min-window-size:
Minimum window size considered for a single utterance. The current default value should be OK in most cases. Larger values might give better results; too large values cause IndexErrors.
- –subsampling-factor:
If the encoder sub-samples its input, the number of frames at the CTC layer is reduced by this factor.
- –frame-duration:
This is the non-overlapping duration of a single frame in milliseconds (the inverse of frames per millisecond).
- –set-blank:
In the rare case that the blank token has not the index 0 in the character dictionary, this parameter sets the index of the blank token.
- –gratis-blank:
Sets the transition cost for blank tokens to zero. Useful if there are longer unrelated segments between segments.
- –replace-spaces-with-blanks:
Spaces are replaced with blanks. Helps to model pauses between words. May increase length of ground truth. May lead to misaligned segments when combined with the option –gratis-blank.
-
espnet.bin.asr_align.
ctc_align
(args, device)[source]¶ ESPnet-specific interface for CTC segmentation.
Parses configuration, infers the CTC posterior probabilities, and then aligns start and end of utterances using CTC segmentation. Results are written to the output file given in the args.
- Parameters:
args – given configuration
device – for inference; one of [‘cuda’, ‘cpu’]
- Returns:
0 on success
espnet.bin.st_trans¶
End-to-end speech translation model decoding script.
espnet.bin.tts_decode¶
TTS decoding script.
espnet.bin.lm_train¶
Language model training script.
espnet.bin.asr_enhance¶
espnet.bin.__init__¶
Initialize sub package.
espnet.bin.asr_recog¶
End-to-end speech recognition model decoding script.