espnet.bin package

Initialize sub package.

espnet.bin.asr_train

Automatic speech recognition model training script.

espnet.bin.asr_train.get_parser(parser=None, required=True)[source]

Get default arguments.

espnet.bin.asr_train.main(cmd_args)[source]

Run the main training function.

espnet.bin.asr_train.setup_logging(verbose)[source]

Make logging setup with a given log level.

espnet.bin.tts_train

Text-to-speech model training script.

espnet.bin.tts_train.get_parser()[source]

Get parser of training arguments.

espnet.bin.tts_train.main(cmd_args)[source]

Run training.

espnet.bin.mt_train

Neural machine translation model training script.

espnet.bin.mt_train.get_parser(parser=None, required=True)[source]

Get default arguments.

espnet.bin.mt_train.main(cmd_args)[source]

Run the main training function.

espnet.bin.vc_train

Voice conversion model training script.

espnet.bin.vc_train.get_parser()[source]

Get parser of training arguments.

espnet.bin.vc_train.main(cmd_args)[source]

Run training.

espnet.bin.st_train

End-to-end speech translation model training script.

espnet.bin.st_train.get_parser(parser=None, required=True)[source]

Get default arguments.

espnet.bin.st_train.main(cmd_args)[source]

Run the main training function.

espnet.bin.mt_trans

Neural machine translation model decoding script.

espnet.bin.mt_trans.get_parser()[source]

Get default arguments.

espnet.bin.mt_trans.main(args)[source]

Run the main decoding function.

espnet.bin.vc_decode

VC decoding script.

espnet.bin.vc_decode.get_parser()[source]

Get parser of decoding arguments.

espnet.bin.vc_decode.main(args)[source]

Run deocding.

espnet.bin.asr_align

This program performs CTC segmentation to align utterances within audio files.

Inputs:
–data-json:

A json containing list of utterances and audio files

–model:

An already trained ASR model

Output:
–output:

A plain segments file with utterance positions in the audio files.

Selected parameters:
–min-window-size:

Minimum window size considered for a single utterance. The current default value should be OK in most cases. Larger values might give better results; too large values cause IndexErrors.

–subsampling-factor:

If the encoder sub-samples its input, the number of frames at the CTC layer is reduced by this factor.

–frame-duration:

This is the non-overlapping duration of a single frame in milliseconds (the inverse of frames per millisecond).

–set-blank:

In the rare case that the blank token has not the index 0 in the character dictionary, this parameter sets the index of the blank token.

–gratis-blank:

Sets the transition cost for blank tokens to zero. Useful if there are longer unrelated segments between segments.

–replace-spaces-with-blanks:

Spaces are replaced with blanks. Helps to model pauses between words. May increase length of ground truth. May lead to misaligned segments when combined with the option –gratis-blank.

espnet.bin.asr_align.ctc_align(args, device)[source]

ESPnet-specific interface for CTC segmentation.

Parses configuration, infers the CTC posterior probabilities, and then aligns start and end of utterances using CTC segmentation. Results are written to the output file given in the args.

Parameters:
  • args – given configuration

  • device – for inference; one of [‘cuda’, ‘cpu’]

Returns:

0 on success

espnet.bin.asr_align.get_parser()[source]

Get default arguments.

espnet.bin.asr_align.main(args)[source]

Run the main decoding function.

espnet.bin.st_trans

End-to-end speech translation model decoding script.

espnet.bin.st_trans.get_parser()[source]

Get default arguments.

espnet.bin.st_trans.main(args)[source]

Run the main decoding function.

espnet.bin.tts_decode

TTS decoding script.

espnet.bin.tts_decode.get_parser()[source]

Get parser of decoding arguments.

espnet.bin.tts_decode.main(args)[source]

Run deocding.

espnet.bin.lm_train

Language model training script.

espnet.bin.lm_train.get_parser(parser=None, required=True)[source]

Get parser.

espnet.bin.lm_train.main(cmd_args)[source]

Train LM.

espnet.bin.asr_enhance

espnet.bin.asr_enhance.get_parser()[source]
espnet.bin.asr_enhance.main(args)[source]

espnet.bin.__init__

Initialize sub package.

espnet.bin.asr_recog

End-to-end speech recognition model decoding script.

espnet.bin.asr_recog.get_parser()[source]

Get default arguments.

espnet.bin.asr_recog.main(args)[source]

Run the main decoding function.