python utility tools¶
ESPnet provides several command-line python tools under utils/
addjson.py: add multiple json values to an input or output value
apply-cmvn.py: apply mean-variance normalization to files
average_checkpoints.py: average models from snapshot
calculate_rtf.py: calculate real time factor (RTF)
change_yaml.py: change specified attributes of a YAML file
compute-cmvn-stats.py: Compute cepstral mean and variance normalization statisticsIf wspecifier provided: per-utterance by default, or per-speaker ifspk2utt option provided; if wxfilename: global
compute-fbank-feats.py: compute FBANK feature from WAV
compute-stft-feats.py: compute STFT feature from WAV
concat_json_multiref.py: concatenate multiple json files for data augmentation
concatjson.py: concatenate json files
convert_fbank_to_wav.py: convert FBANK to WAV using Griffin-Lim algorithm
copy-feats.py: copy feature with preprocessing
dump-pcm.py: dump PCM files from a WAV scp file
eval-source-separation.py: Evaluate enhanced speech. e.g. ../doc/argparse2rst.py –ref ref.scp –enh enh.scp –outdir outputdiror ../doc/argparse2rst.py –ref ref.scp ref2.scp –enh enh.scp enh2.scp –outdir outputdir
eval_perm_free_error.py: evaluate permutation-free error
feat-to-shape.py: convert feature to its shape
feats2npy.py: Convet kaldi-style features to numpy arrays
filt.py: filter words in a text file
generate_wav_from_fbank.py: generate wav from FBANK using wavenet vocoder
get_yaml.py: get a specified attribute from a YAML file
json2sctm.py: convert json to sctm
json2text.py: convert ASR recognized json to text
json2trn.py: convert a json to a transcription file with a token dictionary
json2trn_mt.py: convert json to machine translation transcription
json2trn_wo_dict.py: convert a json to a transcription file with a token dictionary
make_pair_json.py: Merge source and target data.json files into one json file.
mcd_calculate.py: calculate MCD.
merge_scp2json.py: Given each file paths with such format as <key>:<file>:<type>. type> can be omitted and the default is “str”. e.g. ../doc/argparse2rst.py –input-scps feat:data/feats.scp shape:data/utt2feat_shape:shape –input-scps feat:data/feats2.scp shape:data/utt2feat2_shape:shape –output-scps text:data/text shape:data/utt2text_shape:shape –scps utt2spk:data/utt2spk
mergejson.py: merge json files
mix-mono-wav-scp.py: Mixing wav.scp files into a multi-channel wav.scp using sox.
result2json.py: convert sclite’s result.txt file to json
score_lang_id.py: language identification scoring
scp2json.py: convert scp to json
splitjson.py: split a json file for parallel processing
text2token.py: convert raw text to tokenized text
text2vocabulary.py: create a vocabulary file from text files
trim_silence.py: Trim slience with simple power thresholding and make segments file.
trn2ctm.py: convert trn to ctm
trn2stm.py: convert trn to stm
addjson.py¶
add multiple json values to an input or output value
usage: addjson.py [-h] [-i IS_INPUT] [--verbose VERBOSE] jsons [jsons ...]
Positional Arguments¶
- jsons
json files
Named Arguments¶
- -i, --is-input
If true, add to input. If false, add to output
Default: True
- --verbose, -V
Verbose option
Default: 0
apply-cmvn.py¶
apply mean-variance normalization to files
usage: apply-cmvn.py [-h] [--verbose VERBOSE]
[--in-filetype {mat,hdf5,sound.hdf5,sound}]
[--stats-filetype {mat,hdf5,npy}]
[--out-filetype {mat,hdf5}] [--norm-means NORM_MEANS]
[--norm-vars NORM_VARS] [--reverse REVERSE]
[--spk2utt SPK2UTT] [--utt2spk UTT2SPK]
[--write-num-frames WRITE_NUM_FRAMES]
[--compress COMPRESS]
[--compression-method COMPRESSION_METHOD]
stats_rspecifier_or_rxfilename rspecifier wspecifier
Positional Arguments¶
- stats_rspecifier_or_rxfilename
Input stats. e.g. ark:stats.ark or stats.mat
- rspecifier
Read specifier id. e.g. ark:some.ark
- wspecifier
Write specifier id. e.g. ark:some.ark
Named Arguments¶
- --verbose, -V
Verbose option
Default: 0
- --in-filetype
Possible choices: mat, hdf5, sound.hdf5, sound
Specify the file format for the rspecifier. “mat” is the matrix format in kaldi
Default: “mat”
- --stats-filetype
Possible choices: mat, hdf5, npy
Specify the file format for the rspecifier. “mat” is the matrix format in kaldi
Default: “mat”
- --out-filetype
Possible choices: mat, hdf5
Specify the file format for the wspecifier. “mat” is the matrix format in kaldi
Default: “mat”
- --norm-means
Do variance normalization or not.
Default: True
- --norm-vars
Do variance normalization or not.
Default: False
- --reverse
Do reverse mode or not
Default: False
- --spk2utt
A text file of speaker to utterance-list map. (Don’t give rspecifier format, such as “ark:spk2utt”)
- --utt2spk
A text file of utterance to speaker map. (Don’t give rspecifier format, such as “ark:utt2spk”)
- --write-num-frames
Specify wspecifer for utt2num_frames
- --compress
Save in compressed format
Default: False
- --compression-method
Specify the method(if mat) or gzip-level(if hdf5)
Default: 2
average_checkpoints.py¶
average models from snapshot
usage: average_checkpoints.py [-h] --snapshots SNAPSHOTS [SNAPSHOTS ...] --out
OUT [--num NUM] [--backend BACKEND]
[--log [LOG]]
[--metric [{acc,bleu,cer,cer_ctc,loss,perplexity}]]
[--max-epoch [MAX_EPOCH]]
Named Arguments¶
- --snapshots
- --out
- --num
Default: 10
- --backend
Default: “chainer”
- --log
- --metric
Possible choices: acc, bleu, cer, cer_ctc, loss, perplexity
Default: “”
- --max-epoch
Default: 10000000
calculate_rtf.py¶
calculate real time factor (RTF)
usage: calculate_rtf.py [-h] [--log-dir LOG_DIR]
[--log-name {decode,asr_inference}]
[--input-shift INPUT_SHIFT]
[--start-times-marker {input lengths,speech length}]
[--end-times-marker {prediction,best hypo}]
[--inf-num INF_NUM]
Named Arguments¶
- --log-dir
path to logging directory
- --log-name
Possible choices: decode, asr_inference
name of logfile, e.g., ‘decode’ (espnet1) and ‘asr_inference’ (espnet2)
Default: “decode”
- --input-shift
shift of inputs in milliseconds
Default: 10.0
- --start-times-marker
Possible choices: input lengths, speech length
String marking start of decoding in logfile, e.g., ‘input lengths’ (espnet1) and ‘speech length’ (espnet2)
Default: “input lengths”
- --end-times-marker
Possible choices: prediction, best hypo
String marking end of decoding in logfile, e.g., ‘prediction’ (espnet1) and ‘best hypo’ (espnet2)
Default: “prediction”
- --inf-num
number of inference hypothesis for each utterance, e.g. >1 in multi-speaker asr.
Default: 1
change_yaml.py¶
change specified attributes of a YAML file
usage: change_yaml.py [-h] [-o OUTYAML | --outdir OUTDIR] [-a ARG] [-d DELETE]
[inyaml]
Positional Arguments¶
- inyaml
Named Arguments¶
- -o, --outyaml
- --outdir
- -a, --arg
e.g -a a.b.c=4 -> {‘a’: {‘b’: {‘c’: 4}}}
Default: []
- -d, --delete
e.g -d a -> “a” is removed from the input yaml
Default: []
compute-cmvn-stats.py¶
Compute cepstral mean and variance normalization statisticsIf wspecifier provided: per-utterance by default, or per-speaker ifspk2utt option provided; if wxfilename: global
usage: compute-cmvn-stats.py [-h] [--spk2utt SPK2UTT] [--verbose VERBOSE]
[--in-filetype {mat,hdf5,sound.hdf5,sound}]
[--out-filetype {mat,hdf5,npy}]
[--preprocess-conf PREPROCESS_CONF]
rspecifier wspecifier_or_wxfilename
Positional Arguments¶
- rspecifier
Read specifier for feats. e.g. ark:some.ark
- wspecifier_or_wxfilename
Write specifier. e.g. ark:some.ark
Named Arguments¶
- --spk2utt
A text file of speaker to utterance-list map. (Don’t give rspecifier format, such as “ark:utt2spk”)
- --verbose, -V
Verbose option
Default: 0
- --in-filetype
Possible choices: mat, hdf5, sound.hdf5, sound
Specify the file format for the rspecifier. “mat” is the matrix format in kaldi
Default: “mat”
- --out-filetype
Possible choices: mat, hdf5, npy
Specify the file format for the wspecifier. “mat” is the matrix format in kaldi
Default: “mat”
- --preprocess-conf
The configuration file for the pre-processing
compute-fbank-feats.py¶
compute FBANK feature from WAV
usage: compute-fbank-feats.py [-h] [--fs FS] [--fmax [FMAX]] [--fmin [FMIN]]
[--n_mels N_MELS] [--n_fft N_FFT]
[--n_shift N_SHIFT] [--win_length [WIN_LENGTH]]
[--window {hann,hamming}]
[--write-num-frames WRITE_NUM_FRAMES]
[--filetype {mat,hdf5}] [--compress COMPRESS]
[--compression-method COMPRESSION_METHOD]
[--verbose VERBOSE] [--normalize {1,16,24,32}]
[--segments SEGMENTS]
rspecifier wspecifier
Positional Arguments¶
- rspecifier
WAV scp file
- wspecifier
Write specifier
Named Arguments¶
- --fs
Sampling frequency
- --fmax
Maximum frequency
- --fmin
Minimum frequency
- --n_mels
Number of mel basis
Default: 80
- --n_fft
FFT length in point
Default: 1024
- --n_shift
Shift length in point
Default: 512
- --win_length
Analysis window length in point
- --window
Possible choices: hann, hamming
Type of window
Default: “hann”
- --write-num-frames
Specify wspecifer for utt2num_frames
- --filetype
Possible choices: mat, hdf5
Specify the file format for output. “mat” is the matrix format in kaldi
Default: “mat”
- --compress
Save in compressed format
Default: False
- --compression-method
Specify the method(if mat) or gzip-level(if hdf5)
Default: 2
- --verbose, -V
Verbose option
Default: 0
- --normalize
Possible choices: 1, 16, 24, 32
Give the bit depth of the PCM, then normalizes data to scale in [-1,1]
- --segments
segments-file format: each line is either<segment-id> <recording-id> <start-time> <end-time>e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5
compute-stft-feats.py¶
compute STFT feature from WAV
usage: compute-stft-feats.py [-h] [--fs FS] [--n_fft N_FFT]
[--n_shift N_SHIFT] [--win_length [WIN_LENGTH]]
[--window {hann,hamming}]
[--write-num-frames WRITE_NUM_FRAMES]
[--filetype {mat,hdf5}] [--compress COMPRESS]
[--compression-method COMPRESSION_METHOD]
[--verbose VERBOSE] [--normalize {1,16,24,32}]
[--segments SEGMENTS]
rspecifier wspecifier
Positional Arguments¶
- rspecifier
WAV scp file
- wspecifier
Write specifier
Named Arguments¶
- --fs
Sampling frequency
- --n_fft
FFT length in point
Default: 1024
- --n_shift
Shift length in point
Default: 512
- --win_length
Analysis window length in point
- --window
Possible choices: hann, hamming
Type of window
Default: “hann”
- --write-num-frames
Specify wspecifer for utt2num_frames
- --filetype
Possible choices: mat, hdf5
Specify the file format. “mat” is the matrix format in kaldi
Default: “mat”
- --compress
Save in compressed format
Default: False
- --compression-method
Specify the method(if mat) or gzip-level(if hdf5)
Default: 2
- --verbose, -V
Verbose option
Default: 0
- --normalize
Possible choices: 1, 16, 24, 32
Give the bit depth of the PCM, then normalizes data to scale in [-1,1]
- --segments
segments-file format: each line is either<segment-id> <recording-id> <start-time> <end-time>e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5
concat_json_multiref.py¶
concatenate multiple json files for data augmentation
usage: concat_json_multiref.py [-h] jsons [jsons ...]
Positional Arguments¶
- jsons
json files
concatjson.py¶
concatenate json files
usage: concatjson.py [-h] jsons [jsons ...]
Positional Arguments¶
- jsons
json files
convert_fbank_to_wav.py¶
convert FBANK to WAV using Griffin-Lim algorithm
usage: convert_fbank_to_wav.py [-h] [--fs FS] [--fmax [FMAX]] [--fmin [FMIN]]
[--n_fft N_FFT] [--n_shift N_SHIFT]
[--win_length [WIN_LENGTH]] [--n_mels [N_MELS]]
[--window {hann,hamming}] [--iters ITERS]
[--filetype {mat,hdf5}]
rspecifier outdir
Positional Arguments¶
- rspecifier
Input feature
- outdir
Output directory
Named Arguments¶
- --fs
Sampling frequency
Default: 22050
- --fmax
Maximum frequency
- --fmin
Minimum frequency
- --n_fft
FFT length in point
Default: 1024
- --n_shift
Shift length in point
Default: 512
- --win_length
Analysis window length in point
- --n_mels
Number of mel basis
- --window
Possible choices: hann, hamming
Type of window
Default: “hann”
- --iters
Number of iterations in Grriffin Lim
Default: 100
- --filetype
Possible choices: mat, hdf5
Specify the file format for the rspecifier. “mat” is the matrix format in kaldi
Default: “mat”
copy-feats.py¶
copy feature with preprocessing
usage: copy-feats.py [-h] [--verbose VERBOSE]
[--in-filetype {mat,hdf5,sound.hdf5,sound}]
[--out-filetype {mat,hdf5,sound.hdf5,sound}]
[--write-num-frames WRITE_NUM_FRAMES]
[--compress COMPRESS]
[--compression-method COMPRESSION_METHOD]
[--preprocess-conf PREPROCESS_CONF]
rspecifier wspecifier
Positional Arguments¶
- rspecifier
Read specifier for feats. e.g. ark:some.ark
- wspecifier
Write specifier. e.g. ark:some.ark
Named Arguments¶
- --verbose, -V
Verbose option
Default: 0
- --in-filetype
Possible choices: mat, hdf5, sound.hdf5, sound
Specify the file format for the rspecifier. “mat” is the matrix format in kaldi
Default: “mat”
- --out-filetype
Possible choices: mat, hdf5, sound.hdf5, sound
Specify the file format for the wspecifier. “mat” is the matrix format in kaldi
Default: “mat”
- --write-num-frames
Specify wspecifer for utt2num_frames
- --compress
Save in compressed format
Default: False
- --compression-method
Specify the method(if mat) or gzip-level(if hdf5)
Default: 2
- --preprocess-conf
The configuration file for the pre-processing
dump-pcm.py¶
dump PCM files from a WAV scp file
usage: dump-pcm.py [-h] [--write-num-frames WRITE_NUM_FRAMES]
[--filetype {mat,hdf5,sound.hdf5,sound}] [--format FORMAT]
[--compress COMPRESS]
[--compression-method COMPRESSION_METHOD]
[--verbose VERBOSE] [--normalize {1,16,24,32}]
[--preprocess-conf PREPROCESS_CONF]
[--keep-length KEEP_LENGTH] [--segments SEGMENTS]
rspecifier wspecifier
Positional Arguments¶
- rspecifier
WAV scp file
- wspecifier
Write specifier
Named Arguments¶
- --write-num-frames
Specify wspecifer for utt2num_frames
- --filetype
Possible choices: mat, hdf5, sound.hdf5, sound
Specify the file format for output. “mat” is the matrix format in kaldi
Default: “mat”
- --format
The file format for output pcm. This option is only valid when “–filetype” is “sound.hdf5” or “sound”
- --compress
Save in compressed format
Default: False
- --compression-method
Specify the method(if mat) or gzip-level(if hdf5)
Default: 2
- --verbose, -V
Verbose option
Default: 0
- --normalize
Possible choices: 1, 16, 24, 32
Give the bit depth of the PCM, then normalizes data to scale in [-1,1]
- --preprocess-conf
The configuration file for the pre-processing
- --keep-length
Truncating or zero padding if the output length is changed from the input by preprocessing
Default: True
- --segments
segments-file format: each line is either<segment-id> <recording-id> <start-time> <end-time>e.g. call-861225-A-0050-0065 call-861225-A 5.0 6.5
eval-source-separation.py¶
Evaluate enhanced speech. e.g. /home/runner/work/espnet/espnet/tools/venv/bin/sphinx-build –ref ref.scp –enh enh.scp –outdir outputdiror /home/runner/work/espnet/espnet/tools/venv/bin/sphinx-build –ref ref.scp ref2.scp –enh enh.scp enh2.scp –outdir outputdir
usage: eval-source-separation.py [-h] [--verbose VERBOSE] --ref REFFILES
[REFFILES ...] --enh ENHFILES [ENHFILES ...]
--outdir OUTDIR [--keylist KEYLIST]
[--evaltypes {SDR,STOI,ESTOI,PESQ} [{SDR,STOI,ESTOI,PESQ} ...]]
[--permutation PERMUTATION]
[--bss-eval-images BSS_EVAL_IMAGES]
[--bss-eval-version {v3,v4}]
Named Arguments¶
- --verbose, -V
Verbose option
Default: 0
- --ref
WAV file lists for reference
- --enh
WAV files lists for enhanced
- --outdir
- --keylist
Specify the target samples. By default, using all keys in the first reference file
- --evaltypes
Possible choices: SDR, STOI, ESTOI, PESQ
Default: [‘SDR’, ‘STOI’, ‘ESTOI’, ‘PESQ’]
- --permutation
Compute all permutations or use the pair of input order
Default: True
- --bss-eval-images
Use bss_eval_images or bss_eval_sources. For more detail, see museval source codes.
Default: True
- --bss-eval-version
Possible choices: v3, v4
Specify bss-eval-version: v3 or v4
Default: “v3”
eval_perm_free_error.py¶
evaluate permutation-free error
usage: eval_perm_free_error.py [-h] [--num-spkrs NUM_SPKRS]
results [results ...]
Positional Arguments¶
- results
the scores between references and hypotheses, in ascending order of references (1st) and hypotheses (2nd), e.g. [r1h1, r1h2, r2h1, r2h2] in 2-speaker-mix case.
Named Arguments¶
- --num-spkrs
number of mixed speakers.
Default: 2
feat-to-shape.py¶
convert feature to its shape
usage: feat-to-shape.py [-h] [--verbose VERBOSE]
[--filetype {mat,hdf5,sound.hdf5,sound}]
[--preprocess-conf PREPROCESS_CONF]
rspecifier [out]
Positional Arguments¶
- rspecifier
Read specifier for feats. e.g. ark:some.ark
- out
The output filename. If omitted, then output to sys.stdout
Default: <_io.TextIOWrapper name=’<stdout>’ mode=’w’ encoding=’utf-8’>
Named Arguments¶
- --verbose, -V
Verbose option
Default: 0
- --filetype
Possible choices: mat, hdf5, sound.hdf5, sound
Specify the file format for the rspecifier. “mat” is the matrix format in kaldi
Default: “mat”
- --preprocess-conf
The configuration file for the pre-processing
feats2npy.py¶
Convet kaldi-style features to numpy arrays
usage: feats2npy.py [-h] scp_file out_dir
Positional Arguments¶
- scp_file
scp file
- out_dir
output directory
filt.py¶
filter words in a text file
usage: filt.py [-h] [--exclude] filt infile
Positional Arguments¶
- filt
filter list
- infile
input file
Named Arguments¶
- --exclude, -v
exclude filter words
Default: False
generate_wav_from_fbank.py¶
generate wav from FBANK using wavenet vocoder
usage: generate_wav_from_fbank.py [-h] [--fs FS] [--n_fft N_FFT]
[--n_shift N_SHIFT] [--model MODEL]
[--filetype {mat,hdf5}]
rspecifier outdir
Positional Arguments¶
- rspecifier
Input feature e.g. scp:feat.scp
- outdir
Output directory
Named Arguments¶
- --fs
Sampling frequency
Default: 22050
- --n_fft
FFT length in point
Default: 1024
- --n_shift
Shift length in point
Default: 256
- --model
WaveNet model
- --filetype
Possible choices: mat, hdf5
Specify the file format for the rspecifier. “mat” is the matrix format in kaldi
Default: “mat”
get_yaml.py¶
get a specified attribute from a YAML file
usage: get_yaml.py [-h] inyaml attr
Positional Arguments¶
- inyaml
- attr
foo.bar will access yaml.load(inyaml)[“foo”][“bar”]
json2sctm.py¶
convert json to sctm
usage: json2sctm.py [-h] [--num-spkrs [NUM_SPKRS]] [--refs [REFS [REFS ...]]]
[--hyps [HYPS [HYPS ...]]] [--orig-stm [ORIG_STM]]
[--stm STM [STM ...]] [--ctm CTM [CTM ...]] [--bpe [BPE]]
[json] dict
Positional Arguments¶
- json
input trn
- dict
dict
Named Arguments¶
- --num-spkrs
number of speakers
Default: 1
- --refs
ref for all speakers
- --hyps
hyp for all outputs
- --orig-stm
orig stm
- --stm
output stm
- --ctm
output ctm
- --bpe
BPE model if applicable
json2text.py¶
convert ASR recognized json to text
usage: json2text.py [-h] json dict ref hyp
Positional Arguments¶
- json
json files
- dict
dict
- ref
ref
- hyp
hyp
json2trn.py¶
convert a json to a transcription file with a token dictionary
usage: json2trn.py [-h] [--num-spkrs NUM_SPKRS] [--refs REFS [REFS ...]]
[--hyps HYPS [HYPS ...]]
json dict
Positional Arguments¶
- json
json files
- dict
dict
Named Arguments¶
- --num-spkrs
number of speakers
Default: 1
- --refs
ref for all speakers
- --hyps
hyp for all outputs
json2trn_mt.py¶
convert json to machine translation transcription
usage: json2trn_mt.py [-h] [--refs REFS [REFS ...]] [--hyps HYPS [HYPS ...]]
[--srcs SRCS [SRCS ...]] [--dict-src [DICT_SRC]]
json dict
Positional Arguments¶
- json
json files
- dict
dict for target language
Named Arguments¶
- --refs
ref for all speakers
- --hyps
hyp for all outputs
- --srcs
src for all outputs
- --dict-src
dict for source language
Default: False
json2trn_wo_dict.py¶
convert a json to a transcription file with a token dictionary
usage: json2trn_wo_dict.py [-h] [--num-spkrs NUM_SPKRS]
[--refs REFS [REFS ...]] [--hyps HYPS [HYPS ...]]
json
Positional Arguments¶
- json
json files
Named Arguments¶
- --num-spkrs
number of speakers
Default: 1
- --refs
ref for all speakers
- --hyps
hyp for all outputs
make_pair_json.py¶
Merge source and target data.json files into one json file.
usage: make_pair_json.py [-h] [--src-json SRC_JSON] [--trg-json TRG_JSON]
[--num_utts NUM_UTTS] [--verbose VERBOSE] [--out OUT]
Named Arguments¶
- --src-json
Json file for the source speaker
- --trg-json
Json file for the target speaker. If not specified, use source only.
- --num_utts
Number of utterances (take from head)
Default: -1
- --verbose, -V
Verbose option
Default: 1
- --out, -O
The output filename. If omitted, then output to sys.stdout
mcd_calculate.py¶
calculate MCD.
usage: mcd_calculate.py [-h] --wavdir WAVDIR --gtwavdir GTWAVDIR
[--mcep_dim MCEP_DIM] [--mcep_alpha MCEP_ALPHA]
[--fftl FFTL] [--shiftms SHIFTMS] --f0min F0MIN
--f0max F0MAX [--n_jobs N_JOBS]
Named Arguments¶
- --wavdir
path of directory for converted waveforms
- --gtwavdir
path of directory for ground truth waveforms
- --mcep_dim
dimension of mel cepstrum coefficient
Default: 41
- --mcep_alpha
all pass constant
Default: 0.41
- --fftl
fft length
Default: 1024
- --shiftms
frame shift (ms)
Default: 5
- --f0min
fo search range (min)
- --f0max
fo search range (max)
- --n_jobs
number of parallel jobs
Default: 40
merge_scp2json.py¶
Given each file paths with such format as <key>:<file>:<type>. type> can be omitted and the default is “str”. e.g. /home/runner/work/espnet/espnet/tools/venv/bin/sphinx-build –input-scps feat:data/feats.scp shape:data/utt2feat_shape:shape –input-scps feat:data/feats2.scp shape:data/utt2feat2_shape:shape –output-scps text:data/text shape:data/utt2text_shape:shape –scps utt2spk:data/utt2spk
usage: merge_scp2json.py [-h] [--input-scps [INPUT_SCPS [INPUT_SCPS ...]]]
[--output-scps [OUTPUT_SCPS [OUTPUT_SCPS ...]]]
[--scps SCPS [SCPS ...]] [--verbose VERBOSE]
[--allow-one-column ALLOW_ONE_COLUMN] [--out OUT]
Named Arguments¶
- --input-scps
Json files for the inputs
Default: []
- --output-scps
Json files for the outputs
Default: []
- --scps
The json files except for the input and outputs
Default: []
- --verbose, -V
Verbose option
Default: 1
- --allow-one-column
Allow one column in input scp files. In this case, the value will be empty string.
Default: False
- --out, -O
The output filename. If omitted, then output to sys.stdout
mergejson.py¶
merge json files
usage: mergejson.py [-h] [--input-jsons INPUT_JSONS [INPUT_JSONS ...]]
[--output-jsons OUTPUT_JSONS [OUTPUT_JSONS ...]]
[--jsons JSONS [JSONS ...]] [--verbose VERBOSE]
[-O OUTPUT]
Named Arguments¶
- --input-jsons
Json files for the inputs
Default: []
- --output-jsons
Json files for the outputs
Default: []
- --jsons
The json files except for the input and outputs
Default: []
- --verbose, -V
Verbose option
Default: 0
- -O
Output json file
mix-mono-wav-scp.py¶
Mixing wav.scp files into a multi-channel wav.scp using sox.
usage: mix-mono-wav-scp.py [-h] scp [scp ...] [out]
Positional Arguments¶
- scp
Give wav.scp
- out
The output filename. If omitted, then output to sys.stdout
Default: <encodings.utf_8.StreamWriter object at 0x7f62074f3d00>
result2json.py¶
convert sclite’s result.txt file to json
usage: result2json.py [-h] [--key KEY]
Named Arguments¶
- --key, -k
key
score_lang_id.py¶
language identification scoring
usage: score_lang_id.py [-h] --ref REF --hyp HYP [--out OUT]
Named Arguments¶
- --ref
input reference
- --hyp
input hypotheses
- --out
The output filename. If omitted, then output to sys.stdout
Default: <encodings.utf_8.StreamWriter object at 0x7f62074f3d00>
scp2json.py¶
convert scp to json
usage: scp2json.py [-h] [--key KEY]
Named Arguments¶
- --key, -k
key
splitjson.py¶
split a json file for parallel processing
usage: splitjson.py [-h] [--parts PARTS] json
Positional Arguments¶
- json
json file
Named Arguments¶
- --parts, -p
Number of subparts to be prepared
Default: 0
text2token.py¶
convert raw text to tokenized text
usage: text2token.py [-h] [--nchar NCHAR] [--skip-ncols SKIP_NCOLS]
[--space SPACE] [--non-lang-syms NON_LANG_SYMS]
[--trans_type {char,phn}]
[text]
Positional Arguments¶
- text
input text
Default: False
Named Arguments¶
- --nchar, -n
number of characters to split, i.e., aabb -> a a b b with -n 1 and aa bb with -n 2
Default: 1
- --skip-ncols, -s
skip first n columns
Default: 0
- --space
space symbol
Default: “<space>”
- --non-lang-syms, -l
list of non-linguistic symobles, e.g., <NOISE> etc.
- --trans_type, -t
Possible choices: char, phn
- Transcript type. char/phn. e.g., for TIMIT FADG0_SI1279 -
If trans_type is char, read from SI1279.WRD file -> “bricks are an alternative” Else if trans_type is phn, read from SI1279.PHN file -> “sil b r ih sil k s aa r er n aa l sil t er n ih sil t ih v sil”
Default: “char”
text2vocabulary.py¶
create a vocabulary file from text files
usage: text2vocabulary.py [-h] [--output OUTPUT] [--cutoff CUTOFF]
[--vocabsize VOCABSIZE]
[text_files [text_files ...]]
Positional Arguments¶
- text_files
input text files
Named Arguments¶
- --output, -o
output a vocabulary file
Default: “”
- --cutoff, -c
cut-off frequency
Default: 0
- --vocabsize, -s
vocabulary size
Default: 20000
trim_silence.py¶
Trim slience with simple power thresholding and make segments file.
usage: trim_silence.py [-h] [--fs FS] [--threshold THRESHOLD]
[--win_length WIN_LENGTH] [--shift_length SHIFT_LENGTH]
[--min_silence MIN_SILENCE] [--figdir FIGDIR]
[--verbose VERBOSE] [--normalize {1,16,24,32}]
rspecifier wspecifier
Positional Arguments¶
- rspecifier
WAV scp file.
- wspecifier
Segments file.
Named Arguments¶
- --fs
Sampling frequency.
- --threshold
Threshold in decibels.
Default: 60
- --win_length
Analysis window length in point.
Default: 1200
- --shift_length
Shift length in point.
Default: 300
- --min_silence
Minimum silence length in sec.
Default: 0.01
- --figdir
Directory to save figures.
- --verbose
Verbosity level.
Default: 0
- --normalize
Possible choices: 1, 16, 24, 32
Give the bit depth of the PCM, then normalizes data to scale in [-1,1].
trn2ctm.py¶
convert trn to ctm
usage: trn2ctm.py [-h] [trn] [ctm]
Positional Arguments¶
- trn
input trn
- ctm
output ctm
trn2stm.py¶
convert trn to stm
usage: trn2stm.py [-h] [--orig-stm [ORIG_STM]] [trn] [stm]
Positional Arguments¶
- trn
input trn
- stm
output stm
Named Arguments¶
- --orig-stm
Original stm file to add additional information to the generated one