espnet2.fileio package¶
espnet2.fileio.datadir_writer¶
-
class
espnet2.fileio.datadir_writer.
DatadirWriter
(p: Union[pathlib.Path, str])[source]¶ Bases:
object
Writer class to create kaldi like data directory.
Examples
>>> with DatadirWriter("output") as writer: ... # output/sub.txt is created here ... subwriter = writer["sub.txt"] ... # Write "uttidA some/where/a.wav" ... subwriter["uttidA"] = "some/where/a.wav" ... subwriter["uttidB"] = "some/where/b.wav"
espnet2.fileio.score_scp¶
-
class
espnet2.fileio.score_scp.
MIDReader
(fname, add_rest=True, dtype=<class 'numpy.int16'>)[source]¶ Bases:
collections.abc.Mapping
Reader class for ‘mid.scp’.
Examples
key1 /some/path/a.mid key2 /some/path/b.mid key3 /some/path/c.mid key4 /some/path/d.mid …
>>> reader = XMLScpReader('mid.scp') >>> tempo, note_list = reader['key1']
-
class
espnet2.fileio.score_scp.
SingingScoreReader
(fname, dtype=<class 'numpy.int16'>)[source]¶ Bases:
collections.abc.Mapping
Reader class for ‘score.scp’.
Examples
key1 /some/path/score.json key2 /some/path/score.json key3 /some/path/score.json key4 /some/path/score.json …
>>> reader = SoundScpReader('score.scp') >>> score = reader['key1']
-
class
espnet2.fileio.score_scp.
SingingScoreWriter
(outdir: Union[pathlib.Path, str], scpfile: Union[pathlib.Path, str])[source]¶ Bases:
object
Writer class for ‘score.scp’
Examples
key1 /some/path/score.json key2 /some/path/score.json key3 /some/path/score.json key4 /some/path/score.json …
>>> writer = SingingScoreWriter('./data/', './data/score.scp') >>> writer['aa'] = score_obj >>> writer['bb'] = score_obj
-
class
espnet2.fileio.score_scp.
XMLReader
(fname, dtype=<class 'numpy.int16'>)[source]¶ Bases:
collections.abc.Mapping
Reader class for ‘xml.scp’.
Examples
key1 /some/path/a.xml key2 /some/path/b.xml key3 /some/path/c.xml key4 /some/path/d.xml …
>>> reader = XMLScpReader('xml.scp') >>> tempo, note_list = reader['key1']
-
class
espnet2.fileio.score_scp.
XMLWriter
(outdir: Union[pathlib.Path, str], scpfile: Union[pathlib.Path, str])[source]¶ Bases:
object
Writer class for ‘midi.scp’
Examples
key1 /some/path/a.musicxml key2 /some/path/b.musicxml key3 /some/path/c.musicxml key4 /some/path/d.musicxml …
>>> writer = XMLScpWriter('./data/', './data/xml.scp') >>> writer['aa'] = xml_obj >>> writer['bb'] = xml_obj
espnet2.fileio.sound_scp¶
-
class
espnet2.fileio.sound_scp.
SoundScpReader
(fname, dtype=None, always_2d: bool = False, multi_columns: bool = False, concat_axis=1)[source]¶ Bases:
collections.abc.Mapping
Reader class for ‘wav.scp’.
Examples
wav.scp is a text file that looks like the following:
key1 /some/path/a.wav key2 /some/path/b.wav key3 /some/path/c.wav key4 /some/path/d.wav …
>>> reader = SoundScpReader('wav.scp') >>> rate, array = reader['key1']
If multi_columns=True is given and multiple files are given in one line with space delimiter, and the output array are concatenated along channel direction
key1 /some/path/a.wav /some/path/a2.wav key2 /some/path/b.wav /some/path/b2.wav …
>>> reader = SoundScpReader('wav.scp', multi_columns=True) >>> rate, array = reader['key1']
In the above case, a.wav and a2.wav are concatenated.
Note that even if multi_columns=True is given, SoundScpReader still supports a normal wav.scp, i.e., a wav file is given per line, but this option is disable by default because dict[str, list[str]] object is needed to be kept, but it increases the required amount of memory.
-
class
espnet2.fileio.sound_scp.
SoundScpWriter
(outdir: Union[pathlib.Path, str], scpfile: Union[pathlib.Path, str], format='wav', multi_columns: bool = False, output_name_format: str = '{key}.{audio_format}', output_name_format_multi_columns: str = '{key}-CH{channel}.{audio_format}', subtype: str = None)[source]¶ Bases:
object
Writer class for ‘wav.scp’
- Parameters:
outdir –
scpfile –
format – The output audio format
multi_columns – Save multi channel data as multiple monaural audio files
output_name_format – The naming formam of generated audio files
output_name_format_multi_columns – The naming formam of generated audio files when multi_columns is given
dtype –
subtype –
Examples
>>> writer = SoundScpWriter('./data/', './data/wav.scp') >>> writer['aa'] = 16000, numpy_array >>> writer['bb'] = 16000, numpy_array
aa ./data/aa.wav bb ./data/bb.wav
>>> writer = SoundScpWriter( './data/', './data/feat.scp', multi_columns=True, ) >>> numpy_array.shape (100, 2) >>> writer['aa'] = 16000, numpy_array
aa ./data/aa-CH0.wav ./data/aa-CH1.wav
espnet2.fileio.vad_scp¶
-
class
espnet2.fileio.vad_scp.
VADScpReader
(fname, dtype=<class 'numpy.float32'>)[source]¶ Bases:
collections.abc.Mapping
Reader class for ‘vad.scp’.
Different from segments, the vad.scp would focus on utterance-level, while the segments are expected to focus on a whole session. The major usage in ESPnet is to guide the silence trim for UASR.
Examples
key1 0:1.2000 key2 3.0000:4.5000 7.0000:9:0000 …
>>> reader = VADScpReader('wav.scp') >>> array = reader['key1']
-
class
espnet2.fileio.vad_scp.
VADScpWriter
(scpfile: Union[pathlib.Path, str], dtype=None)[source]¶ Bases:
object
Writer class for ‘vad.scp’
Examples
key1 0:1.2000 key2 3.0000:4.5000 7.0000:9:0000 …
>>> writer = VADScpWriter('./data/vad.scp') >>> writer['aa'] = list of tuples >>> writer['bb'] = list of tuples
espnet2.fileio.rand_gen_dataset¶
-
class
espnet2.fileio.rand_gen_dataset.
FloatRandomGenerateDataset
(shape_file: Union[pathlib.Path, str], dtype: Union[str, numpy.dtype] = 'float32', loader_type: str = 'csv_int')[source]¶ Bases:
collections.abc.Mapping
Generate float array from shape.txt.
Examples
shape.txt uttA 123,83 uttB 34,83 >>> dataset = FloatRandomGenerateDataset(“shape.txt”) >>> array = dataset[“uttA”] >>> assert array.shape == (123, 83) >>> array = dataset[“uttB”] >>> assert array.shape == (34, 83)
-
class
espnet2.fileio.rand_gen_dataset.
IntRandomGenerateDataset
(shape_file: Union[pathlib.Path, str], low: int, high: int = None, dtype: Union[str, numpy.dtype] = 'int64', loader_type: str = 'csv_int')[source]¶ Bases:
collections.abc.Mapping
Generate float array from shape.txt
Examples
shape.txt uttA 123,83 uttB 34,83 >>> dataset = IntRandomGenerateDataset(“shape.txt”, low=0, high=10) >>> array = dataset[“uttA”] >>> assert array.shape == (123, 83) >>> array = dataset[“uttB”] >>> assert array.shape == (34, 83)
espnet2.fileio.__init__¶
espnet2.fileio.npy_scp¶
-
class
espnet2.fileio.npy_scp.
NpyScpReader
(fname: Union[pathlib.Path, str])[source]¶ Bases:
collections.abc.Mapping
Reader class for a scp file of numpy file.
Examples
key1 /some/path/a.npy key2 /some/path/b.npy key3 /some/path/c.npy key4 /some/path/d.npy …
>>> reader = NpyScpReader('npy.scp') >>> array = reader['key1']
-
class
espnet2.fileio.npy_scp.
NpyScpWriter
(outdir: Union[pathlib.Path, str], scpfile: Union[pathlib.Path, str])[source]¶ Bases:
object
Writer class for a scp file of numpy file.
Examples
key1 /some/path/a.npy key2 /some/path/b.npy key3 /some/path/c.npy key4 /some/path/d.npy …
>>> writer = NpyScpWriter('./data/', './data/feat.scp') >>> writer['aa'] = numpy_array >>> writer['bb'] = numpy_array
espnet2.fileio.read_text¶
-
class
espnet2.fileio.read_text.
RandomTextReader
(text_and_scp: str)[source]¶ Bases:
collections.abc.Mapping
Reader class for random access to text.
- Simple text reader for non-pair text data (for unsupervised ASR)
Instead of loading the whole text into memory (often large for UASR), the reader consumes text which stores in byte-offset of each text file and randomly selected unpaired text from it for training using mmap.
- Examples:
- text
text1line text2line text3line
- scp
11 00000000000000000010 00000000110000000020 00000000210000000030
- scp explanation
(number of digits per int value) (text start at bytes 0 and end at bytes 10 (including “
- “))
(text start at bytes 11 and end at bytes 20 (including “
- “))
(text start at bytes 21 and end at bytes 30 (including “
“))
-
espnet2.fileio.read_text.
load_num_sequence_text
(path: Union[pathlib.Path, str], loader_type: str = 'csv_int') → Dict[str, List[Union[float, int]]][source]¶ Read a text file indicating sequences of number
Examples
key1 1 2 3 key2 34 5 6
>>> d = load_num_sequence_text('text') >>> np.testing.assert_array_equal(d["key1"], np.array([1, 2, 3]))
-
espnet2.fileio.read_text.
read_2columns_text
(path: Union[pathlib.Path, str]) → Dict[str, str][source]¶ Read a text file having 2 columns as dict object.
Examples
- wav.scp:
key1 /some/path/a.wav key2 /some/path/b.wav
>>> read_2columns_text('wav.scp') {'key1': '/some/path/a.wav', 'key2': '/some/path/b.wav'}
-
espnet2.fileio.read_text.
read_label
(path: Union[pathlib.Path, str]) → Dict[str, List[Union[float, int]]][source]¶ Read a text file indicating sequences of number
Examples
key1 start_time_1 end_time_1 phone_1 start_time_2 end_time_2 phone_2 ….
key2 start_time_1 end_time_1 phone_1
>>> d = load_num_sequence_text('label') >>> np.testing.assert_array_equal(d["key1"], [0.1, 0.2, "啊"]))
-
espnet2.fileio.read_text.
read_multi_columns_text
(path: Union[pathlib.Path, str], return_unsplit: bool = False) → Tuple[Dict[str, List[str]], Optional[Dict[str, str]]][source]¶ Read a text file having 2 or more columns as dict object.
Examples
- wav.scp:
key1 /some/path/a1.wav /some/path/a2.wav key2 /some/path/b1.wav /some/path/b2.wav /some/path/b3.wav key3 /some/path/c1.wav …
>>> read_multi_columns_text('wav.scp') {'key1': ['/some/path/a1.wav', '/some/path/a2.wav'], 'key2': ['/some/path/b1.wav', '/some/path/b2.wav', '/some/path/b3.wav'], 'key3': ['/some/path/c1.wav']}
espnet2.fileio.rttm¶
-
class
espnet2.fileio.rttm.
RttmReader
(fname: str)[source]¶ Bases:
collections.abc.Mapping
Reader class for ‘rttm.scp’.
Examples
SPEAKER file1 1 0 1023 <NA> <NA> spk1 <NA> SPEAKER file1 2 4000 3023 <NA> <NA> spk2 <NA> SPEAKER file1 3 500 4023 <NA> <NA> spk1 <NA> END file1 <NA> 4023 <NA> <NA> <NA> <NA>
This is an extend version of standard RTTM format for espnet. The difference including: 1. Use sample number instead of absolute time 2. has a END label to represent the duration of a recording 3. replace duration (5th field) with end time (For standard RTTM,
…
>>> reader = RttmReader('rttm') >>> spk_label = reader["file1"]