espnet.lm package¶
Initialize sub package.
espnet.lm.__init__¶
Initialize sub package.
espnet.lm.lm_utils¶
-
class
espnet.lm.lm_utils.
MakeSymlinkToBestModel
(key, prefix='model', suffix='best')[source]¶ Bases:
chainer.training.extension.Extension
Extension that makes a symbolic link to the best model
- Parameters:
key (str) – Key of value
prefix (str) – Prefix of model files and link target
suffix (str) – Suffix of link target
-
class
espnet.lm.lm_utils.
ParallelSentenceIterator
(dataset, batch_size, max_length=0, sos=0, eos=0, repeat=True, shuffle=True)[source]¶ Bases:
chainer.dataset.iterator.Iterator
Dataset iterator to create a batch of sentences.
This iterator returns a pair of sentences, where one token is shifted between the sentences like ‘<sos> w1 w2 w3’ and ‘w1 w2 w3 <eos>’ Sentence batches are made in order of longer sentences, and then randomly shuffled.
-
property
epoch_detail
¶
-
property
previous_epoch_detail
¶
-
serialize
(serializer)[source]¶ Serializes the internal state of the iterator.
This is a method to support the serializer protocol of Chainer.
Note
It should only serialize the internal state that changes over the iteration. It should not serialize what is set manually by users such as the batch size.
-
espnet.lm.lm_utils.
compute_perplexity
(result)[source]¶ Computes and add the perplexity to the LogReport
- Parameters:
result (dict) – The current observations
-
espnet.lm.lm_utils.
count_tokens
(data, unk_id=None)[source]¶ Count tokens and oovs in token ID sequences.
- Parameters:
data (list[np.ndarray]) – list of token ID sequences
unk_id (int) – ID of unknown token
- Returns:
tuple of number of token occurrences and number of oov tokens
- Return type:
tuple
-
espnet.lm.lm_utils.
load_dataset
(path, label_dict, outdir=None)[source]¶ Load and save HDF5 that contains a dataset and stats for LM
- Parameters:
path (str) – The path of an input text dataset file
label_dict (dict[str, int]) – dictionary that maps token label string to its ID number
outdir (str) – The path of an output dir
- Returns:
- Tuple of
token IDs in np.int32 converted by read_tokens the number of tokens by count_tokens, and the number of OOVs by count_tokens
- Return type:
tuple[list[np.ndarray], int, int]
-
espnet.lm.lm_utils.
make_lexical_tree
(word_dict, subword_dict, word_unk)[source]¶ Make a lexical tree to compute word-level probabilities
-
espnet.lm.lm_utils.
read_tokens
(filename, label_dict)[source]¶ Read tokens as a sequence of sentences
:param str filename : The name of the input file :param dict label_dict : dictionary that maps token label string to its ID number :return list of ID sequences :rtype list
espnet.lm.chainer_backend.lm¶
-
class
espnet.lm.chainer_backend.lm.
BPTTUpdater
(train_iter, optimizer, schedulers, device, accum_grad)[source]¶ Bases:
chainer.training.updaters.standard_updater.StandardUpdater
An updater for a chainer LM
:param chainer.dataset.Iterator train_iter : The train iterator :param optimizer: :param schedulers: :param int device : The device id :param int accum_grad :
-
class
espnet.lm.chainer_backend.lm.
ClassifierWithState
(predictor, lossfun=<function softmax_cross_entropy>, label_key=-1)[source]¶ Bases:
chainer.link.Chain
A wrapper for a chainer RNNLM
:param link.Chain predictor : The RNNLM :param function lossfun: The loss function to use :param int/str label_key:
-
class
espnet.lm.chainer_backend.lm.
DefaultRNNLM
(**links)[source]¶ Bases:
espnet.nets.lm_interface.LMInterface
,chainer.link.Chain
Default RNNLM wrapper to compute reduce framewise loss values.
- Parameters:
n_vocab (int) – The size of the vocabulary
args (argparse.Namespace) – configurations. see add_arguments
-
class
espnet.lm.chainer_backend.lm.
LMEvaluator
(val_iter, eval_model, device)[source]¶ Bases:
espnet.utils.training.evaluator.BaseEvaluator
A custom evaluator for a chainer LM
:param chainer.dataset.Iterator val_iter : The validation iterator :param eval_model : The model to evaluate :param int device : The device id to use
-
evaluate
()[source]¶ Evaluates the model and returns a result dictionary.
This method runs the evaluation loop over the validation dataset. It accumulates the reported values to
DictSummary
and returns a dictionary whose values are means computed by the summary.Note that this function assumes that the main iterator raises
StopIteration
or code in the evaluation loop raises an exception. So, if this assumption is not held, the function could be caught in an infinite loop.Users can override this method to customize the evaluation routine.
Note
This method encloses
eval_func
calls withfunction.no_backprop_mode()
context, so all calculations usingFunctionNode
s insideeval_func
do not make computational graphs. It is for reducing the memory consumption.- Returns:
Result dictionary. This dictionary is further reported via
report()
without specifying any observer.- Return type:
dict
-
class
espnet.lm.chainer_backend.lm.
RNNLM
(n_vocab, n_layers, n_units, typ='lstm')[source]¶ Bases:
chainer.link.Chain
A chainer RNNLM
- Parameters:
n_vocab (int) – The size of the vocabulary
n_layers (int) – The number of layers to create
n_units (int) – The number of units per layer
type (str) – The RNN type
-
espnet.lm.chainer_backend.lm.
train
(args)[source]¶ Train with the given args
- Parameters:
args (Namespace) – The program arguments
espnet.lm.chainer_backend.extlm¶
espnet.lm.chainer_backend.__init__¶
Initialize sub package.
espnet.lm.pytorch_backend.lm¶
LM training in pytorch.
-
class
espnet.lm.pytorch_backend.lm.
BPTTUpdater
(train_iter, model, optimizer, schedulers, device, gradclip=None, use_apex=False, accum_grad=1)[source]¶ Bases:
chainer.training.updaters.standard_updater.StandardUpdater
An updater for a pytorch LM.
Initialize class.
- Parameters:
train_iter (chainer.dataset.Iterator) – The train iterator
model (LMInterface) – The model to update
optimizer (torch.optim.Optimizer) – The optimizer for training
schedulers (espnet.scheduler.scheduler.SchedulerInterface) – The schedulers of optimizer
device (int) – The device id
gradclip (float) – The gradient clipping value to use
use_apex (bool) – The flag to use Apex in backprop.
accum_grad (int) – The number of gradient accumulation.
-
class
espnet.lm.pytorch_backend.lm.
LMEvaluator
(val_iter, eval_model, reporter, device)[source]¶ Bases:
espnet.utils.training.evaluator.BaseEvaluator
A custom evaluator for a pytorch LM.
Initialize class.
:param chainer.dataset.Iterator val_iter : The validation iterator :param LMInterface eval_model : The model to evaluate :param chainer.Reporter reporter : The observations reporter :param int device : The device id to use
-
class
espnet.lm.pytorch_backend.lm.
Reporter
(**links)[source]¶ Bases:
chainer.link.Chain
Dummy module to use chainer’s trainer.
-
espnet.lm.pytorch_backend.lm.
compute_perplexity
(result)[source]¶ Compute and add the perplexity to the LogReport.
- Parameters:
result (dict) – The current observations
-
espnet.lm.pytorch_backend.lm.
concat_examples
(batch, device=None, padding=None)[source]¶ Concat examples in minibatch.
- Parameters:
batch (np.ndarray) – The batch to concatenate
device (int) – The device to send to
padding (Tuple[int,int]) – The padding to use
- Returns:
(inputs, targets)
:rtype (torch.Tensor, torch.Tensor)
espnet.lm.pytorch_backend.extlm¶
-
class
espnet.lm.pytorch_backend.extlm.
LookAheadWordLM
(wordlm, word_dict, subword_dict, oov_penalty=0.0001, open_vocab=True)[source]¶ Bases:
torch.nn.modules.module.Module
-
forward
(state, x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.-
logzero
= -10000000000.0¶
-
zero
= 1e-10¶
-
class
espnet.lm.pytorch_backend.extlm.
MultiLevelLM
(wordlm, subwordlm, word_dict, subword_dict, subwordlm_weight=0.8, oov_penalty=1.0, open_vocab=True)[source]¶ Bases:
torch.nn.modules.module.Module
-
forward
(state, x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.-
logzero
= -10000000000.0¶
-
zero
= 1e-10¶
espnet.lm.pytorch_backend.__init__¶
Initialize sub package.
-
-
-
-
-
-
-
property