pytorch_end2end package

Module contents

class pytorch_end2end.CTCDecoder(beam_width=100, after_logsoftmax=False, blank_idx=0, time_major=False, labels=None, lm_path=None, lmwt=1.0, wip=1.0, oov_penalty=- 10, case_sensitive=True)[source]

Decoder class to perform CTC decoding

Parameters
  • beam_width

    width of beam (number of stored hypotheses), default 100.

    If 1, decoder always perform greedy (argmax) decoding

  • after_logsoftmax

    if log_logits (logits after log softmax), default False

    (with False decoder expects pure logits, not after softmax). Greedy decoding ignores this parameter and can work with pure logits, or after log softmax

  • blank_idx – id of blank label, default 0

  • time_major – if logits are time major (else batch major)

  • labels – list of strings with labels (including blank symbol), e.g. ["_", "a", "b", "c"]

  • lm_path – path to language model (ARPA format or gzipped ARPA)

  • lmwt – language model weight, default 1.0, makes sense only if language model is present

  • wip – word insertion penalty, default 1.0, makes sense only if labels are present

  • oov_penalty – penalty for each oov word, default -10.0

  • case_sensitive – obtain language model scores with respect to case, default False

decode(logits, logits_lengths=None)[source]

Performs prefix beam search decoding as described in https://arxiv.org/abs/1408.2873

Parameters
  • logits

    tensor with neural network outputs after logsoftmax

    of shape (sequence_length, batch_size, alphabet_size) if time_major

    else of shape (batch_size, sequence_length, alphabet_size)

  • logits_lengths – default None

Returns

namedtuple(decoded_targets, decoded_targets_lengths, decoded_sentences)

decoded_targets:

tensor with result targets of shape (batch_size, sequence_length), doesn’t contain blank symbols

decoded_targets_length:

tensor with lengths of decoded targets

decoded_sentences:

list of strings, shape (batch_size). If labels are None, list of empty string is returned.

decode_greedy(logits, logits_lengths=None)[source]

Performs greedy (argmax) decoding

Parameters
  • logits

    tensor with neural network outputs after logsoftmax

    of shape (sequence_length, batch_size, alphabet_size) if time_major

    else of shape (batch_size, sequence_length, alphabet_size)

  • logits_lengths – default None

Returns

(decoded_targets, decoded_targets_lengths, decoded_sentences)

decoded_targets:

tensor with result targets of shape (batch_size, sequence_length), doesn’t contain blank symbols

decoded_targets_length:

tensor with lengths of decoded targets

decoded_sentences:

list of strings, shape (batch_size). If labels are None, list of empty string is returned.

class pytorch_end2end.CTCEncoder(characters, blank_id=0, transform_fn=<method 'upper' of 'str' objects>)[source]

Simple CTC-encoder for text

clean(text)[source]
decode(ids_list)[source]
decode_pure(ids_list)[source]
encode(text)[source]
class pytorch_end2end.CTCLoss(size_average=None, reduce=None, after_logsoftmax=False, time_major=False, blank_idx=0)[source]

Criterion to compute CTC Loss as described in http://www.cs.toronto.edu/~graves/icml_2006.pdf

Parameters
  • size_average – if compute average loss (only if reduce is True)

  • reduce – if compute mean or average loss (if None, returns full tensor of shape (batch_size,))

  • after_logsoftmax

    if logsoftmax is used before passing neural network outputs

    (else takes pure network outputs)

  • time_major – if logits are time major (or batch major), default True

  • blank_idx – id of blank label, default 0

training: bool