s3_align.c File Reference

Engine for Sphinx 3 aligner. More...

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <feat.h>
#include <strfuncs.h>
#include <s3types.h>
#include "mdef.h"
#include "tmat.h"
#include "dict.h"
#include "logs3.h"
#include "s3_align.h"

Classes

struct  pnode_s
struct  plink_s
struct  history_s
struct  snode_s
struct  slink_s

Defines

#define ACTIVE_LIST_SIZE_INCR   16380

Typedefs

typedef struct pnode_s pnode_t
typedef struct plink_s plink_t
typedef struct history_s history_t
typedef struct snode_s snode_t
typedef struct slink_s slink_t

Functions

int32 align_build_sent_hmm (char *wordstr, int insert_sil)
int32 align_destroy_sent_hmm (void)
void align_sen_active (uint8 *senlist, int32 n_sen)
int32 align_start_utt (char *uttid)
int32 align_frame (int32 *senscr)
int32 align_end_utt (align_stseg_t **stseg_out, align_phseg_t **phseg_out, align_wdseg_t **wdseg_out)
int32 align_init (mdef_t *_mdef, tmat_t *_tmat, dict_t *_dict, cmd_ln_t *_config, logmath_t *_logmath)
void align_free (void)

Detailed Description

Engine for Sphinx 3 aligner.


Define Documentation

#define ACTIVE_LIST_SIZE_INCR   16380

Referenced by align_build_sent_hmm().


Typedef Documentation

typedef struct history_s history_t

Viterbi search history for each state at each time.

typedef struct plink_s plink_t

A may have links (transitions) to several successor or predecessor nodes. They are captured by a list of the following plink_t type.

typedef struct pnode_s pnode_t

SOME ASSUMPTIONS

  • All phones (ciphones and triphones) have same HMM topology with n_state states.
  • Initial state = state 0; final state = state n_state-1.
  • Final state is a non-emitting state with no arcs out of it.
  • Some form of Bakis topology (ie, no cycles, except for self-transitions). Phone-level sentence HMM structures: pnode_t: nodes of phones forming sentence HMM. plink_t: a link between two pnode_t nodes. A phone node may have multiple successors and/or predecessors because of multiple alternative pronunciations for a word, as well as the presence of OPTIONAL filler words.

Assumptions:

  • No cycles in phone level sentence HMM.
typedef struct slink_s slink_t
typedef struct snode_s snode_t

Head of list of all history nodes State DAG structures similar to phone DAG structures.


Function Documentation

int32 align_build_sent_hmm ( char *  wordstr,
int  insert_sil 
)

Build a sentence HMM for the given transcription (wordstr). A two-level DAG is built: phone-level and state-level.

  • <s> and </s> always added at the beginning and end of sentence to form an augmented transcription.
  • Optional <sil> and noise words added between words in the augmented transcription. wordstr must contain only the transcript; no extraneous stuff such as utterance-id. Phone-level HMM structure has replicated nodes to allow for different left and right context CI phones; hence, each pnode corresponds to a unique triphone in the sentence HMM. Return 0 if successful, <0 if any error (eg, OOV word encountered).
Parameters:
wordstr In: Word transcript
insert_sil In: Whether to insert silences/fillers

References ACTIVE_LIST_SIZE_INCR, BAD_S3CIPID, BAD_S3PID, BAD_S3SENID, BAD_S3WID, pnode_s::ci, dict_basewid, dict_wordid(), dict_t::finishwid, snode_s::hist, pnode_s::id, IS_S3WID, pnode_s::lc, mdef_t::n_emit_state, pnode_s::next, NOT_S3WID, pnode_s::pid, snode_s::pnode, snode_s::predlist, pnode_s::predlist, pnode_s::rc, snode_s::sen, pnode_s::startstate, dict_t::startwid, snode_s::state, snode_s::succlist, pnode_s::succlist, and pnode_s::wid.

int32 align_destroy_sent_hmm ( void   ) 
int32 align_end_utt ( align_stseg_t **  stseg_out,
align_phseg_t **  phseg_out,
align_wdseg_t **  wdseg_out 
)

All frames consumed. Trace back best Viterbi state sequence and dump it out.

Parameters:
stseg_out Out: list of state segmentation
phseg_out Out: list of phone segmentation
wdseg_out Out: list of word segmentation

References snode_s::active_frm, history_s::alloc_next, snode_s::hist, slink_s::next, align_wdseg_s::next, align_phseg_s::next, align_stseg_s::next, slink_s::node, history_s::pred, snode_s::predlist, slink_s::prob, and snode_s::score.

int32 align_frame ( int32 *  senscr  ) 

One frame of Viterbi time alignment.

Parameters:
senscr In: array of senone scores this frame

References snode_s::active_frm, snode_s::hist, IS_S3SENID, snode_s::newhist, snode_s::newscore, slink_s::next, slink_s::node, snode_s::predlist, slink_s::prob, S3_LOGPROB_ZERO, snode_s::score, snode_s::sen, and snode_s::succlist.

void align_free ( void   ) 

Referenced by main().

int32 align_init ( mdef_t _mdef,
tmat_t _tmat,
dict_t _dict,
cmd_ln_t *  _config,
logmath_t *  _logmath 
)
void align_sen_active ( uint8 *  senlist,
int32  n_sen 
)

Flag the active senones.

Parameters:
senlist Out: senlist[s] TRUE iff active in frame
n_sen In: Size of senlist[] array

References IS_S3SENID, and snode_s::sen.

int32 align_start_utt ( char *  uttid  ) 

Start Viterbi alignment using the sentence HMM previously built. Assumes that each utterance will only be aligned once; state member variables initialized during sentence HMM building.

References snode_s::active_frm, snode_s::hist, slink_s::next, slink_s::node, snode_s::score, and snode_s::succlist.


Generated on 7 Mar 2010 by  doxygen 1.6.1