s3_align.h File Reference

data structure for alignment More...

#include <logmath.h>
#include <s3types.h>

Go to the source code of this file.

Classes

struct  align_stseg_s
struct  align_phseg_s
struct  align_wdseg_s

Typedefs

typedef struct align_stseg_s align_stseg_t
typedef struct align_phseg_s align_phseg_t
typedef struct align_wdseg_s align_wdseg_t

Functions

int32 align_init (mdef_t *_mdef, tmat_t *_tmat, dict_t *_dict, cmd_ln_t *_config, logmath_t *_logmath)
void align_free (void)
int32 align_build_sent_hmm (char *transcript, int insert_sil)
int32 align_destroy_sent_hmm (void)
int32 align_start_utt (char *uttid)
void align_sen_active (uint8 *senlist, int32 n_sen)
int32 align_frame (int32 *senscr)
int32 align_end_utt (align_stseg_t **stseg, align_phseg_t **phseg, align_wdseg_t **wdseg)

Detailed Description

data structure for alignment


Typedef Documentation

typedef struct align_phseg_s align_phseg_t

Phone level segmentation/alignment information

typedef struct align_stseg_s align_stseg_t

State level segmentation/alignment; one entry per frame

typedef struct align_wdseg_s align_wdseg_t

Word level segmentation/alignment information


Function Documentation

int32 align_build_sent_hmm ( char *  wordstr,
int  insert_sil 
)

Build a sentence HMM for the given transcription (wordstr). A two-level DAG is built: phone-level and state-level.

  • <s> and </s> always added at the beginning and end of sentence to form an augmented transcription.
  • Optional <sil> and noise words added between words in the augmented transcription. wordstr must contain only the transcript; no extraneous stuff such as utterance-id. Phone-level HMM structure has replicated nodes to allow for different left and right context CI phones; hence, each pnode corresponds to a unique triphone in the sentence HMM. Return 0 if successful, <0 if any error (eg, OOV word encountered).
Parameters:
wordstr In: Word transcript
insert_sil In: Whether to insert silences/fillers

References ACTIVE_LIST_SIZE_INCR, BAD_S3CIPID, BAD_S3PID, BAD_S3SENID, BAD_S3WID, pnode_s::ci, dict_basewid, dict_wordid(), dict_t::finishwid, snode_s::hist, pnode_s::id, IS_S3WID, pnode_s::lc, mdef_t::n_emit_state, pnode_s::next, NOT_S3WID, pnode_s::pid, snode_s::pnode, snode_s::predlist, pnode_s::predlist, pnode_s::rc, snode_s::sen, pnode_s::startstate, dict_t::startwid, snode_s::state, snode_s::succlist, pnode_s::succlist, and pnode_s::wid.

int32 align_destroy_sent_hmm ( void   ) 
int32 align_end_utt ( align_stseg_t **  stseg_out,
align_phseg_t **  phseg_out,
align_wdseg_t **  wdseg_out 
)

All frames consumed. Trace back best Viterbi state sequence and dump it out.

Parameters:
stseg_out Out: list of state segmentation
phseg_out Out: list of phone segmentation
wdseg_out Out: list of word segmentation

References snode_s::active_frm, history_s::alloc_next, snode_s::hist, slink_s::next, align_wdseg_s::next, align_phseg_s::next, align_stseg_s::next, slink_s::node, history_s::pred, snode_s::predlist, slink_s::prob, and snode_s::score.

int32 align_frame ( int32 *  senscr  ) 

Step time aligner one frame forward Wind up utterance and return final result (READ-ONLY). Results only valid until the next utterance is begun.

One frame of Viterbi time alignment.

Parameters:
senscr In: array of senone scores this frame

References snode_s::active_frm, snode_s::hist, IS_S3SENID, snode_s::newhist, snode_s::newscore, slink_s::next, slink_s::node, snode_s::predlist, slink_s::prob, S3_LOGPROB_ZERO, snode_s::score, snode_s::sen, and snode_s::succlist.

void align_free ( void   ) 

Referenced by main().

int32 align_init ( mdef_t _mdef,
tmat_t _tmat,
dict_t _dict,
cmd_ln_t *  _config,
logmath_t *  _logmath 
)
void align_sen_active ( uint8 *  senlist,
int32  n_sen 
)

Called at the beginning of a frame to flag the active senones (any senone used by active HMMs) in that frame.

Flag the active senones.

Parameters:
senlist Out: senlist[s] TRUE iff active in frame
n_sen In: Size of senlist[] array

References IS_S3SENID, and snode_s::sen.

int32 align_start_utt ( char *  uttid  ) 

Start Viterbi alignment using the sentence HMM previously built. Assumes that each utterance will only be aligned once; state member variables initialized during sentence HMM building.

References snode_s::active_frm, snode_s::hist, slink_s::next, slink_s::node, snode_s::score, and snode_s::succlist.


Generated on 7 Mar 2010 by  doxygen 1.6.1