approx_cont_mgau.h File Reference

Master function to compute the approximate score of mixture of Gaussians. More...

#include <logmath.h>
#include <profile.h>
#include "cont_mgau.h"
#include "subvq.h"
#include "gs.h"
#include "fast_algo_struct.h"
#include "ascr.h"
#include "mdef.h"
#include "s3types.h"

Go to the source code of this file.

Functions

S3DECODER_EXPORT int32 approx_cont_mgau_frame_eval (mdef_t *mdef, subvq_t *svq, gs_t *gs, mgau_model_t *g, fast_gmm_t *fastgmm, ascr_t *a, float32 *feat, int32 frame, int32 *cache_ci_senscr, ptmr_t *tm_ovrhd, logmath_t *logmath)
S3DECODER_EXPORT void approx_cont_mgau_ci_eval (subvq_t *svq, gs_t *gs, mgau_model_t *g, fast_gmm_t *fg, mdef_t *mdef, float32 *feat, int32 *ci_senscr, int32 *best_score, int32 fr, logmath_t *logmath)

Detailed Description

Master function to compute the approximate score of mixture of Gaussians.

Warning:
You need to have some knowledge in fast GMM computation in order to modifed this function.

This is the current schemes included: 1, VQ-based Gaussian Selection 2, Subvq-based Gaussian Selection 3, Context Independent Phone-based GMM Selection 4, Down Sampling a, dumb approach, b, conditional down sampling (currently can only be used with VQ-based Gaussian Selection c, distance-based down sampling

The above method of categorizing GMM computation in 4 levels are presented in ICSLP 2004. For the publication, please visit Arthur Chan's web site at www.cs.cmu.edu/~archan/ .


Function Documentation

S3DECODER_EXPORT void approx_cont_mgau_ci_eval ( subvq_t svq,
gs_t gs,
mgau_model_t g,
fast_gmm_t fg,
mdef_t mdef,
float32 *  feat,
int32 *  ci_senscr,
int32 *  best_score,
int32  fr,
logmath_t *  logmath 
)

Evaluate the approximate gaussian score for CI senone for one frame. In Sphinx 3.X (X=4,5), this routine is used to precompute CI senone score as a kind of approximate match of the CD score.

In this function, 1, It only compute the ci-phones score. 2, The score is not normalize, this routine is supposed to be used before approx_cont_mgau_frame_eval, The best score is determined by the later function.

Parameters:
fg Input/Output: wrapper for parameters for Fast GMM , for all beams and parameters, during the computation, the
mdef In : The fast GMM structure
feat In : model definition
ci_senscr In : the feature vector
best_score Input/Output : ci senone score, a one dimension array
fr Input/Output: the best score, a scalar
logmath In : The frame number

References mgau_model_t::frm_ci_gau_eval, mgau_model_t::frm_ci_sen_eval, fast_gmm_t::gaus, gc_compute_closest_cw(), mdef_cd2cisen, mdef_is_cisenone(), mgau_eval(), mgau_n_comp, subvq_gautbl_eval_logs3(), and gau_select_t::subvqbeam.

S3DECODER_EXPORT int32 approx_cont_mgau_frame_eval ( mdef_t mdef,
subvq_t svq,
gs_t gs,
mgau_model_t g,
fast_gmm_t fastgmm,
ascr_t a,
float32 *  feat,
int32  frame,
int32 *  cache_ci_senscr,
ptmr_t *  tm_ovrhd,
logmath_t *  logmath 
)

approx_con_mgau_frame_eval encapsulates all approximations in the Gaussian computation. This assumes programmers NOT to initialize the senone scores at every frame before using this function. This modularize this routine but complicated issues such as frame-dropping which can also be done in the front-end

This layer of code controls the optimization performance in Frame Leval and GMM Level.

Frame Level:

^^^^^^^^^^^^

We select to compute the scores only if it is not similar to the most recently computed frames. There are multiple ways to configures this.

Naive down-sampling : Skip the computation one every other n-frames

Conditional down-sampling : Skip the computation only if the current frame doesn't belong to the same neighborhood of the same frame. This neighborhood corresponds to the codeword which the feature vector found to be the closest.

No matter which down-sampling was used, the following problem will appear in the computation. Active senones of frame which supposed to be skipped in computation could be not computed in the most recently computed frame. In those cases, we chose to compute those senones completely.

GMM Level:

^^^^^^^^^^

In the implementation of CI-based GMM selection makes use of the fact that in s3.3 , CI models are always placed before all CD models. Hence the following logic is implemented:

if(it is CI senone) compute score else if (it is CD senone) if the ci-phone beam was not set compute score else if the CD senone's parent has a score within the beam compute_score else CD senone's parent has a score out of the beam back-off using the parent senone score.

During s3.5, the idea of bestidx in a GMM has been changed and the above logic becomes

if(it is CI senone) compute score else if (it is CD senone) if the ci-phone beam was not set compute score else if the CD senone's parent has a score within the beam compute_score else CD senone's parent has a score out of the beam if the bestindex of the last frame exists compute score using the bestidx then back-off using the parent senone score.

About renormalization

^^^^^^^^^^^^^^^^^^^^^

Sphinx 3.4 generally renormalize the score using the best score. Notice that this introduce extra complication to the implementation. I have separated the logic of computing or not computing the scores. This will clarify the code a bit.

Accounting of senone and gaussian computation

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This function assumes approx_cont_mgau_ci_eval was run before it, hence at the end the score was added on top of the it.

Design

^^^^^^

The whole idea of this function is based on my paper on "4-level categorization of GMM computation " which basically describe how different techniques of fast GMM computation should interact with each others. The current implementation was there to make the code to be as short as possible. I hope that no one will try to make the code to be longer than 500 lines.

Imperfection ^^^^^^^^^^^^

Imperfections of the code can be easily seen by experts so I want to point out before they freak out. There are synchronization mechanism in the bestindex and rec_sen_active. That can easily be a source of error. I didn't do it because somehow when you trust just the best matching index of the previous frame is slightly different from if you trust the score of the previous frame

The sen_active, rec_sen_active and senscr should be inside the GMM structure rather than just a separate array. I didn't fix it because this change will also touch other data structures as well.

See also:
approx_mgau_eval
Returns:
the best senone score
Parameters:
gs Input mdef, svq and gs
fastgmm Input/Output: wrapper for parameters for Fast GMM , for all beams and parameters, during the computation, the
a Input/Output: wrapper for all acoustic scores arrays
feat Input: the current feature vector
frame Input: The frame number
cache_ci_senscr Input: The cache CI scores for this frame
tm_ovrhd Output: the timer used for computing overhead

References mgau_t::bstidx, mgau_t::bstscr, gmm_select_t::ci_occu, gmm_select_t::ci_pbeam, mgau_model_t::frm_gau_eval, mgau_model_t::frm_sen_eval, fast_gmm_t::gaus, gc_compute_closest_cw(), fast_gmm_t::gmms, gmm_select_t::max_cd, mdef_cd2cisen, mdef_is_cisenone(), mgau_model_t::mgau, mgau_eval(), mdef_t::n_ci_sen, mgau_model_t::n_mgau, mdef_t::n_sen, NO_BSTIDX, gau_select_t::rec_bstcid, ascr_t::rec_sen_active, ascr_t::sen_active, ascr_t::senscr, subvq_gautbl_eval_logs3(), gau_select_t::subvqbeam, gmm_select_t::tighten_factor, and mgau_t::updatetime.


Generated on 7 Mar 2010 by  doxygen 1.6.1