Implementation detail of approx_cont_mgau. More...
#include "approx_cont_mgau.h"
#include <stdlib.h>
Defines | |
#define | DEBUG_GSCORE 1 |
Functions | |
void | approx_cont_mgau_ci_eval (subvq_t *svq, gs_t *gs, mgau_model_t *g, fast_gmm_t *fg, mdef_t *mdef, float32 *feat, int32 *ci_senscr, int32 *best_score, int32 fr, logmath_t *logmath) |
int32 | approx_cont_mgau_frame_eval (mdef_t *mdef, subvq_t *svq, gs_t *gs, mgau_model_t *g, fast_gmm_t *fastgmm, ascr_t *a, float32 *feat, int32 frame, int32 *cache_ci_senscr, ptmr_t *tm_ovrhd, logmath_t *logmath) |
Variables | |
int32 * | ci |
Implementation detail of approx_cont_mgau.
#define DEBUG_GSCORE 1 |
void approx_cont_mgau_ci_eval | ( | subvq_t * | svq, | |
gs_t * | gs, | |||
mgau_model_t * | g, | |||
fast_gmm_t * | fg, | |||
mdef_t * | mdef, | |||
float32 * | feat, | |||
int32 * | ci_senscr, | |||
int32 * | best_score, | |||
int32 | fr, | |||
logmath_t * | logmath | |||
) |
In this function, 1, It only compute the ci-phones score. 2, The score is not normalize, this routine is supposed to be used before approx_cont_mgau_frame_eval, The best score is determined by the later function.
fg | Input/Output: wrapper for parameters for Fast GMM , for all beams and parameters, during the computation, the | |
mdef | In : The fast GMM structure | |
feat | In : model definition | |
ci_senscr | In : the feature vector | |
best_score | Input/Output : ci senone score, a one dimension array | |
fr | Input/Output: the best score, a scalar | |
logmath | In : The frame number |
References mgau_model_t::frm_ci_gau_eval, mgau_model_t::frm_ci_sen_eval, fast_gmm_t::gaus, gc_compute_closest_cw(), mdef_cd2cisen, mdef_is_cisenone(), mgau_eval(), mgau_n_comp, subvq_gautbl_eval_logs3(), and gau_select_t::subvqbeam.
int32 approx_cont_mgau_frame_eval | ( | mdef_t * | mdef, | |
subvq_t * | svq, | |||
gs_t * | gs, | |||
mgau_model_t * | g, | |||
fast_gmm_t * | fastgmm, | |||
ascr_t * | a, | |||
float32 * | feat, | |||
int32 | frame, | |||
int32 * | cache_ci_senscr, | |||
ptmr_t * | tm_ovrhd, | |||
logmath_t * | logmath | |||
) |
approx_con_mgau_frame_eval encapsulates all approximations in the Gaussian computation. This assumes programmers NOT to initialize the senone scores at every frame before using this function. This modularize this routine but complicated issues such as frame-dropping which can also be done in the front-end
This layer of code controls the optimization performance in Frame Leval and GMM Level.
Frame Level:
^^^^^^^^^^^^
We select to compute the scores only if it is not similar to the most recently computed frames. There are multiple ways to configures this.
Naive down-sampling : Skip the computation one every other n-frames
Conditional down-sampling : Skip the computation only if the current frame doesn't belong to the same neighborhood of the same frame. This neighborhood corresponds to the codeword which the feature vector found to be the closest.
No matter which down-sampling was used, the following problem will appear in the computation. Active senones of frame which supposed to be skipped in computation could be not computed in the most recently computed frame. In those cases, we chose to compute those senones completely.
GMM Level:
^^^^^^^^^^
In the implementation of CI-based GMM selection makes use of the fact that in s3.3 , CI models are always placed before all CD models. Hence the following logic is implemented:
if(it is CI senone) compute score else if (it is CD senone) if the ci-phone beam was not set compute score else if the CD senone's parent has a score within the beam compute_score else CD senone's parent has a score out of the beam back-off using the parent senone score.
During s3.5, the idea of bestidx in a GMM has been changed and the above logic becomes
if(it is CI senone) compute score else if (it is CD senone) if the ci-phone beam was not set compute score else if the CD senone's parent has a score within the beam compute_score else CD senone's parent has a score out of the beam if the bestindex of the last frame exists compute score using the bestidx then back-off using the parent senone score.
About renormalization
^^^^^^^^^^^^^^^^^^^^^
Sphinx 3.4 generally renormalize the score using the best score. Notice that this introduce extra complication to the implementation. I have separated the logic of computing or not computing the scores. This will clarify the code a bit.
Accounting of senone and gaussian computation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This function assumes approx_cont_mgau_ci_eval was run before it, hence at the end the score was added on top of the it.
Design
^^^^^^
The whole idea of this function is based on my paper on "4-level categorization of GMM computation " which basically describe how different techniques of fast GMM computation should interact with each others. The current implementation was there to make the code to be as short as possible. I hope that no one will try to make the code to be longer than 500 lines.
Imperfection ^^^^^^^^^^^^
Imperfections of the code can be easily seen by experts so I want to point out before they freak out. There are synchronization mechanism in the bestindex and rec_sen_active. That can easily be a source of error. I didn't do it because somehow when you trust just the best matching index of the previous frame is slightly different from if you trust the score of the previous frame
The sen_active, rec_sen_active and senscr should be inside the GMM structure rather than just a separate array. I didn't fix it because this change will also touch other data structures as well.
gs | Input mdef, svq and gs | |
fastgmm | Input/Output: wrapper for parameters for Fast GMM , for all beams and parameters, during the computation, the | |
a | Input/Output: wrapper for all acoustic scores arrays | |
feat | Input: the current feature vector | |
frame | Input: The frame number | |
cache_ci_senscr | Input: The cache CI scores for this frame | |
tm_ovrhd | Output: the timer used for computing overhead |
References mgau_t::bstidx, mgau_t::bstscr, gmm_select_t::ci_occu, gmm_select_t::ci_pbeam, mgau_model_t::frm_gau_eval, mgau_model_t::frm_sen_eval, fast_gmm_t::gaus, gc_compute_closest_cw(), fast_gmm_t::gmms, gmm_select_t::max_cd, mdef_cd2cisen, mdef_is_cisenone(), mgau_model_t::mgau, mgau_eval(), mdef_t::n_ci_sen, mgau_model_t::n_mgau, mdef_t::n_sen, NO_BSTIDX, gau_select_t::rec_bstcid, ascr_t::rec_sen_active, ascr_t::sen_active, ascr_t::senscr, subvq_gautbl_eval_logs3(), gau_select_t::subvqbeam, gmm_select_t::tighten_factor, and mgau_t::updatetime.
int32* ci |