srch.h File Reference

search abstraction. More...

#include <stdio.h>
#include <s3types.h>
#include <glist.h>
#include "dag.h"
#include "lm.h"
#include "ascr.h"
#include "adaptor.h"
#include "stat.h"
#include "fast_algo_struct.h"
#include "kbcore.h"
#include "kb.h"
#include "srch_allphone.h"
#include "srch_fsg.h"
#include "srch_flat_fwd.h"
#include "srch_time_switch_tree.h"
#include "srch_word_switch_tree.h"
#include "srch_do_nothing.h"
#include "srch_debug.h"
#include "srch_output.h"

Go to the source code of this file.

Classes

struct  grp_str_t
struct  srch_funcs_s
struct  srch_s

Defines

#define SRCH_SUCCESS   0
#define SRCH_FAILURE   1
#define OPERATION_ALIGN   0
#define OPERATION_ALLPHONE   1
#define OPERATION_GRAPH   2
#define OPERATION_FLATFWD   3
#define OPERATION_TST_DECODE   4
#define OPERATION_WST_DECODE   5
#define OPERATION_EVANDRO_MODE   6
#define OPERATION_DAVID_MODE   7
#define OPERATION_ARTHUR_MODE   8
#define OPERATION_YITAO_MODE   9
#define OPERATION_RAVI_MODE   10
#define OPERATION_STEVE_MODE   88
#define OPERATION_DO_NOTHING   1368
#define OPERATION_DEBUG   1369
#define GRAPH_STRUCT_FLAT   0
#define GRAPH_STRUCT_TST   1
#define GRAPH_STRUCT_WST   2
#define GRAPH_STRUCT_GENGRAPH   3
#define GRAPH_STRUCT_PHMM   4
#define GMM_STRUCT_CDHMM   0
#define GMM_STRUCT_SCHMM   1
#define GAUDEN_EVAL_WINDOW   8
#define DFLT_UTT_SIZE   5000
#define DFLT_NUM_SEGS   200

Typedefs

typedef struct srch_funcs_s srch_funcs_t
typedef struct srch_s srch_t

Functions

int32 srch_mode_str_to_index (const char *mode_str)
char * srch_mode_index_to_str (int32 index)
srch_tsrch_init (kb_t *kb, int32 op_mode)
void srch_report (srch_t *srch)
int32 srch_utt_begin (srch_t *srch)
S3DECODER_EXPORT int32 srch_utt_decode_blk (srch_t *srch, float ***block_feat, int32 block_nfeatvec, int32 *curfrm)
int32 srch_utt_end (srch_t *srch)
int32 srch_uninit (srch_t *srch)
glist_t srch_get_hyp (srch_t *srch)
dag_tsrch_get_dag (srch_t *srch)
void reg_result_dump (srch_t *s, int32 id)
void write_bstsenscr (FILE *fp, int32 numframe, int32 *scale)
S3DECODER_EXPORT int32 srch_set_lm (srch_t *srch, const char *lmname)
int32 srch_delete_lm (srch_t *srch, const char *lmname)

Detailed Description

search abstraction.

Written at 20050510 by ARCHAN.

Mechanism/implementation separation

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Starting from Sphinx 3.6, the implementation of the search has conceptually separated into two layers. The top layer srch.c defines the top level atomic function defintion (implemented as function pointers) that every search-like operations will share.

To understand what it means, consider two seemingly different search-like operations, alignment and decoding. Both of them required 1) initialization, 2) propropagte 1 frame. 3) termniation. Each of these operation can well be defined as one single interface.

Some people call this type of implementation as "C-class" which I think is very appropiate. Because in general, C++, Objective-C and D actually used the same mechanism to implement the concept "class". Obviously, the actual implementation was hidden in the compiler in those cases so that's why so many people are confused.

Why polymorphism is important?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Before sphinx 3.6, maintainers (that includes me) tend to think that different operations would require different treatment. The consequence of that is different opertations tends to have different implementations, styles. Code tends to be duplicated if programmers think in this way. One big problem is the fact that we had so called s3 slow and s3 fast and there were 3000 lines duplication.

Code duplication also tends compound. i.e. if code was duplicated twice, it will be duplicated four times. That reduces productivitiy of the team.

Other reasons why srch.c is written

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

At the time when we implemented this routine, we actually didn't know whether mode 5 (word tree copies) will work or not. Therefore, letting the code of mode 4 (lucky wheel search) and mode 5 to coexist will be the best.

"Embedding" of multi-stage search into the dynamic programming

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Another interesting part (Interesting in the sense that I would write a paper on this.) of the search is that we (I will say mainly me because this is not new at all.) realize that approximiate search (i.e. something like phoneme-lookahead), detail search (i.e. something like hard-core tree and fsm search) and high level rescoring using knowledge source (e.g. using LM to rescore N-best) could all be incorporated into the first pass search and potentially make the code more verstille.

They key concept on how to do this is here: at one frame, one can carry out the above three operations and make the search faster because of 1) heuristics value obtained in the approximate search 2) early incorporation of high-level knowledge in the rescoring stage.

The naming of function interfaces shows this tendency. lv1 means the approximate first-stage search. lv2 means the detail second-stage search. rescoring means the use of high-level information in rescoring during the search.

The scope of what srch.c defines

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Written at 20050510

At the time I wrote this, we only implement mode 4 (Ravi's implementation in Sphinx 3.2, I usually call it "lucky wheel" implementation. (See my description in srch_time_switch_tree.h), mode 5 (My implementation using tree copies.) and mode 1369 (A debug mode of the search mechanism where only a prompt will be displayed and show that a function is called. Why 1369? coz I am a nerd. :-))

Though only three modes of the search were written. I found it to be utmost important to share my imagination of what we can actually do using this interface of the search. In theory, the slow search (pre-defined as mode 3) and the fast searches (pre-defined as mode 4 and mode 5) can be incorporated using this interface. s3align and s3allphone (pre-defined as mode 0 and 1) in s3.0 family can also be implemented in this way.

Applications such as keyword spotting and speaker verifcation. They are not defined at this point but they are possible using the architecture

What limits this architecture is that it might not be able to implement a wide class of segmental models. I will speculate if a recursive formulat can be derived for a particular type of model (segmental or non-segmental, HMM or non-HMM). The current interface could be overloaded and allow implementation of them. There are definitely some types of segmental model are very hard to implement using this structure; e.g. MIT segmental model (pre-1999) where segmentation was generated by low-level information first and search were performed on segments.

A general guide-line on how to write a search implementation

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1, The batch mode search operation and the on-line mode operation can always share code. If the implementation is duplicated, it shows a bad omen.

2, Knowledge source (such as LM and finite state grammer) always change the graph structure. That is why srch_{set,add,delete}_lm was defined, if we have multiple types of knowledge sources. Interfaces should be added in srch.c and the search operation (e.g. search/srch_time_switch_tree.c) provides the actualy implemenation

3, GMM computation and search can be easily separated into two routines. Therefore, instead of wrap it up in individual searches, a new set of function called gmmwrap.c is used which conform the interface in srch.c.

4, Mechanism such as phoneme lookahead and generally fast match are all similar in the sense that. They look forward a couple of frames and compute a heuristic score. The windows controlling mechanism should be coded in srch.c . The actual heuristic should be coded somewhere else.

5, grh contains the actual data structure that stores the graph. We didn't use WFST as a common structure because we found that it has certain fact we don't know at the time we implemented.

6, Always get back to the debug mode (mode 1369) if you want to debug the search mechanism

7, (V. Imp.) When in doubt, ask Arthur Chan. If he is not in CMU any more, brainstorm how to get him back. He is a nice guy and has no problem in talking.


Define Documentation

#define DFLT_NUM_SEGS   200

The default number of segments of an utterance

#define DFLT_UTT_SIZE   5000

The default number of frames of an utterance

#define GAUDEN_EVAL_WINDOW   8
#define GMM_STRUCT_CDHMM   0
#define GMM_STRUCT_SCHMM   1
#define GRAPH_STRUCT_FLAT   0
#define GRAPH_STRUCT_GENGRAPH   3
#define GRAPH_STRUCT_PHMM   4
#define GRAPH_STRUCT_TST   1
#define GRAPH_STRUCT_WST   2
#define OPERATION_ALIGN   0

The many operation modes of srch.c.

Mode 1-4 are working modes. Not all of them are functioning as at but we have plans to migrate to that.

Just for the sake of it. mode 6-10 are assigned to 5 working programmers of CMU Sphinx. We don't expect too much from them. So let's see how it goes.

Mode 88 is also assigned but this is just for the sake of it. (Reserved but not used) force alignment mode. It is questionable whether this belongs in the decoder at all since it's mostly used in training.

#define OPERATION_ALLPHONE   1

Phoneme recognition.

#define OPERATION_ARTHUR_MODE   8

(Reserved but not used) Arthur Chan is the guy who think of this search mode thing. He wrote mode 5. Just like plan 9. It is quite a failure. Though, this guy is not that easy-going, he must have think of something this time.

#define OPERATION_DAVID_MODE   7

(Reserved but not used) It is not an illusion that David Huggins-Daines is productive. The reason he hasn't touched the search part yet is because he hasn't read Richard Bellman's "Dynamic Programming" and Arthur Chan's C++ implementation of arec. He will definitely show his super-fast speed in search implementation as well.

#define OPERATION_DEBUG   1369

Debug mode, it dumps the internal function call in srch.c layer. ARCHAN 20050329: 1369 has no meaning at all, I just love to call it in that way.

#define OPERATION_DO_NOTHING   1368

Do nothing mode. as it means it does nothing. It is used as one of the function test of the search operator.

#define OPERATION_EVANDRO_MODE   6

(Reserved but not used) Dr. Evandro Gouvea is a very defensive programmer. He is very clever and has strong mathematical background. I would guess his search code would be very solid.

#define OPERATION_FLATFWD   3

Flat-lexicon decoding mode In CMU, it is well-used as a research tool for acoustic and language modeling research. Its shell, decode_anytopo is still one of the standard executables.

#define OPERATION_GRAPH   2

FSG mode, adapated from sphinx 2 specify -fsg in the current decode interface should work.

#define OPERATION_RAVI_MODE   10

(Reserved but not used) Mosur Ravishankar works out both s2 and s3 He is also the original author of mode 2, 3 and 4. (In practice, Arthur Chan's mode 5 really couldn't match mode 4 at all.) He always has a lot of interesting ideas in his mind. Likely it might think of interesting search similar to mode 4.

#define OPERATION_STEVE_MODE   88

(Reserved but not used) Steven Lee is the guy who work out the a-star search for the MIT segmental recognizer. He yielded to Arthur Chan and become one of the developers of CMU Sphinx. But his heart is still with MIT, MIT SLS. Likely he will still be happy that 88 is a very lucky number in Chinese.

#define OPERATION_TST_DECODE   4

Tree-lexicon decoding with time-switching tree copies the current default. Or what I called the "magic wheel" search. It is a practical and useful search techniques.

#define OPERATION_WST_DECODE   5

(Reserved but not used) Tree-lexicon decoding with word-switching tree copies. Work for bigram with trigram rescoring. Not well developed enough to be deployed.

#define OPERATION_YITAO_MODE   9

(Reserved but not used) Yitao Sun works mainly on grammar, CFG, FSG and stuffs. Likely his search will also reflect his research direction.

#define SRCH_FAILURE   1
#define SRCH_SUCCESS   0

Typedef Documentation

typedef struct srch_funcs_s srch_funcs_t
typedef struct srch_s srch_t

Function Documentation

void reg_result_dump ( srch_t s,
int32  id 
)

Dump recognition result

Parameters:
s A search structure
id Utterance ID
int32 srch_delete_lm ( srch_t srch,
const char *  lmname 
)

delete lm

Parameters:
srch A search structure
lmname LM fie name
dag_t* srch_get_dag ( srch_t srch  ) 

Get the complete set of possible hypotheses, as a word graph (DAG). The dag_t returned by this function remains owned by srch_t. Do NOT attempt to free it.

glist_t srch_get_hyp ( srch_t srch  ) 

Get a recognition backtrace, as a list of srch_hyp_t structures. You must free this with glist_free() after using it.

Parameters:
srch In: a search structure
srch_t* srch_init ( kb_t kb,
int32  op_mode 
)

Initialize the search routine, this will specify the type of search drivers and initialized all resouces. It will do the following things

1, Set all function pointers depends on the mode number.

2, Initialize parameters which the default search abstraction mechanism required. This currently include, cache_win, cache_win_start. They control how much delay of the first level and second level search. Also senscale, a parameter which store the largest gmm score for each frame.

3, Initialize search-specific data structure using srch_init method of the search class. That is s->srch_init will be called.

Returns:
an initialized srch_t with operation mode op_mode.
Parameters:
kb In: knowledge base
op_mode In: operation mode of the search
char* srch_mode_index_to_str ( int32  index  ) 

Translate mode string to mode index. User need to supply an initialized

Returns:
a mode string;
See also:
srchmode_str_to_index
int32 srch_mode_str_to_index ( const char *  mode_str  ) 

Translate mode string (as passed in the -mode argument) to number. Names chosen for familiarity with Sphinx2/PocketSphinx.

The currently valid operation modes are

Strings Operation Mode(Mode Idx) --------------------------------------- allphone OPERATION_ALLPHONE(1)

fsg OPERATION_GRAPH(2)

fwdflat OPERATION_FLATFWD(3)

fwdtree OPERATION_TST_DECODE(4)

Returns:
the mode index if it is valid, -1 if it is not.
void srch_report ( srch_t srch  ) 

Report the search routine using file name of the model definition and directory name to initialize Report the search structure

Parameters:
srch In: a search structure
S3DECODER_EXPORT int32 srch_set_lm ( srch_t srch,
const char *  lmname 
)

using file name of the LM or defined lmctlfn mechanism

Parameters:
srch A search structure
lmname LM fie name
int32 srch_uninit ( srch_t srch  ) 

Wrap up the search routine

Parameters:
srch In: a search structure
int32 srch_utt_begin ( srch_t srch  ) 

Begin decoding of speech for one utterance.

Returns:
SRCH_SUCCESS if succeed, SRCH_FAILURE if failed.
See also:
srch_utt_end
Parameters:
srch In: a search structure
S3DECODER_EXPORT int32 srch_utt_decode_blk ( srch_t srch,
float ***  block_feat,
int32  block_nfeatvec,
int32 *  curfrm 
)

Decode one block of speech and provide the implementation of the default search abstraction

Parameters:
srch In: a search structure
block_feat In: a pointer of a two dimensional array
block_nfeatvec In: Number of feature vector
curfrm In/Out: a pointer of the current frame index
int32 srch_utt_end ( srch_t srch  ) 

End decoding of speech for one utterance.

Parameters:
srch In: a search structure
void write_bstsenscr ( FILE *  fp,
int32  numframe,
int32 *  scale 
)

Dump the best senone score for each frame

Parameters:
fp A file pointer
numframe Number of frame in one recognition
scale Scales

Generated on 7 Mar 2010 by  doxygen 1.6.1