lm_s Struct Reference

#include <lm.h>

List of all members.

Public Attributes

char * name
int32 n_ug
int32 n_bg
int32 n_tg
int32 max_ug
int32 n_ng
char ** wordstr
uint32 log_bg_seg_sz
uint32 bg_seg_sz
ug_tug
s3lmwid32_tdict2lmwid
s3lmwid32_t startlwid
s3lmwid32_t finishlwid
bg_tbg
tg_ttg
membg_tmembg
tginfo_t ** tginfo
lm_tgcache_entry_ttgcache
bg32_tbg32
tg32_ttg32
membg32_tmembg32
tginfo32_t ** tginfo32
lm_tgcache_entry32_ttgcache32
lmlog_tbgprob
lmlog_ttgprob
lmlog_ttgbowt
int32 * tg_segbase
int32 n_bgprob
int32 n_tgprob
int32 n_tgbowt
FILE * fp
int32 byteswap
int32 bgoff
int32 tgoff
float32 lw
int32 wip
int32 n_bg_fill
int32 n_bg_inmem
int32 n_bg_score
int32 n_bg_bo
int32 n_tg_fill
int32 n_tg_inmem
int32 n_tg_score
int32 n_tg_bo
int32 n_tgcache_hit
int32 access_type
int32 isLM_IN_MEMORY
int32 dict_size
hash_table_t * HT
lmclass_t ** lmclass
int32 n_lmclass
int32 * inclass_ugscore
int32 inputenc
int32 outputenc
int32 version
int32 is32bits
sorted_list_t sorted_prob2
sorted_list_t sorted_bowt2
sorted_list_t sorted_prob3
int32 max_sorted_entries
logmath_t * logmath

Member Data Documentation

Updated on every lm_{tg,bg,ug}_score call to reflect the kind of n-gram accessed: 3 for 3-gram, 2 for 2-gram and 1 for 1-gram

NULL iff disk-based

Bigram 32 bits, NULL iff disk-based

int32 lm_s::bgoff

BG offsets into DMP file (used iff disk-based)

tgcache 32 bits Table of actual bigram probs

Whether this file is in the WRONG byte order

a mapping from dictionary word to LM word

Only used in class-based LM, because class-based LM is addressed in the dictionary space.

S3_FINISH_WORD id, if it exists

FILE* lm_s::fp
hash_table_t* lm_s::HT

hash table for word-string->word-id map

An array of inter-class unigram probability

Input encoding method

Whether the current LM is 32 bits or not. Derived from version and n_ug

Whether LM in in memory, it is a property, potentially it means the code could allow you some model to be disk-based, some are not.

LM class for this LM

See big comment above

logmath_t* lm_s::logmath
float32 lm_s::lw

Language weight currently in effect for this LM

Temporary Variable: 2x the maximum size of the MAX_SORTED_ENTRIES

int32 lm_s::max_ug

To which n_ug can grow with dynamic addition of words

membg[w1] = bigrams for lm wid w1 (used iff disk-based)

membg 32bits membg[w1] = bigrams for lm wid w1 (used iff disk-based)

int32 lm_s::n_bg

#bigrams in entire LM

#bg_score ops backed off to ug

#bg fill operations

#bg in memory

#bg_score operations

# LM class

int32 lm_s::n_ng

if unigram, n_ng=1, if bigram n_bg=2 and so one

int32 lm_s::n_tg

#trigrams in entire LM

#tg_score ops backed off to bg

Similar stats for trigrams

#tg in memory

#tg_score operations

# of trigram cache hit ops backed off to bg

int32 lm_s::n_ug

#unigrams in LM

char* lm_s::name

The name of the LM

Output encoding method

Temporary Variable: Sorted list

Temporary Variable: Sorted list

Temporary Variable: Sorted list

S3_START_WORD id, if it exists

NULL iff disk-based

Trigram 32 bits NULL iff disk-based

tg_segbase[i>>lm_t.log_bg_seg_sz] = index of 1st trigram for bigram segment (i>>lm_t.log_bg_seg_sz)

Table of actual trigram backoff weights

<w0,w1,w2> hashed to an entry into this array. Only the last trigram mapping to any * given hash entry is kept in that entry. (The cache doesn't have to be super-efficient.)

tginfo[w2] = fast trigram access info for bigrams (*,w2)

tginfo 32bits tginfo[w2] = fast trigram access info for bigrams (*,w2)

int32 lm_s::tgoff

TG offsets into DMP file (used iff disk-based)

Table of actual trigram probs

Unigrams

The version number of LM, in particular, this is the version that recently read in.

int32 lm_s::wip

logs3(word insertion penalty) in effect for this LM

char** lm_s::wordstr

The LM word list (in unigram order)


The documentation for this struct was generated from the following file:

Generated on 7 Mar 2010 by  doxygen 1.6.1