Sunday, June 19, 2011

Fixing bad voices produced with festvox (or "Hey! My voice don't work!") (or "How to fix bad labellings")

Someone asked on a mailing list for possible ways to fix a bad voice the produced using festvox. I realized that my answer took my quite some time to figure out without any help, So I thought I'd post my response here.

Basically, if your voice is bad, chances are your labeling of some of the prompts is bad. (even if it isn't, it doesn't hurt to make sure they are good). You want to fix the bad labellings

To do so, copy the contents of your wav folder and the contents of your lab folder into the same directory (or setup links to make it seem that way). Once you've done that, open up the wav files with waveurfer, and choose the "transcription" view for all of them. Now you can go through one by one and check if the labellings are right. Options are: re-record the ones with bad labellings (remember to run bin/make_lab again before checking the labels again, I made this mistake once, and kept re-recording and thinking that the autolabeller sucked. Also, to save time, you can run bin/make_labs prompt-wav/test001.wav to just relabel test001.wav, instead of doing it to all the recordings, which can be time-consuming.), or hand-correcting the labels. You can literally just drag the labels from within wavesurfer (remember to copy your changes back to the lab/ directory).

Once you've got all the labels as perfect as you care to have them, just repeat all the steps after "bin/make_labs prompt-wav/*.wav" from whatever tutorial you are following and you should get the voice built with proper labeling (Come on, I know that if you knew how to do anything with festvox without a tutorial in front of you, there's no way you would need to be reading this post).

Thursday, June 9, 2011

Switching between multiple grammars with pocketsphinx

I was having difficulty understanding the pocketsphinx api, specifically when it comes to switching between multiple grammars.

Here's how it works:

Pocketsphinx actually keeps track of a set of grammars at all time. Normally, this set of grammars only has one element. However, it can contain multiple grammars, while only one is switched on at a time. The basic method is

  1. get this set of grammars using ps_get_fsgset()
  2. Add your grammar to the set using fsg_set_add()
  3. Select your grammar from the set as the active one using fsg_set_select()
  4. Notify the recognizer that you have updated the grammar using ps_update_fsgset()
(this assumes that the recognizer was initially instantiated with a FSG, rather than an N-Gram model. Otherwise, you first need to switch it to an FSG model).

Example code:

ps_decoder_t * p= ...; //Decoder already initialized somehow
fsg_model_t * m= ...; //Load the model using fsg_model_read or jsgf_parse_file and jsgf_build_fsg
fsg_set_t* fsgset=ps_get_fsgset(p);
fsg_set_add(fsgset, "newgrammarname", m);
fsg_set_select(fsgset,"newgrammarname")
ps_update_fsgset(p);

NOTE: I realize that even jsgf_build_fsg is confusing. Here's how you should handle it:

jsgf_build_fsg(jsfgmodel, rule, ps_get_logmath(ps), 6.5);

where jsfgmodel is the jsgf model loaded using jsgf_parse_file, and "rule" is a rule chose from it. (use the jsgf_* functions to select the rule). Also, free the jsgf once the fsg has been created using jsgf_grammar_free.

Oh yeah, and the 6.5 just seems to be a magic number. In two places I've seen it used without any explanation. The documentation says nothing about what the number "lw" does anywhere, so I'd just stick to the value 6.5 and hope for the best...