Quick Notes: May 2012

Wednesday, 16 May 2012

Git - ssh authentication with key on windows.

Git - ssh authentication with key on windows.
As I've noticed there is a common problem setting up ssh connection with remote git repo without having to enter your password all the time. If you follow all "standard" procedures you'll be getting something like:

Access denied
fatal: The remote end hung up unexpectedly

I don't know why this is happening, don't even want to know. So here is a quick recepie to avoid that.

1. set GIT_SSH environment variable to point to plink.exe [C:\putty\plink.exe]
2. ssh to remote host where repo is
3. cd ~
4. mkdir .ssh
5. cd .shh
6. ssh-keygen -t dsa (hit enter all the time)
7. cat id_dsa.pub >> authorized_keys
8. copy id_dsa to your windows machine (somewhere)
9. run puttygen.exe
10. press "Load" - open id_dsa file
11. press "Save private key" (save somewhere)
12. run pageant.exe
13. add key that you saved fro puttygen to pageant
14. git clone...

Should be working from now on, without need to provide password all the time.
Adapted from here

Tuesday, 1 May 2012

Sphinx4 custom acoustic model files notes.

The whole process of creating custom acoustic model is described here.
Read it thoroughly. If you are still not getting what are the required files and where to get them from this note is for you.

Given that structure is:

your_db.dic - Phonetic dictionary
your_db.phone - Phoneset file
your_db.lm.DMP - Language model
your_db.filler - List of fillers
your_db_train.fileids - List of files for training
your_db_train.transcription - Transcription for training
your_db_test.fileids - List of files for testing
your_db_test.transcription - Transcription for testing

speaker_1

file_1.wav - Recording of speech utterance

speaker_2

file_2.wav

Following files could be built by lmtool web service :

your_db.dic -

Phonetic dictionary

your_db.phone - Phoneset file
your_db.filler - List of fillers

after you've got those files ready, you'll need .DMP file:

your_db.lm.DMP - Language model

it is generated from .lm file with sphinx_lm_convert programm which is shipped with sphinxbase-7.0 archive. See this section on installation instructions of sphinxbase. You should use following commands to generate this file:

 sphinx_lm_convert -i model.lm -o model.dmp

sphinx_lm_convert -i model.dmp -ifmt dmp -o model.lm -ofmt arpa

After you've got that running, you should list all audio files that you want use for training and their matching phrases in remaining files:

your_db_train.fileids -

List of files for training

your_db_train.transcription - Transcription for training

Ubuntu 11.10 install sphinxbase.

Ubuntu 11.10 install sphinxbase.

There is a good manual at http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx
Anyways. After you've downloaded and unpacked sphinx-0.7 archive

1. ./autogen.sh (It won't start right away, but will print what software packages are missing)
2. sudo apt-get install autoconf
3. sudo apt-get install libtool
4. sudo apt-get install automake
5. sudo apt-get install bison
6. sudo ./autogen.sh
8. sudo make
9. sudo make install
10. export LD_LIBRARY_PATH=/usr/local/lib

Now try

sphinx_lm_convert

If you can see following message:

ERROR: "cmd_ln.c", line 675: No arguments given, available options are:
Arguments list definition:
[NAME] [DEFLT] [DESCR]
-case Ether 'lower' or 'upper' - case fold to lower/upper case (NOT UNICODE AWARE)
-debug Verbosity level for debugging messages
-help no Shows the usage of the tool
-i Input language model file (required)
-ienc Input language model text encoding (no conversion done if not specified)
-ifmt Input language model format (will guess if not specified)
-logbase 1.0001 Base in which all log-likelihoods calculated
-mmap no Use memory-mapped I/O for reading binary LM files
-o Output language model file (required)
-oenc utf8 Output language model text encoding
-ofmt Output language model file (will guess if not specified)

Than everything works fine.
NOTE: this is required step to start training your own acoustic models for CMU Sphinx 4.