cachetts

Currently there are two free navigation applications for n900: mapero and navit. Both are using text to speech syntheses for voice navigation. This problem is solved more or less well for English language, but problem of Russian speech syntheses is much more complicated. It is supported in espeak and festival (launching festival on n900 is another story) but espeak speech quality is wery poor. You need too much attention to understand what it is saying, so it is not suitable for voice navigation. Festival with voice developed in MSU is good, but requires about 130 MB of RAM at moments of synthesis. n900 has 256 on board, so festival is very slow and it takes about 30 seconds to generate a phrase. Not suitable too. Commercial navigators successfully use prerecorded audio files. It works because navigator uses only limited number of phrases. But they can't pronounce streets names as navit does. So this is how idea about caching text proxy came, now it is called cachetts and is available in extras-devel. This is a daemon, which listens on DBus. When message arrives is is divided on phrases. If there is prerecorded audio file for a phrase it is played. In other case festival is launched and the phrase is generated to audio file. If at the moment, when phrase should be pronounced festival is still processing it, is is synthesized via espeak (poor quality for Russian, but would be pronounced immediately). Next time cachetts would find file, generated by festival and would play it. Of course this behavior is configurable.
As a bonus input string is passed through replacement procedure, this allows to fix mistakes in phrases, generated by navigators. For example festival fails to say "2.2" in Russian, so in all decimal numbers "." is replaced with "point" (actually with "и" in default configuration).

Usage

First of all you need to configure cachetts. Settings are located in /home/user/.cachetts.py
[cachetts]
# Mode for festival
textmode=fundamental
# Folder, where phrases generated by festival would be placed
cachedir=/home/user/MyDocs/cachetts/cache/
# Folder with voice files
prefix=/home/user/MyDocs/cachetts/voice/
# Command, executed in festival. I use it to enable Russian voice
festival_init_cmd=(voice_msu_ru_nsh_clunits)
# Comand to be executed to launch espeak
espeakcmd=espeak -vru
# File with list of phrases, which hae according files in voice folder
templates=/home/user/MyDocs/cachetts/templates.csv
# File with list of replacements (regular expressions are used, actually file contains two first arguments for re.sub)
replaces=/home/user/MyDocs/cachetts/replaces.csv
# Main sound generator
generator=festival_cache
# Backup sound generator
backup_generator=espeak
Currently festival, festival_cache and espeak generators are allowed.
Now you need to launch the application. You can do it by typing cachetts.py from user in console (you would get some debug output, but application will terminate if you close the console), or /etc/init.d/cachetts start as root, this would start application in daemon mode. You can stop it with /etc/init.d/cachetts stop in such case. This application uses one tty device to communicate with festival, so if it would fail wit insufficient tty errors you would have to close one of terminal.
All communication is performed through DBus:
  • Say a string
    dbus-send --system --type=signal /su/kibergus/cachetts su.kibergus.cachetts.say_string string:'String you want to say' boolean:false string:navit boolean:true
     
    Arguments have following meaning:
    say_string(string, urgent, tag, kickoff)
     
    If urgent flag is set message would be pronounced immediately, without waiting in the queue. If kickoff flag is set all previous messages with the same tag and kickoff tag set would be removed from queue. This is made in a case if navigator would generate messages too often.
  • Say string using specified generator and with/without vocabulary (defined in templates.csv)
    dbus-send --system --type=signal /su/kibergus/cachetts su.kibergus.cachetts.say_string_custom string:'String, which should be said' boolean:false string:navit boolean:true boolean:true string:espeak
     
    say_string_custom(string, urgent, tag, kickoff, caching, generator)
     
  • Set default generator
    set_generator (type, generator)
     
    Where type can be main or backup
  • set_caching (caching) Enables and disables vocabulary usage.
Now you only have to tell navigator to use this application. For navit appropriate line in navit.xml would be:
<speech type="cmdline" data="dbus-send --system --type=signal /su/kibergus/cachetts su.kibergus.cachetts.say_string string:'%s' boolean:false string:navit boolean:true" cps="6"/>

What could be done in future

Currently all sounds are stored in wav, cachetts is capable to play any format supported by gstreamer, so you can convert them to ogg (as far a I know mp3 usage would increase gaps between words), but I couldn't find a way to convert them on n900 without additional packages, so I decided not add convertation after festival generates them.
And of-course new voices can be created. When I started writing I wanted my girlfriend's navigator to speak with my voice, through I didn't recorded it yet. I hope, that somebody would create voices, which are better, than synthesized ones.

Comments

It would be much simpler to

It would be much simpler to just focus on navigation and simply pre-record all pieces of text that can come up during a full text navigation session (including errors and redirects). Commpress that down with low quality mono vorbis and it will take less space than festival takes RAM and will play back much faster.

This is what cachetts can do.

This is what cachetts can do. You can prerecord needed files and put them in-to voice folder. They would be used in the first order. Festival would only be invoked if there are new peaces, for example for street names. Making pronunciation of numbers is not so easy too, so the would be rendered by festival.
And you can encode these files to ogg too. If "extra decoders support" is installed they would work. My practice shows, that all missing phrases are cached in a short period of time and festival is invoked quite seldom.