Skip to main content

Posts

Featured

[GSoC 2017 with CMUSphinx] Post 12#: Final Report: A Summary

I have been working on this Collect Pronunciation Dictionaries from Wiktionary project with CMUSphinx during the last three months, as a part of Google Summer of Code 2017. My project aims to expand pronunciation dictionaries for new words and multiple languages, and use them in CMUSphinx, like training acoustic models. My work can mainly be divided into two parts: in the first two months, developed an individual toolkit to parse pronunciation from Wiktioanry dump then convert them into IPA and X-SAMPA format; and test some benchmarks using collected dictionaries in the third month. Here's the status of project goals: Developed a toolkit wikt2pron which can be used as a blackbox to collect pronunciation in IPA and X-SAMPA format. Well tested and easy to use to expand dictionaries by parsing pronunciations from Wiktionary dump. The toolkit has a detailed document for usages and API. Ten phonetic dictionaries have been collected from Wiktioanry using the toolkit. One proble

Latest Posts

[GSoC 2017 with CMUSphinx] Post 11#: Training Acoustic Model on LibriSpeech

[GSoC 2017 with CMUSphinx] Post 9-10#: Training Acoustic Model on Voxforge Dataset

[GSoC 2017 with CMUSphinx] Post 8#: Grapheme to Phoneme Conversion

[GSoC 2017 with CMUSphinx] Post 7#: Pronunciation Dictionary Collection

[GSoC 2017 with CMUSphinx] Post 6#: IPA Template Generation

[GSoC 2017 with CMUSphinx] Post 5#: Documentation

[GSoC 2017 with CMUSphinx] Post 4#: First Evaluation Report - A Beta Version of pywiktionary

[GSoC 2017 with CMUSphinx] Post 3#: IPA Extraction and X-SAMPA Conversion

[GSoC 2017 with CMUSphinx] Post 2#: Survey of Phoneme Sets