[GSoC 2017 with CMUSphinx] Post 12#: Final Report: A Summary
I have been working on this Collect Pronunciation Dictionaries from Wiktionary project with CMUSphinx during the last three months, as a part of Google Summer of Code 2017. My project aims to expand pronunciation dictionaries for new words and multiple languages, and use them in CMUSphinx, like training acoustic models. My work can mainly be divided into two parts: in the first two months, developed an individual toolkit to parse pronunciation from Wiktioanry dump then convert them into IPA and X-SAMPA format; and test some benchmarks using collected dictionaries in the third month. Here's the status of project goals: Developed a toolkit wikt2pron which can be used as a blackbox to collect pronunciation in IPA and X-SAMPA format. Well tested and easy to use to expand dictionaries by parsing pronunciations from Wiktionary dump. The toolkit has a detailed document for usages and API. Ten phonetic dictionaries have been collected from Wiktioanry using the toolkit. One proble