Seminars & Events
Mathematics and Computer Science Division
"Contextual Modeling in Continuous Speech Tone Recognition"
DATE: February 18, 2011
TIME: 10:30 AM - 11:30 AM
SPEAKER: Siwei Wang, LANS Postdoc Interviewee
LOCATION: Building 240 Conference Center Rooms 1404 and 1405, Argonne National Laboratory
HOST: Sven Leyffer
Description:
Tone languages like Mandarin Chinese use tones (specific pitch patterns) to distinguish syllables which are otherwise ambiguous. Tones in Mandarin Chinese have been shown to carry as much information as vowels [1]. However, several contextual factors in continuous speech make it challenging to achieve successful tone recognition. First, coarticulation between adjacent tones can compromise the realization of underlying tone targets. Second, phrase, sentence and topic boundaries can also affect pitch; pitch variation has been successfully employed to perform sentence and story segmentation. Third, speaker differences, especially gender differences, make it necessary to scale tone targets to compensate for individual variation.
I present two approaches to model contextual effects on tone production in continuous speech. These approaches achieve state-of-the-art tone recognition performance on Mandarin Chinese Broadcast News data. The first approach focuses on modeling the coarticulation between adjacent tones. It manipulates a landmark-based vowel detection system to locate the most reliable tone production regions and to remove those regions affected by coarticulation. This approach shows a 15% improvement over two previously published tone recognition frameworks. The second approach employs sequential graphical models with Conditional Random Fields (CRF) to encode tone variation due to phrase, sentence and topic level intonation. We found that not only do different tones vary under each of these intonation conditions, but the choice of graphical model structure can also impact performance. These techniques can also be extended to other machine learning challenges, i.e modeling human social processes in dyad conversations of three cultures (American English, Mexican Spanish and Iraqi Arabic). Finally, I will briefly talk about how to apply structural learning on network behavior and attack pattern
analysis [2,3].
[1] Dinoj Surendran, Gina-Anne Levow, and Yi Xu. Tone recognition in mandarin using focus. Proceedings of Interspeech/ICSLP 2005, 2005.
[2] Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation. Yu Gu, Andrew McCallum and Don Towsley. Internet Measurement Conference, 2005.
[3] Layered approach using conditional random fields for intrusion detection. Kapil Kumar Gupta, Baikunth Nath and Ramamohanarao Kotagiri IEEE Trans. Dependable and Secure Computing, Vol. 7, No. 1, Jan-Mar 2010
Save the event to your calendar [schedule.ics]
