Seminar Details:

LANS Informal Seminar
"Contextual Modeling in Continuous Speech Tone Recognition"

DATE: January 20, 2011

TIME: 10:30:00 - 11:30:00
SPEAKER: Siwei Wang, Computer Science Department, University of Chicago
LOCATION: Bldg 240, Room 4301, Argonne National Laboratory

Description:
Tone languages like Mandarin Chinese use tones (specific pitch patterns) to distinguish syllables which are otherwise ambiguous. Tones in Mandarin Chinese have been shown to carry as much information as vowels [1]. However, several contextual factors in continuous speech make it challenging to achieve successful tone recognition. First, coarticulation between adjacent tones can compromise the realization of underlying tone targets. Second, phrase, sentence and topic boundaries can also affect pitch; pitch variation has been successfully employed to perform sentence and story segmentation. Third, speaker differences, especially gender differences, make it necessary to scale tone targets to compensate for individual variation.

I present two approaches to model contextual effects on tone production in continuous speech. These approaches achieve state-of-the-art tone recognition performance on Mandarin Chinese Broadcast News data. The first approach focuses on modeling the coarticulation between adjacent tones. It manipulates a landmark-based vowel detection system to locate the most reliable tone production regions and to remove those regions affected by coarticulation. This approach shows a 15% improvement over two previously published tone recognition frameworks. The second approach employs sequential graphical models with Conditional Random Fields (CRF) to encode tone variation due to phrase, sentence and topic level intonation. We found that not only do different tones vary under each of these intonational conditions, but the choice of graphical model structure can also impact performance. Finally, I will briefly talk about possible application of these approaches to general machine learning challenges, especially structural learning on network behavior and attack pattern analysis [2,3].

[1] Dinoj Surendran, Gina-Anne Levow, and Yi Xu. Tone recognition in mandarin using focus. Proceedings of Interspeech/ICSLP 2005, 2005.
[2] Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation. Yu Gu, Andrew McCallum and Don Towsley. Internet Measurement Conference, 2005
[3] Layered approach using conditional random fields for intrusion detection. Kapil Kumar Gupta, Baikunth Nath and Ramamohanarao Kotagiri IEEE Trans. Dependable and Secure Computing, Vol. 7, No. 1, Jan-Mar 2010


 

Please send questions or suggestions to Debojyoti Ghosh: ghosh at mcs dot anl dot gov.