Keynote Speakers

 

Modeling prosody variations for communicative speech and the second language towards trans-disciplinary scientific understanding
Yoshinori Sagisaka, Waseda University
Prosody as trans-disciplinary studies
 


Abstract

In this talk, I would like to introduce research activities on prosody variations needed for the modeling of communicative prosody and the second language (L2) studies. For communicative prosody studies, a possibility of lexicon driven prosody control and further needs of dialogue-act modeling will be discussed. For the modeling of L2 studies, the possibility of scientific understanding of timing control characteristics and the needs of perceptual studies will be demonstrated. Through the introduction of these research activities, I would like to show the necessity and the merits of trans-disciplinary research collaboration among multiple research areas relating to speech science and technologies including linguistics, phonetics, information processing and language education. Finally, I would like to introduce the research efforts of international research consortium AESOP (Asian English Speech cOrpus Project) to collect commonly sharable learner's spoken language data and knowledge of L2 studies for trans-disciplinary scientific understanding.


Biography

Yoshinori Sagisaka, Professor, Global Information and Telecommunication Institute, Applied Mathematics Department and Language and Speech Science Research Labs., Waseda University. Yoshinori Sagisaka has been a professor of GITI Waseda University since 2001. He has been working in speech and language science and engineering field for more than thirty years. During this period, he has worked at Electrical Communication Res. Labs. (1975-1986), ATR (1986-2007), Edinburgh University CSTR (1988), AT&T Bell Labs. (1993), Kobe University (1997-2001), NICT(2007-) and Waseda University (2000-). His research interests cover speech synthesis, prosody modeling, speech recognition, speech perception and language processing. He has been engaged in quite a few international research activities including IEEE Signal Processing Society Committee Member (1990-1994), Speech Communication Journal Editorial Board (1993-2009), France Telecom CNET(Centre Nationale d’Etudes des Telecommunications) Conseiller Scientifique (1993), KTH (Royal Institute of Stockholm University) CTT (Speech Technology Center) International Advisory Committee Member (1993-), Computer Speech and Language Journal Editorial Board (1994-2009), Natural Language Engineering Journal Editorial Board (1994-2004), Permanent Council of International Conferences on Spoken Language Processing Member (1998-), Speech Communication Journal Chief Co-Editor(2001-2004), International Speech Communication Association Board (2007-2011), International Congress of Phonetic Science Board (2007-) and contributed as an international scientific committee member of many speech related conferences and workshops. In 2008, he initiated AESOP (Asian English Speech cOrpus Project) research consortium to promote research on the 2nd language studies and collaborations among Asian countries. 

 
Neural specializations for pitch in tonal languages
 Jackson Gandour, Purdue University
 Prosody in the Brain
 


Abstract

Tonal languages provide a window for tracing the hierarchical transformation of the pitch of a sound from early sensory to later cognitive stages of processing in the human brain. Hemispheric laterality of pitch is driven by multiple dichotomies or scalar features that apply during real-time intervals at cortical and subcortical levels. Using functional neuroimaging (PET/fMRI), we show that pitch processing recruits the hemispheres differentially as a function of its phonological relevance to the listener. Mismatch negativity, a neural index of early, cortical processing, shows that pitch processing is shaped by the relative saliency of tonal features. Frequency following response, a neural index of brainstem pitch encoding, shows that enhancement of pitch features is sensitive to rapidly-changing segments of tonal contours, and that ear asymmetries can be modulated by functional changes in pitch based on linguistic status. We conclude that nascent representations of acoustic-phonetic features emerge early along the auditory pathway.


Biography

Jackson Gandour developed an interest in tone languages during his service as a Peace Corps Volunteer in Thailand (1964-66). After receiving his MA degree in linguistics from the University of Pittsburgh (1968), he taught linguistics as a Visiting Fulbright Lecturer at Niigata University in Japan (1968-69). He later returned to Thailand (1975) to carry out multidimensional scaling research on tone perception for his doctoral degree in linguistics at UCLA (1976), and continued this research as a postdoctoral fellow at Bell Labs (1976-77). He has been a faculty member at Purdue University for 34 years, where his research interests focused initially on the perception of tone in adult speakers of tone languages with communication disorders. He carried out field research in Thailand on tone production and perception in aphasics, laryngectomees, and hearing-impaired populations as a Senior Fulbright Research Scholar at Mahidol University (1988-89). More recently, he has exploited functional neuroimaging (PET, fMRI) and auditory electrophysiology (MMN, FFR) to study speech prosody in healthy adult speakers of Mandarin and Thai (1997-present). His current research focuses on auditory electrophysiological studies of brainstem pitch encoding in tone languages. He has served on the editorial boards of Brain and Language (1993-2011) and Aphasiology (2002-2011). 

 
 Analysis-by-Synthesis in Prosody Research
 Rüdiger Hoffmann, TU Dresden
 Prosody in Speech Processing
 


Abstract

It was early recognized in the history of speech technology, that prosody plays an essential role in the communication process and that it is therefore necessary to include prosodic components into the speech-based systems for man-computer interaction. Recent text-to-speech (TTS) systems show prosodic components at an elementary level (intonation and duration) for good comprehensibility, but it is also obvious that these components are not powerful enough to produce speech with high naturalness and personality. On the other hand, systems for automatic speech recognition (ASR) consider the prosody more or less implicitly, and we have only few examples where prosodic features are explicitly used for improving the recognition results. This talk is an attempt to give a more general view on the inclusion of prosody in speech technology. During the last decade, reconsidering the paradigm of analysis-by-synthesis (AbS) in speech technology has produced some algorithmic progress in TTS and in ASR as well. The system UASR (Unified Approach for Speech Synthesis and Recognition) of the TU Dresden was designed to demonstrate the AbS approach in a hierarchical way. It is now time to discuss how prosodic components could be included in such systems. The inclusion of rhythmic phenomena seems to be the most difficult but also very promising subtask. Possibly speech processing can benefit from musical signal processing where the identification of rhythm is a very natural task.


Biography

Rüdiger Hoffmann studied Radio Engineering at Technische Universität Dresden and worked from 1971 to 1982 in the electronics industry. He finished his Dr.-Ing. (1978) and habilitation (1985) theses in the field of system theory. In 1982, he started his research work in speech communication at Technische Universität Dresden. In 1992, he was promoted to the Chair for Speech Communication. Ten years later, this chair was redefined to System Theory and Speech Technology. His research interests are in the field of signals and systems and its application to acoustic (especially speech) signals. The main research projects of his team included components of the German VERBMOBIL project, the Dresden Speech Synthesis System (DRESS), and the Unified System for Speech Synthesis and Recognition (UASR). He has directed numerous application projects with different companies. He has written two textbooks on signal processing and is the series editor of the “Studientexte zur Sprachkommunikation” (60 volumes). He is the author of numerous scientific contributions and has supervised about 25 Dr.-Ing. and habilitation theses. He is additionally active with projects dealing with the history of experimental phonetics and speech technology. He was the General Chair of the 3rd International Conference on Speech Prosody (Dresden 2006). 2010 he was appointed as Guest Professor of the Tongji University Shanghai, School of Foreign Languages. 

 
 Hierarchical Prosody Modeling and Generation
 Jianhua Tao, Chinese Academy of Sciences
 Prosody Modeling
 


Abstract

Prosody is a super-segmental feature of speech and represents various features of speakers or utterances. It is well accepted that the speech prosody can be hierarchically modeled, however it’s still the open question that how the hierarchical model is influenced by the acoustic features and context features. Which features are important to label or predict the hierarchical structure for prosody? Among them, pitch accent and phrase play the important roles. In the talk, we will try to give a wide view of the recent research on the hierarchical prosody model, mainly focusing on the features of phrase and pitch accent from acoustical and perceptional aspect in different hierarchical levels. The influence from the syntactic structure will also be introduced. And finally, we will introduce the prediction model for the hierarchical prosody structure and how to use it in text to speech system to get more expressive synthetic results.


Biography

Jianhua Tao received the M.S. degree from Nanjing University, Nanjing, China, in 1996 and the Ph.D. degree from Tsinghua University, Beijing, China, in 2001. He is currently a Professor with the National Laboratory of Pattern Recognition, Chinese Academy of Sciences. His research interests include speech synthesis and recognition, emotional information processing. He developed quite several earliest versions of Speech systems in China, and has published more than 100 papers in journals and proceedings, e.g., IEEE Trans. on ASLP, ICASSP, Interspeech, ICME, ISCSLP, Speech Prosody, ICPR, etc. Prof. Tao received several awards from important conferences including Eurospeech2001. From 2006, he was ever elected as Vice-Chairperson of the ISCA Special Interest Group of Chinese Spoken Language Processing (SIG-CSLP) (2006-2010). Currently, He is also the executive committee member of the HUMAINE association, the steering committee member of IEEE Trans. on Affective Computing, the associate editor of IJSE, JMI and IJCLCLP. 

 
 Bidirectional Tone Sandhi in Tianjin Dialect: Problem and Analysis
 Qiuwu Ma, Tongji University
 Tonal Aspects of Prosody
 


Abstract

The paradoxical problem of Tianjin tone sandhi, first recognized by Chen, has ever been a challenge to current phonological theories. The present paper demonstrates what reasons underlie it and then discusses what problems still exist in the application of some current theories or models in Tianjin tone sandhi.


Biography

Professor Qiuwu Ma graduated from Tianjin Normal University with an MA degree in English Language and Literature in June 1988 and received his Ph.D. degree in Linguistics and Language Teaching from Beijing Normal University in June 2001. His research has concentrated on the phonological theories and their application in Chinese languages. His interest also ranges over other different fields of linguistics, including syntax, pragmatics, sociolinguistics, (first and second) language learning and acquisition and cognitive linguistics. He has published over 50 academic papers and a dozen of books on language learning. He is currently working on the project: Optimality Theory and Chinese Tone Sandhi.

 

Copyright©2010 sp2012. All rights reserved.

This site is best viewed with Internet Explorer 8.0 at resolution 1024 X 768.