This page contains links to papers in PDF (Portable Digital
Format). Use Adobe Acrobat Reader (version 4.0 minimum) to open these
documents. [Download
Acrobat Reader (free)]
Prosody and Emotion
Keynote lecture: Prosody and Emotions
by Sylvie Mozziconacci (Q-go and Leiden University, NL) The investigation of emotional speech constitutes
quite a multidisciplinary research field. In order to benefit from the mutual
enrichment that might result from this interdisciplinarity, it seems important
to clarify what the specific contribution of a particular field can be. In this
paper, an attempt is made to shed light on the type of contribution proposed by
the tradition of studies of prosody to studies of expression of emotion in
speech. Methodological issues that might contribute to the fruitfulness of
studies of expressive speech are discussed.
It is argued that the way
intonation conveys emotion and attitude in speech is best studied when pitch
variation is represented in the theoretical framework of a model of intonation.
Using such a model structures and reduces the data, formalizes the description,
facilitates control of parameters and generalization of results. The use of such
a model has proven useful in the field of prosody, independently of its type.
Moreover, such a context constitutes an opportunity to test the model's adequacy
for dealing with extreme variations such as those occurring in emotional speech.
Another point is that complementary studies of production and perception are
felt to be a necessary prerequisite for establishing the communicative
significance of the investigated speech parameters. The need of a reference
baseline in each experiment is also reminded. Furthermore, the need for drawing
a distinction between the description of the global shape of the F0 curve, i.e.,
the type of pith contour, and the description of how this contour is concretely
implemented in terms of pitch events is also discussed. Both the abstract
phonological information on the type of contour, and the concrete phonetic
information on the concrete pitch implementation of the F0 curve are pertinent.
The investigation of type of pitch contour and detailed pitch implementation
seems particularly promising when the experimental design is such as to allow
the independent study of each effect, as well as their combined effect. This may
lead us to make use of an orthogonal experimental design in the investigation of
prosodic parameters. Finally, by considering how well existing assumptions about
intonation account for its expressive functions, some light can be shed on the
functionality of intonation for conveying meaningful information in speech
communication.
Keynote lecture: ToBI Or Not ToBI?
by Colin Wightman (Minnesota State University, US) In the decade that has passed since the introduction
of the ToBI system for the transcription of prosody, speech technology has moved
out of the laboratory and into commercial applications on several fronts.
However, virtually none of the commercial products have made large-scale use of
prosody. Nevertheless, researchers in both recognition and synthesis continue to
agree that better utilization of prosody is essential to improving the
performance and acceptability of commercial systems. In this paper, we review
the current state of prosody in commercial systems, and examine how the ongoing
discussions related to what and how to transcribe with respect to prosody have
simultaneously advanced and inhibited the field. In particular, we argue that,
in hindsight, the ToBI system contains several flaws that have limited its
acceptance and application.
Prosody and Syntactic, Semantic, Pragmatic Interpretation
Keynote lecture: Intonation and
Interpretation: Phonetics and Phonology by Carlos Gussenhoven
(University of Nijmegen, NL) Intonational
meaning is located in two components of language, the phonetic implementation
and the intonational grammar. The phonetic implementation is widely used for the
expression of universal meanings that derive from 'biological codes', meaning
dimensions based on aspects of the production process of pitch variation. Three
codes are identified, Ohala's Frequency Code, the Effort Code and the Production
Code. In each case, 'informational' meanings (which relate to the message) are
identified, while for the first two codes also 'affective' meanings (relating to
the state of the speaker) are discussed. Speech communities will vary in the
extent to which they employ those meanings, and in the choices they make when
they conflict. What they will never do, however, is change the natural
form-function relations that they embody. By contrast, grammaticalised meanings
often mimic the natural meanings, but linguistic change may create quite
arbitrary form-meaning relations when forms are phonologised, and the semantics
is systematised. English grammaticalised intonational meaning concerns
information status.
Keynote lecture: Cerebral Strategies in
the Segmentation and Interpretation of Speech by Kai Alter
(Max-Planck-Institute of Cognitive Neuroscience, Leipzig, DE) The segmentation of the acoustic speech signal is a
fundamental for the processing of spoken language. The paper at hand provides a
survey of studies conducted in our lab concerning the detection of segmentation
cues in the speech signal and associated perception of prosodic boundaries. The
first two studies presented here employ the methodology of Event-Related
Potentials (ERP) to study online electrophysiological responses to acoustic
stimuli varying in syntactic and prosodic constituency, as well as in segmental
content. By the first study an ERP shift was identified correlated with the
perception of major intonational boundaries which was termed the Closure
Positive Shift (CPS). The second study was especially concerned with listener's
abilities in speech segmentation, given the exclusive presence of prosodic cues.
A third experiment reviewed here employs functional Magnetic Resonance Imaging
(fMRI), an investigation method based on hemodynamic brain responses. ERP and
fMRI are complementary methodologies: while ERPs provide an accurate measure of
temporal aspects of processing, fMRI methodology is particularly well suited to
localize such processes in the brain.
Keynote lecture: Articulatory Constraints and
Tonal alignment by Yi Xu (Northwestern University, Illinois, US) There has been accumulating evidence in recent years
that certain F0 events are consistently aligned with segmental events such as
the syllable boundary. The mechanisms for the observed alignment patterns,
however, are still being closely investigated. In this paper I argue that to
understand the observed tonal alignment patterns, it is imperative to first
understand the role of articulatory constraints in shaping the F0 contours in
speech. In particular, the maximum speed of pitch change limits how fast F0
movements can be produced; and the coordination of laryngeal and supralaryngeal
movements limits how syllables and tones can be aligned to each other. From a
different perspective, these constraints mean that the degrees of freedom
speakers have are probably less than previously thought. This may actually make
our understanding of the speech signal somewhat easier than before. I will
demonstrate this with a theoretical model of F0 production that is based on the
new understanding of the articulatory constraints. Though conceptually simple,
the model seems to be able to account for a number of phonetic patterns that
have been observed in speech. Finally, I will briefly discuss the implications
of the new insights on our understanding of tonal perception in speech.
Keynote lecture: Acoustic Correlates of
Linguistic Rhythm: Perspectives by Franck Ramus (Maison des Sciences de
l'Homme, Paris, FR) The empirical grounding
of a typology of languages' rhythm is again a hot issue. The currently popular
approach is based on the durations of vocalic and intervocalic intervals and
their variability. Despite some successes, many questions remain. The main
findings still need to be generalised to much larger corpora including many more
languages. But a straightforward continuation of the current work faces many
difficulties. Perspectives are outlined for future work, including proposals for
the cross-linguistic control of speech rate, improvements on the statistical
analyses, and prospects raised by automatic speech processing.