TR-H-0103 :1994.10.18

Osamu Fujimura

Syllables, Internal Structure and Role in Prosodic Organization

Abstract:This paper discusses a new view of speech organization in relation to the Converter-Distributor (C/D) model of phonetic implementation. Traditionally, speech signals were interpreted basically as a concatenated string of phonemic segments, each of the segments being represented as a simultaneous bundle of distinctive features. In this influential classical work, each segment is assumed to be completely specified with all phonetic characteristics as an integral and independent phonetic form, each component feature being associated with its inherent, though inevitably abstract, phonetic manifestation. On this linear string of segmental phonetic events, suprasegmental effects were assumed to be superimposed to form the observable speech signals. Some smoothing process, generally called coarticultion, would be applied to a set of step functions representing phonetic dimensions (such as formants or articulators' positions) formed out of discretely concatenated target values and assigned durations for individual (phonemic) segments, to generate continuously changing and physically realizable time functions for phonetic variables. The suprasegmental phenomena have been discussed referring to separately observable speech characteristics, in particular, the voice fundamental frequency (pitch) contour (i.e. time functions) and spectrographically defined segmental durations. Thus, concatenated string of phonemes, roughly corresponding to alphabetic text, is the primary (abstract) representation of the speech material, modulated by suprasegmental control superimposed on it in actual utterances.

Our new view to be discussed here deviates radically from this interpretation of speech phenomena. It assumes what we will call prosodic organization of an utterance or its phrasal components (as a phonetic unit) as the basic structure of speech phenomena. This structure is associated with a linear string of syllables as the concatenative "segmental" units. The flow of vocalic gestures characterizing the sequence of syllable nuclei forms the base function of the articulatory event that fits in the prosodic structure of the utterance. On this base function, consonantal gestures are superimposed, basically in the way Ohman depicted in his consonantal perturbation model (see also Carre, R. & Chennoukh, S.). The base function is inherently multi-dimensional in the sense that different articulators such as the jaw opening, the tongue body advancing or retraction, and the lip rounding and protrusion, behave more or less independently from each other, and some of these dimensions, in particular, presumably, the mandible abduction/adduction, more directly reflect the prosodic structure. Onto this base function is superimposed consonantal gestures reflecting inherent characteristics of phonological features representing each syllable margin item, or more specifically, in the C/D model terminology, features of onset, coda, or syllable affix(es). The prosodic characteristics of an utterance are specified phonologically by a metrical tree (or some other symbolic representation). In the phonetic specification of the speech utterance, which we will need as the input for the C/D system, the metrical tree must be augmented by numeric annotations. The intricate relation between the symbolic, discrete phonological representation and the corresponding continuously variable phonetic characteristics of utterances of the given linguistic message in a given situation is thus explained by an explicitly described generative model of phonetic realization.