Autosegmental-Metrical

The AM theory views prosody as a coherent structure that consists of several interacting components (including prosodic trees and the representation of intonation) and characterizes intonation as a finite state system consisting of pitch accents, phrase tones and boundary tones (Beckman, 1996; Gussenhoven, 2004; Ladd, 1996, Pierrehumbert, 1980). An algorithm for generating surface F0 contours using the proposed phonological units was put forward by Pierrehumbert (1981) and a modified algorithm was given in Anderson et al. (1984).  Many of the tonal categories in AM theory have been adopted as the tonal components of the Tone and Break Indices (ToBI) transcription system (Silverman et al., 1992). Various efforts have been made to convert ToBI parameters to continuous F0 contours (Black & Hunt, 1996; Dusterhoff et al., 1999; Ross & Ostendorf, 1999), but the basic target-and-interpolation assumption of the AM theory is not implemented in those systems, thus the computational advantages of AM theory have yet to be fully demonstrated.

Algorithmically, AM uses point- rather than interval-based annotations, following the idea that tonal targets are specified in terms of pitch height and their relation to the segmental string, and inter-target contours result from linear or sagging interpolations (Pierrehumbert, 1980, 1981). The inter-target connections have been implemented with linear and parabolic interpolations in an early version of AMtrainer (Lee, Xu & Prom-on, 2014) following (Pierrehumbert, 1981). In the current CPP implementation, we have applied a linear least square method that estimates coefficients of a quadratic equation for each inter-target interval. This makes use of all data points in the interval rather than only three points in parabolic interpolation, with the same base quadratic equation. Thus, the AM F0 model in the current version of CPP is expressed as

where c1, c2, and c3 are estimated using the linear least square method, which substitutes data points into the equation and solves for coefficients using a pseudoinverse operation.

Referencesb

Anderson, M., Pierrehumbert, J. and Liberman, M. (1984). Synthesis by rule of English intonation patterns. In Proceedings of Proceedings of ICASSP, San Diego, CA: 77-80.

Beckman, M. E. (1996). The parsing of prosody. Language and Cognitive Processes 11: 17-67.

Black, A. and Hunt, A. (1996). Generating F0 contours from ToBI labels using linear regression. In Proceedings of International Conference on Spoken Language Processing, Philadelphia

Dusterhoff, K. E., Black, A. W. and Taylor, P. (1999). Using decision trees within the Tilt intonation model to predict f0 contours. In Proceedings of Eurospeech _99, Budapest, Hungary

Gussenhoven, C. (2004). The Phonology of Tone and Intonation: Cambridge University Press.

Ladd, D. R. (1996). Intonational phonology. Cambridge: Cambridge University Press.

Lee, Y. Xu and S. Prom-on, “Modeling Japanese F0 contours using the PENTAtrainers and AMtrainers,” in TAL 2014 – 4th International Symposium on Tonal Aspect of Languages, May 13–16, Nijmegen, The Netherlands, 2014, pp. 164–167.

Ross, K. N. and Ostendorf, M. (1999). A dynamical system model for generating fundamental frequency for speech synthesis. IEEE Transactions on Speech and Audio Processing 7: 295-309.

Pierrehumbert, The Phonology and Phonetics of English Intonation. Ph.D. dissertation, MIT, Cambridge, MA, 1980.

Pierrehumbert, “Synthesizing intonation,” Journal of the Acoustical Society of America, vol. 70, pp. 986-995, 1981.

Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J. & Hirschberg, J. (1992). ToBI: A standard for labeling English prosody. In Proceedings of The 1992 International Conference on Spoken Language Processing, Banff: 867-870.