AUC PHILOLOGICA
AUC PHILOLOGICA

AUC Philologica (Acta Universitatis Carolinae Philologica) je akademický časopis publikující jak lingvistické, tak literárně historické a teoretické studie. Nedílnou součástí časopisu jsou i recenze odborných knih a zprávy z akademického prostředí.

Časopis je indexován v databázích CEEOL, DOAJ, EBSCO a ERIH PLUS.

AUC PHILOLOGICA, Vol 2022 No 1 (2022), 51–63

The dynamic effect of speaking fast on speech prosody

Lauri Tavi

DOI: https://doi.org/10.14712/24646830.2022.28
zveřejněno: 17. 01. 2023

Abstract

Speaking fast causes several changes in speech prosody. In addition, it can be associated with a decrease in speech intelligibility. In this study, prosodic changes in fast speech were investigated using common prosodic measurements and syllabic prosody index (SPI), a novel prominence measure that combines f0, energy and duration features. Dynamic changes in long-term prosodic prominence were investigated using functional data analysis (FDA), in which the SPI is transformed into a functional form. The possibly decreasing effect of speaking fast on speech intelligibility was evaluated using automatic speech recognition. Phonetic analyses of syllabic units showed that speaking fast decreases duration, f0 and SPI, and increases articulation rate and proportional acoustic energy in the frequency range of 0–1 kHz. FDA supported the aforementioned results by revealing dynamically decreased overall prominence in fast speech. Furthermore, in comparison to regular speech, speech intelligibility was found to be significantly lower in fast speech: word error rate (WER) for regular speech was 0.27, whereas for fast speech it was 0.86.

klíčová slova: fast speech; prosody; prominence; functional data analysis; speech intelligibility

reference (25)

1. Boersma, P. & Weenink, D. (2020). Praat: doing phonetics by computer [Computer program]. Version 6.1.32. url: http://www.praat.org.

2. Corretge, R. (2020). Praat Vocal Toolkit. url: http://www.praatvocaltoolkit.com.

3. Cronenberg, J., Gubian, M., Harrington, J., & Ruch, H. (2020). A dynamic model of the change from pre-to post-aspiration in Andalusian Spanish. Journal of Phonetics, 83, 1-22. CrossRef

4. Cummins, F., Grimaldi, M., Leonard, T., Simko, J. (2006). The chains corpus: Characterizing individual speakers. Proceedings of SPECOM, Citeseer, pp. 431-435.

5. De Jong, N. H. & Wempe, T. (2009). Praat script to detect syllable nuclei and measure speech rate automatically. Behaviour Research Methods, 41, 385- 390. CrossRef

6. Dellwo, V., Leemann, A., & Kolly, M. J. (2015). Rhythmic variability between speakers: Articulatory, prosodic, and linguistic factors. The Journal of the Acoustical Society of America, 137(3), 1513-1528. CrossRef

7. Greenberg, S., Carvey, H., Hitchcock, L., & Chang, S. (2003). Temporal properties of spontaneous speech-a syllable-centric perspective. Journal of Phonetics, 31, 465-485. CrossRef

8. Gubian, M., Boves, L., & Cangemi, F. (2011). Joint analysis of f0 and speech rate with functional data analysis. 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, Prague, Czech Republic, pp. 4972-4975. CrossRef

9. Gubian, M., Cangemi, F., & Boves, L. (2010). Automatic and data driven pitch contour manipulation with functional data analysis. Speech Prosody, Chicago, IL, USA.

10. Gubian, M., Torreira, F., & Boves, L. (2015). Using functional data analysis for investigating multidimensional dynamic phonetic contrasts. Journal of Phonetics, 49, 16-40. CrossRef

11. Hazan, V. & Markham, D. (2004). Acoustic-phonetic correlates of talker intelligibility for adults and children. The Journal of the Acoustical Society of America, 116, 3108-3118. CrossRef

12. Janse, E. (2004). Word perception in fast speech: artificially time-compressed vs. naturally produced fast speech. Speech Communication, 42, 155-173. CrossRef

13. Janse, E., Nooteboom, S., & Quené, H. (2003). Word-level intelligibility of time-compressed speech: prosodic and segmental factors. Speech Communication, 41, 287-301. CrossRef

14. Kisler, T., Reichel, U., & Schiel, F. (2017). Multilingual processing of speech via web services. Computer Speech & Language, 45, 326-347. CrossRef

15. Lindblom, B. (1990). Explaining phonetic variation: A sketch of the h&h theory. Speech production and speech modelling, Springer, pp. 403-439. CrossRef

16. Mayo, C., Aubanel, V., & Cooke, M. (2012). Effect of prosodic changes on speech intelligibility. Proceedings of INTERSPEECH, Portland, USA September 9-13, pp. 1708-1711. CrossRef

17. Niebuhr, O., & Kohler, K. J. (2011). Perception of phonetic detail in the identification of highly reduced words. Journal of Phonetics, 39(3), 319-329. CrossRef

18. Patel, R. & Schell, K.W. (2008). The influence of linguistic content on the lombard effect. Journal of Speech, Language, and Hearing Research, 51(1), 209-220. CrossRef

19. Ramsay, J.O., Hooker, G., & Graves, S. (2009). Functional data analysis with R and MATLAB. NY: Springer. CrossRef

20. Reetz, H. (2009). Phonetics: transcription, production, acoustics, and perception. Oxford: Wiley-Blackwell.

21. Roettger, T.B., Winter, B., & Baayen, H. (2019). Emergent data analysis in phonetic sciences: Towards pluralism and reproducibility. Journal of Phonetics, 73, 1-7. CrossRef

22. Stanton, B., Jamieson, L. & Allen, G. (1988). Acoustic-phonetic analysis of loud and lombard speech in simulated cockpit conditions. ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing, pp. 331-334. CrossRef

23. Tavi, L. & Werner, S. (2020). A phonetic case study on prosodic variability in suicidal emergency calls. International Journal of Speech, Language & the Law, 27, 59-74. CrossRef

24. Tomashenko, N., Wang, X., Vincent, E., Patino, J., Srivastava, B. M. L., Noé, P.G., Nautsch, A., Evans, N., Yamagishi, J., O'Brien, B., & Chanclu, A. (2021). The voiceprivacy 2020 challenge: Results and findings. arXiv preprint arXiv:2109.00648. CrossRef

25. Zellers, M., Gubian, M., & Post, B. (2010). Redescribing intonational categories with functional data analysis. Proceedings of INTERSPEECH, Makuhari, Japan, pp. 1141-1144. CrossRef

Creative Commons License
The dynamic effect of speaking fast on speech prosody is licensed under a Creative Commons Attribution 4.0 International License.

230 x 157 mm
vychází: 3 x ročně
cena tištěného čísla: 150 Kč
ISSN: 0567-8269
E-ISSN: 2464-6830

Ke stažení