The dynamics of indexical information in speech: Can recognizability be controlled by the speaker?

Dellwo, Volker; Pellegrino,  Elisa; He,  Lei; Kathiresan,  Thayabaran

AUC PHILOLOGICA

AUC Philologica (Acta Universitatis Carolinae Philologica) je akademický časopis publikující jak lingvistické, tak literárně historické a teoretické studie. Nedílnou součástí časopisu jsou i recenze odborných knih a zprávy z akademického prostředí.

Časopis je indexován v databázích CEEOL, DOAJ, EBSCO a ERIH PLUS.

AUC PHILOLOGICA, Vol 2019 No 2 (2019), 57–75

The dynamics of indexical information in speech: Can recognizability be controlled by the speaker?

[The dynamics of indexical information in speech: Can recognizability be controlled by the speaker? ]

Volker Dellwo, Elisa Pellegrino, Lei He, Thayabaran Kathiresan

DOI: https://doi.org/10.14712/24646830.2019.18
zveřejněno: 18. 10. 2019

Abstract

Human voices are individual and humans have elaborate skills in recognizing speakers by their voice, phenomena that are deeply rooted in the evolution of human behavior. To date, the mechanisms of speaker recognition are not well understood because of the high variability of the acoustic cues to a speaker’s identity. We wondered what role the speaker plays in making his/her voice more or less well recognizable. While it is evident from the literature that humans can control vocal properties to enhance their intelligibility, it is unclear whether speakers can and/or do control vocal characteristics to be better recognizable and whether such control mechanisms play a role in the communication process. In this paper, we reviewed results from the literature supporting the view that speaker idiosyncratic information is dynamic and that humans have the ability to control how well they can be recognized. We suggest possible experimental setups by which the control over identity in voice can be tested and present pilot acoustic characteristics of speech that was produced to be either targeted at being (a) intelligible (clear speech) and (b) suitable for person recognition (identity marked speech). Results revealed that there is reason to believe that speakers apply different mechanisms when making their individuality identifiable as opposed to making their speech better understood. We discuss predictions that a control of recognizability and intelligibility has within major theories of speech perception.

klíčová slova: indexical information; voice recognition; identity marked speech

reference (109)

1. Abercrombie, D. (1967). Elements of General Phonetics. Chicago: University of Chicago Press.

2. Adank, P., Smits, R. & Van Hout, R. (2004). A comparison of vowel normalization procedures for language variation research. Journal of the Acoustical Society of America, 116(5), 3099-3107. CrossRef

3. Amino, K. & Arai, T. (2009). Speaker-dependent characteristics of the nasals. Forensic Science International, 185, 21-28. CrossRef

4. Amino, K., Sugawara, T. & Arai, T. (2006). Idiosyncrasy of nasal sounds in human speaker identification and their acoustic properties. Acoustical Science and Technology, 27, 233-235. CrossRef

5. Belin, P. (2006). Voice processing in human and non-human primates. Philos Trans R Soc Lond B Biol Sci., 361(1476), 2091-2107. CrossRef

6. Belin, P., Boehme, B. & McAleer, P. (2017). The sound of trustworthiness: Acoustic-based modulation of perceived voice personality. PLoS One, 12(10), e0185651. CrossRef

7. Belin, P. & Zatorre, R. J. (2003). Adaptation to speaker's voice in right anterior temporal lobe. Neuroreport, 14(16), 2105-2109. CrossRef

8. Bregman, M. R. & Creel, S. C. (2014). Gradient language dominance affects talker learning. Cognition, 130(1), 85-95. CrossRef

9. Bruckert, L., Bestelmeyer, P., Latinus, M., Rouger, J., Charest, I., Rousselet, G. A., Kawahara, H. et al. (2010). Vocal attractiveness increases by averaging. Current Biology, 20(2), 116-120. CrossRef

10. Burnham, D., Kitamura, C. & Vollmer-Conna, U. (2002). What's new, pussycat? On talking to babies and animals. Science, 296(5572), 1435-1435. CrossRef

11. Burton, A. M., Kramer, R. S., Ritchie, K. L. & Jenkins, R. (2016). Identity from variation: Representations of faces derived from multiple instances. Cognitive Science, 40(1), 202-223. CrossRef

12. Campeanu, S., Craik, F. I. M. & Alain, C (2013). Voice congruency facilitates word recognition. PLoS One, 8(3): e58778. CrossRef

13. Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. CrossRef

14. Clifford, B. R. (1980). Voice identification by human listeners: On earwitness reliability. Law and Human Behavior, 4(4), 373-394. CrossRef

15. Collins, S. A. (2001). Men's voices and women's choices. Animal Behaviour, 60(6), 773-780. CrossRef

16. Collins, S. A. & Missing, C. (2003). Vocal and visual attractiveness are related in women. Animal Behaviour, 65(5), 997-1004. CrossRef

17. Creel, S. C. & Bregman, M. R. (2011). How talker identity relates to language processing. Linguistics and Language Compass, 5(5), 190-204. CrossRef

18. Creel, S. C. & Tumlin, M. A. (2011). On-line acoustic and semantic interpretation of talker information. Journal of Memory and Language, 65(3), 264-285. CrossRef

19. De Figueiredo, R. M. & de Souza Britto, H. (1996). A report on the acoustic effects of one type of disguise. Forensic Linguistics, 3, 168-175. CrossRef

20. de Saussure, F. (1916). Cours de linguistique generale. Laussane and Paris: Payot.

21. de Jong, G., McDougall, K., Hudson, T. & Nolan, F. (2007). The speaker-discriminating power of sounds undergoing historical change: A formant-based study. In: Proceedings of the 16th International Congress of Phonetic Sciences, 1813-1816.

22. Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P. & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788-798. CrossRef

23. Dellwo, V., Huckvale, M. & Ashby, M. (2007). How is individuality expressed in voice? An introduction to speech production and description for speaker classification. In: Mueller, Ch. (Ed.). Speaker Classification I: Fundamentals, Features, and Methods (pp. 1-20). Berlin, Heidelberg: Springer. CrossRef

24. Dellwo, V., French, P. & He, L. (2018a). Voice biometrics for forensic speaker recognition applications. In: Frühholz, S. & Belin, P. (Eds.), The Oxford Handbook of Voice Perception (pp. 777-798). Oxford: Oxford University Press. CrossRef

25. Dellwo, V., Kathiresan, T., Pellegrino, E., Schwab, S. & Maurer, D. (2018b). Influences of fundamental oscillation on speaker identification in vocalic utterances by humans and computers. In: Proceedings of Interspeech 2018, 3795-3799. CrossRef

26. Doscher, B. (1993). The Functional Unity of the Singing Voice. Scarecrow Press.

27. Fant, G. (1975). Non-uniform vowel normalization. STL-QPSR, 2-3/1975, 1-19. CrossRef

28. Fischer, J., Semple, S., Fickenscher, G., Jürgens, R., Kruse, E., Heistermann, M. et al. (2011). Do women's voices provide cues of the likelihood of ovulation? The importance of sampling regime. PLoS One, 6(9), e24490. CrossRef

29. Fleming, D., Giordano, B. L., Caldara, R. & Belin, P. (2014). A language-familiarity effect for speaker discrimination without comprehension. Proceedings of the National Academy of Sciences of the United States of America, 111(38), 13795-13798. CrossRef

30. Eriksson, A. (2010). The disguised voice: imitating accents or speech styles and impersonating individuals. In: Llamas, C. & Watt, D. (Eds.), Language and Identities. Edinburgh: Edinburgh University Press.

31. Eriksson, A. & Wretling, P. (1997). How flexible is the human voice? - A case study of mimicry. In: Proceedings of Eurospeech 1997, 1043-1046.

32. Foulkes, P. & Barron, A. (2000). Telephone speaker recognition amongst members of a close social network. Forensic Linguistics, 7, 180-198. CrossRef

33. Garcia-Romero, D. & Espy-Wilson, C. Y. (2011). Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of Interspeech 2011.

34. Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(5), 1166-1183. CrossRef

35. Hansen, J. H. L. & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine, 32(6), 74-99. CrossRef

36. Hazan, V. & Baker, R. (2011). Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. Journal of the Acoustical Society of America, 130(4), 2139-2152. CrossRef

37. Hazan, V., Grynpas, J. & Baker, R. (2012). Is clear speech tailored to counter the effect of specific adverse listening conditions? Journal of the Acoustical Society of America, 132(5), EL371-EL377. CrossRef

38. He, L. & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law, 23(2), 243-273. CrossRef

39. He, L. & Dellwo, V. (2017). Between-speaker variability in temporal organizations of intensity contours. Journal of the Acoustical Society of America, 141(5): EL488-EL494. CrossRef

40. He, L., Zhang, Y. & Dellwo, V. (2019). Between-speaker variability and temporal organization of the first formant. Journal of the Acoustical Society of America, 145(3): EL209-EL214. CrossRef

41. Hirson, A. & Duckworth, M. (1993). Glottal fry and voice disguise: a case study in forensic phonetics. Journal of Biomedical Engineering, 15(3), 193-200. CrossRef

42. Hove, I. & Dellwo, V. (2014). The effects of voice disguise on f0 and on the formants. In: Proceedings of IAFPA 2014.

43. Hudson, T., de Jong, G., McDougall, K., Harrison, P. & Nolan, F. (2007). F0 statistics for 100 young male speakers of Standard Southern British English. In: Proceedings of the 16th International Congress of Phonetic Sciences, 1809-1812.

44. Jansen, W., Gregory, M. L. & Brenier, J. M. (2001). Prosodic correlates of directly reported speech: Evidence from conversational speech. In: ISCA tutorial and research workshop (ITRW) on prosody in speech recognition and understanding.

45. Johnson, K. (1997). Speech perception without speaker normalization: An exemplar model. In: Johnson, K. & Mullennix, J. W. (Eds.), Talker Variability in Speech Processing (pp. 145-165). San Diego: Academic Press.

46. Johnson, E. K., Westrek, E., Nazzi, T. & Cutler, A. (2011). Infant ability to tell voices apart rests on language experience. Developmental Science, 14(5), 1002-1011. CrossRef

47. Johnsrude, I. S., Mackey, A., Hakyemez, H., Alexander, E., Trang, H. P. & Carlyon, R. P. (2013). Swinging at a cocktail party: Voice familiarity aids speech perception in the presence of a competing voice. Psychological Science, 24(10), 1995-2004. CrossRef

48. Kathiresan, T., Dilley, L., Townsend, S., Shi, R., Daum, M., Arjmandi, M. & Dellwo, V. (2019). Infant-directed speech enhances recognizability of individual mothers' voices. Journal of the Acoustical Society of America, 145(3), 1766. CrossRef

49. Kemper, S., Finter-Urczyk, A., Ferrell, P., Harden, T. & Billington, C. (1998). Using elderspeak with older adults. Discourse Processes, 25(1), 55-73. CrossRef

50. Kerstholt, J. H., Jansen, N. J., Van Amelsvoort, A. J. & Broeders A. P. A. (2004). Earwitnesses: Effects of speech duration, retention interval and acoustic environment. Applied Cognitive Psychology, 18(3), 327-336. CrossRef

51. Kisilevsky, B. S., Hains, S. M. J., Lee, K., Xie, X., Huang, H., Ye, H., Zhang, K., & Wang, Z. (2003). Effects of experience on fetal voice recognition. Psychological Science, 14(3), 220-224. CrossRef

52. Kisilevsky, B., Hains, S., Brown, C., Lee, C., Cowperthwaite, B. & Stutzman, S. (2009). Fetal sensitivity to properties of maternal speech and language. Infant Behavior and Development, 32, 59-71. CrossRef

53. Kitamura, T. (2008). Acoustic analysis of imitated voice produced by a professional impersonator. In: Proceedings of Interspeech 2008, 813-816.

54. Kleinschmidt, D. F. & Florian Jaeger, T. (2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review, 122(2), 148-203. CrossRef

55. Klewitz, G. & Couper-Kuhlen, E. (1999). Quote-unquote? The role of prosody in the contextualization of reported speech sequences. Pragmatics, 9(4), 459-485. CrossRef

56. Knoll, M. A., Johnstone, M. & Blakely, C. (2015). Can you hear me? Acoustic modifications in speech directed to foreigners and hearing-impaired people. In: Proceedings of Interspeech 2015, 2987-2990.

57. Craik, F. I. M. & Kirsner, K. (1974). The effect of speaker's voice on word recognition. Quarterly Journal of Experimental Psychology, 26(2), 274-284. CrossRef

58. Kriengwatana, B., Escudero, P. & ten Cate, C. (2015). Revisiting vocal perception in non-human animals: A review of vowel discrimination, speaker voice recognition, and speaker normalization. Frontiers in Psychology, 5, 1543. CrossRef

59. Kreiman, J. & Sidtis, D. (2011). Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception. Hoboken: John Wiley & Sons. CrossRef

60. Kurowski, K. M., Blumstein, S. E. & Alexander, M. (1996). The Foreign Accent Syndrome: A reconsideration. Brain and Language, 54(1), 1-25. CrossRef

61. Ladefoged, P. & Ladefoged, J. (1980). The ability of listeners to identify voices. UCLA Working Papers in Phonetics, 49, 43-89.

62. Larranaga, A., Bielza, C., Pongrácz, P., Faragó, T., Bálint, A. & Larranaga, P. (2015). Comparing supervised learning methods for classifying sex, age, context and individual Mudi dogs from barking. Animal Cognition, 18(2), 405-421. CrossRef

63. Latinus, M. & Belin, P. (2011). Anti-voice adaptation suggests prototype-based coding of voice identity. Frontiers in Psychology, 2, 1-12. CrossRef

64. Lavan, N., Burton, M., Scott, S. K. & McGettigan, C. (2019). Flexible voices: Identity perception from variable vocal signals. Psychonomic Bulletin & Review, 26(1), 90-102. CrossRef

65. Levi, S. V. & Pisoni, D. B. (2007). Indexical and linguistic channels in speech perception: Some effects of voiceovers on advertising outcomes. In: T. M. Lowrey (Ed.), Psycholinguistics Phenomena in Marketing Communications (pp. 203-219). Mahwah: Lawrence Erlbaum.

66. Liberman, A. M., Cooper, F. S., Shankweiler, D. & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431-461. CrossRef

67. Liberman, A. M. & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21, 1-36. CrossRef

68. Locke, J. L. (2006). Parental selection of vocal behavior: Crying, cooing, babbling, and the evolution of language. Human Nature, 17(2), 155-168. CrossRef

69. McAleer, P., Todorov, A. & Belin, P. (2014). How do you say 'Hello'? Personality impressions from brief novel voices. PLoS One, 9(3): e90779. CrossRef

70. McClelland, J. L. & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1-86. CrossRef

71. McDougall, K. & Nolan, F. (2007). Discrimination of speakers using the formant dynamics of /u:/ in British English. In: Proceedings of the 16th International Congress of Phonetic Sciences, 1825-1828.

72. Moez, A., Bonastre, J. F., Kheder, W. B., Rossato, S. & Kahn, J. (2016). Phonetic content impact on forensic voice comparison. In: IEEE Spoken Language Technology Workshop (SLT). CrossRef

73. Molnár, C., Pongrácz, P., Faragó, T., Dóka, A. & Miklósi, Á. (2009). Dogs discriminate between barks: the effect of context and identity of the caller. Behavioural Processes, 82(2), 198-201. CrossRef

74. Nolan, F. (1997). Speaker recognition and forensic phonetics. In: W. Hardcastle & J. Laver (Eds.), A Handbook of Phonetic Science. Oxford: Blackwell.

75. Norris, D. (1994). Shortlist: A connectionist model of continuous speech recognition. Cognition, 52(3), 189-234. CrossRef

76. Nygaard, L. C., Sommers, M. S. & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5(1), 42-46. CrossRef

77. O'Connor, J. J. & Barclay, P. (2017). The influence of voice pitch on perceptions of trustworthiness across social contexts. Evolution and Human Behavior, 38(4), 506-512. CrossRef

78. Oleszkiewicz, A., Pisanski, K., Lachowicz-Tabaczek, K. & Sorokowska, A. (2017). Voice-based assessments of trustworthiness, competence, and warmth in blind and sighted adults. Psychonomic Bulletin & Review, 24(3), 856-862. CrossRef

79. Panneton Cooper, R., Abraham, J., Berman, S. & Staska, M. (1997). The development of infants' preference for motherese. Infant Behavior and Development, 20(4), 477-488. CrossRef

80. Papcun, G., Kreiman, J. & Davis, A. (1989). Long‐term memory for unfamiliar voices. Journal of the Acoustical Society of America, 85(2), 913-925. CrossRef

81. Peirce, C. S., Hartshorne, C., Weiss, P. & Burks, A. W. (1965). Collected papers of Charles Sanders Peirce. Cambridge, Mass: Belknap.

82. Perrachione, T. K., Del Tufo, S. N. & Gabrieli, J. D. E. (2011). Human voice recognition depends on language ability. Science, 333(July), 595. CrossRef

83. Perrachione, T. K., Dougherty, S. C., McLaughlin, D. E. & Lember, R. A. (2015). The effects of speech perception and speech comprehension on talker identification. In: Proceedings of ICPhS 2015.

84. Perrodin, C., Kayser, C., Logothetis, N. K. & Petkov, C. I. (2011). Voice cells in the primate temporal lobe. Current Biology, 21(16), 1408-1415. CrossRef

85. Perrodin, C., Kayser, C., Abel, T. J., Logothetis, N. K. & Petkov, C. I. (2015). Who is that? Brain networks and mechanisms for identifying individuals. Trends in Cognitive Sciences, 19(12), 783-796. CrossRef

86. Peterson, G. E. & Barney, H. L. (1952). Control methods used in a study of the vowels. Journal of the Acoustical Society of America, 24(2), 175-184. CrossRef

87. Petkov, C. I., Logothetis, N. K. & Obleser, J. (2009). Where are the human speech and voice regions, and do other animals have anything like them? The Neuroscientist, 15(5), 419-429. CrossRef

88. Pollard, K. A. & Blumstein, D. T. (2011). Social group size predicts the evolution of individuality. Current Biology, 21(5), 413-417. CrossRef

89. Raj, A., Gupta, B., Chowdhury, A. & Chadha, S. (2010). A study of voice changes in various phases of menstrual cycle and in postmenopausal women. Journal of Voice, 24(3), 363-368. CrossRef

90. Rosenberg, A. & Hirschberg, J. (2005). Acoustic/Prosodic and lexical correlates of charismatic speech. Columbia University: Academic Commons.

91. Roswandowitz, C., Mathias, S. R., Hintz, F., Kreitewolf, J., Schelinski, S. & von Kriegstein, K. (2014). Two cases of selective developmental voice-recognition impairments. Current Biology, 24(19), 2348-2353. CrossRef

92. Růžičková, A. & Skarnitzl, R. (2017). Voice disguise strategies in Czech male speakers. Acta Universitatis Carolinae - Philologica 3, 19-34. CrossRef

93. Schegloff, E. A. (1979). Identification and recognition in telephone conversation openings. In: Psathas, G. (Ed.), Everyday Language: Studies in Ethnomethodology (pp. 23-78). New York: Irvington Publishers.

94. Shaw, G. B. (1916). Pygmalion. New York: Brentano.

95. Skuk, V. G. & Schweinberger, S. R. (2013). Gender differences in familiar voice identification. Hearing Research, 296, 131-140. CrossRef

96. Smiljanić, R. & Bradlow, A. R. (2008). Temporal organization of English clear and conversational speech. Journal of the Acoustical Society of America, 124(5), 3171-3182. CrossRef

97. Smith, R. (2015). Perception of speaker-specific phonetic detail. In: Fuchs, S., Pape, D., Petrone, C. & Perrier, P (Eds.), Individual Differences in Speech Production and Perception (pp. 11-38). Frankfurt a. M.: Peter Lang.

98. Stevenage, S. V., Clarke, G. & McNeill, A. (2012). The "other-accent" effect in voice recognition. Journal of Cognitive Psychology, 24(6), 647-653. CrossRef

99. Stevenage, S. V. (2017). Drawing a distinction between familiar and unfamiliar voice processing: A review of neuropsychological, clinical and empirical findings. Neuropsychologia, 31(116), 162-178. CrossRef

100. Stevens, K. (1998). Acoustic Phonetics. Cambridge, MA: MIT Press.

101. Sullivan, R., Perry, R., Sloan, A., Kleinhaus, K. & Burtchen, N. (2011). Infant bonding and attachment to the caregiver: Insights from basic and clinical science. Clinics in Perinatology, 38, 643-655. CrossRef

102. Sundberg, J. (1977). The acoustics of the singing voice. Scientific American, 236(3), 82-91. CrossRef

103. Theodore, R. M. & Miller, J. L. (2010). Characteristics of listener sensitivity to talker-specific phonetic detail. Journal of the Acoustical Society of America, 128(4), 2090-2099. CrossRef

104. Theodore, R. M., Blumstein, S. E. & Luthra, S. (2015). Attention modulates specificity effects in spoken word recognition: Challenges to the time-course hypothesis. Attention, Perception, and Psychophysics, 77(5), 1674-1684. CrossRef

105. Van Lancker, D. R., Cummings, J. L., Kreiman, J. & Dobkin, B. H. (1988). Phonagnosia: A dissociation between familiar and unfamiliar voices. Cortex, 24(2), 195-209. CrossRef

106. Von Kriegstein, K & Giraud, A. L. (2004). Distinct functional substrates along the right superior temporal sulcus for the processing of voices. Neuroimage, 22(2), 948-955. CrossRef

107. Von Kriegstein, K., Kleinschmidt, A. & Giraud, A. L. (2005). Voice recognition and cross-modal responses to familiar speakers' voices in prosopagnosia. Cerebral Cortex, 16(9), 1314-1322. CrossRef

108. Wagner, I. & Köster, O. (1999). Perceptual recognition of familiar voices using falsetto as a type of voice disguise. In: Proceedings of the 14th International Congress of Phonetic Sciences, 1381-1385.

109. Yarmey, A. D. (1995). Earwitness speaker identification. Psychology, Public Policy, and Law, 1(4), 792-816. CrossRef

The dynamics of indexical information in speech: Can recognizability be controlled by the speaker? is licensed under a Creative Commons Attribution 4.0 International License.

230 x 157 mm
vychází: 3 x ročně
cena tištěného čísla: 150 Kč
ISSN: 0567-8269
E-ISSN: 2464-6830

Ke stažení

Phil_2019_2_0057.pdf

Sdílet