AUC Philologica (Acta Universitatis Carolinae Philologica) is an academic journal published by Charles University. It publishes scholarly articles in a large number of disciplines (English, German, Greek and Latin, Oriental, Romance and Slavonic studies, as well as in phonetics and translation studies), both on linguistic and on literary and cultural topics. Apart from articles it publishes reviews of new academic books or special issues of academic journals.
The journal is indexed in CEEOL, DOAJ, EBSCO, and ERIH PLUS.
AUC PHILOLOGICA, Vol 2019 No 2 (2019), 57–75
The dynamics of indexical information in speech: Can recognizability be controlled by the speaker?
[The dynamics of indexical information in speech: Can recognizability be controlled by the speaker? ]
Volker Dellwo, Elisa Pellegrino, Lei He, Thayabaran Kathiresan
DOI: https://doi.org/10.14712/24646830.2019.18
published online: 18. 10. 2019
abstract
Human voices are individual and humans have elaborate skills in recognizing speakers by their voice, phenomena that are deeply rooted in the evolution of human behavior. To date, the mechanisms of speaker recognition are not well understood because of the high variability of the acoustic cues to a speaker’s identity. We wondered what role the speaker plays in making his/her voice more or less well recognizable. While it is evident from the literature that humans can control vocal properties to enhance their intelligibility, it is unclear whether speakers can and/or do control vocal characteristics to be better recognizable and whether such control mechanisms play a role in the communication process. In this paper, we reviewed results from the literature supporting the view that speaker idiosyncratic information is dynamic and that humans have the ability to control how well they can be recognized. We suggest possible experimental setups by which the control over identity in voice can be tested and present pilot acoustic characteristics of speech that was produced to be either targeted at being (a) intelligible (clear speech) and (b) suitable for person recognition (identity marked speech). Results revealed that there is reason to believe that speakers apply different mechanisms when making their individuality identifiable as opposed to making their speech better understood. We discuss predictions that a control of recognizability and intelligibility has within major theories of speech perception.
keywords: indexical information; voice recognition; identity marked speech
references (109)
1. Abercrombie, D. (1967). Elements of General Phonetics. Chicago: University of Chicago Press.
2. Adank, P., Smits, R. & Van Hout, R. (2004). A comparison of vowel normalization procedures for language variation research. Journal of the Acoustical Society of America, 116(5), 3099-3107. CrossRef
3. Amino, K. & Arai, T. (2009). Speaker-dependent characteristics of the nasals. Forensic Science International, 185, 21-28. CrossRef
4. Amino, K., Sugawara, T. & Arai, T. (2006). Idiosyncrasy of nasal sounds in human speaker identification and their acoustic properties. Acoustical Science and Technology, 27, 233-235. CrossRef
5. Belin, P. (2006). Voice processing in human and non-human primates. Philos Trans R Soc Lond B Biol Sci., 361(1476), 2091-2107. CrossRef
6. Belin, P., Boehme, B. & McAleer, P. (2017). The sound of trustworthiness: Acoustic-based modulation of perceived voice personality. PLoS One, 12(10), e0185651. CrossRef
7. Belin, P. & Zatorre, R. J. (2003). Adaptation to speaker's voice in right anterior temporal lobe. Neuroreport, 14(16), 2105-2109. CrossRef
8. Bregman, M. R. & Creel, S. C. (2014). Gradient language dominance affects talker learning. Cognition, 130(1), 85-95. CrossRef
9. Bruckert, L., Bestelmeyer, P., Latinus, M., Rouger, J., Charest, I., Rousselet, G. A., Kawahara, H. et al. (2010). Vocal attractiveness increases by averaging. Current Biology, 20(2), 116-120. CrossRef
10. Burnham, D., Kitamura, C. & Vollmer-Conna, U. (2002). What's new, pussycat? On talking to babies and animals. Science, 296(5572), 1435-1435. CrossRef
11. Burton, A. M., Kramer, R. S., Ritchie, K. L. & Jenkins, R. (2016). Identity from variation: Representations of faces derived from multiple instances. Cognitive Science, 40(1), 202-223. CrossRef
12. Campeanu, S., Craik, F. I. M. & Alain, C (2013). Voice congruency facilitates word recognition. PLoS One, 8(3): e58778. CrossRef
13. Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. CrossRef
14. Clifford, B. R. (1980). Voice identification by human listeners: On earwitness reliability. Law and Human Behavior, 4(4), 373-394. CrossRef
15. Collins, S. A. (2001). Men's voices and women's choices. Animal Behaviour, 60(6), 773-780. CrossRef
16. Collins, S. A. & Missing, C. (2003). Vocal and visual attractiveness are related in women. Animal Behaviour, 65(5), 997-1004. CrossRef
17. Creel, S. C. & Bregman, M. R. (2011). How talker identity relates to language processing. Linguistics and Language Compass, 5(5), 190-204. CrossRef
18. Creel, S. C. & Tumlin, M. A. (2011). On-line acoustic and semantic interpretation of talker information. Journal of Memory and Language, 65(3), 264-285. CrossRef
19. De Figueiredo, R. M. & de Souza Britto, H. (1996). A report on the acoustic effects of one type of disguise. Forensic Linguistics, 3, 168-175. CrossRef
20. de Saussure, F. (1916). Cours de linguistique generale. Laussane and Paris: Payot.
21. de Jong, G., McDougall, K., Hudson, T. & Nolan, F. (2007). The speaker-discriminating power of sounds undergoing historical change: A formant-based study. In: Proceedings of the 16th International Congress of Phonetic Sciences, 1813-1816.
22. Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P. & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788-798. CrossRef
23. Dellwo, V., Huckvale, M. & Ashby, M. (2007). How is individuality expressed in voice? An introduction to speech production and description for speaker classification. In: Mueller, Ch. (Ed.). Speaker Classification I: Fundamentals, Features, and Methods (pp. 1-20). Berlin, Heidelberg: Springer. CrossRef
24. Dellwo, V., French, P. & He, L. (2018a). Voice biometrics for forensic speaker recognition applications. In: Frühholz, S. & Belin, P. (Eds.), The Oxford Handbook of Voice Perception (pp. 777-798). Oxford: Oxford University Press. CrossRef
25. Dellwo, V., Kathiresan, T., Pellegrino, E., Schwab, S. & Maurer, D. (2018b). Influences of fundamental oscillation on speaker identification in vocalic utterances by humans and computers. In: Proceedings of Interspeech 2018, 3795-3799. CrossRef
26. Doscher, B. (1993). The Functional Unity of the Singing Voice. Scarecrow Press.
27. Fant, G. (1975). Non-uniform vowel normalization. STL-QPSR, 2-3/1975, 1-19. CrossRef
28. Fischer, J., Semple, S., Fickenscher, G., Jürgens, R., Kruse, E., Heistermann, M. et al. (2011). Do women's voices provide cues of the likelihood of ovulation? The importance of sampling regime. PLoS One, 6(9), e24490. CrossRef
29. Fleming, D., Giordano, B. L., Caldara, R. & Belin, P. (2014). A language-familiarity effect for speaker discrimination without comprehension. Proceedings of the National Academy of Sciences of the United States of America, 111(38), 13795-13798. CrossRef
30. Eriksson, A. (2010). The disguised voice: imitating accents or speech styles and impersonating individuals. In: Llamas, C. & Watt, D. (Eds.), Language and Identities. Edinburgh: Edinburgh University Press.
31. Eriksson, A. & Wretling, P. (1997). How flexible is the human voice? - A case study of mimicry. In: Proceedings of Eurospeech 1997, 1043-1046.
32. Foulkes, P. & Barron, A. (2000). Telephone speaker recognition amongst members of a close social network. Forensic Linguistics, 7, 180-198. CrossRef
33. Garcia-Romero, D. & Espy-Wilson, C. Y. (2011). Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of Interspeech 2011.
34. Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(5), 1166-1183. CrossRef
35. Hansen, J. H. L. & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine, 32(6), 74-99. CrossRef
36. Hazan, V. & Baker, R. (2011). Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions. Journal of the Acoustical Society of America, 130(4), 2139-2152. CrossRef
37. Hazan, V., Grynpas, J. & Baker, R. (2012). Is clear speech tailored to counter the effect of specific adverse listening conditions? Journal of the Acoustical Society of America, 132(5), EL371-EL377. CrossRef
38. He, L. & Dellwo, V. (2016). The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language and the Law, 23(2), 243-273. CrossRef
39. He, L. & Dellwo, V. (2017). Between-speaker variability in temporal organizations of intensity contours. Journal of the Acoustical Society of America, 141(5): EL488-EL494. CrossRef
40. He, L., Zhang, Y. & Dellwo, V. (2019). Between-speaker variability and temporal organization of the first formant. Journal of the Acoustical Society of America, 145(3): EL209-EL214. CrossRef
41. Hirson, A. & Duckworth, M. (1993). Glottal fry and voice disguise: a case study in forensic phonetics. Journal of Biomedical Engineering, 15(3), 193-200. CrossRef
42. Hove, I. & Dellwo, V. (2014). The effects of voice disguise on f0 and on the formants. In: Proceedings of IAFPA 2014.
43. Hudson, T., de Jong, G., McDougall, K., Harrison, P. & Nolan, F. (2007). F0 statistics for 100 young male speakers of Standard Southern British English. In: Proceedings of the 16th International Congress of Phonetic Sciences, 1809-1812.
44. Jansen, W., Gregory, M. L. & Brenier, J. M. (2001). Prosodic correlates of directly reported speech: Evidence from conversational speech. In: ISCA tutorial and research workshop (ITRW) on prosody in speech recognition and understanding.
45. Johnson, K. (1997). Speech perception without speaker normalization: An exemplar model. In: Johnson, K. & Mullennix, J. W. (Eds.), Talker Variability in Speech Processing (pp. 145-165). San Diego: Academic Press.
46. Johnson, E. K., Westrek, E., Nazzi, T. & Cutler, A. (2011). Infant ability to tell voices apart rests on language experience. Developmental Science, 14(5), 1002-1011. CrossRef
47. Johnsrude, I. S., Mackey, A., Hakyemez, H., Alexander, E., Trang, H. P. & Carlyon, R. P. (2013). Swinging at a cocktail party: Voice familiarity aids speech perception in the presence of a competing voice. Psychological Science, 24(10), 1995-2004. CrossRef
48. Kathiresan, T., Dilley, L., Townsend, S., Shi, R., Daum, M., Arjmandi, M. & Dellwo, V. (2019). Infant-directed speech enhances recognizability of individual mothers' voices. Journal of the Acoustical Society of America, 145(3), 1766. CrossRef
49. Kemper, S., Finter-Urczyk, A., Ferrell, P., Harden, T. & Billington, C. (1998). Using elderspeak with older adults. Discourse Processes, 25(1), 55-73. CrossRef
50. Kerstholt, J. H., Jansen, N. J., Van Amelsvoort, A. J. & Broeders A. P. A. (2004). Earwitnesses: Effects of speech duration, retention interval and acoustic environment. Applied Cognitive Psychology, 18(3), 327-336. CrossRef
51. Kisilevsky, B. S., Hains, S. M. J., Lee, K., Xie, X., Huang, H., Ye, H., Zhang, K., & Wang, Z. (2003). Effects of experience on fetal voice recognition. Psychological Science, 14(3), 220-224. CrossRef
52. Kisilevsky, B., Hains, S., Brown, C., Lee, C., Cowperthwaite, B. & Stutzman, S. (2009). Fetal sensitivity to properties of maternal speech and language. Infant Behavior and Development, 32, 59-71. CrossRef
53. Kitamura, T. (2008). Acoustic analysis of imitated voice produced by a professional impersonator. In: Proceedings of Interspeech 2008, 813-816.
54. Kleinschmidt, D. F. & Florian Jaeger, T. (2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review, 122(2), 148-203. CrossRef
55. Klewitz, G. & Couper-Kuhlen, E. (1999). Quote-unquote? The role of prosody in the contextualization of reported speech sequences. Pragmatics, 9(4), 459-485. CrossRef
56. Knoll, M. A., Johnstone, M. & Blakely, C. (2015). Can you hear me? Acoustic modifications in speech directed to foreigners and hearing-impaired people. In: Proceedings of Interspeech 2015, 2987-2990.
57. Craik, F. I. M. & Kirsner, K. (1974). The effect of speaker's voice on word recognition. Quarterly Journal of Experimental Psychology, 26(2), 274-284. CrossRef
58. Kriengwatana, B., Escudero, P. & ten Cate, C. (2015). Revisiting vocal perception in non-human animals: A review of vowel discrimination, speaker voice recognition, and speaker normalization. Frontiers in Psychology, 5, 1543. CrossRef
59. Kreiman, J. & Sidtis, D. (2011). Foundations of Voice Studies: An Interdisciplinary Approach to Voice Production and Perception. Hoboken: John Wiley & Sons. CrossRef
60. Kurowski, K. M., Blumstein, S. E. & Alexander, M. (1996). The Foreign Accent Syndrome: A reconsideration. Brain and Language, 54(1), 1-25. CrossRef
61. Ladefoged, P. & Ladefoged, J. (1980). The ability of listeners to identify voices. UCLA Working Papers in Phonetics, 49, 43-89.
62. Larranaga, A., Bielza, C., Pongrácz, P., Faragó, T., Bálint, A. & Larranaga, P. (2015). Comparing supervised learning methods for classifying sex, age, context and individual Mudi dogs from barking. Animal Cognition, 18(2), 405-421. CrossRef
63. Latinus, M. & Belin, P. (2011). Anti-voice adaptation suggests prototype-based coding of voice identity. Frontiers in Psychology, 2, 1-12. CrossRef
64. Lavan, N., Burton, M., Scott, S. K. & McGettigan, C. (2019). Flexible voices: Identity perception from variable vocal signals. Psychonomic Bulletin & Review, 26(1), 90-102. CrossRef
65. Levi, S. V. & Pisoni, D. B. (2007). Indexical and linguistic channels in speech perception: Some effects of voiceovers on advertising outcomes. In: T. M. Lowrey (Ed.), Psycholinguistics Phenomena in Marketing Communications (pp. 203-219). Mahwah: Lawrence Erlbaum.
66. Liberman, A. M., Cooper, F. S., Shankweiler, D. & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431-461. CrossRef
67. Liberman, A. M. & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21, 1-36. CrossRef
68. Locke, J. L. (2006). Parental selection of vocal behavior: Crying, cooing, babbling, and the evolution of language. Human Nature, 17(2), 155-168. CrossRef
69. McAleer, P., Todorov, A. & Belin, P. (2014). How do you say 'Hello'? Personality impressions from brief novel voices. PLoS One, 9(3): e90779. CrossRef
70. McClelland, J. L. & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1-86. CrossRef
71. McDougall, K. & Nolan, F. (2007). Discrimination of speakers using the formant dynamics of /u:/ in British English. In: Proceedings of the 16th International Congress of Phonetic Sciences, 1825-1828.
72. Moez, A., Bonastre, J. F., Kheder, W. B., Rossato, S. & Kahn, J. (2016). Phonetic content impact on forensic voice comparison. In: IEEE Spoken Language Technology Workshop (SLT). CrossRef
73. Molnár, C., Pongrácz, P., Faragó, T., Dóka, A. & Miklósi, Á. (2009). Dogs discriminate between barks: the effect of context and identity of the caller. Behavioural Processes, 82(2), 198-201. CrossRef
74. Nolan, F. (1997). Speaker recognition and forensic phonetics. In: W. Hardcastle & J. Laver (Eds.), A Handbook of Phonetic Science. Oxford: Blackwell.
75. Norris, D. (1994). Shortlist: A connectionist model of continuous speech recognition. Cognition, 52(3), 189-234. CrossRef
76. Nygaard, L. C., Sommers, M. S. & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5(1), 42-46. CrossRef
77. O'Connor, J. J. & Barclay, P. (2017). The influence of voice pitch on perceptions of trustworthiness across social contexts. Evolution and Human Behavior, 38(4), 506-512. CrossRef
78. Oleszkiewicz, A., Pisanski, K., Lachowicz-Tabaczek, K. & Sorokowska, A. (2017). Voice-based assessments of trustworthiness, competence, and warmth in blind and sighted adults. Psychonomic Bulletin & Review, 24(3), 856-862. CrossRef
79. Panneton Cooper, R., Abraham, J., Berman, S. & Staska, M. (1997). The development of infants' preference for motherese. Infant Behavior and Development, 20(4), 477-488. CrossRef
80. Papcun, G., Kreiman, J. & Davis, A. (1989). Long‐term memory for unfamiliar voices. Journal of the Acoustical Society of America, 85(2), 913-925. CrossRef
81. Peirce, C. S., Hartshorne, C., Weiss, P. & Burks, A. W. (1965). Collected papers of Charles Sanders Peirce. Cambridge, Mass: Belknap.
82. Perrachione, T. K., Del Tufo, S. N. & Gabrieli, J. D. E. (2011). Human voice recognition depends on language ability. Science, 333(July), 595. CrossRef
83. Perrachione, T. K., Dougherty, S. C., McLaughlin, D. E. & Lember, R. A. (2015). The effects of speech perception and speech comprehension on talker identification. In: Proceedings of ICPhS 2015.
84. Perrodin, C., Kayser, C., Logothetis, N. K. & Petkov, C. I. (2011). Voice cells in the primate temporal lobe. Current Biology, 21(16), 1408-1415. CrossRef
85. Perrodin, C., Kayser, C., Abel, T. J., Logothetis, N. K. & Petkov, C. I. (2015). Who is that? Brain networks and mechanisms for identifying individuals. Trends in Cognitive Sciences, 19(12), 783-796. CrossRef
86. Peterson, G. E. & Barney, H. L. (1952). Control methods used in a study of the vowels. Journal of the Acoustical Society of America, 24(2), 175-184. CrossRef
87. Petkov, C. I., Logothetis, N. K. & Obleser, J. (2009). Where are the human speech and voice regions, and do other animals have anything like them? The Neuroscientist, 15(5), 419-429. CrossRef
88. Pollard, K. A. & Blumstein, D. T. (2011). Social group size predicts the evolution of individuality. Current Biology, 21(5), 413-417. CrossRef
89. Raj, A., Gupta, B., Chowdhury, A. & Chadha, S. (2010). A study of voice changes in various phases of menstrual cycle and in postmenopausal women. Journal of Voice, 24(3), 363-368. CrossRef
90. Rosenberg, A. & Hirschberg, J. (2005). Acoustic/Prosodic and lexical correlates of charismatic speech. Columbia University: Academic Commons.
91. Roswandowitz, C., Mathias, S. R., Hintz, F., Kreitewolf, J., Schelinski, S. & von Kriegstein, K. (2014). Two cases of selective developmental voice-recognition impairments. Current Biology, 24(19), 2348-2353. CrossRef
92. Růžičková, A. & Skarnitzl, R. (2017). Voice disguise strategies in Czech male speakers. Acta Universitatis Carolinae - Philologica 3, 19-34. CrossRef
93. Schegloff, E. A. (1979). Identification and recognition in telephone conversation openings. In: Psathas, G. (Ed.), Everyday Language: Studies in Ethnomethodology (pp. 23-78). New York: Irvington Publishers.
94. Shaw, G. B. (1916). Pygmalion. New York: Brentano.
95. Skuk, V. G. & Schweinberger, S. R. (2013). Gender differences in familiar voice identification. Hearing Research, 296, 131-140. CrossRef
96. Smiljanić, R. & Bradlow, A. R. (2008). Temporal organization of English clear and conversational speech. Journal of the Acoustical Society of America, 124(5), 3171-3182. CrossRef
97. Smith, R. (2015). Perception of speaker-specific phonetic detail. In: Fuchs, S., Pape, D., Petrone, C. & Perrier, P (Eds.), Individual Differences in Speech Production and Perception (pp. 11-38). Frankfurt a. M.: Peter Lang.
98. Stevenage, S. V., Clarke, G. & McNeill, A. (2012). The "other-accent" effect in voice recognition. Journal of Cognitive Psychology, 24(6), 647-653. CrossRef
99. Stevenage, S. V. (2017). Drawing a distinction between familiar and unfamiliar voice processing: A review of neuropsychological, clinical and empirical findings. Neuropsychologia, 31(116), 162-178. CrossRef
100. Stevens, K. (1998). Acoustic Phonetics. Cambridge, MA: MIT Press.
101. Sullivan, R., Perry, R., Sloan, A., Kleinhaus, K. & Burtchen, N. (2011). Infant bonding and attachment to the caregiver: Insights from basic and clinical science. Clinics in Perinatology, 38, 643-655. CrossRef
102. Sundberg, J. (1977). The acoustics of the singing voice. Scientific American, 236(3), 82-91. CrossRef
103. Theodore, R. M. & Miller, J. L. (2010). Characteristics of listener sensitivity to talker-specific phonetic detail. Journal of the Acoustical Society of America, 128(4), 2090-2099. CrossRef
104. Theodore, R. M., Blumstein, S. E. & Luthra, S. (2015). Attention modulates specificity effects in spoken word recognition: Challenges to the time-course hypothesis. Attention, Perception, and Psychophysics, 77(5), 1674-1684. CrossRef
105. Van Lancker, D. R., Cummings, J. L., Kreiman, J. & Dobkin, B. H. (1988). Phonagnosia: A dissociation between familiar and unfamiliar voices. Cortex, 24(2), 195-209. CrossRef
106. Von Kriegstein, K & Giraud, A. L. (2004). Distinct functional substrates along the right superior temporal sulcus for the processing of voices. Neuroimage, 22(2), 948-955. CrossRef
107. Von Kriegstein, K., Kleinschmidt, A. & Giraud, A. L. (2005). Voice recognition and cross-modal responses to familiar speakers' voices in prosopagnosia. Cerebral Cortex, 16(9), 1314-1322. CrossRef
108. Wagner, I. & Köster, O. (1999). Perceptual recognition of familiar voices using falsetto as a type of voice disguise. In: Proceedings of the 14th International Congress of Phonetic Sciences, 1381-1385.
109. Yarmey, A. D. (1995). Earwitness speaker identification. Psychology, Public Policy, and Law, 1(4), 792-816. CrossRef
The dynamics of indexical information in speech: Can recognizability be controlled by the speaker? is licensed under a Creative Commons Attribution 4.0 International License.
230 x 157 mm
periodicity: 3 x per year
print price: 150 czk
ISSN: 0567-8269
E-ISSN: 2464-6830