TR-A-0097 :1990.12.20

Alain de Cheveigne

F0 estimation from mixed speech

Abstract:Listeners can often follow and understand the speech of one speaker among many, even when binaural information is not available. This aspect of our ability to organize the sound environment has received renewed interest of late (Palmer 1990, Assman and Summerfield 1990, Meddis and Hewitt 1990). Among the effects of interfering speech, one might expect vowels to be particularly susceptible to interference from concurrent vowels, because a vowel's identity depends on the steady-state shape of the spectrum, and this is strongly affected by the presence of a concurrent vowel. It appears however that synthetic vowels can be perceptually identified within concurrent pairs at levels far above chance, particularly if there is a difference in fundamental frequency (Assmann and Summerfield 1990). Various perception models and signal processing methods for monaural concurrent vowel separation have been proposed, most of which require at some stage that the f0 of both individual speakers be determined. However it is notoriously difficult to design algorithms capable of extracting f0 from real speech(Hess 1983) and such difficulties are likely to be compounded for mixed speech. The work reported here addresses the problem of how to track the f0 of two or more simultaneous speakers, and proposes an algorithm for that purpose. Although I discuss the problem in speech engineering terms, another motivation of this research is to understand how human listeners separate sounds and organize their auditory environment. I start by reviewing previous methods of mixed speech f0 extraction and speech separation, and discuss in detail some differences between the approaches. Then I present the mixed speech f0 estimation algorithm, together with some experimental results that demonstrate its performance. Finally, I attempt to relate the algorithm to the issue of sound organization in hearing. Appendix I. shows examples of extracted f0 tracks, and Appendix II discusses various implementation issues and perspectives for future development.