Alain de Cheveigne
F0 estimation from mixed speech
Abstract:Listeners can often follow and understand the speech of one speaker among
many, even when binaural information is not available. This aspect of our ability to
organize the sound environment has received renewed interest of late (Palmer 1990,
Assman and Summerfield 1990, Meddis and Hewitt 1990).
Among the effects of interfering speech, one might expect vowels to be
particularly susceptible to interference from concurrent vowels, because a vowel's
identity depends on the steady-state shape of the spectrum, and this is strongly affected
by the presence of a concurrent vowel. It appears however that synthetic vowels can be
perceptually identified within concurrent pairs at levels far above chance, particularly if
there is a difference in fundamental frequency (Assmann and Summerfield 1990).
Various perception models and signal processing methods for monaural concurrent
vowel separation have been proposed, most of which require at some stage that the f0
of both individual speakers be determined. However it is notoriously difficult to design
algorithms capable of extracting f0 from real speech(Hess 1983) and such difficulties
are likely to be compounded for mixed speech.
The work reported here addresses the problem of how to track the f0 of two or
more simultaneous speakers, and proposes an algorithm for that purpose. Although I
discuss the problem in speech engineering terms, another motivation of this research is
to understand how human listeners separate sounds and organize their auditory
environment. I start by reviewing previous methods of mixed speech f0 extraction and
speech separation, and discuss in detail some differences between the approaches.
Then I present the mixed speech f0 estimation algorithm, together with some
experimental results that demonstrate its performance. Finally, I attempt to relate the
algorithm to the issue of sound organization in hearing.
Appendix I. shows examples of extracted f0 tracks, and Appendix II discusses
various implementation issues and perspectives for future development.