Gregor Möhler
Detection of Creaky Voice in
Speech Signals
Abstract:Reliable F0 extraction and pitch marking are essential for a good unit selection in
concatenative speech synthesis. But natural speech is subject to irregularities. The
phenomena is often described by the terms "creaky voice" or "laryngealization".
The problem is that the fundamental frequency is hard to define in these parts
of speech and extracting F0 will often result in a F0 contour jumping between
different harmonics of the signal. But this is unacceptable for concatenative speech
synthesis systems. We are therefor looking for a method that could detect sections
of creaky voice in the speech database.
In this work a method has been developed that can detect irregularities in speech signals.
It works on the basis of an F0 algorithm (ADMF) which presents different
candidates for F0 to a Recurrent Neural Network (RNN) classifier. The classifier
is trained and tested on the female voices of a German Database (MUSLI) with
annotated creaky periods. This essentially very simple approach leads to 42%
recognition rate in an open test. A program based on this RNN has been written
that now can detect irregularities in a speech synthesis database.