TR-IT-0181 :1996.5

Gregor Möhler

Detection of Creaky Voice in Speech Signals

Abstract:Reliable F0 extraction and pitch marking are essential for a good unit selection in concatenative speech synthesis. But natural speech is subject to irregularities. The phenomena is often described by the terms "creaky voice" or "laryngealization". The problem is that the fundamental frequency is hard to define in these parts of speech and extracting F0 will often result in a F0 contour jumping between different harmonics of the signal. But this is unacceptable for concatenative speech synthesis systems. We are therefor looking for a method that could detect sections of creaky voice in the speech database. In this work a method has been developed that can detect irregularities in speech signals. It works on the basis of an F0 algorithm (ADMF) which presents different candidates for F0 to a Recurrent Neural Network (RNN) classifier. The classifier is trained and tested on the female voices of a German Database (MUSLI) with annotated creaky periods. This essentially very simple approach leads to 42% recognition rate in an open test. A program based on this RNN has been written that now can detect irregularities in a speech synthesis database.