Alain de Cheveigne
Auditory nerve fiber spike generation model
Abstract:Speech recognition systems are far from reproducing the capabilities of human
listeners. Recognition rates remain low even in the best recording conditions, and
deteriorate even further when the conditions approach those of everyday use: noise,
reverberation, interfering voices, etc. In contrast, people can hear and understand
speech in very adverse conditions.
To a certain degree, we can parse the complex spectrum of superimposed speech and
noise into separate "streams", and concentrate on the speech while eliminating the noise.
We can also "reconstruct" portions of speech that have been obliterated. Such
capabilities likely involve a combination of peripheral processing (under central control
via efferent nerve pathways), and central processing of auditory nerve fiber patterns.
Physiological experiments (recordings of discharge patterns within the auditory nerve)
provide a picture of the data that central processing operate on, and are a starting point
for models of central processing. Despite the large amount of published data available,
questions may arise for which this data is insufficient.
A model of auditory nerve fiber spike generation is useful within this context, because
it allows us to:
a) experiment with models of higher-level auditory processing,
b) extrapolate from existing experimental data to new stimulus conditions, without
carrying out physiological experiments,
c) clarify the relation between the shape of the driving function that controls nerve fiber
discharge, and the shape of histograms as they can be measured from single units of the
auditory nerve,
d) assess the degree to which experimental data has been affected by the choice of
histogram format used to report it (different histograms are not equivalent).