Alex Waibel, Hidefumi Sawai, Kiyohiro Shikano
Modularity and Scaling
in Large Phonemic Neural Networks
Abstract:Scaling connectionist models to larger connectionist systems is difficult, because larger networks require increasing amounts of training time and data and the complexity of the optimization task quickly reaches computationally unmanageable proportions. In this paper, we train several small Time-Delay Neural Networks aimed at all phonemic subcategories (nasals, fricatives, etc.) and report excellent fine phonemic discrimination performance for all cases. Exploiting the hidden structure of these smaller phonemic subcategory networks, we then propose several techniques that allow us to "grow" larger nets in an incremental and modular fashion without loss in recognition performance and without the need for excessive training time or additional data. These techniques include class discriminatory learning, connectionist glue, selective/partial learning and all-net fine tuning. A set of experiments shows that stop consonant networks (BDGPTK) constructed from subcomponent BDG- and PTK-nets achieved up to 98.6% correct recognition compared to 98.3% and 98.7% correct for the component BDG- and PTK-nets. Similarly, an incrementally trained network aimed at all consonants achieved recognition scores of about 95.9% correct. These result were found to be comparable to the performance of the subcomponent networks and significantly better than several alternative speech recognition methods.