TR-IT-0335 :February 16, 2000

Jeremy Bateman & Nick Campbell

SPACES: A Semantically and Prosodically Annotated Corpus Of English Speech

Abstract:ATR possesses a corpus of approximately one million words of English text, of which every word has been syntactically and semantically tagged, and every sentence assigned a correct parse using the ATR grammar of general English [1,2]. This paper describes our work towards a large speech corpus (SPACES) containing the same material, annotated using a reduced version of the ToBI prosodic annotation system. For the first time a large corpus of prosodically annotated read English speech will also be annotated syntactically and semantically. We describe how correlations in the parallel annotations may provide a 'gold standard' against which to continue training and evolving the CHATR speech synthesizer.