Jeremy Bateman & Nick Campbell
SPACES: A Semantically and
Prosodically Annotated Corpus Of
English Speech
Abstract:ATR possesses a corpus of approximately one million words of English
text, of which every word has been syntactically and semantically tagged,
and every sentence assigned a correct parse using the ATR grammar of
general English [1,2]. This paper describes our work towards a large
speech corpus (SPACES) containing the same material, annotated using
a reduced version of the ToBI prosodic annotation system. For the first
time a large corpus of prosodically annotated read English speech will
also be annotated syntactically and semantically. We describe how
correlations in the parallel annotations may provide a 'gold
standard' against which to continue training and evolving the CHATR
speech synthesizer.