Detlef Koll, Eiichiro Sumita, Hitoshi Iida
Massively Parallel
Document Retrieval
in Clustered Databases
Abstract:Our goal has been to develop an effective and efficient document retrieval system
for very big databases, based on the vector space model. Thus we (1) implemented
a massively parallel retrieval kernel on a SIMD-machine and (2) devised a fast non-
hierarchical clustering method for restricting its search scope without hurting retrieval effectiveness.
This paper discusses score-computing algorithm and load-balancing method of the
parallel kernel, the document clustering method and how those two parts combine to
a large-scale retrieval system.
Evidence for the efficiency and effectiveness of this approach is given for standard
test suites: (i) Virginia-collections; (ii) Tipster-collection with some gigabyte of text.