Turner, Lisa and Dimitrov, Nedialko B. and Fearnhead, Paul (2016) Bayes Linear Methods for Large-Scale Network Search. arxiv.org.
Abstract
Consider the problem of searching a large set of items, such as emails, for a small set which are relevant to a given query. This can be implemented in a sequential manner whereby we use knowledge from earlier items that we have screened to help us choose future items in an informed way. Often the items we are searching have an underlying network structure: for example emails can be related to a network of participants, where an edge in the network relates to the presence of a communication between those two participants. Recent work by Dimitrov, Kress and Nevo has shown that using the information about the network structure together with a modelling assumption that relevant items and participants are likely to cluster together, can greatly increase the rate of screening relevant items. However their approach is computationally expensive and thus limited in applicability to small networks. Here we show how Bayes Linear methods provide a natural approach to modelling such data; that they output posterior summaries that are most relevant to heuristic policies for choosing future items; and that they can easily be applied to large-scale networks. Both on simulated data, and data from the Enron Corpus, Bayes Linear approaches are shown to be applicable to situations where the method of Dimitrov et al. is infeasible; and give substantially better performance than methods that ignore the network structure.