Ellis, Geoffrey (2008) Random sampling as a clutter reduction technique to facilitate interactive visualisation of large datasets. PhD thesis, Lancaster University.
Abstract
Within our physical world lies a digital world populated with an ever increasing number of sizeable data collections. Exploring these large datasets for patterns or trends is a difficult and complex task, especially when users do not always know what they are looking for. Information visualisation can facilitate this task through an interactive visual representation, thus making the data easier to interpret. However, we can soon reach a limit on the amount of data that can be plotted before the visual display becomes overcrowded or cluttered, hence potentially important information becomes hidden. The main theme of this work is to investigate the use of dynamic random sampling for reducing display clutter. Although randomness has been successfully applied in many areas of computer science and sampling has been used in data processing, the use of random sampling as a dynamic clutter reduction technique is novel. In addition, random sampling is particularly suitable for exploratory tasks as it offers a way of reducing the amount of data without the user having to decide what data is important. Sampling-based scatterplot and parallel coordinate visualisations are developed to experiment with various options and tools. These include simple, dynamic sampling controls with density feedback; a method of checking the reality of the representative sample; the option of global and/or localised clutter reduction using a variety of novel lenses and an auto-sampling option of automatically maintaining a reasonable view of the data within the lens. Furthermore, this work showed that sampling can be added to existing tools and used effectively in conjunction with other clutter reduction techniques. Sampling is evaluated both analytically, using a taxonomy of clutter reduction developed for the purpose, and experimentally using large datasets. The analytic route was prompted by an exploratory analysis, which showed that evaluation of information visualisation based on user studies are problematic. This thesis has contributed to several areas of research: ‣the feasibility and flexibility of global or lens-based sampling as a clutter reduction technique are demonstrated through sampling-based scatterplot and parallel coordinate visualisations. ‣the novel method of calculating the density for overlapping lines in parallel coordinate plots is both accurate and efficient and enables constant density within a sampling lens to be maintained without user intervention. ‣the novel criteria-based taxonomy of clutter reduction for information visualisation provides designers with a method to critique existing visualisations and think about new ones.