CASES STUDY:
VISUAL DATA EXPLORATION
An interactive data visualization to explore multidimensional data sets in order to distinguish different classes of galaxies. The data set was part of an astronomical study led by K. Leiter at the Chair of Astronomy, University of Würzburg.
Intro
The multidimensional data comprises X-ray properties of galaxies divided into two groups: those which show megamaser emission in the radio frequency band of the electromagnetic spectrum and thus provide an independent opportunity to measure the expansion of the universe, and a control group, so-called nonmaser galaxies. Astronomers created the data set in order to find parameters which serve to classify new sources into one of the two group. In addition machine learning algorithms, I supported the group by performing a visual parameter exploration.
A Visual Exploration
The researchers was to obtain efficient classification criteria to find candidates for megamaser galaxies in future X-ray source catalogs. Typically, this is done by asking a series of question, i.e. whether or not the parameter values of a source is larger or lower than a pre-deterimed threshold. This concept is called decision tree. However, the larger the number of parameters in the data set, the harder it is to find the most efficient combinations of parameters to be checked. In this case study, I wanted to explore: can data visualization help to understand and evaluate the classification of the intelligent decision tree?
Data Mapping & Navigation
For this visualisation I used the concept of stacked Epicycles that was once used to describe planetary orbits. Each source performs a unique paths through 3D space which is solely defined by its parameter values. Similar sources will end up in the same area while distinct sources occupy different locations in space. The researchers were able to test different parameter combinations and find the best parameter set to separate megamaser galaxies (red) from non-maser galaxies (blue) and find outliers. Interactivity & Navigation Interactivity and Navigation allows the scientist to define individual parameter sets to be explored, detect and exclude outliers from the classification process and investigate the parameter values of each individual source. The user can navigate freely through 3D space using the mouse & keyboard.
Impact
Impact The visual analysis helped astronomers to understand the results of a random forest machine learning alogorithm and identify outliers in the large data set. This helped to refine and improve the analysis pipeline.








