Astronomy and Astrophysics – Astronomy
Scientific paper
Sep 2009
adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2009apj...703.1061s&link_type=abstract
The Astrophysical Journal, Volume 703, Issue 1, pp. 1061-1077 (2009).
Astronomy and Astrophysics
Astronomy
17
Galaxies: Halos, Galaxies: Structure, Methods: Data Analysis, Methods: Numerical
Scientific paper
We describe a density-based hierarchical group finding algorithm capable of identifying structures and substructures of any shape and density in multidimensional data sets where each dimension can be a numeric attribute with arbitrary measurement scale. This has applications in a wide variety of fields from finding structures in galaxy redshift surveys, to identifying halos and subhalos in N-body simulations and group finding in Local Group chemodynamical data sets. In general, clustering schemes require an a priori definition of a metric (a non-negative function that gives the distance between two points in a space) and the quality of clustering depends upon this choice. The general practice is to use a constant global metric which is optimal only if the clusters in the data are self-similar. For complex data configurations even the most finely tuned constant global metric turns out to be suboptimal. Moreover, the correct choice of metric also becomes increasingly important as the number of dimensions increase. To address these problems, we present an entropy-based binary space partitioning algorithm which uses a locally adaptive metric for each data point. The metric is employed to calculate the density at each point and a list of its nearest neighbors, and this information is then used to form a hierarchy of groups. Finally, the ratio of maximum to minimum density of points in a group is used to estimate the significance of the groups. Setting a threshold on this significance can effectively screen out groups arising due to Poisson noise and helps organize the groups into meaningful clusters. For a data set of N points, the algorithm requires only O(N) space and O(N(log N)3) time which makes it ideally suitable for analyzing large data sets. As an example, we apply the algorithm to identify structures in a simulated stellar halo using the full six-dimensional phase space coordinates.
Johnston Kathryn V.
Sharma Sanjib
No associations
LandOfFree
A Group Finding Algorithm for Multidimensional Data Sets does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with A Group Finding Algorithm for Multidimensional Data Sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Group Finding Algorithm for Multidimensional Data Sets will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-1272828