A Group Finding Algorithm for Multidimensional Data Sets

Astronomy and Astrophysics – Astronomy

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

17

Galaxies: Halos, Galaxies: Structure, Methods: Data Analysis, Methods: Numerical

Scientific paper

We describe a density-based hierarchical group finding algorithm capable of identifying structures and substructures of any shape and density in multidimensional data sets where each dimension can be a numeric attribute with arbitrary measurement scale. This has applications in a wide variety of fields from finding structures in galaxy redshift surveys, to identifying halos and subhalos in N-body simulations and group finding in Local Group chemodynamical data sets. In general, clustering schemes require an a priori definition of a metric (a non-negative function that gives the distance between two points in a space) and the quality of clustering depends upon this choice. The general practice is to use a constant global metric which is optimal only if the clusters in the data are self-similar. For complex data configurations even the most finely tuned constant global metric turns out to be suboptimal. Moreover, the correct choice of metric also becomes increasingly important as the number of dimensions increase. To address these problems, we present an entropy-based binary space partitioning algorithm which uses a locally adaptive metric for each data point. The metric is employed to calculate the density at each point and a list of its nearest neighbors, and this information is then used to form a hierarchy of groups. Finally, the ratio of maximum to minimum density of points in a group is used to estimate the significance of the groups. Setting a threshold on this significance can effectively screen out groups arising due to Poisson noise and helps organize the groups into meaningful clusters. For a data set of N points, the algorithm requires only O(N) space and O(N(log N)3) time which makes it ideally suitable for analyzing large data sets. As an example, we apply the algorithm to identify structures in a simulated stellar halo using the full six-dimensional phase space coordinates.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

A Group Finding Algorithm for Multidimensional Data Sets does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with A Group Finding Algorithm for Multidimensional Data Sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Group Finding Algorithm for Multidimensional Data Sets will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-1272828

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.