A Group Finding Algorithm for Multidimensional Data Sets

Astronomy and Astrophysics – Astronomy

Scientific paper

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details A Group Finding Algorithm for Multidimensional Data Sets A Group Finding Algorithm for Multidimensional Data Sets

: Sep 2009
: adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2009apj...703.1061s&link_type=abstract
: The Astrophysical Journal, Volume 703, Issue 1, pp. 1061-1077 (2009).
: Astronomy and Astrophysics
: Astronomy

: 17
: Galaxies: Halos, Galaxies: Structure, Methods: Data Analysis, Methods: Numerical
: Scientific paper
: We describe a density-based hierarchical group finding algorithm capable of identifying structures and substructures of any shape and density in multidimensional data sets where each dimension can be a numeric attribute with arbitrary measurement scale. This has applications in a wide variety of fields from finding structures in galaxy redshift surveys, to identifying halos and subhalos in N-body simulations and group finding in Local Group chemodynamical data sets. In general, clustering schemes require an a priori definition of a metric (a non-negative function that gives the distance between two points in a space) and the quality of clustering depends upon this choice. The general practice is to use a constant global metric which is optimal only if the clusters in the data are self-similar. For complex data configurations even the most finely tuned constant global metric turns out to be suboptimal. Moreover, the correct choice of metric also becomes increasingly important as the number of dimensions increase. To address these problems, we present an entropy-based binary space partitioning algorithm which uses a locally adaptive metric for each data point. The metric is employed to calculate the density at each point and a list of its nearest neighbors, and this information is then used to form a hierarchy of groups. Finally, the ratio of maximum to minimum density of points in a group is used to estimate the significance of the groups. Setting a threshold on this significance can effectively screen out groups arising due to Poisson noise and helps organize the groups into meaningful clusters. For a data set of N points, the algorithm requires only O(N) space and O(N(log N)3) time which makes it ideally suitable for analyzing large data sets. As an example, we apply the algorithm to identify structures in a simulated stellar halo using the full six-dimensional phase space coordinates.

Affiliated with

Johnston Kathryn V.

Astronomy and Astrophysics – Astrophysics

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sharma Sanjib

Astronomy and Astrophysics – Astrophysics – Galaxy Astrophysics

Scientist

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

A Group Finding Algorithm for Multidimensional Data Sets does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with A Group Finding Algorithm for Multidimensional Data Sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Group Finding Algorithm for Multidimensional Data Sets will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFWR-SCP-O-1272828

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure