Truecluster matching

Computer Science – Artificial Intelligence

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

15 pages, 2 figures. Details the matching needed for "Truecluster: robust scalable clustering with model selection" but can al

Scientific paper

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a chi-square transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Truecluster matching does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Truecluster matching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Truecluster matching will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-356706

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.