Sampling Strategies for Mining in Data-Scarce Domains

Computer Science – Computational Engineering – Finance – and Science

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

Data mining has traditionally focused on the task of drawing inferences from large datasets. However, many scientific and engineering domains, such as fluid dynamics and aircraft design, are characterized by scarce data, due to the expense and complexity of associated experiments and simulations. In such data-scarce domains, it is advantageous to focus the data collection effort on only those regions deemed most important to support a particular data mining objective. This paper describes a mechanism that interleaves bottom-up data mining, to uncover multi-level structures in spatial data, with top-down sampling, to clarify difficult decisions in the mining process. The mechanism exploits relevant physical properties, such as continuity, correspondence, and locality, in a unified framework. This leads to effective mining and sampling decisions that are explainable in terms of domain knowledge and data characteristics. This approach is demonstrated in two diverse applications -- mining pockets in spatial data, and qualitative determination of Jordan forms of matrices.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Sampling Strategies for Mining in Data-Scarce Domains does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Sampling Strategies for Mining in Data-Scarce Domains, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Sampling Strategies for Mining in Data-Scarce Domains will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-402958

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.