Statistics – Applications
Scientific paper
2011-11-27
Statistics
Applications
Submitted
Scientific paper
Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of \emph{feature selection} in which only a subset of the predictors $X_t$ are dependent on the multidimensional variate $Y$, and the remainder of the predictors constitute a "noise set" $X_u$ independent of $Y$. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine "empirical bounds" on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.
Carroll Raymond
Chapkin Robert
Ivanov I. I.
Schwartz Scott
Zheng Charles
No associations
LandOfFree
Feature selection for high-dimensional integrated data does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Feature selection for high-dimensional integrated data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Feature selection for high-dimensional integrated data will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-687470