Biology – Quantitative Biology – Quantitative Methods
Scientific paper
2005-04-08
A.N. Gorban, B. Kegl, D.C. Wunsch, A. Zinovyev (eds.) Principal Manifolds for Data Visualization and Dimension Reduction, Lect
Biology
Quantitative Biology
Quantitative Methods
18 pages, with program listings for MatLab, PCA analysis of genomes and additional animated 3D PCA plots
Scientific paper
10.1007/978-3-540-73750-6_14
In this paper, we aim to give a tutorial for undergraduate students studying statistical methods and/or bioinformatics. The students will learn how data visualization can help in genomic sequence analysis. Students start with a fragment of genetic text of a bacterial genome and analyze its structure. By means of principal component analysis they ``discover'' that the information in the genome is encoded by non-overlapping triplets. Next, they learn how to find gene positions. This exercise on PCA and K-Means clustering enables active study of the basic bioinformatics notions. Appendix 1 contains program listings that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich ones. Animated 3D PCA plots are attached as separate gif files. Topology (cluster structure) and geometry (mutual positions of clusters) of these plots depends clearly on GC-content.
Gorban Alexander N.
Zinovyev Andrey Yu.
No associations
LandOfFree
PCA and K-Means decipher genome does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with PCA and K-Means decipher genome, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and PCA and K-Means decipher genome will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-683390