Computer Science – Distributed – Parallel – and Cluster Computing
Scientific paper
2010-03-04
Computer Science
Distributed, Parallel, and Cluster Computing
Submitted to HPDC 2010
Scientific paper
It has been long recognized that failure events are correlated, not independent. Previous research efforts show event correlation mining is helpful to resource allocation, job scheduling and proactive management. However logs are hard to be analyzed because of the inherent unstructured nature and large quantity. Previous work fails to resolve this issue in several ways: some work uses association rule mining algorithm to filter events so as to find simple temporal and spatial laws or models for the purpose of failure prediction; however their prediction results are coarse and high level without details. Some previous efforts proposed rule-based algorithms for event prediction; however, they either only focus on some failure patterns by identifying non-fatal events preceding each fatal event before event correlation mining, or only focus on specific target event types, rather than analyzing a variety of failures in large cluster systems. Our contributions are four-fold: (1) For the first time, we build a general-purpose event correlation mining system; (2) we propose two approaches to mining event correlations in a single node and multiple nodes; (3) we propose an innovative abstraction, Failure Correlation Graphs (FCG), to represent event correlations in cluster systems; (4) we present a FCG-based algorithm for event prediction. As a case, we use LogMaster to analyze three months'logs of a production Hadoop cluster system in the Research Institution of China Mobile, which includes 977,858 original event entries. At the same time, we use the analysis results to predict one month's logs of the same system.
Meng Dan
Xu Dongyan
Zhan Jianfeng
Zhang Zhihong
Zhou Wei
No associations
LandOfFree
LogMaster: Mining Event Correlations in Logs of Large scale Cluster Systems does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with LogMaster: Mining Event Correlations in Logs of Large scale Cluster Systems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and LogMaster: Mining Event Correlations in Logs of Large scale Cluster Systems will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-559315