Computer Science – Distributed – Parallel – and Cluster Computing
Scientific paper
2011-06-27
Computer Science
Distributed, Parallel, and Cluster Computing
To appear in: Proceedings of the 23rd European Modeling & Simulation Symposium (Simulation in Industry) 2011
Scientific paper
The trend for cloud computing has initiated a race towards data centres (DC) of an ever-increasing size. The largest DCs now contain many hundreds of thousands of virtual machine (VM) services. Given the finite lifespan of hardware, such large DCs are subject to frequent hardware failure events that can lead to disruption of service. To counter this, multiple redundant copies of task threads may be distributed around a DC to ensure that individual hardware failures do not cause entire jobs to fail. Here, we present results demonstrating the resilience of different job scheduling algorithms in a simulated DC with hardware failure. We use a simple model of jobs distributed across a hardware network to demonstrate the relationship between resilience and additional communication costs of different scheduling methods.
Cartlidge John
Sriram Ilango
No associations
LandOfFree
Modelling Resilience in Cloud-Scale Data Centres does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Modelling Resilience in Cloud-Scale Data Centres, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Modelling Resilience in Cloud-Scale Data Centres will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-639022