CMU | Carrie Zhang
Menu
Home
About
Experiences
Projects
Publications
Blogs
Games
Contact
HPC, MPI, MapReduce
Learning Notes of CMU Course 15-640 Distributed Systems
Posted by
haohanz
May 09, 2019 · Stay Hungry · Stay Foolish
HPC
(Supercomputers + MPI)
MPI’s focus:
Low latency
High scalability
HPC:
Character
High bandwidth
Long-lived process
In memory computation
Spatial locality exploited via partitioning
Pros
High utilization about resource
Effectiveness
Cons
Need tuning between application and resource
Generability is low, cannot afford and variant
Fault tolerance
Fault tolerance
Build checkpoint
large I/O traffic
Wasted intervening computation
Scaling sensitive to failing numbers
MPA:
Pros
Handle iterative, interactive computations.
Handle tightly-coupled computations
Handle communication components
Cons
Fault tolerance
Cannot be more complex
Cluster Computing
(MapReduce + Implementation)
MapReduce deals with stragglers:
Solved: Caused not by poor partition (by I/O latency, server performance, bugs)
Reschedule the tasks not done
Not solved: by poor partition (which is solved by Spark) Hashing is on mapper, M mapper R reducer, each mapper output R files.
Why use cluster file system on the input and output? (HDFS)
Fault tolerance
Dynamic schedule tasks in replicated files
Hetorogeneous nodes
Maximize cost-performance
Minimal downtime, minimal staffing
MapReduce:
Pros
Failure model
High throughput
Simple programming
Loosely-coupled, course-grained parallel tasks
Cons
Disk IO, high latency
Iterative, interactive computation
Abstraction not expressive enough, have to build specialized analytics systems.
Next
Post
→
← Previous
Post