分布式系统

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.

在一个系统中有一台你从未见过的机器出了问题, 却能致使你的机器也无法使用, 那么它就是一个分布式系统 ------ 莱斯利.兰伯特, 1987

为什么我们需要分布式系统？

分布式系统的麻烦

Models describe the key properties of a distributed system in a precise manner.

Theorem 定理 (maybe?)

FLP

FLP proved that consensus cannot be achieved in asynchronous distributed systems if failures are possible.
CAP

Available or Consistent when Partitioned

分布式系统一个首要问题是进程间通信

其次，进程间需要同步：互斥（分布式锁）和分布式事务（原子提交）

共识算法 Consensus Algorithm

Getting a group of nodes to agree on a given value is described as the consensus problem

共识问题: 一组节点就选择一个 value 达成共识 (single decree).

分而治之

进一步阅读

Online Blog & Articles

Distributed Systems for fun and profit, mixu

概述分布式系统的多个主题
Fallacies of Distributed Computing Explained

L Peter Deutsch 在 Sun 公司发表的分布式计算的 7 个谬论

Books

Martin Kleppmann: Designing Data Intensive Applications - The Big Ideas Behind Reliable, Scalable And Maintainable System

设计数据密集型应用, 对于实践有非常好的指导价值, 建议作为入门读物
Joe Armstrong: Making Reliable Distributed Systems in the presence of software errors

中译本: 面对软件错误构建可靠的分布式系统
Architect for Scale - High Availability For Your Growing Applications

Andrew S. Tanebaum: Distributed Systems Principles And Paradigms

中译本, 分布式系统原理与范型; 偏理论
Distributed Operating Systems 分布式操作系统
Introduction to Reliable and Secure Distributed Programming
Nancy A. Lynch: Distributed Algorithms

How To Architect

Kate Matsudaira: Scalable Web Architecture and Distributed Systems

This article seeks to cover some of the key issues to consider when designing large websites, as well as some of the building blocks used to achieve these goals.
Google: Designs, Lessons and Advice from Building Large Distributed Systems
Dan Pritchett: Architecting for Latency, eBay, October 2007
Architecting for scale, Beautiful Architecture, Chapter 3