教土豆学计算机
Eric Brewer 在长期设计和部署分布式系统的过程中发现, 其中 3 个核心系统需求(systemic requirements)之间存在着某种特殊的关联.(他主要是谈论 Web 类的应用, Lessons from Giant-Scale Services)
这 3 个核心的系统需求分别是:Consistency,Availability 和 Partition Tolerance,造就了该理论的名字 - CAP
Consistency: Every read receives the most recent write or an error, equivalent to having a single up-to-date copy of data
数据一致性: 每个读请求都可以获取到最新的数据或者错误, 或者说只有一个版本的最新数据
Availability: Every (update) request receives a (non-error) response - without guarantee that it contains the most recent write
可用性: 每一个(更新)请求都能得到响应 - 但不保证总是最新的
Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes
网络分区容忍性: 在出现任意消息丢失或延迟时, 系统依然可以工作;
A network partition has long been defined as a communication failure in which the network is split into disjoint sub-networks, with no communication possible across sub-network.
网络分区定义为: 网络因为通信故障被分割为多个不连通的子网络, 子网络之间无法通信
The easiest way to understand CAP is to think of two nodes on opposite sides of a partition. Allowing at least one node to update state will cause the nodes to become inconsistent, thus forfeiting C. Likewise, if the choice is to preserve consistency. one side of the partition must act as if it is unavailable, thus forfeiting A.
理解 CAP 的最容易方式是: 考虑处于网络分区二侧的二个节点, 如果允许至少一个节点更新状态, 就会导致节点间不一致, 也就是放弃 C. 同样的, 如果是选择数据一致性, 节点必须放弃可用性.
No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the presence of a partition, one is then left with two options: consistency or availability.
所有的分布式系统都有可能出现网络问题, 因此网络分区问题是必须要接受的. 在出现网络分区的情形下, 需要在数据一致性和可用性之间做出选择.
When choosing consistency over availability, the system will return an error or a time-out if particular information cannot be guaranteed to be up to date due to network partitioning.
When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.
In the absence of network failure – that is, when the distributed system is running normally – both availability and consistency can be satisfied.
CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare
CAP 只限制了设计空间的很小一部分: 仅在出现网络分区时, 才需要选择可用性或者数据一致性, 不过这种情形很少会发生
来源: NoSQL: Past Present Future
This section shows the development history of CAP, refers to “NoSQL: Past, Present, Future”
1995 Develop modern cluster-based server, scalable, highly available, commodity clusters (Brewer co-founded Inktomi Corporation in 1996)
Inktomi search engine (1996) and proxy cache (1998)
1997 Published Cluster-Based Scalable Network Services on SOSP
Led to “ACID vs. BASE” spectrum, but BASE was not well received
1998 Developed CAP (conjecture) while teaching, appears in 1999
Eric Brewer felt their design choices were right: sufficient (and faster), necessary (consistency hinders performance/availability). And he started to notice other systems that made similar decisions: Coda, Bayou
1999 Published Harvest, yield, and scalable tolerant systems
2000 Published Towards Robust Distributed Systems on Principles of Distributed Computing (PODC)
2002 Seth Gilbert, Nany Lynh of MIT formally proved Brewer to be correct, in paper Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services, and thus CAP theorem was born
2012 Published CAP Twelve Years Later: How the “Rules” Have Changed
…
ACID 和 BASE 在数据一致性-可用性光谱 (注: Eric Brewer 在 Towards Robust Distributed Systems 中提出) 中代表了二个完全相反的设计理念. ACID 专注于一致性, 是数据库的传统处理方式.
The notion of accepting eventual consistency is supported via an architectural approach known as BASE (Basically Available, Soft-state, Eventually consistent).
BASE
, as its name indicates, is the logical opposite of ACID
, though it would be quite wrong to imply that any architecture should (or could) be based wholly on one or the other. This is an important point to remember, given our industry’s habit of “oooh shiny” strategy adoption.
Basically available indicates that the system does guarantee almost availability, in terms of the CAP theorem.
Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
Eventual consistency indicates that the system will become consistent over time, given that the system doesn’t receive input during that time.
CAP 和 ACID 的关系比较复杂, 并且经常会导致误解, 部分原因是字母 C 和 A 在 ACID 和 CAP 中代表着不同的含义, 另一部分原因是选择可用性影响了一些 ACID 保证.
原子性 (A) 所有系统都从原子操作中受益. 当专注于可用性时, 分区的二部分都应该使用原子操作. 此外, 高级别的原子操作也使得恢复变得简单.
一致性 (C) 在 ACID 中, C 意味着一个事务保持着数据库约束, 比如唯一键约束; 相反地, CAP 中 C 只是指数据保持一致, 是 ACID 中 C 的一个真子集
……
CRDTs (conflict-free replicated data types) are data types on which the same set of operations yields the same outcome, regardless of order of execution and duplication of operations.
This allows data convergence without the need for consensus between replicas. In turn, this allows for easier implementation (no consensus protocol implementation) as well as lower latency (no wait-time for consensus).
Operations on CRDTs need to adhere to the following rules:
Associativity: a + (b + c) = (a + b) + c
, so that grouping does not matter.
Commutativity: a + b = b + a
, so that order of application does not matter.
Idempotent: a + a = a
, so that duplication does not matter.
Some examples of the different data types specified as CRDT’s include:
Counters
Grow-only counter (merge = max(values); payload = single integer)
Positive-negative counter (consists of two grow counters, one for increments and another for decrements)
Registers
Last Write Wins register (timestamps or version numbers; merge = max(ts); payload = blob)
Multi-valued register (vector clocks; merge = take both)
Sets
Grow-only set (merge = union(items); payload = set; no removal)
Two-phase set (consists of two sets, one for adding, and another for removing; elements can be added once and removed once)
Unique set (an optimized version of the two-phase set)
Last write wins set (merge = max(ts); payload = set)
Positive-negative set (consists of one PN-counter per set item)
Observed-remove set
Graphs and text sequences (see the paper)
Cluster-Based Scalable Network Services, Operating Systems Principles (SOSP 97), ACM, 1997, pp. 78-91
Harvest, Yield, and Scalable Tolerant Systems, 1999
Eric Brewer: Towards Robust Distributed Systems, PODC 2000 keynote
Eric Brewer: Lessons from Giant-Scale Services, 2001
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services, 2002
Seth Gilbert, Nany Lynh of MIT formally proved Brewer to be correct
Dan Pritchett: Architecting for Latency, eBay, 2007
HP’s talk on CAP theorem, titled There is no free lunch with distributed data
Consistency, availability, and partition-tolerance trade-offs on distributed data access systems
Brewer’s CAP Theorum: The kool aid Amazon and Ebay have been drinking, 2009, julian
Nathan Marz: How to beat the CAP theorem, October 13, 2011
Eric Brewer: CAP Twelve Years Later: How the “Rules” Have Changed, 2012, on IEEE
中文翻译 CAP 十二年回顾:”规则”变了
Eric Brewer: NoSQL: Past, Present, Future , November 8, 2012
Martin Kleppmann: A Critique of the CAP Theorem, September 17, 2015