jackyfkc.github.io

教土豆学计算机

CAP Conjecture or Theorem

对于通过网络共享数据的系统, 如果发生网络分区, 只能在数据一致性与可用性之间做选择

Introduction

Eric Brewer 在长期设计和部署分布式系统的过程中发现, 其中 3 个核心系统需求(systemic requirements)之间存在着某种特殊的关联.(他主要是谈论 Web 类的应用, Lessons from Giant-Scale Services)

这 3 个核心的系统需求分别是:Consistency,Availability 和 Partition Tolerance,造就了该理论的名字 - CAP

The CAP theorem asserts that any networked shared-data system can have only two of three desirable properties. However, by explicitly handling partitions, designers can optimize consistency and availability, thereby achieving some tradeoff of all three.
CAP 定理断言任何通过网络的数据共享系统最多只能满足数据一致性, 可用性, 分区容忍性三个属性中的二个. 但是, 设计者可以通过显式的处理分区问题来优化数据一致性和可用性, 从而达到三者之间的一种权衡.

The easiest way to understand CAP is to think of two nodes on opposite sides of a partition. Allowing at least one node to update state will cause the nodes to become inconsistent, thus forfeiting C. Likewise, if the choice is to preserve consistency. one side of the partition must act as if it is unavailable, thus forfeiting A.

理解 CAP 的最容易方式是: 考虑处于网络分区二侧的二个节点, 如果允许至少一个节点更新状态, 就会导致节点间不一致, 也就是放弃 C. 同样的, 如果是选择数据一致性, 节点必须放弃可用性.

No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the presence of a partition, one is then left with two options: consistency or availability.

所有的分布式系统都有可能出现网络问题, 因此网络分区问题是必须要接受的. 在出现网络分区的情形下, 需要在数据一致性和可用性之间做出选择.

In the absence of network failure – that is, when the distributed system is running normally – both availability and consistency can be satisfied.

CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare


CAP 只限制了设计空间的很小一部分: 仅在出现网络分区时, 才需要选择可用性或者数据一致性, 不过这种情形很少会发生

CAP 2010 来源: NoSQL: Past Present Future

History

This section shows the development history of CAP, refers to “NoSQL: Past, Present, Future”

The Significance of the Theorem

Why “2 of 3” is misleading

CAP is frequently misunderstood as if one had to choose to abandon one of the three properties at all times. In fact, the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made
CAP 经常被误解成在任意时间里都必须舍弃其中一个属性; 事实上, 只有在出现网络分区故障时, 才需要在数据一致性和可用性之间做出选择, 除此之外, 不需要做任何取舍.

ACID, BASE, and CAP - 酸碱平衡与 CAP

ACID 和 BASE 在数据一致性-可用性光谱 (注: Eric Brewer 在 Towards Robust Distributed Systems 中提出) 中代表了二个完全相反的设计理念. ACID 专注于一致性, 是数据库的传统处理方式.

The BASE Jump

A latency tolerant alternative to ACID

The notion of accepting eventual consistency is supported via an architectural approach known as BASE (Basically Available, Soft-state, Eventually consistent).

BASE, as its name indicates, is the logical opposite of ACID, though it would be quite wrong to imply that any architecture should (or could) be based wholly on one or the other. This is an important point to remember, given our industry’s habit of “oooh shiny” strategy adoption.

CAP & ACID

CAP 和 ACID 的关系比较复杂, 并且经常会导致误解, 部分原因是字母 C 和 A 在 ACID 和 CAP 中代表着不同的含义, 另一部分原因是选择可用性影响了一些 ACID 保证.

原子性 (A) 所有系统都从原子操作中受益. 当专注于可用性时, 分区的二部分都应该使用原子操作. 此外, 高级别的原子操作也使得恢复变得简单.

一致性 (C) 在 ACID 中, C 意味着一个事务保持着数据库约束, 比如唯一键约束; 相反地, CAP 中 C 只是指数据保持一致, 是 ACID 中 C 的一个真子集

The letter C does not really belong in ACID. Joe Hellerstein has remarked that the C in ACID was "tossed in to make the acronym work" in Harder and Reuter's paper, "Principles of Transcation-Oriented Database recovery"

ACID & BASE

……

Automatic Conflict resolution - CRDT

CRDTs (conflict-free replicated data types) are data types on which the same set of operations yields the same outcome, regardless of order of execution and duplication of operations.

This allows data convergence without the need for consensus between replicas. In turn, this allows for easier implementation (no consensus protocol implementation) as well as lower latency (no wait-time for consensus).

Operations on CRDTs need to adhere to the following rules:

Some examples of the different data types specified as CRDT’s include:

Further Readings