- High-level trade-offs
everything is a trade-off.
Performance vs scalability
A service is scalable if it results in increased performance in a manner proportional to resources added.
Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.
Another way to look at performance vs scalability:
If you have a performance problem, your system is slow for a single user.
If you have a scalability problem, your system is fast for a single user but slow under heavy load.
In distributed systems there are other reasons for adding resources to a system;
for example to improve the reliability of the offered service.
Introducing redundancy is an important first line of defense against failures.
An always-on service is said to be scalable if adding resources to facilitate redundancy does not result in a loss of performance.
For the systems we build we must carefully inspect along
which axis we expect the system to grow
where redundancy is required
how one should handle heterogeneity in this system
make sure that architects are aware of which tools they can use for under which conditions, and what the common pitfalls are.
Latency vs throughput
Latency is the time to perform some action or to produce some result.
Throughput is the number of such actions or results per unit of time.
Generally, you should aim for maximal throughput with acceptable latency.
Availability vs consistency
Given that networks aren’t completely reliable, you must tolerate partitions in a distributed system, period.
Fortunately, though, you get to choose what to do when a partition does occur.
According to the CAP theorem, this means we are left with two options: Consistency and Availability.
Consistency - Every read receives the most recent write or an error
Availability - Every request receives a response, without guarantee that it contains the most recent version of the information
Partition Tolerance - The system continues to operate despite arbitrary partitioning due to network failures
CP - Consistency/Partition Tolerance
Wait for a response from the partitioned node which could result in a timeout error.
The system can also choose to return an error, depending on the scenario you desire.
Choose Consistency over Availability when your business requirements dictate atomic reads and writes.
AP - Availability/Partition Tolerance
Return the most recent version of the data you have, which could be stale.
This system state will also accept writes that can be processed later when the partition resolved.
Choose Availability over Consistency when your business requirements allow for some flexibility around when the data in the system synchronizes.
Availability is also a compelling option when the system needs to continue to function in spite of external errors (shopping carts, etc.)
With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data.
After a write, reads may or may not see it. A best effort approach is taken.
This approach is seen in systems such as memcached.
Weak consistency works well in real time use cases such as VoIP, video chat, and realtime multiplayer games.
For example, if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.
After a write, reads will eventually see it (typically within milliseconds).
Data is replicated asynchronously.
This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.
After a write, reads will see it. Data is replicated synchronously.
This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.
There are two main patterns to support high availability: fail-over and replication.
With active-passive fail-over, heartbeats are sent between the active and the passive server on standby.
If the heartbeat is interrupted, the passive server takes over the active’s IP address and resumes service.
The length of downtime is determined by whether the passive server is already running in ‘hot’ standby or whether it needs to start up from ‘cold’ standby. Only the active server handles traffic.
Active-passive failover can also be referred to as master-slave failover.
In active-active, both servers are managing traffic, spreading the load between them.
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.
Active-active failover can also be referred to as master-master failover.
Fail-over adds more hardware and additional complexity.
There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.