System Design Day -2

Key Concepts

Ramesh Pokhrel
3 min readJul 16, 2024

Scalability

System Grow dimensions: More Users (Incoming Request) , Features (Add new feature to Expand system capability), Data(Growth of data that system handles), Complexity(System broken into small independent systems), Geographies (Server users in new regions)

Scale A System

1. Vertical Scaling (Scale up)

Increase capacity of existing resources (Memory, CPU..). Good for small Arch System

2. Horizontal Scaling (Scale out)

Increase the number of resources (Added more server). It’s often considered the most effective way to scale for large systems

3. Load Balancing

Load balancing is the process of distributing traffic across multiple servers to ensure no single server becomes overwhelmed.

4. Caching

Store frequently accessed data in-memory (like RAM) to reduce the load on the server or database. Implement caching can dramatically improve response times.

5. Content Delivery Networks (CDNs)

Globally Distributed group of servers that cache static assets (images, videos, etc.) for your origin server. Every CDN server has it’s own local cache which should all sync. This can reduce latency and result in faster load times. Two ways CDN cache to be populated: Pull & Push

Push CDN

Engineer should push updated files to the CDN

Pull CDN

Server cache is lazily updated, if doesn’t have it it will fetch from origin server.

Not Applicable if:

Region Specific User can just have an Origin Server near them.

Are not good for dynamic and sensitive data

6. Database Sharding and Partitioning

Sharding: Method of Distributing data across multiple machines. Database level. If you have two shards, means two database server

Partitioning: Splitting subset of data within the same instance, Data level

5 partitions of our 100GB dataset are distributed across 2 shard
Diff with Partition and Sharding

7. Asynchronous communication

Defer long-running or non-critical tasks to background queues or message brokers. This ensures your main application remains responsive to users.

Eg. Kafka, Message Queueing Service(SQS), RabbitMQ, Kinesis, Google PubSub

Broker is backbone of system, Harder to track flow

Cascading Failures for synchronous communication: In chain of connected components, when one component goes down, it affects all connected components

8. Microservices Architecture

Break down your application into smaller, independent services that can be scaled independently. This improves resilience and allows teams to work on specific components in parallel.

9. Auto-Scaling

Automatically adjust the number of active servers based on the current load. This ensures that the system can handle spikes in traffic without manual intervention. AWS auto Scaling

10. Multi-region Deployment

Deploy the application in multiple data centers or cloud regions to reduce latency and improve redundancy. Spotify uses multi-region deployments to ensure their music streaming service remains highly available and responsive to users all over the world, regardless of where they are located.

Throughput and Latency

Latency and throughput are two metrics that measure the performance of a computer network.

Latency is time taken by a single data packet to travel from the source computer to the destination computer. Measure in Milli Seconds(ms)

Throughput refers to the amount of data that can be transferred over a network in a given period. Measure in Megabits per second(Mbps)

Better: High Throughput, Low Latency

Fault Tolerance

Fault tolerance describes a system’s ability to handle errors and outages without any loss of functionality.

We can make Fault Tolerance by:

Multiple hardware systems, Multiple instances of software, Backup sources of power

Fault tolerance vs. high availability

High availability refers to a system’s total uptime, and achieving high availability is one of the primary reasons architects look to build fault-tolerant systems.

Technically, fault tolerance and high availability are not exactly the same thing. Keeping an application highly available is not simply a matter of making it fault tolerant. A highly fault-tolerant application could still fail to achieve high availability if, for example, it has to be taken offline regularly to upgrade software components, change the database schema, etc.

Application Level Fault Tolerance: The application spread across multiple regions, with each region having its own Kubernetes cluster.

Persistence(Database) Level Fault Tolerance: Database Replication, Sharding, Geographical Distribution,

--

--