System Design Day -2
Key Concepts
Scalability
System Grow dimensions: More Users (Incoming Request) , Features (Add new feature to Expand system capability), Data(Growth of data that system handles), Complexity(System broken into small independent systems), Geographies (Server users in new regions)
Scale A System
1. Vertical Scaling (Scale up)
Increase capacity of existing resources (Memory, CPU..). Good for small Arch System
2. Horizontal Scaling (Scale out)
Increase the number of resources (Added more server). It’s often considered the most effective way to scale for large systems
3. Load Balancing
Load balancing is the process of distributing traffic across multiple servers to ensure no single server becomes overwhelmed.
4. Caching
Store frequently accessed data in-memory (like RAM) to reduce the load on the server or database. Implement caching can dramatically improve response times.
5. Content Delivery Networks (CDNs)
Globally Distributed group of servers that cache static assets (images, videos, etc.) for your origin server. Every CDN server has it’s own local cache which should all sync. This can reduce latency and result in faster load times. Two ways CDN cache to be populated: Pull & Push
Push CDN
Engineer should push updated files to the CDN
Pull CDN
Server cache is lazily updated, if doesn’t have it it will fetch from origin server.
Not Applicable if:
Region Specific User can just have an Origin Server near them.
Are not good for dynamic and sensitive data
6. Database Sharding and Partitioning
Sharding: Method of Distributing data across multiple machines. Database level. If you have two shards, means two database server
Partitioning: Splitting subset of data within the same instance, Data level
7. Asynchronous communication
Defer long-running or non-critical tasks to background queues or message brokers. This ensures your main application remains responsive to users.
Eg. Kafka, Message Queueing Service(SQS), RabbitMQ, Kinesis, Google PubSub
Broker is backbone of system, Harder to track flow
Cascading Failures for synchronous communication: In chain of connected components, when one component goes down, it affects all connected components
8. Microservices Architecture
Break down your application into smaller, independent services that can be scaled independently. This improves resilience and allows teams to work on specific components in parallel.
9. Auto-Scaling
Automatically adjust the number of active servers based on the current load. This ensures that the system can handle spikes in traffic without manual intervention. AWS auto Scaling
10. Multi-region Deployment
Deploy the application in multiple data centers or cloud regions to reduce latency and improve redundancy. Spotify uses multi-region deployments to ensure their music streaming service remains highly available and responsive to users all over the world, regardless of where they are located.
Throughput and Latency
Latency and throughput are two metrics that measure the performance of a computer network.
Latency is time taken by a single data packet to travel from the source computer to the destination computer. Measure in Milli Seconds(ms)
Throughput refers to the amount of data that can be transferred over a network in a given period. Measure in Megabits per second(Mbps)
Better: High Throughput, Low Latency
Fault Tolerance
Fault tolerance describes a system’s ability to handle errors and outages without any loss of functionality.
We can make Fault Tolerance by:
Multiple hardware systems, Multiple instances of software, Backup sources of power
Fault tolerance vs. high availability
High availability refers to a system’s total uptime, and achieving high availability is one of the primary reasons architects look to build fault-tolerant systems.
Technically, fault tolerance and high availability are not exactly the same thing. Keeping an application highly available is not simply a matter of making it fault tolerant. A highly fault-tolerant application could still fail to achieve high availability if, for example, it has to be taken offline regularly to upgrade software components, change the database schema, etc.
Application Level Fault Tolerance: The application spread across multiple regions, with each region having its own Kubernetes cluster.
Persistence(Database) Level Fault Tolerance: Database Replication, Sharding, Geographical Distribution,