Design API Rate Limiter

Want to get a detailed explanation? Click here to get Udemy's Bestseller "The Complete Design Interview Course" with 100% Money-back guarantee!

1. Introduction

Rate limiter is used to control the rate of requests sent or received by our system. Many services like Tiny URL’s, Twitter API’s, Facebook API’s use rate limiters. Other term for rate limiting is throttling.

Types of throttling

Hard throttling – Here the number of requests cannot exceed the limit.
Soft throttling – Once the limit is exhausted the user might be allowed to use some additional percentage of the resources.
Dynamic throttling – Once the limit is exhausted the user might be allowed to request more if there are free resources available in the system.

Prerequisites:

System design introduction : 3 principles of distributed system, 5 step guide for System design.

System design concepts & components : Horizontal scaling

2. Requirement analysis

Functional requirements

Limit the number of api requests
The limit must be configurable

Non-functional requirements

System should be highly available since it protects our system.
Rate limiter should not slow down the api requests. There should be little to no latency.

3. API design

boolean isRequestAllowed(params)

4. Define data model

The user or entity accessing the apis should have some unique id against which we will apply rate limiter. We can keep the rate or count related data in-memory.

5. Back-of-the-envelope calculations

1 million active users send 10 requests per minute
10 million requests per minute
If each rate limiting server can handle 1 million requests per minute we will need 10 servers.
Our systems should not be loaded more than 70% any point of time then we need 10*100/70 = 100/7 = 15 servers.

6. High level design

Problem statement:

Calculate the number of requests the user sends per minute.

Solution:

There are various Algorithms which perform similar functionality.

Leaky bucket
Fixed window algorithm
Sliding window
Sliding log algorithm

7. Scaling the design

Let us revisit our non-functional requirements
1. System should be highly available since it protects our system which can be achieved by no single point of failure principle.
2. Rate limiter should not slow down the api requests. This can be achieved by no bottleneck principle.

Capacity to process the sliding window algorithm?

Consider with the counter and the array it is 10KB per user so for 1 million users the capacity needed would be 10KB * 1 million = 10KB * 1000000 = 1000000 KB = 10000MB = 10GB which can easily be handled by a single server.

How to synchronize servers?

Sticky sessions
Common data storage system
Maintain the data in-memory and frequently sync the data with common data storage

9. Next Steps

Ask questions and share your feedback in the course.