Design API Rate Limiter

1. Introduction

Rate limiter is used to control the rate of requests sent or received by our system. Many services like Tiny URL’s, Twitter API’s, Facebook API’s use rate limiters. Other term for rate limiting is throttling.

Types of throttling

  1. Hard throttling – Here the number of requests cannot exceed the limit.
  2. Soft throttling – Once the limit is exhausted the user might be allowed to use some additional percentage of the resources.
  3. Dynamic throttling – Once the limit is exhausted the user might be allowed to request more if there are free resources available in the system.

Prerequisites:

System design introduction : 3 principles of distributed system, 5 step guide for System design.

System design concepts & components : Horizontal scaling

2. Requirement analysis

Functional requirements

  1. Limit the number of api requests
  2. The limit must be configurable

Non-functional requirements

  1. System should be highly available since it protects our system.
  2. Rate limiter should not slow down the api requests. There should be little to no latency.
3. API design

boolean isRequestAllowed(params)

    4. Define data model

    The user or entity accessing the apis should have some unique id against which we will apply rate limiter. We can keep the rate or count related data in-memory.

      5. Back-of-the-envelope calculations
      • 1 million active users send 10 requests per minute
      • 10 million requests per minute
      • If each rate limiting server can handle 1 million requests per minute we will need 10 servers.
      • Our systems should not be loaded more than 70% any point of time then we need 10*100/70 = 100/7 = 15 servers.
      6. High level design

      Problem statement:

      Calculate the number of requests the user sends per minute.

      Solution:

      There are various Algorithms which perform similar functionality.

      1. Leaky bucket
      2. Fixed window algorithm
      3. Sliding window
      4. Sliding log algorithm
      7. Scaling the design

      Let us revisit our non-functional requirements
      1. System should be highly available since it protects our system which can be achieved by no single point of failure principle.
      2. Rate limiter should not slow down the api requests. This can be achieved by no bottleneck principle.

      Capacity to process the sliding window algorithm?

      Consider with the counter and the array it is 10KB per user so for 1 million users the capacity needed would be 10KB * 1 million = 10KB * 1000000 = 1000000 KB = 10000MB = 10GB which can easily be handled by a single server.

      How to synchronize servers?

      1. Sticky sessions
      2. Common data storage system
      3. Maintain the data in-memory and frequently sync the data with common data storage
      9. Next Steps
      Ask questions and share your feedback in the course.

      The Complete Design Interview Course

      Let's connect on LinkedIn

      © Copyright CompleteDesignInterviewCourse.com