Design Tiny URL

1. Introduction

URL shortening is a process where we create short alias URLs for long URLs. These are called as short links. Some websites providing this service are tiny URL, bit.ly, goo.gl

Prerequisites:

System design introduction : 3 principles of distributed system, 5 step guide for System design.

System design concepts & components : Horizontal scaling, Hashing, Consistent hashing, Sharding, SQL vs NoSQL databases, Caching

2. Requirement analysis

Functional requirements

  1. Given URL we should get unique random short URL
  2. Given short link users should be redirected to original URL
  3. Users should optionally be able to choose a custom link
  4. Users should optionally be able to choose a expiration date. We didn’t see this in tiny URL but we can have it in our system.

Non-functional requirements

  1. The system should be highly available. Every time we hit the short URL we should be redirected to the original page.
  2. The system should work with minimum latency. Generation of tiny URLs and the redirection should be fast.
  3. Our system should be scalable it should handle the growing amount of URL creation and read requests.
  4. We should be able to know how many times the link was accessed.
3. API design
  • long createUser(name, email)
  • String createShortLink(originalUrl, customUrl = None, expirationDate = None, developerId = None)
  • String getOriginalUrl(shortLink)
  • void deleteURL(developerId, shortLink)
  • int getHitCount(developerId, shortLink)
    4. Define data model
    • User: UserId<PK>, Name, Email, CreationDate
    • ShortUrl: ShortUrl<PK>, LongUrl, UserId, hitCount, CreationDate, ExpirationDate
    What kind of database to use?
    Since we are designing a highly available and scalable system where the data access should be faster we will go for a NoSQL Column oriented database like Cassandra
      5. Back-of-the-envelope calculations

      Question:

      We have 100 write requests/sec. Each write request is of 10KB. We need to calculate how much disk space we would be using in the next 5 years?

      Answer:

      • Per day: 100 * 60 seconds * 60 minutes * 24 hours = 100 * 3600 * 24 = 360000 * 24 = 360K * 24  400K * 20 = 8000K = 8 Million request per day.
      • Per year: 8M * 365  8M * 400 = 3200M requests per year = 3.2 billion short URLs per year.
      • Writes per second: Consider each write requests of 10KB so we get 100 * 10KB = 1000KB = 1MB writes/s
      • Considering our reads are 100 times more we have approx. 100MB reads/s
      • So each day we write 1MB * 60 * 60 * 24 = 3600MB * 24  4000 * 20 = 80000MB per day = 80GB per day.
      • So each year we write 80 * 365 = 80 * 400 = 32000 GB = 32 TB
      • We don’t want to use more than 70% of our data storage at any time so we procure (32 * 100)/70  (35 * 100)/70 = 50TB disk space.
      • Say each server handles 10MB reads per second we would need minimum 10 servers for reading and 1 server for writing.
      • Similarly say we decide to introduce cache. Considering 20% of data per day is responsible for 80% of traffic so 80 GB * 20% = 80 * 0.2 = 16GB cache is needed.
      • Also because we have calculated that our system has 100MB reads/s and we have applied the 80-20 rule the 80% of 100MB/s = 80MB/s traffic is now read from the cache. Remaining 20% traffic is read directly from the database.
      6. High level design

      The main problem statement is to convert original URL into short URL of some fixed length. Let us see some of the challenges associated with it.

      What are the characters allowed in our short URL?

      Character Encoding: Base64

      Ways to generate the short URL?

      1. One way is to generate a random string using base64 and consider it as short URL. It will cause lot of collisions
      2. To avoid collision we can follow an incremental approach.
      3. Create a hash of the long URL using MD5 or SHA256
      Do we need to generate the short URL only when request arrives?
      No. We can pregenerate all the keys.
      7. Scale the design 

      Our non-functional requirements are

      1. The system should be highly available, which can be achieved by no single point of failure principle
      2. The system should work with minimum latency, which can be achieved by no bottleneck principle.
      3. Our system should be scalable it should handle the growing amount of URL creation and read requests. We can introduce caching and sharding to make our system scalable.

      Sharding criteria?

      1. UserId’s first letter
      2. Short URLs first letter
      3. Consistent hashing based on hash of short URL
      9. Additional things

      Security:

      1. Make the users signup and perform email authentication before giving them developer id.
      2. For all read/write request check for XSS.
      3. Limit the rate at with user can create short links.
      4. Encrypt the requests.
      5. Have proper authentication and authorization for accessing the database and servers.

      Telemetry:

      1. Number of active users per second
      2. Number of read/write requests per second
      3. Telemetry at user level
      10. Next Steps
      Ask questions and share your feedback in the course.

      The Complete Design Interview Course

      Let's connect on LinkedIn

      © Copyright CompleteDesignInterviewCourse.com