Designing a URL Shortening Service ( TinyURL )
How to Plan a URL Shortening Service's System Design
PermalinkIntroduction
A URL shortener is a program or service that, while still rerouting users to the original URL, shortens lengthy, complicated URLs into shorter, easier-to-manage connections. Because they are simpler to type, less prone to typos, and less expensive to print, these shorter links are particularly helpful when distributing content on character-limited platforms like Twitter, via SMS, or in printed documents. Users can create short links, occasionally with unique aliases, using services like Bit.ly and TinyURL that are easy to distribute and offer statistics and tracking for traffic monitoring.
For instance, TinyURL can be used to shorten a lengthy URL, such as https://example.com, to https://tinyurl.com/alias. URL shorteners improve accessibility and usability across several media by making link sharing easier.
PermalinkRequirements and Features
PermalinkFeatures
Custom URL Creation: Users are able to construct short URLs up to 16 characters long.
Metrics Gathering: The system ought to gather and present data, including the most frequented links and the quantity of redirects.
Permanent Links: Unless specifically removed or expired, shortened URLs are automatically stored in the system for an indefinite period of time.
Dynamic Expiry: Users have the ability to specify unique link expiration dates.
Targeted Advertising: Features like link performance-based targeted advertising can be supported by metrics aggregation.
High Performance: The service guarantees quick and easy redirection without causing any performance deterioration.
PermalinkFunctional Requirements
Short URL Generation: For long URLs, the system ought to produce distinct, condensed aliases.
Redirection: The system ought to reroute users to the original long URL when they are presented with a short one.
Custom Short Links: With a 16 character limit, users ought should be able to construct their own unique short links.
Link Deletion: If granted the required permissions, users ought to be able to remove short links that they have created.
Link Update: If allowed, users ought to be able to change the original URL linked to a brief link.
Expiration Time: Although users should be able to define their own expiration times, links should have a default expiration time.
Lifetime Retention: The short URL ought to stay in the system permanently if no expiration date is specified.
PermalinkNon-Functional Requirements
Availability: To guarantee zero downtime, the system needs to be fault-tolerant and highly available.
Scalability: As demand grows, the system should be able to accommodate up to 100 million additional URL shortenings each month and grow horizontally.
Low Latency: Even under heavy load, the system should offer quick URL redirection and quick response times.
Security: To avoid unauthorized access and patterns that can be guessed, the short URLs should be extremely unexpected.
Readability: The brief URLs that are produced must to be simple to read, distinct, and type.
Integration: In order to facilitate integration with third-party applications, the service should offer REST APIs.
PermalinkSystem Design
PermalinkArchitecture
PermalinkCapacity and Estimation
PermalinkShortened URLs
- Generation Rate: Approximately 40 new URLs per second, totaling ~120 billion URLs over 100 years.
PermalinkRedirection Requests
Rate: Around 8,000 redirections per second with a 200:1 read-to-write ratio.
Daily Total: ~700 million redirections.
PermalinkStorage Requirements
Data Object Size: Each object (short URL, long URL, timestamps, etc.) is ~500 bytes.
100-Year Storage: ~60 TB.
PermalinkMemory Requirements
- Caching (80:20 Rule): Cache 20% of the most accessed URLs, or ~0.66 billion requests per day, requiring ~70 GB of memory.
PermalinkBandwidth
Incoming for Shortening Requests: ~304 Kbps for 40 requests per second.
Outgoing for Redirections: ~30.4 Mbps for 8,000 requests per second.
PermalinkServers
- Peak Load: At 100 million redirections per second, ~961 servers are required, assuming each handles 104,000 requests per second.
PermalinkTraffic and Load Handling
PermalinkRequests
Shortening Requests: ~76 per second.
Redirections: ~7,600 per second (1:100 shortening-to-redirection ratio).
PermalinkRead/Write Ratio
- Ratio: Approximately 200:1, with redirections far exceeding URL creation.
PermalinkCaching and Load Balancing
PermalinkCaching
Purpose: Reduces latency and database load by serving frequently accessed URLs.
Requirements: Cache 20% of requests, requiring ~70 GB of memory.
PermalinkLoad Balancing
Traffic Distribution: Use a load balancer to evenly distribute traffic across servers, ensuring fault tolerance and availability.
Scaling: Employ horizontal scaling to manage increased traffic efficiently.
PermalinkREST API
TinyURL API Documentation
Permalink1. Create Short URL
PermalinkPOST /api/create
This endpoint is used to generate a short URL from a provided long URL. It optionally allows the user to specify a custom short URL.
PermalinkRequest Body
{
"long_url": "string", // The long URL to be shortened
"api_key": "string", // The user's unique API key (required for access control)
"custom_url": "string" // Optional: The custom short URL that the user wants to create
}
long_url
: Required. The URL that needs to be shortened.api_key
: Required. A unique API key provided to each user for authentication and access control.custom_url
: Optional. The custom short link URL the user wants to create.
PermalinkResponse
{
"status": 200, // Success status
"data": {
"short_url": "string" // The generated short URL (either custom or automatically generated)
}
}
status
: Indicates the status of the request (e.g., "OK" for success).short_url
: The generated short URL (either custom or system-generated).
PermalinkError Response
If the request contains an invalid parameter (e.g., missing long_url
or api_key
), the response will be an error code.
{
"status": 400,
"message": "Invalid parameter" // Error message indicating the issue
}
Permalink2. Redirect to Long URL
PermalinkGET /{short_url}
This endpoint is used to perform a redirect from a short URL to the original long URL. It returns a 302 HTTP status code to ensure analytics tracking.
PermalinkParameters
short_url
: Required. The short URL that needs to be redirected to the original long URL.
PermalinkResponse
- HTTP 302 Redirect: The response will include a 302 status code, which performs a temporary redirect to the original long URL.
PermalinkError Response
If the short_url
does not exist or is invalid, the response will include an error code.
{
"status": 400,
"message": "Invalid URL" // Error message indicating the short URL is invalid
}
PermalinkError Codes
200 Status OK: Everything is find.
302 Redirect: Temporary Redirect
400 Bad Request: Missing or invalid parameters in the request.
401 Unauthorized: Invalid or missing API key.
404 Not Found: The short URL does not exist.
500 Internal Server Error: Unexpected error on the server.
PermalinkDatabase Schema
PermalinkDatabase Schema
Permalink1. Users Table
This table stores information related to the users of the service.
Column Name | Data Type | Description |
id | UUID | Primary Key. Unique for each User |
name | VARCHAR(255) | Name of the user. |
email | VARCHAR(255) | Email ID of the user. |
password | VARCHAR(255) | Hashed user password. |
token | VARCHAR(255) | Token for storing lastest JWT tokens. |
created_at | TIMESTAMP | The date and time when the user was registered. |
updated_a t | TIMESTAMP | The date and time when user details were last changed |
PermalinkTable Definition (SQL)
CREATE TABLE users (
id UUID PRIMARY KEY, -- Unique identifier for each user
name VARCHAR(255) NOT NULL, -- Name of the user
email VARCHAR(255) NOT NULL UNIQUE, -- Email ID, unique for each user
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Date of user registration
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Date of user updation
);
Permalink2. ShortLinks Table
This table stores information about the shortened URLs created by users.
Column Name | Data Type | Description |
short_url | VARCHAR(7) | Unique short URL (6/7 characters long). (Primary Key) |
original_url | TEXT | The original long URL that was shortened. |
user_id | UUID | The user_id from the users table, linking the short URL to the user. |
created_at | TIMESTAMP | Date and time when the short URL was created. |
updated_at | TIMESTAMP | Date and time when it was last updated |
PermalinkTable Definition (SQL)
CREATE TABLE short_links (
short_url VARCHAR(7) PRIMARY KEY, -- Unique short URL, 6/7 characters
original_url TEXT NOT NULL, -- The original long URL
user_id UUID REFERENCES users(user_id), -- Link to the user who created the short URL
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Date of short URL creation
updated_at TIMESTAMP DEFUALT CURRENT_TIMESTAMP -- Date of short URL updation
);
PermalinkURL Encoding Technique
To convert a long URL into a unique short URL, we can use hashing techniques like Base62 encoding or MD5 hashing. Both methods have their own advantages and limitations.
PermalinkEncoding in Base62
A mix of characters and numbers is used in Base62 encoding, such as:
Uppercase Letters ( A-Z ): 26 characters
Lowercase Letters ( a-z ) : 26 characters
Numbers ( 0-9 ): 10 characters
Thus, 62 characters (A-Z, a-z, 0-9) are available to us. A seven-character short URL encoded with Base62 can yield:
3,500 billion distinct combinations are available, or 62^7.
This makes Base62 perfect for creating short URLs with a wide range of possible values, as it is a major advance over Base10 encoding, which only employs the digits 0–9 and offers just 10 million choices. For the provided long URL, we may create a random number, translate it to Base62, and use that as the short URL ID.
PermalinkMD5 Hashing
The result of MD5 is 128 bits, far larger than what is required for a URL with only 7 characters. The MD5 hash's forty-three bits can be used to create a seven-character abbreviated URL. MD5 offers a wider hash space, however there may be disadvantages as well:
Collisions: When multiple long URLs yield the same short URL, MD5 hashes may generate the same output for various inputs, causing conflicts. Inaccurate redirections and data corruption may result from this.
Before allocating the produced ID to a new long URL, we must verify that it is unique by looking for pre-existing short URLs in the database.
PermalinkHigh-Level Design ( HLD )
PermalinkArchitecture Overview
The URL shortening service follows a microservice-based architecture. Key components include:
API Gateway: Acts as an entry point for users to interact with the system. It routes requests to the appropriate service (e.g., short URL creation, redirection).
URL Shortening Service: Responsible for generating and storing the short URLs. It processes shortening requests, generates short URLs, and stores the mappings in a database.
Redirection Service: Handles redirection when a user accesses a short URL, performing a lookup to find the corresponding long URL and redirecting the user.
Metrics Service: Gathers analytics about the short URLs (e.g., number of redirects, most popular links).
Database: Stores user data, short URL mappings, and analytics. A scalable database system (like PostgreSQL or NoSQL) is required to handle growing data.
Cache Layer: To reduce latency and database load, frequently accessed URLs are cached using systems like Redis or Memcached.
PermalinkComponents Breakdown
User Management: Handles user authentication, API keys, and access control.
Short URL Management: Handles creating and retrieving short URLs.
Analytics: Collects and stores user interaction data (e.g., redirection counts, geographic data).
Expiration Management: Manages link expiration based on user preferences.
PermalinkDeep Dive into HLD
PermalinkShort URL Creation Process
Request: The user sends a POST request with the long URL to be shortened.
Base62 Encoding: The system generates a unique identifier for the URL using either Base62 encoding or an MD5 hash.
Base62 provides a compact representation of numbers with a larger space of unique keys.
MD5 could offer more collision resistance but is larger in size and requires collision checks.
Database Insertion: The generated short URL is stored in the database along with the long URL.
Response: The system responds with the generated short URL.
PermalinkRedirection Process
Request: The user accesses a short URL.
URL Lookup: The system performs a lookup in the database to retrieve the corresponding long URL.
Redirect: If found, the user is redirected to the long URL. A 302 HTTP status code is used for tracking.
Caching: Frequently accessed short URLs are cached to reduce database load and speed up response time.
PermalinkAnalytics and Metrics
Collect data on how often a short URL is accessed (e.g., number of redirects, geographical information).
Provide an API to allow users to retrieve analytics on their short URLs.
PermalinkScalability
Use horizontal scaling to distribute the load across multiple servers.
Cache hot URLs to reduce database lookups and improve performance.
Implement load balancing to distribute traffic across servers.
PermalinkTrade-offs in Design
PermalinkURL Shortening Method (Base62 vs MD5)
Base62:
Pros: More compact, human-readable, and direct. 62 possible characters increase uniqueness.
Cons: Still prone to collisions in cases of high-volume traffic, and needs database checks to ensure no duplicates.
MD5:
Pros: Stronger hash function, more collision-resistant.
Cons: Larger hashes need to be truncated to fit a 7-character limit, and it’s more computationally expensive.
PermalinkDatabase Choice (SQL vs NoSQL)
SQL (e.g., PostgreSQL):
Pros: Strong consistency, easy to manage relationships between users and short URLs.
Cons: Can be harder to scale horizontally for very large datasets.
NoSQL (e.g., Cassandra, MongoDB):
Pros: Better suited for horizontal scaling and high write throughput.
Cons: Weaker consistency guarantees (depending on the configuration), complex querying capabilities may be limited.
PermalinkCaching Strategy
In-memory Cache (e.g., Redis):
Pros: Extremely fast, reduces database load, ideal for frequently accessed URLs.
Cons: Cache consistency issues (e.g., cache invalidation), memory cost could be high for large datasets.
Database-Backed Cache:
Pros: Easier to maintain consistency, no need for complex cache invalidation strategies.
Cons: Slower than in-memory cache, increases load on the database.
PermalinkLink Expiry
User-defined Expiry:
Pros: Allows flexibility for the users.
Cons: Adds complexity to the system as you need to track expiration dates.
Default Expiry:
Pros: Simplifies the system.
Cons: Less flexibility for users.
PermalinkScaling and High Availability
Vertical Scaling:
Pros: Easier to implement in the short term, fewer infrastructure changes needed.
Cons: Limits to how much you can scale a single server.
Horizontal Scaling:
Pros: Better long-term solution, allows the system to handle huge traffic volumes.
Cons: More complex, requires a load balancer, and systems to sync data across nodes.
PermalinkPotential Heading Changes
URL Encoding Technique → URL Generation Mechanism: To make it more intuitive since we are discussing both Base62 and MD5.
Database Schema → Data Model Design: This can clarify that it refers to both relational and non-relational data models, depending on the technology you choose.
Metrics Gathering → Analytics and Reporting: A broader term that encompasses both gathering metrics and providing reports/insights.