URL Shortener System Design
Design a scalable URL shortener like Bitly or TinyURL with detailed architecture, API design, database schema, caching strategy, and scalability considerations.
What is a URL Shortener?
A URL shortener converts long URLs into short, shareable links. Services like Bitly, TinyURL, and bit.ly are popular examples used by millions daily.
Example:
Long URL: https://example.com/very/long/article/path/with/many/parameters?utm_source=twitter&utm_campaign=2024
Short URL: https://cwvenu.in/a1b2c3
Requirements Analysis
Functional Requirements
- URL Shortening: Generate a unique short code for any given long URL
- URL Redirection: Redirect users from short URL to original long URL
- Custom Aliases: Allow users to create custom short codes (e.g., cwvenu.in/mylink)
- Analytics: Track clicks, geographic location, referrers, and timestamps
- Expiration: Support URL expiration after a certain time period
- User Accounts: Optional user registration to manage their URLs
- API Access: Provide REST API for programmatic access
Non-Functional Requirements
- High Availability: 99.99% uptime
- Low Latency: Redirection should happen in < 100ms
- Scalability: Handle millions of URLs and billions of redirects
- Durability: URLs should never be lost
- Security: Prevent abuse, spam, and malicious URLs
Capacity Estimation
Assumptions:
- 100 million new URLs per month
- 100:1 read-to-write ratio (10 billion redirects per month)
- Average URL size: 500 bytes
- Store URLs for 5 years
Storage:
- URLs per month: 100M
- URLs in 5 years: 100M × 12 × 5 = 6 billion URLs
- Storage needed: 6B × 500 bytes = 3 TB
Bandwidth:
- Write: 100M URLs/month = ~40 URLs/second
- Read: 10B redirects/month = ~4,000 redirects/second
- Peak traffic: 3-5x average = 20,000 redirects/second
API Design
1. Create Short URL
POST /api/v1/shorten
Content-Type: application/json
{
"longUrl": "https://example.com/very/long/url",
"customAlias": "mylink", // optional
"expiresAt": "2024-12-31T23:59:59Z" // optional
}
Response:
{
"shortUrl": "https://cwvenu.in/a1b2c3",
"shortCode": "a1b2c3",
"longUrl": "https://example.com/very/long/url",
"createdAt": "2024-01-15T10:30:00Z",
"expiresAt": "2024-12-31T23:59:59Z"
}
2. Redirect Short URL
GET /{shortCode}
Response: 302 Redirect
Location: https://example.com/very/long/url
3. Get URL Analytics
GET /api/v1/analytics/{shortCode}
Response:
{
"shortCode": "a1b2c3",
"totalClicks": 15420,
"clicksByDate": [...],
"clicksByCountry": {...},
"topReferrers": [...]
}
4. Delete URL
DELETE /api/v1/urls/{shortCode}
Response: 204 No Content
Database Schema
URLs Table
CREATE TABLE urls (
id BIGSERIAL PRIMARY KEY,
short_code VARCHAR(10) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
expires_at TIMESTAMP,
is_active BOOLEAN DEFAULT TRUE,
INDEX idx_short_code (short_code),
INDEX idx_user_id (user_id),
INDEX idx_created_at (created_at)
);
Analytics Table
CREATE TABLE url_analytics (
id BIGSERIAL PRIMARY KEY,
short_code VARCHAR(10) NOT NULL,
clicked_at TIMESTAMP NOT NULL DEFAULT NOW(),
ip_address VARCHAR(45),
country VARCHAR(2),
city VARCHAR(100),
referrer TEXT,
user_agent TEXT,
INDEX idx_short_code_clicked (short_code, clicked_at),
INDEX idx_clicked_at (clicked_at)
);
Users Table (Optional)
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
api_key VARCHAR(64) UNIQUE,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
INDEX idx_email (email),
INDEX idx_api_key (api_key)
);
Short Code Generation
Approach 1: Base62 Encoding
Use Base62 (a-z, A-Z, 0-9) to encode a unique ID:
public class Base62Encoder {
private static final String BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static String encode(long num) {
StringBuilder sb = new StringBuilder();
while (num > 0) {
sb.append(BASE62.charAt((int)(num % 62)));
num /= 62;
}
return sb.reverse().toString();
}
public static long decode(String str) {
long num = 0;
for (char c : str.toCharArray()) {
num = num * 62 + BASE62.indexOf(c);
}
return num;
}
}
Pros:
- Predictable length
- No collisions if using sequential IDs
Cons:
- Sequential IDs can be guessed
- Requires distributed ID generation
Approach 2: Random Generation with Collision Check
public String generateShortCode() {
String characters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
Random random = new Random();
int maxAttempts = 5;
for (int attempt = 0; attempt < maxAttempts; attempt++) {
StringBuilder code = new StringBuilder();
for (int i = 0; i < 7; i++) {
code.append(characters.charAt(random.nextInt(62)));
}
String shortCode = code.toString();
if (!urlRepository.existsByShortCode(shortCode)) {
return shortCode;
}
}
throw new RuntimeException("Failed to generate unique short code");
}
Pros:
- Simple implementation
- Non-sequential codes
Cons:
- Potential collisions
- Performance degrades as database fills up
Approach 3: Pre-generated Keys (Recommended for Scale)
Use a separate key generation service that pre-generates and stores unused keys:
Key Generation Service → Key Database (unused keys)
↓
URL Service (consumes keys)
High-Level Architecture
┌─────────────┐
│ Client │
└──────┬──────┘
│
↓
┌─────────────────┐
│ Load Balancer │
└──────┬──────────┘
│
↓
┌──────────────────────────────────┐
│ API Gateway / CDN │
└──────┬───────────────────────────┘
│
↓
┌──────────────────────────────────┐
│ Application Servers │
│ (URL Service, Analytics) │
└──────┬───────────────────────────┘
│
├─────────────┬──────────────┬──────────────┐
↓ ↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Redis │ │ Database │ │ Queue │ │ Object │
│ Cache │ │ (Primary)│ │(Analytics)│ │ Storage │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│
↓
┌──────────┐
│ Database │
│(Replicas)│
└──────────┘
Caching Strategy
Cache Hot URLs
@Service
public class UrlService {
@Autowired
private RedisTemplate<String, String> redisTemplate;
@Autowired
private UrlRepository urlRepository;
public String getLongUrl(String shortCode) {
// Try cache first
String longUrl = redisTemplate.opsForValue().get("url:" + shortCode);
if (longUrl != null) {
return longUrl;
}
// Cache miss - fetch from database
Url url = urlRepository.findByShortCode(shortCode)
.orElseThrow(() -> new NotFoundException("URL not found"));
// Store in cache with TTL
redisTemplate.opsForValue().set(
"url:" + shortCode,
url.getLongUrl(),
Duration.ofHours(24)
);
return url.getLongUrl();
}
}
Cache Eviction Policy
- LRU (Least Recently Used): Remove least accessed URLs
- TTL: Set 24-hour expiration for cached entries
- Cache Size: Keep top 20% of URLs (80/20 rule)
Scalability Considerations
1. Database Sharding
Shard by short_code hash:
Shard 0: short_codes starting with [0-3]
Shard 1: short_codes starting with [4-7]
Shard 2: short_codes starting with [8-b]
Shard 3: short_codes starting with [c-f]
2. Read Replicas
- Use read replicas for analytics queries
- Master handles writes, replicas handle reads
- Eventual consistency is acceptable for analytics
3. CDN for Static Content
- Cache redirect responses at edge locations
- Reduce latency for global users
- Handle DDoS attacks
4. Async Analytics Processing
@Service
public class RedirectService {
@Autowired
private KafkaTemplate<String, ClickEvent> kafkaTemplate;
public void redirect(String shortCode, HttpServletRequest request) {
// Send analytics event asynchronously
ClickEvent event = new ClickEvent(
shortCode,
request.getRemoteAddr(),
request.getHeader("User-Agent"),
request.getHeader("Referer")
);
kafkaTemplate.send("url-clicks", event);
}
}
5. Rate Limiting
Prevent abuse with rate limiting:
@Component
public class RateLimiter {
@Autowired
private RedisTemplate<String, String> redisTemplate;
public boolean isAllowed(String apiKey) {
String key = "rate:" + apiKey;
Long count = redisTemplate.opsForValue().increment(key);
if (count == 1) {
redisTemplate.expire(key, Duration.ofMinutes(1));
}
return count <= 100; // 100 requests per minute
}
}
Security Considerations
1. Malicious URL Detection
- Integrate with Google Safe Browsing API
- Maintain blacklist of known malicious domains
- Scan URLs before shortening
2. Prevent Abuse
- Require CAPTCHA for anonymous users
- Rate limit by IP address
- Require API keys for programmatic access
3. HTTPS Only
- Enforce HTTPS for all short URLs
- Prevent man-in-the-middle attacks
Trade-offs and Design Decisions
Base62 vs Random Generation
| Aspect | Base62 | Random | |--------|--------|--------| | Collision Risk | None | Low but exists | | Predictability | High | Low | | Performance | Fast | Slower (collision check) | | Scalability | Requires distributed ID | Simpler |
Decision: Use Base62 with distributed ID generation for production scale.
SQL vs NoSQL
| Aspect | SQL (PostgreSQL) | NoSQL (Cassandra) | |--------|------------------|-------------------| | ACID | Full support | Limited | | Scalability | Vertical + Sharding | Horizontal | | Queries | Complex queries | Simple lookups | | Consistency | Strong | Eventual |
Decision: Use PostgreSQL for URLs (need ACID), Cassandra for analytics (high write volume).
Sync vs Async Analytics
Decision: Use async processing with Kafka to avoid impacting redirect latency.
Interview Questions
Q1: How would you handle 1 million requests per second?
Answer:
- Use CDN to cache popular URLs at edge locations
- Implement Redis cluster for distributed caching
- Shard database across multiple servers
- Use read replicas for analytics queries
- Implement rate limiting and DDoS protection
- Use async processing for analytics
Q2: How do you prevent short code collisions?
Answer:
- Use Base62 encoding with distributed unique ID generation (Snowflake)
- If using random generation, check database before inserting
- Use database unique constraint as final safety net
- Pre-generate keys in a separate service
Q3: How would you implement custom aliases?
Answer:
- Check if custom alias is available
- Validate alias (length, characters, not reserved words)
- Store in same table with is_custom flag
- Charge premium for custom aliases
- Prevent offensive or trademarked aliases
Q4: How do you handle URL expiration?
Answer:
- Store expires_at timestamp in database
- Run periodic cleanup job to mark expired URLs as inactive
- Check expiration during redirect
- Return 404 for expired URLs
- Optionally notify users before expiration
Q5: How would you implement analytics without impacting redirect performance?
Answer:
- Send click events to message queue (Kafka)
- Process events asynchronously in background workers
- Store aggregated data in separate analytics database
- Use batch processing for historical analytics
- Cache frequently accessed analytics data
Conclusion
A URL shortener seems simple but involves many interesting system design challenges:
- Unique ID generation at scale
- High-throughput read operations
- Low-latency redirects
- Distributed caching strategies
- Analytics processing without impacting performance
The key is to make the right trade-offs based on your specific requirements and scale.