The Challenge
I wanted to build a scheduling platform like Cal.com with three constraints:
- Budget: Under $50/month
- Performance: Sub-100ms response times globally
- Scalability: Handle 10k+ users later
Here's the architecture I chose and why it works (plus where it breaks).
System Architecture

The Stack: 4 Components, Maximum Impact
1. Cloudflare Workers + Hono.js (The Bouncer)
Why: 100k free requests/day, sub-50ms global latency, built-in DDoS protection
What it does:
- Serves static files (HTML, CSS, JS)
- Basic auth validation (JWT verification)
- Rate limiting (100 req/min per IP)
- Blocks 80% of requests from hitting expensive servers
// Lightweight auth check app.use('/api/*', async (c, next) => { const token = c.req.header('Authorization') if (token && !await verifyJWT(token)) { return c.json({error: 'Invalid token'}, 401) } await next() })
2. Express.js on AWS EC2 (The Brain)
Why: No cold starts, predictable costs, WebSocket support
Handles: Booking logic, user management, calendar integrations, email notifications
Why not Lambda? Cold starts (2-3 seconds) kill booking experience.
3. PostgreSQL on RDS (The Memory)
Why: ACID transactions prevent double-bookings, complex availability queries
-- Find available slots (try this in MongoDB!) SELECT time_slot FROM generate_series( '09:00'::time, '17:00'::time, '30 minutes' ) WHERE NOT EXISTS ( SELECT 1 FROM bookings WHERE time_slot BETWEEN start_time AND end_time );
4. Redis + Cloudflare R2 (The Cache & Storage)
- Redis: Sessions, rate limits (runs on same EC2 = $0 extra)
- R2: Profile pics, attachments (10x cheaper than S3)
How a Booking Actually Works
- User visits
/book/john
→ Workers serve static page (50ms globally) - Selects time slot → Workers check auth, forward to Express (100ms)
- Express creates booking → PostgreSQL transaction + Redis cache (200ms)
- Background job → Send emails, sync calendar (async)
Total user wait time: 300ms
Cost Breakdown: Every Dollar Justified
- EC2 t3.small: $20 (API server)
- RDS t3.micro: $16 (PostgreSQL database)
- Cloudflare R2: $3 (file storage)
- Domain: $10
- Workers, Redis, DDoS protection: FREE
Total: $49/month
Compare to alternatives:
- Vercel + PlanetScale: $80-150/month
- Firebase: $100-300/month
- AWS Lambda setup: $60-120/month
Scaling Path: No Rewrites Needed
- Phase 1 (MVP): Current setup
- Phase 2 (1k-10k users): Add read replica ($16), upgrade EC2 ($20)
- Phase 3 (10k+ users): Load balancer, multiple instances, microservices
Each phase builds on the previous—no architecture rewrites.
Where This Architecture BREAKS 💥
Let me be honest about the weaknesses:
1. Single Point of Failure
Problem: One EC2 instance goes down = entire API is down
Reality: 99.5% uptime means ~36 hours downtime/year
Mitigation: Health checks, auto-restart, but still risky for production
2. Database Becomes Bottleneck
Problem: t3.micro has ~100 connection limit
Reality: Breaks around 1000 concurrent users
Mitigation: Connection pooling helps, but you'll hit limits fast
3. No Geographic Distribution
Problem: API server in one region (say US-East)
Reality: Users in Asia get 300-500ms latency for bookings
Impact: Poor UX for global users
4. Limited Real-time Features
Problem: WebSockets don't scale across multiple instances easily
Reality: Real-time availability updates break when you scale
Impact: Users might see stale availability data
5. Vendor Lock-in Risks
Problem: Heavy dependency on Cloudflare ecosystem
Reality: If Cloudflare changes pricing/features, you're stuck
Impact: Migration complexity increases over time
6. Development Complexity
Problem: Managing edge logic vs server logic
Reality: Debugging across Workers + EC2 is harder than monolith
Impact: Slower feature development, more bugs
7. Limited Background Job Processing
Problem: Redis queue on single EC2 instance
Reality: Email sending, calendar sync can back up and fail
Impact: Users don't get confirmations, calendars out of sync
When NOT to Use This Architecture
❌ Don't use this if:
- You need 99.99% uptime SLA
- You have global users who need <100ms API responses
- You're building complex real-time collaboration features
- Your team isn't comfortable with distributed systems debugging
- You need to handle traffic spikes >10x normal load
✅ Perfect for:
- MVP with <10k users
- Budget-conscious startups
- Simple booking workflows
- Single-region users initially
- Learning system design
Key Lessons Learned
- Start simple, scale smart - This architecture buys you time to validate product-market fit
- Edge-first saves money - 80% cost savings by handling requests at the edge
- Monitoring is crucial - Single points of failure require excellent monitoring
- Plan your breaking points - Know exactly when you'll need to upgrade each component
The Bottom Line
This architecture works brilliantly for getting to market fast and cheap. It fails when you need enterprise reliability or global performance.
My recommendation: Use this to build your MVP, get your first 1000 users, and generate revenue. Then invest that revenue in a more robust architecture.
The best architecture isn't the most scalable—it's the one that gets you to profitability.