Serverless 2.0: RIP Operations
What happens when you stop managing retries and start shipping features
This is Part 2. If you haven't read Making Failure Boring Again, start there. It explains why retries shouldn't be independent and how EZThrottle coordinates failure across regions.
This article is about what you can build when Layer 7 is reliable.
Spoiler: A lot. In a weekend. Without IAM policies.
Your Servers Are Faster Than You Think
Every retry loop is CPU thrashing.
# What your server is actually doing
while True:
try:
response = requests.post(api_url)
if response.status_code == 429:
time.sleep(backoff) # Thread blocked. Doing nothing.
backoff *= 2 # Still nothing.
continue # Try again. Maybe nothing.
break
except Timeout:
time.sleep(backoff) # More nothing.
continue
That sleep() isn't free. That thread is allocated. That connection is open. Your server is waiting instead of working.
The math nobody does:
- 1000 requests/sec hitting rate limits
- Average 3 retries per request
- Average 2 second backoff per retry
- = 6000 thread-seconds of waiting per second
- = Your server is 85% waiting, 15% working
Remove the retry burden, and suddenly your Python server handles what you thought required Go. Your Go server handles what you thought required C. Your C server handles what you thought required more servers.
EZThrottle moves retries off your CPU entirely. Your server fires the request and moves on. We handle the waiting. You handle more requests.
Region Racing
Send the same request to multiple regions. First response wins. Cancel the rest.
from ezthrottle import EZThrottle, Step, StepType
client = EZThrottle(api_key="your_api_key")
result = (
Step(client)
.url("https://api.openai.com/v1/chat/completions")
.method("POST")
.headers({"Authorization": "Bearer sk-..."})
.body('{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}')
.type(StepType.PERFORMANCE)
.regions(["iad", "lax", "ord"]) # DC, LA, Chicago
.execution_mode("race") # First wins
.webhooks([{"url": "https://app.com/webhook"}])
.execute()
)
print(f"Job submitted: {result['job_id']}")
# Webhook arrives from whichever region responds first
Why this matters:
Lambda cold starts: 500ms-2s. Plus your retry logic. Plus your backoff. Plus praying the region is healthy.
EZThrottle region racing: Request already in flight to 3 regions before Lambda finishes booting. Fastest region wins. Others cancelled. No retries needed because one of them worked.
The fastest region is always faster than the average region. And dramatically faster than a cold region retrying.
"But 2 RPS Is Too Slow"
I hear this a lot. Let me address it directly.
Yes, 2 requests per second per domain is conservative. That's intentional — it's safe for every API without needing custom configuration.
But here's what people miss:
You're not limited to one provider.
What are the chances that OpenAI, Anthropic, Google, and xAI are all having outages with full queues at the same time across all regions?
Basically zero. One of them, in some region, will have capacity. That's the whole point.
When you can race requests across providers and regions, "slow" becomes "the fastest available option at this moment."
And that's still dramatically faster than Lambda cold starts + retry storms + debugging at 3am.
Fallback Racing
What if OpenAI is rate limited? Try Anthropic. If Anthropic times out, try Google. If Google fails, try xAI.
from ezthrottle import EZThrottle, Step, StepType
client = EZThrottle(api_key="your_api_key")
# Define fallback chain: OpenAI → Anthropic → Google → xAI
xai_fallback = (
Step()
.url("https://api.x.ai/v1/chat/completions")
.method("POST")
.headers({"Authorization": "Bearer xai-..."})
.body('{"model": "grok-1", "messages": [...]}')
)
google_fallback = (
Step()
.url("https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent")
.method("POST")
.headers({"Authorization": "Bearer ..."})
.body('{"contents": [...]}')
.fallback(xai_fallback, trigger_on_error=[429, 500, 502, 503, 504])
)
anthropic_fallback = (
Step()
.url("https://api.anthropic.com/v1/messages")
.method("POST")
.headers({"x-api-key": "sk-ant-...", "anthropic-version": "2023-06-01"})
.body('{"model": "claude-3-opus-20240229", "messages": [...]}')
.fallback(google_fallback, trigger_on_error=[429, 500, 502, 503, 504])
)
# Primary request with full fallback chain
result = (
Step(client)
.url("https://api.openai.com/v1/chat/completions")
.method("POST")
.headers({"Authorization": "Bearer sk-..."})
.body('{"model": "gpt-4", "messages": [...]}')
.type(StepType.PERFORMANCE)
.fallback(anthropic_fallback, trigger_on_error=[429, 500, 502, 503, 504])
.webhooks([{"url": "https://app.com/inference-complete"}])
.execute()
)
Steady stream of inference. No babysitting. No 3am pages. OpenAI having a bad day? You don't even notice — Anthropic picked it up.
Still faster than Lambda:
Even with fallback overhead, you're not:
- Waiting for cold starts
- Managing retry storms in your code
- Debugging why SQS isn't triggering
- Figuring out which IAM policy is missing
The fallback happens at the infrastructure layer. Your code just gets a webhook with the result.
Webhook Fanout
Send results to multiple services. Get quorum.
from ezthrottle import EZThrottle, Step, StepType
client = EZThrottle(api_key="your_api_key")
result = (
Step(client)
.url("https://api.stripe.com/v1/charges")
.method("POST")
.headers({"Authorization": "Bearer sk_live_..."})
.body('{"amount": 2000, "currency": "usd", "source": "tok_visa"}')
.type(StepType.PERFORMANCE)
.webhooks([
# Primary app (must succeed)
{"url": "https://app.com/payment-complete", "has_quorum_vote": True},
# Analytics (nice to have)
{"url": "https://analytics.com/track", "has_quorum_vote": False},
# CRM update (must succeed)
{"url": "https://crm.com/customer-charged", "has_quorum_vote": True},
# Backup/audit log (must succeed)
{"url": "https://audit.com/log", "has_quorum_vote": True},
# Notification service (nice to have)
{"url": "https://notify.com/send-receipt", "has_quorum_vote": False}
])
.webhook_quorum(2) # At least 2 quorum voters must succeed
.execute()
)
This replaces:
SQS Queue
→ Lambda (parse message)
→ SNS Topic
→ Lambda (send to app.com)
→ Lambda (send to analytics.com)
→ Lambda (send to crm.com)
→ Lambda (send to audit.com)
→ Lambda (send to notify.com)
→ DynamoDB (track which succeeded)
→ Step Function (check quorum)
→ Dead Letter Queue (handle failures)
→ CloudWatch (debug why it's broken)
→ IAM Policies (12 of them, one is wrong, good luck)
With one API call.
Quorum is literally the foundation of DynamoDB. You're getting Dynamo-style consistency for webhook delivery without running Dynamo.
Workflows (What You Can Build in a Weekend)
Chain success and failure handlers. Build entire pipelines.
from ezthrottle import EZThrottle, Step, StepType
client = EZThrottle(api_key="your_api_key")
# Step 4: Final analytics (runs after notification)
analytics = (
Step()
.url("https://analytics.com/track")
.method("POST")
.body('{"event": "order_complete"}')
.type(StepType.FRUGAL) # Cheap, local execution
)
# Step 3: Send notification (runs after payment)
notification = (
Step()
.url("https://api.sendgrid.com/v3/mail/send")
.method("POST")
.headers({"Authorization": "Bearer SG..."})
.body('{"to": "customer@email.com", "subject": "Order confirmed!"}')
.type(StepType.PERFORMANCE)
.regions(["iad", "lax"])
.on_success(analytics)
)
# Step 2: Failure handler (Slack alert if payment fails)
failure_alert = (
Step()
.url("https://hooks.slack.com/services/xxx/yyy/zzz")
.method("POST")
.body('{"text": "Payment failed! Check dashboard."}')
.type(StepType.FRUGAL)
)
# Step 1: Primary payment
result = (
Step(client)
.url("https://api.stripe.com/v1/charges")
.method("POST")
.headers({"Authorization": "Bearer sk_live_..."})
.body('{"amount": 5000, "currency": "usd"}')
.type(StepType.PERFORMANCE)
.regions(["iad", "lax", "ord"])
.on_success(notification)
.on_failure(failure_alert)
.webhooks([{"url": "https://app.com/order-status"}])
.execute()
)
What this would take in AWS:
- Step Functions state machine (JSON, hundreds of lines)
- 4 Lambda functions (Node.js or Python, each needs deps)
- IAM roles for each Lambda
- SQS queues between steps
- DynamoDB for state tracking
- CloudWatch for logging
- 2-3 weeks of setup and debugging
- Ongoing maintenance forever
What it takes in EZThrottle:
- The code above
- A weekend
- Zero ongoing maintenance
Legacy Code Wrappers
You have existing code. You don't want to rewrite it. You just want it to stop failing.
from ezthrottle import EZThrottle, auto_forward, ForwardToEZThrottle
import requests
client = EZThrottle(api_key="your_api_key")
@auto_forward(client, fallback_on_error=[429, 500, 502, 503])
def call_openai(prompt):
"""
Your existing function. No changes to the logic.
Just add the decorator.
"""
response = requests.post(
"https://api.openai.com/v1/chat/completions",
headers={"Authorization": "Bearer sk-..."},
json={"model": "gpt-4", "messages": [{"role": "user", "content": prompt}]}
)
response.raise_for_status() # Raises on 429, 500, etc.
return response.json()
# Call it exactly like before
result = call_openai("Write a poem about distributed systems")
# If it succeeds: you get the response
# If it hits 429/500: automatically forwarded to EZThrottle
# Webhook delivers result later
For more control, raise explicitly:
@auto_forward(client)
def process_payment(order_id, amount):
try:
response = requests.post(
"https://api.stripe.com/v1/charges",
headers={"Authorization": "Bearer sk_live_..."},
json={"amount": amount, "currency": "usd"}
)
if response.status_code == 429:
# Explicitly forward to EZThrottle with full context
raise ForwardToEZThrottle(
url="https://api.stripe.com/v1/charges",
method="POST",
headers={"Authorization": "Bearer sk_live_..."},
body=f'{{"amount": {amount}, "currency": "usd"}}',
idempotent_key=f"order_{order_id}", # Prevent duplicate charges!
webhooks=[{"url": "https://app.com/payment-complete"}],
metadata={"order_id": order_id}
)
return response.json()
except requests.Timeout:
raise ForwardToEZThrottle(
url="https://api.stripe.com/v1/charges",
method="POST",
idempotent_key=f"order_{order_id}",
webhooks=[{"url": "https://app.com/payment-complete"}]
)
# Your existing code keeps working
# Failed requests get handled automatically
# No rewrite required
Onboarding is simple:
pip install ezthrottle- Add
@auto_forwarddecorator - Deploy
- Sleep through outages
Async/Await for Webhooks
Event-driven architecture. Non-blocking. Your server never waits.
import asyncio
from ezthrottle import EZThrottle, Step, StepType
client = EZThrottle(api_key="your_api_key")
async def process_order(order):
"""Submit order processing, don't wait for completion."""
result = (
Step(client)
.url("https://api.stripe.com/v1/charges")
.method("POST")
.body(f'{{"amount": {order["amount"]}, "currency": "usd"}}')
.type(StepType.PERFORMANCE)
.idempotent_key(f"order_{order['id']}")
.webhooks([{"url": "https://app.com/order-complete"}])
.execute()
)
# Returns immediately. Webhook arrives later.
return {"order_id": order["id"], "job_id": result["job_id"]}
async def process_batch(orders):
"""Process 1000 orders concurrently. All non-blocking."""
tasks = [process_order(order) for order in orders]
results = await asyncio.gather(*tasks)
print(f"Submitted {len(results)} orders")
# Webhooks arrive as each completes
# Your server is free to handle more requests
# Submit 1000 orders in parallel
orders = [{"id": i, "amount": 1000 + i} for i in range(1000)]
asyncio.run(process_batch(orders))
FastAPI webhook handler:
from fastapi import FastAPI, Request
from ezthrottle import verify_webhook_signature_strict
from datetime import datetime
app = FastAPI()
WEBHOOK_SECRET = "your_webhook_secret"
@app.post("/order-complete")
async def handle_order_webhook(request: Request):
# Verify signature (prevent spoofing)
signature = request.headers.get("X-EZThrottle-Signature", "")
payload = await request.body()
verify_webhook_signature_strict(payload, signature, WEBHOOK_SECRET)
# Process completed order
data = await request.json()
order_id = data["metadata"]["order_id"]
status = data["status"]
if status == "success":
response_body = data["response"]["body"]
# Order succeeded - update database, send confirmation
await update_order_status(order_id, "paid")
await send_confirmation_email(order_id)
else:
# Order failed - alert support
await update_order_status(order_id, "failed")
await alert_support(order_id)
return {"ok": True}
The pattern:
- Submit request → returns immediately
- Your server handles more requests
- Webhook arrives when complete
- Process result asynchronously
- Never block. Never wait. Never sleep.
This is event-driven architecture without:
- SQS
- SNS
- EventBridge
- Lambda triggers
- Dead letter queues
- IAM policies
Just HTTP in, webhook out.
RIP Operations
What you don't manage anymore:
| Before | After |
|---|---|
| Lambda functions | Gone |
| SQS queues | Gone |
| DynamoDB tables | Gone |
| Step Functions | Gone |
| IAM policies | Gone |
| CloudWatch alarms | Gone |
| Dead letter queues | Gone |
| Retry logic | Gone |
| Exponential backoff | Gone |
| 3am pages | Gone |
What you have:
pip install ezthrottle
And an API key.
Serverless 1.0: You don't manage servers. You manage serverless infrastructure.
Serverless 2.0: You don't manage infrastructure. You ship features.
Scale your requests. Not your pods.