Serverless 2.0: RIP Operations

This is Part 2. If you haven't read Making Failure Boring Again, start there. It explains why retries shouldn't be independent and how EZThrottle coordinates failure across regions.

This article is about what you can build when Layer 7 is reliable.

Spoiler: A lot. In a weekend. Without IAM policies.

Your Servers Are Faster Than You Think

Every retry loop is CPU thrashing.

# What your server is actually doing
while True:
    try:
        response = requests.post(api_url)
        if response.status_code == 429:
            time.sleep(backoff)  # Thread blocked. Doing nothing.
            backoff *= 2         # Still nothing.
            continue             # Try again. Maybe nothing.
        break
    except Timeout:
        time.sleep(backoff)      # More nothing.
        continue

That sleep() isn't free. That thread is allocated. That connection is open. Your server is waiting instead of working.

The math nobody does:

1000 requests/sec hitting rate limits
Average 3 retries per request
Average 2 second backoff per retry
= 6000 thread-seconds of waiting per second
= Your server is 85% waiting, 15% working

Remove the retry burden, and suddenly your Python server handles what you thought required Go. Your Go server handles what you thought required C. Your C server handles what you thought required more servers.

EZThrottle moves retries off your CPU entirely. Your server fires the request and moves on. We handle the waiting. You handle more requests.

Region Racing

Send the same request to multiple regions. First response wins. Cancel the rest.

from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

result = (
    Step(client)
    .url("https://api.openai.com/v1/chat/completions")
    .method("POST")
    .headers({"Authorization": "Bearer sk-..."})
    .body('{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}')
    .type(StepType.PERFORMANCE)
    .regions(["iad", "lax", "ord"])  # DC, LA, Chicago
    .execution_mode("race")          # First wins
    .webhooks([{"url": "https://app.com/webhook"}])
    .execute()
)

print(f"Job submitted: {result['job_id']}")
# Webhook arrives from whichever region responds first

Why this matters:

Lambda cold starts: 500ms-2s. Plus your retry logic. Plus your backoff. Plus praying the region is healthy.

EZThrottle region racing: Request already in flight to 3 regions before Lambda finishes booting. Fastest region wins. Others cancelled. No retries needed because one of them worked.

The fastest region is always faster than the average region. And dramatically faster than a cold region retrying.

"But 2 RPS Is Too Slow"

I hear this a lot. Let me address it directly.

Yes, 2 requests per second per domain is conservative. That's intentional — it's safe for every API without needing custom configuration.

But here's what people miss:

You're not limited to one provider.

What are the chances that OpenAI, Anthropic, Google, and xAI are all having outages with full queues at the same time across all regions?

Basically zero. One of them, in some region, will have capacity. That's the whole point.

When you can race requests across providers and regions, "slow" becomes "the fastest available option at this moment."

And that's still dramatically faster than Lambda cold starts + retry storms + debugging at 3am.

Fallback Racing

What if OpenAI is rate limited? Try Anthropic. If Anthropic times out, try Google. If Google fails, try xAI.

from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

# Define fallback chain: OpenAI → Anthropic → Google → xAI
xai_fallback = (
    Step()
    .url("https://api.x.ai/v1/chat/completions")
    .method("POST")
    .headers({"Authorization": "Bearer xai-..."})
    .body('{"model": "grok-1", "messages": [...]}')
)

google_fallback = (
    Step()
    .url("https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent")
    .method("POST")
    .headers({"Authorization": "Bearer ..."})
    .body('{"contents": [...]}')
    .fallback(xai_fallback, trigger_on_error=[429, 500, 502, 503, 504])
)

anthropic_fallback = (
    Step()
    .url("https://api.anthropic.com/v1/messages")
    .method("POST")
    .headers({"x-api-key": "sk-ant-...", "anthropic-version": "2023-06-01"})
    .body('{"model": "claude-3-opus-20240229", "messages": [...]}')
    .fallback(google_fallback, trigger_on_error=[429, 500, 502, 503, 504])
)

# Primary request with full fallback chain
result = (
    Step(client)
    .url("https://api.openai.com/v1/chat/completions")
    .method("POST")
    .headers({"Authorization": "Bearer sk-..."})
    .body('{"model": "gpt-4", "messages": [...]}')
    .type(StepType.PERFORMANCE)
    .fallback(anthropic_fallback, trigger_on_error=[429, 500, 502, 503, 504])
    .webhooks([{"url": "https://app.com/inference-complete"}])
    .execute()
)

Steady stream of inference. No babysitting. No 3am pages. OpenAI having a bad day? You don't even notice — Anthropic picked it up.

Still faster than Lambda:

Even with fallback overhead, you're not:

Waiting for cold starts
Managing retry storms in your code
Debugging why SQS isn't triggering
Figuring out which IAM policy is missing

The fallback happens at the infrastructure layer. Your code just gets a webhook with the result.

Webhook Fanout

Send results to multiple services. Get quorum.

from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

result = (
    Step(client)
    .url("https://api.stripe.com/v1/charges")
    .method("POST")
    .headers({"Authorization": "Bearer sk_live_..."})
    .body('{"amount": 2000, "currency": "usd", "source": "tok_visa"}')
    .type(StepType.PERFORMANCE)
    .webhooks([
        # Primary app (must succeed)
        {"url": "https://app.com/payment-complete", "has_quorum_vote": True},

        # Analytics (nice to have)
        {"url": "https://analytics.com/track", "has_quorum_vote": False},

        # CRM update (must succeed)
        {"url": "https://crm.com/customer-charged", "has_quorum_vote": True},

        # Backup/audit log (must succeed)
        {"url": "https://audit.com/log", "has_quorum_vote": True},

        # Notification service (nice to have)
        {"url": "https://notify.com/send-receipt", "has_quorum_vote": False}
    ])
    .webhook_quorum(2)  # At least 2 quorum voters must succeed
    .execute()
)

This replaces:

SQS Queue
  → Lambda (parse message)
    → SNS Topic
      → Lambda (send to app.com)
      → Lambda (send to analytics.com)
      → Lambda (send to crm.com)
      → Lambda (send to audit.com)
      → Lambda (send to notify.com)
    → DynamoDB (track which succeeded)
    → Step Function (check quorum)
    → Dead Letter Queue (handle failures)
    → CloudWatch (debug why it's broken)
    → IAM Policies (12 of them, one is wrong, good luck)

With one API call.

Quorum is literally the foundation of DynamoDB. You're getting Dynamo-style consistency for webhook delivery without running Dynamo.

Workflows (What You Can Build in a Weekend)

Chain success and failure handlers. Build entire pipelines.

from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

# Step 4: Final analytics (runs after notification)
analytics = (
    Step()
    .url("https://analytics.com/track")
    .method("POST")
    .body('{"event": "order_complete"}')
    .type(StepType.FRUGAL)  # Cheap, local execution
)

# Step 3: Send notification (runs after payment)
notification = (
    Step()
    .url("https://api.sendgrid.com/v3/mail/send")
    .method("POST")
    .headers({"Authorization": "Bearer SG..."})
    .body('{"to": "customer@email.com", "subject": "Order confirmed!"}')
    .type(StepType.PERFORMANCE)
    .regions(["iad", "lax"])
    .on_success(analytics)
)

# Step 2: Failure handler (Slack alert if payment fails)
failure_alert = (
    Step()
    .url("https://hooks.slack.com/services/xxx/yyy/zzz")
    .method("POST")
    .body('{"text": "Payment failed! Check dashboard."}')
    .type(StepType.FRUGAL)
)

# Step 1: Primary payment
result = (
    Step(client)
    .url("https://api.stripe.com/v1/charges")
    .method("POST")
    .headers({"Authorization": "Bearer sk_live_..."})
    .body('{"amount": 5000, "currency": "usd"}')
    .type(StepType.PERFORMANCE)
    .regions(["iad", "lax", "ord"])
    .on_success(notification)
    .on_failure(failure_alert)
    .webhooks([{"url": "https://app.com/order-status"}])
    .execute()
)

What this would take in AWS:

Step Functions state machine (JSON, hundreds of lines)
4 Lambda functions (Node.js or Python, each needs deps)
IAM roles for each Lambda
SQS queues between steps
DynamoDB for state tracking
CloudWatch for logging
2-3 weeks of setup and debugging
Ongoing maintenance forever

What it takes in EZThrottle:

The code above
A weekend
Zero ongoing maintenance

Legacy Code Wrappers

You have existing code. You don't want to rewrite it. You just want it to stop failing.

from ezthrottle import EZThrottle, auto_forward, ForwardToEZThrottle
import requests

client = EZThrottle(api_key="your_api_key")

@auto_forward(client, fallback_on_error=[429, 500, 502, 503])
def call_openai(prompt):
    """
    Your existing function. No changes to the logic.
    Just add the decorator.
    """
    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": "Bearer sk-..."},
        json={"model": "gpt-4", "messages": [{"role": "user", "content": prompt}]}
    )
    response.raise_for_status()  # Raises on 429, 500, etc.
    return response.json()

# Call it exactly like before
result = call_openai("Write a poem about distributed systems")

# If it succeeds: you get the response
# If it hits 429/500: automatically forwarded to EZThrottle
# Webhook delivers result later

For more control, raise explicitly:

@auto_forward(client)
def process_payment(order_id, amount):
    try:
        response = requests.post(
            "https://api.stripe.com/v1/charges",
            headers={"Authorization": "Bearer sk_live_..."},
            json={"amount": amount, "currency": "usd"}
        )

        if response.status_code == 429:
            # Explicitly forward to EZThrottle with full context
            raise ForwardToEZThrottle(
                url="https://api.stripe.com/v1/charges",
                method="POST",
                headers={"Authorization": "Bearer sk_live_..."},
                body=f'{{"amount": {amount}, "currency": "usd"}}',
                idempotent_key=f"order_{order_id}",  # Prevent duplicate charges!
                webhooks=[{"url": "https://app.com/payment-complete"}],
                metadata={"order_id": order_id}
            )

        return response.json()

    except requests.Timeout:
        raise ForwardToEZThrottle(
            url="https://api.stripe.com/v1/charges",
            method="POST",
            idempotent_key=f"order_{order_id}",
            webhooks=[{"url": "https://app.com/payment-complete"}]
        )

# Your existing code keeps working
# Failed requests get handled automatically
# No rewrite required

Onboarding is simple:

pip install ezthrottle
Add @auto_forward decorator
Deploy
Sleep through outages

Async/Await for Webhooks

Event-driven architecture. Non-blocking. Your server never waits.

import asyncio
from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

async def process_order(order):
    """Submit order processing, don't wait for completion."""
    result = (
        Step(client)
        .url("https://api.stripe.com/v1/charges")
        .method("POST")
        .body(f'{{"amount": {order["amount"]}, "currency": "usd"}}')
        .type(StepType.PERFORMANCE)
        .idempotent_key(f"order_{order['id']}")
        .webhooks([{"url": "https://app.com/order-complete"}])
        .execute()
    )

    # Returns immediately. Webhook arrives later.
    return {"order_id": order["id"], "job_id": result["job_id"]}

async def process_batch(orders):
    """Process 1000 orders concurrently. All non-blocking."""
    tasks = [process_order(order) for order in orders]
    results = await asyncio.gather(*tasks)

    print(f"Submitted {len(results)} orders")
    # Webhooks arrive as each completes
    # Your server is free to handle more requests

# Submit 1000 orders in parallel
orders = [{"id": i, "amount": 1000 + i} for i in range(1000)]
asyncio.run(process_batch(orders))

FastAPI webhook handler:

from fastapi import FastAPI, Request
from ezthrottle import verify_webhook_signature_strict
from datetime import datetime

app = FastAPI()
WEBHOOK_SECRET = "your_webhook_secret"

@app.post("/order-complete")
async def handle_order_webhook(request: Request):
    # Verify signature (prevent spoofing)
    signature = request.headers.get("X-EZThrottle-Signature", "")
    payload = await request.body()

    verify_webhook_signature_strict(payload, signature, WEBHOOK_SECRET)

    # Process completed order
    data = await request.json()
    order_id = data["metadata"]["order_id"]
    status = data["status"]

    if status == "success":
        response_body = data["response"]["body"]
        # Order succeeded - update database, send confirmation
        await update_order_status(order_id, "paid")
        await send_confirmation_email(order_id)
    else:
        # Order failed - alert support
        await update_order_status(order_id, "failed")
        await alert_support(order_id)

    return {"ok": True}

The pattern:

Submit request → returns immediately
Your server handles more requests
Webhook arrives when complete
Process result asynchronously
Never block. Never wait. Never sleep.

This is event-driven architecture without:

SQS
SNS
EventBridge
Lambda triggers
Dead letter queues
IAM policies

Just HTTP in, webhook out.

RIP Operations

What you don't manage anymore:

Before	After
Lambda functions	Gone
SQS queues	Gone
DynamoDB tables	Gone
Step Functions	Gone
IAM policies	Gone
CloudWatch alarms	Gone
Dead letter queues	Gone
Retry logic	Gone
Exponential backoff	Gone
3am pages	Gone

What you have:

pip install ezthrottle

And an API key.

Serverless 1.0: You don't manage servers. You manage serverless infrastructure.

Serverless 2.0: You don't manage infrastructure. You ship features.

Scale your requests. Not your pods.