← Back to Blog

Serverless 2.0: RIP Operations

What happens when you stop managing retries and start shipping features

By @RahmiPruitt ·

This is Part 2. If you haven't read Making Failure Boring Again, start there. It explains why retries shouldn't be independent and how EZThrottle coordinates failure across regions.

This article is about what you can build when Layer 7 is reliable.

Spoiler: A lot. In a weekend. Without IAM policies.

Your Servers Are Faster Than You Think

Every retry loop is CPU thrashing.

# What your server is actually doing
while True:
    try:
        response = requests.post(api_url)
        if response.status_code == 429:
            time.sleep(backoff)  # Thread blocked. Doing nothing.
            backoff *= 2         # Still nothing.
            continue             # Try again. Maybe nothing.
        break
    except Timeout:
        time.sleep(backoff)      # More nothing.
        continue

That sleep() isn't free. That thread is allocated. That connection is open. Your server is waiting instead of working.

The math nobody does:

Remove the retry burden, and suddenly your Python server handles what you thought required Go. Your Go server handles what you thought required C. Your C server handles what you thought required more servers.

EZThrottle moves retries off your CPU entirely. Your server fires the request and moves on. We handle the waiting. You handle more requests.

Region Racing

Send the same request to multiple regions. First response wins. Cancel the rest.

from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

result = (
    Step(client)
    .url("https://api.openai.com/v1/chat/completions")
    .method("POST")
    .headers({"Authorization": "Bearer sk-..."})
    .body('{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}')
    .type(StepType.PERFORMANCE)
    .regions(["iad", "lax", "ord"])  # DC, LA, Chicago
    .execution_mode("race")          # First wins
    .webhooks([{"url": "https://app.com/webhook"}])
    .execute()
)

print(f"Job submitted: {result['job_id']}")
# Webhook arrives from whichever region responds first

Why this matters:

Lambda cold starts: 500ms-2s. Plus your retry logic. Plus your backoff. Plus praying the region is healthy.

EZThrottle region racing: Request already in flight to 3 regions before Lambda finishes booting. Fastest region wins. Others cancelled. No retries needed because one of them worked.

The fastest region is always faster than the average region. And dramatically faster than a cold region retrying.

"But 2 RPS Is Too Slow"

I hear this a lot. Let me address it directly.

Yes, 2 requests per second per domain is conservative. That's intentional — it's safe for every API without needing custom configuration.

But here's what people miss:

You're not limited to one provider.

What are the chances that OpenAI, Anthropic, Google, and xAI are all having outages with full queues at the same time across all regions?

Basically zero. One of them, in some region, will have capacity. That's the whole point.

When you can race requests across providers and regions, "slow" becomes "the fastest available option at this moment."

And that's still dramatically faster than Lambda cold starts + retry storms + debugging at 3am.

Fallback Racing

What if OpenAI is rate limited? Try Anthropic. If Anthropic times out, try Google. If Google fails, try xAI.

from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

# Define fallback chain: OpenAI → Anthropic → Google → xAI
xai_fallback = (
    Step()
    .url("https://api.x.ai/v1/chat/completions")
    .method("POST")
    .headers({"Authorization": "Bearer xai-..."})
    .body('{"model": "grok-1", "messages": [...]}')
)

google_fallback = (
    Step()
    .url("https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent")
    .method("POST")
    .headers({"Authorization": "Bearer ..."})
    .body('{"contents": [...]}')
    .fallback(xai_fallback, trigger_on_error=[429, 500, 502, 503, 504])
)

anthropic_fallback = (
    Step()
    .url("https://api.anthropic.com/v1/messages")
    .method("POST")
    .headers({"x-api-key": "sk-ant-...", "anthropic-version": "2023-06-01"})
    .body('{"model": "claude-3-opus-20240229", "messages": [...]}')
    .fallback(google_fallback, trigger_on_error=[429, 500, 502, 503, 504])
)

# Primary request with full fallback chain
result = (
    Step(client)
    .url("https://api.openai.com/v1/chat/completions")
    .method("POST")
    .headers({"Authorization": "Bearer sk-..."})
    .body('{"model": "gpt-4", "messages": [...]}')
    .type(StepType.PERFORMANCE)
    .fallback(anthropic_fallback, trigger_on_error=[429, 500, 502, 503, 504])
    .webhooks([{"url": "https://app.com/inference-complete"}])
    .execute()
)

Steady stream of inference. No babysitting. No 3am pages. OpenAI having a bad day? You don't even notice — Anthropic picked it up.

Still faster than Lambda:

Even with fallback overhead, you're not:

The fallback happens at the infrastructure layer. Your code just gets a webhook with the result.

Webhook Fanout

Send results to multiple services. Get quorum.

from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

result = (
    Step(client)
    .url("https://api.stripe.com/v1/charges")
    .method("POST")
    .headers({"Authorization": "Bearer sk_live_..."})
    .body('{"amount": 2000, "currency": "usd", "source": "tok_visa"}')
    .type(StepType.PERFORMANCE)
    .webhooks([
        # Primary app (must succeed)
        {"url": "https://app.com/payment-complete", "has_quorum_vote": True},

        # Analytics (nice to have)
        {"url": "https://analytics.com/track", "has_quorum_vote": False},

        # CRM update (must succeed)
        {"url": "https://crm.com/customer-charged", "has_quorum_vote": True},

        # Backup/audit log (must succeed)
        {"url": "https://audit.com/log", "has_quorum_vote": True},

        # Notification service (nice to have)
        {"url": "https://notify.com/send-receipt", "has_quorum_vote": False}
    ])
    .webhook_quorum(2)  # At least 2 quorum voters must succeed
    .execute()
)

This replaces:

SQS Queue
  → Lambda (parse message)
    → SNS Topic
      → Lambda (send to app.com)
      → Lambda (send to analytics.com)
      → Lambda (send to crm.com)
      → Lambda (send to audit.com)
      → Lambda (send to notify.com)
    → DynamoDB (track which succeeded)
    → Step Function (check quorum)
    → Dead Letter Queue (handle failures)
    → CloudWatch (debug why it's broken)
    → IAM Policies (12 of them, one is wrong, good luck)

With one API call.

Quorum is literally the foundation of DynamoDB. You're getting Dynamo-style consistency for webhook delivery without running Dynamo.

Workflows (What You Can Build in a Weekend)

Chain success and failure handlers. Build entire pipelines.

from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

# Step 4: Final analytics (runs after notification)
analytics = (
    Step()
    .url("https://analytics.com/track")
    .method("POST")
    .body('{"event": "order_complete"}')
    .type(StepType.FRUGAL)  # Cheap, local execution
)

# Step 3: Send notification (runs after payment)
notification = (
    Step()
    .url("https://api.sendgrid.com/v3/mail/send")
    .method("POST")
    .headers({"Authorization": "Bearer SG..."})
    .body('{"to": "customer@email.com", "subject": "Order confirmed!"}')
    .type(StepType.PERFORMANCE)
    .regions(["iad", "lax"])
    .on_success(analytics)
)

# Step 2: Failure handler (Slack alert if payment fails)
failure_alert = (
    Step()
    .url("https://hooks.slack.com/services/xxx/yyy/zzz")
    .method("POST")
    .body('{"text": "Payment failed! Check dashboard."}')
    .type(StepType.FRUGAL)
)

# Step 1: Primary payment
result = (
    Step(client)
    .url("https://api.stripe.com/v1/charges")
    .method("POST")
    .headers({"Authorization": "Bearer sk_live_..."})
    .body('{"amount": 5000, "currency": "usd"}')
    .type(StepType.PERFORMANCE)
    .regions(["iad", "lax", "ord"])
    .on_success(notification)
    .on_failure(failure_alert)
    .webhooks([{"url": "https://app.com/order-status"}])
    .execute()
)

What this would take in AWS:

What it takes in EZThrottle:

Legacy Code Wrappers

You have existing code. You don't want to rewrite it. You just want it to stop failing.

from ezthrottle import EZThrottle, auto_forward, ForwardToEZThrottle
import requests

client = EZThrottle(api_key="your_api_key")

@auto_forward(client, fallback_on_error=[429, 500, 502, 503])
def call_openai(prompt):
    """
    Your existing function. No changes to the logic.
    Just add the decorator.
    """
    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": "Bearer sk-..."},
        json={"model": "gpt-4", "messages": [{"role": "user", "content": prompt}]}
    )
    response.raise_for_status()  # Raises on 429, 500, etc.
    return response.json()

# Call it exactly like before
result = call_openai("Write a poem about distributed systems")

# If it succeeds: you get the response
# If it hits 429/500: automatically forwarded to EZThrottle
# Webhook delivers result later

For more control, raise explicitly:

@auto_forward(client)
def process_payment(order_id, amount):
    try:
        response = requests.post(
            "https://api.stripe.com/v1/charges",
            headers={"Authorization": "Bearer sk_live_..."},
            json={"amount": amount, "currency": "usd"}
        )

        if response.status_code == 429:
            # Explicitly forward to EZThrottle with full context
            raise ForwardToEZThrottle(
                url="https://api.stripe.com/v1/charges",
                method="POST",
                headers={"Authorization": "Bearer sk_live_..."},
                body=f'{{"amount": {amount}, "currency": "usd"}}',
                idempotent_key=f"order_{order_id}",  # Prevent duplicate charges!
                webhooks=[{"url": "https://app.com/payment-complete"}],
                metadata={"order_id": order_id}
            )

        return response.json()

    except requests.Timeout:
        raise ForwardToEZThrottle(
            url="https://api.stripe.com/v1/charges",
            method="POST",
            idempotent_key=f"order_{order_id}",
            webhooks=[{"url": "https://app.com/payment-complete"}]
        )

# Your existing code keeps working
# Failed requests get handled automatically
# No rewrite required

Onboarding is simple:

  1. pip install ezthrottle
  2. Add @auto_forward decorator
  3. Deploy
  4. Sleep through outages

Async/Await for Webhooks

Event-driven architecture. Non-blocking. Your server never waits.

import asyncio
from ezthrottle import EZThrottle, Step, StepType

client = EZThrottle(api_key="your_api_key")

async def process_order(order):
    """Submit order processing, don't wait for completion."""
    result = (
        Step(client)
        .url("https://api.stripe.com/v1/charges")
        .method("POST")
        .body(f'{{"amount": {order["amount"]}, "currency": "usd"}}')
        .type(StepType.PERFORMANCE)
        .idempotent_key(f"order_{order['id']}")
        .webhooks([{"url": "https://app.com/order-complete"}])
        .execute()
    )

    # Returns immediately. Webhook arrives later.
    return {"order_id": order["id"], "job_id": result["job_id"]}

async def process_batch(orders):
    """Process 1000 orders concurrently. All non-blocking."""
    tasks = [process_order(order) for order in orders]
    results = await asyncio.gather(*tasks)

    print(f"Submitted {len(results)} orders")
    # Webhooks arrive as each completes
    # Your server is free to handle more requests

# Submit 1000 orders in parallel
orders = [{"id": i, "amount": 1000 + i} for i in range(1000)]
asyncio.run(process_batch(orders))

FastAPI webhook handler:

from fastapi import FastAPI, Request
from ezthrottle import verify_webhook_signature_strict
from datetime import datetime

app = FastAPI()
WEBHOOK_SECRET = "your_webhook_secret"

@app.post("/order-complete")
async def handle_order_webhook(request: Request):
    # Verify signature (prevent spoofing)
    signature = request.headers.get("X-EZThrottle-Signature", "")
    payload = await request.body()

    verify_webhook_signature_strict(payload, signature, WEBHOOK_SECRET)

    # Process completed order
    data = await request.json()
    order_id = data["metadata"]["order_id"]
    status = data["status"]

    if status == "success":
        response_body = data["response"]["body"]
        # Order succeeded - update database, send confirmation
        await update_order_status(order_id, "paid")
        await send_confirmation_email(order_id)
    else:
        # Order failed - alert support
        await update_order_status(order_id, "failed")
        await alert_support(order_id)

    return {"ok": True}

The pattern:

  1. Submit request → returns immediately
  2. Your server handles more requests
  3. Webhook arrives when complete
  4. Process result asynchronously
  5. Never block. Never wait. Never sleep.

This is event-driven architecture without:

Just HTTP in, webhook out.

RIP Operations

What you don't manage anymore:

Before After
Lambda functionsGone
SQS queuesGone
DynamoDB tablesGone
Step FunctionsGone
IAM policiesGone
CloudWatch alarmsGone
Dead letter queuesGone
Retry logicGone
Exponential backoffGone
3am pagesGone

What you have:

pip install ezthrottle

And an API key.

Serverless 1.0: You don't manage servers. You manage serverless infrastructure.

Serverless 2.0: You don't manage infrastructure. You ship features.

Scale your requests. Not your pods.

Get Started

SDKs available for:

Rahmi Pruitt

Founder, EZThrottle

@RahmiPruitt

Ready to RIP Operations?

Start with 1 million free requests. No credit card required.

Start Free →

© 2026 EZThrottle. The World's First API Aqueduct™

Built on BEAM by a solo founder who believes engineers deserve to sleep at night.