Stop Losing LangGraph Progress to 429 Errors

Why Your Agents Don't Scale

I've seen genuinely nice people become assholes because they get paged every weekend. I've seen organizations play Hunger Games when leadership asks who caused the post-mortem.

The reason your agents don't scale is the same reason serverless doesn't scale.

Serverless doesn't mean operationless.

You still need retry logic. You still need rate limit handling. You still need coordination across workers. You still need someone to wake up at 3am when it breaks.

LangGraph handles state management, workflow orchestration, and complex agent logic beautifully.

But when OpenRouter returns 429 at step 7 of your workflow, LangGraph can't help you. Your workflow crashes. You restart from step 1. Your engineers debug why 100 workers created a retry storm.

At some point, someone suggests: "Let's build a queue."

The Queue You'll Eventually Build

If you want agents to scale without churning through engineers, you'll need some mechanism for queuing. Not optional. It's a real infrastructure problem.

The right architecture is queue-per-URL. Each external dependency gets its own queue with its own rate limits. Stripe gets 100 RPS. OpenAI gets 50 RPS. They don't interfere with each other.

This is doable. It's not magic. It's ~2000 lines of code plus distributed state management plus health checking plus monitoring.

But here's the part nobody mentions: it's not the time to write it that kills you. It's the ongoing maintenance.

Those queues need to scale as your business grows. They need debugging when they break. They need someone on-call when they fail at 2am. They need a team.

You can build this. Many companies do.

But now you're in the infrastructure business, not the AI agent business.

Netflix didn't become Netflix by managing data centers. They specialized in streaming video and let AWS handle infrastructure.

Same principle here.

What You Have Today

Here's what most LangGraph workflows look like:

from langgraph.graph import StateGraph
from litellm import completion

def call_llm_node(state):
    try:
        response = completion(
            model="anthropic/claude-3.5-sonnet",
            messages=state["messages"],
            fallbacks=["openai/gpt-4"]
        )
        return {"messages": state["messages"] + [response]}
    except RateLimitError:
        raise  # Workflow crashes

What happens when OpenRouter rate limits at step 7:

Sequential fallback: Claude times out (5s), THEN try GPT-4 (5s) = 10s wasted
Limited to your account: All fallbacks hit YOUR quota
No coordination: 100 workers retry independently (retry storm)
Progress lost: Restart from step 1

This works fine at 10 requests/day. It breaks at 1000 requests/day.

What You Actually Want

Multi-provider, multi-account fallbacks that race instead of waiting sequentially.

When your primary OpenRouter account hits rate limits, you want the system to automatically try:

Your backup OpenRouter account
Direct Anthropic API
Direct OpenAI API
Whichever other providers you've configured

All racing simultaneously. Fastest response wins.

Coordinated retries across all your workers so 100 instances don't create a retry storm.

Webhook-based resumption so your LangGraph workflow doesn't block waiting for responses.

Idempotent execution so a 429 at step 7 resumes at step 7, not step 1.

Here's what that looks like:

def call_llm_node(state):
    result = (
        Step(ez)
        .url("https://openrouter.ai/api/v1/chat/completions")
        .method("POST")
        .headers({"Authorization": f"Bearer {OPENROUTER_KEY}"})
        .body({
            "model": "anthropic/claude-3.5-sonnet",
            "messages": state["messages"]
        })
        .type(StepType.PERFORMANCE)
        .fallback_on_error([429, 500, 503])
        .webhooks([{"url": "https://yourapp.com/langgraph-resume"}])
        .idempotent_key(f"workflow_{state['workflow_id']}_step_{state['step']}")
        .execute()
    )

    return {"job_id": result["job_id"], "status": "waiting"}

Behind the scenes, this coordinates retries across all workers, races multiple providers and accounts, and delivers results via webhook when ready.

You could build this coordination yourself. Or you could ship agents.

Fallback Racing

Sequential fallbacks waste time. You want racing.

# Define fallback chain
anthropic = Step(ez).url("https://api.anthropic.com/v1/messages")

openai = (
    Step(ez)
    .url("https://api.openai.com/v1/chat/completions")
    .fallback(anthropic, trigger_on_timeout=3000)  # Race after 3s
)

result = (
    Step(ez)
    .url("https://openrouter.ai/...")
    .fallback(openai, trigger_on_error=[429, 500])
    .execute()
)

Timeline when OpenRouter returns 429:

0ms:    OpenRouter tries
100ms:  OpenRouter 429 → OpenAI fallback fires
100ms:  OpenRouter retrying + OpenAI both racing
3100ms: OpenAI slow → Anthropic fires
3100ms: All three racing
3200ms: Anthropic wins, others cancelled

All providers race after their triggers fire. Fastest wins.

You can't do this with client-side retries. They're sequential by design.

Resuming Workflows with Webhooks

Your workflow doesn't block. It continues, and webhooks resume it when ready.

from fastapi import FastAPI, Request, BackgroundTasks

app = FastAPI()

@app.post("/langgraph-resume")
async def resume_workflow(request: Request, background_tasks: BackgroundTasks):
    data = await request.json()
    workflow_id = data["metadata"]["workflow_id"]

    if data["status"] == "success":
        llm_response = json.loads(data["response"]["body"])

        # Resume in background (don't block webhook)
        background_tasks.add_task(
            continue_workflow,
            workflow_id,
            llm_response
        )

    return {"ok": True}

async def continue_workflow(workflow_id: str, llm_response: dict):
    # Update LangGraph state
    agent.update_state(workflow_id, {
        "messages": [..., llm_response],
        "status": "complete"
    })

    # Continue from next step
    await agent.ainvoke({"workflow_id": workflow_id})

The pattern:

Submit to coordination layer → returns immediately
Workflow continues with other work
Webhook fires when LLM responds
Resume workflow from checkpoint

No blocking. No retry storms. No lost progress.

What the Industry Actually Needs

The industry needs agents that can be trusted to run for months and years without human intervention.

That means Layer 7 (HTTP) needs to be automated. Retries, rate limits, failover - all handled at the infrastructure layer, not in application code.

Right now, most teams write retry logic in every service. When it breaks, engineers get paged. When traffic spikes, retry storms happen. When providers have outages, everything falls over.

This doesn't scale. Not the technology - the people.

You can build coordination infrastructure yourself. You can dedicate a team to maintaining it. Some companies do.

Or you can treat it like AWS treats compute: infrastructure you don't manage.

The Choice

Build it yourself:

Queue per URL/dependency (the right architecture)
Distributed state coordination
Health checking and failover
Ongoing maintenance as you scale
A team to own it

Or:

Focus on agents
Let infrastructure handle reliability
Go home at 5pm

Netflix chose streaming over data centers. What will you choose?

Getting Started

If you want the patterns above without building infrastructure:

SDKs: Python | Node.js | Go

Free tier: 1M requests/month at ezthrottle.network

The coordination layer handles 20 accounts across 4 providers working like one pool.

Or build it yourself: Architecture details

My Mission

I'm working to help the industry write scalable serverless software without needing to turn on more servers and with minimal operations.

Engineers shouldn't wake up at 3am because OpenRouter rate limited. They shouldn't lose weekends debugging retry storms. They shouldn't sacrifice time with family maintaining infrastructure that leadership calls "good enough."

Layer 7 should be automated. Agents should run for months without human intervention. Engineers should go home at 5pm.

That's what I'm building toward.

Use it or don't. Build it yourself or don't.

But please: stop letting infrastructure steal your time.

Find me on X: @RahmiPruitt

Coming next: Part 2 - Surviving Regional Failures and Partial Outages

🦞