Build a Proxy SaaS with Playwright

A proxy SaaS offers scraping or browser automation as a service. Customers send URLs or tasks; your platform runs Playwright through proxies and returns the data. This guide covers the architecture, how to integrate Playwright and proxies in workers, proxy allocation strategies, scaling, failure handling, billing, and what to monitor.

What You're Building

Typical flow:

Customer submits a job via API (URL, extraction rules, or high-level task).
Your API validates the request, enqueues the job, returns a job ID.
Worker picks up the job from the queue, launches Playwright with a proxy, performs the scrape.
Result is stored (DB, object storage) or sent via webhook. Customer polls or receives callback.

You need: proxy backend (residential provider), automation layer (Playwright), job queue (Redis, SQS, etc.), workers, API with auth and quotas, and billing/usage tracking.

Architecture Overview

Proxy layer: Residential proxy provider as backend. Each worker gets proxy credentials. Traffic exits through the provider's IPs. You resell or bundle proxy access with automation. You don't run proxies yourself.

Automation layer: Playwright runs in workers. Each job gets its own browser instance and thus its own proxy session. Playwright handles JS, cookies, and anti-bot targets better than raw HTTP.

API/gateway: REST or similar. Customers post jobs, poll for results, or receive webhooks. Authenticate with API keys. Enforce rate limits and quotas per customer.

Queue: Redis, SQS, RabbitMQ. Decouples API from workers. Handles retries, backoff, and prioritization. Workers pull jobs; failed jobs can be retried or moved to a dead-letter queue.

Billing: Track usage (requests, pages, bandwidth). Enforce quotas. Charge per plan or usage.

Worker Implementation: Playwright + Proxy

Each worker process runs Playwright. For each job, it launches a browser with the appropriate proxy:

python

def run_scrape_job(job):
    proxy = get_proxy_for_job(job)  # from pool or customer config
    with sync_playwright() as p:
        browser = p.chromium.launch(
            proxy={"server": proxy["server"], "username": proxy["user"], "password": proxy["pass"]}
        )
        page = browser.new_page()
        page.goto(job["url"])
        data = extract_data(page, job.get("rules"))
        browser.close()
    return data

get_proxy_for_job might pull from a shared pool, assign per-customer configs, or use geo parameters from the job. Proxy config is decided by your platform, not the customer. Each job = one browser = one IP from your pool.

Proxy Allocation Strategies

Shared pool: All customers share one proxy pool. Simple. Noisy customers can affect others. Use when starting out or when usage is low.

Per-customer quotas: Each customer gets a quota (e.g. X concurrent sessions or Y requests/min). Isolates abuse. Allows tiers (enterprise = more capacity).

Geo assignment: Jobs specify region (US, DE, etc.). Route to geo-specific proxy configs. Requires provider to support geo.

Sticky vs rotating: Use rotating for independent scrapes. Use sticky when a job needs login or multi-step flow. Pass session ID in proxy config for sticky.

Handling Failures

Retry with new IP: On 403, timeout, or CAPTCHA, close the browser and retry with a new one (new IP). Use exponential backoff (1s, 2s, 4s) between retries. Limit retries (e.g. 3) to avoid infinite loops.

Circuit breaker: If a domain fails repeatedly (e.g. 10 failures in a row), pause jobs for that domain for 5–10 minutes. Avoids wasting proxy budget on a target that's blocking everyone.

Dead letter queue (DLQ): After N retries, move failed jobs to a DLQ. Notify the customer or alert your team. Enables debugging and avoids blocking the main queue.

Scaling Workers

Vertical: Bigger machines, more CPU/RAM. Playwright is heavy; 4–8 workers per machine is typical. Use when you need more throughput per machine.

Horizontal: More machines, more workers. Queue distributes jobs. Scale by adding worker instances. Ensure proxy pool size grows with total worker count—each worker needs IPs.

Concurrency per domain: Even with many workers, cap how many hit the same domain at once. Otherwise you trigger rate limits regardless of IP count. Use a domain-level semaphore or rate limiter.

Billing and Quotas

Common models:

Per request — Charge per URL scraped or per page load.
Per bandwidth — Charge for data transferred.
Subscription tiers — X requests/month per plan.
Proxy usage — Charge for proxy IP-minutes or concurrent sessions.

Implement quotas in your API: reject or throttle when a customer exceeds their limit. Track usage in a DB for invoicing. Emit usage events for real-time dashboards.

Operational Considerations

Cost control: Proxy and compute are the main costs. Optimize browser lifecycle—close promptly, avoid leaks. Use spot/preemptible instances for workers if workload allows. Monitor cost per request.

Compliance: Respect target sites' terms of service and robots.txt. Document acceptable use for customers. Consider data residency (e.g. EU) if you store results.

Monitoring: Track job success rate, latency (p50, p95), block rate, proxy utilization, queue depth. Alerts on degradation, quota breaches, or queue backlog.

Summary

Proxy SaaS = API + queue + workers (Playwright) + proxy backend + billing.
Each worker runs Playwright with proxy config from your pool or per-customer allocation.
Scale workers horizontally; ensure proxy pool grows with them.
Retry with new IP on failure; use circuit breakers for bad domains.
Enforce quotas and track usage for billing.
Monitor success rate, latency, and cost.

Further reading: Proxy Pool Design · Scaling Scraping with Playwright · Playwright Proxy Setup

Build a Proxy SaaS with Playwright

Key Takeaways