Webpage Capture API in Python 2026 — Requests, httpx & Async Examples

Why Python Developers Replace Selenium and Playwright with a Capture API

The standard Python stack for programmatic webpage capture is Selenium or Playwright running a local headless Chromium. That works in a Jupyter notebook and falls apart the moment you try to ship it. Chromium adds 300+ MB to every Docker image, which pushes Lambda packages past the 250 MB unzipped limit and turns ECS task definitions into slow-rolling deployments. Each Chromium instance holds 400 to 600 MB of resident memory, so a FastAPI worker handling a dozen concurrent capture requests needs multiple gigabytes of RAM dedicated purely to browsers. And the crashes — OOM kills, zombie renderer processes, X11 failures in minimal Linux images — all need watchdog logic that has nothing to do with your business problem. A hosted webpage capture API replaces all of that with a single HTTPS call: pass the URL, receive the PNG or PDF bytes, move on.

Minimal Requests Example

import os, requests

API_KEY = os.environ["SNAPAPI_KEY"]

def capture(url, format="png", full_page=True, width=1280, height=800):
    params = {
        "access_key": API_KEY,
        "url": url,
        "format": format,
        "full_page": "1" if full_page else "0",
        "viewport_width": width,
        "viewport_height": height,
    }
    r = requests.get("https://snapapi.pics/screenshot", params=params, timeout=30)
    r.raise_for_status()
    return r.content

png = capture("https://example.com")
with open("out.png", "wb") as f:
    f.write(png)

pdf = capture("https://example.com", format="pdf")
with open("out.pdf", "wb") as f:
    f.write(pdf)

Async with httpx

For any pipeline doing more than a handful of captures, sync requests blocks the whole worker on network I/O. httpx with asyncio gives you real concurrency without spawning threads:

import os, asyncio, httpx

API_KEY = os.environ["SNAPAPI_KEY"]
BASE = "https://snapapi.pics/screenshot"

async def capture(client, url, format="png"):
    params = {"access_key": API_KEY, "url": url, "format": format, "full_page": "1"}
    r = await client.get(BASE, params=params, timeout=30.0)
    r.raise_for_status()
    return url, r.content

async def capture_many(urls, concurrency=8):
    sem = asyncio.Semaphore(concurrency)
    async with httpx.AsyncClient() as client:
        async def bounded(u):
            async with sem:
                return await capture(client, u)
        return await asyncio.gather(*(bounded(u) for u in urls), return_exceptions=True)

urls = ["https://example.com", "https://python.org", "https://fastapi.tiangolo.com"]
results = asyncio.run(capture_many(urls))
for r in results:
    if isinstance(r, Exception):
        print("err:", r)
    else:
        url, data = r
        print(url, len(data), "bytes")

FastAPI Download Endpoint

Wrap the API behind your own FastAPI route so the rest of your app doesn't need to know where the screenshot came from. Stream the response straight through to avoid buffering multi-megabyte full-page screenshots in memory:

import os, httpx
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse

app = FastAPI()
API_KEY = os.environ["SNAPAPI_KEY"]

@app.get("/capture")
async def capture(url: str, format: str = "png"):
    params = {"access_key": API_KEY, "url": url, "format": format, "full_page": "1"}
    client = httpx.AsyncClient(timeout=30.0)
    req = client.build_request("GET", "https://snapapi.pics/screenshot", params=params)
    upstream = await client.send(req, stream=True)

    if upstream.status_code != 200:
        await upstream.aclose()
        await client.aclose()
        raise HTTPException(status_code=502, detail="capture failed")

    mime = "application/pdf" if format == "pdf" else f"image/{format}"
    filename = f"capture.{ 'pdf' if format == 'pdf' else format }"

    async def pipe():
        async for chunk in upstream.aiter_bytes():
            yield chunk
        await upstream.aclose()
        await client.aclose()

    return StreamingResponse(
        pipe(),
        media_type=mime,
        headers={"Content-Disposition": f'attachment; filename="{filename}"'},
    )

Celery Batch Jobs

For pipelines that process thousands of URLs (competitor monitoring, archival, QA snapshots), hand each capture to a Celery task and let the broker handle concurrency:

import os, requests
from celery import Celery

app = Celery("captures", broker=os.environ["REDIS_URL"])
API_KEY = os.environ["SNAPAPI_KEY"]

@app.task(autoretry_for=(requests.HTTPError,), retry_backoff=True, max_retries=3)
def capture_to_s3(url: str, bucket: str, key: str):
    import boto3
    params = {"access_key": API_KEY, "url": url, "format": "png", "full_page": "1"}
    r = requests.get("https://snapapi.pics/screenshot", params=params, timeout=30)
    r.raise_for_status()

    s3 = boto3.client("s3")
    s3.put_object(
        Bucket=bucket,
        Key=key,
        Body=r.content,
        ContentType="image/png",
        CacheControl="public, max-age=86400",
    )
    return f"s3://{bucket}/{key}"

Error Handling and Retries You Actually Need

The three failure modes worth writing code for are HTTP 429 (rate limited), 5xx (transient backend hiccups), and connection resets. Everything else — bad URLs, timeouts on the target site, auth errors — should surface to the caller immediately. The retry pattern that survives is exponential backoff with jitter, capped at three attempts, with the Retry-After header respected when the server returns 429:

import time, random, requests

def capture_with_retry(url, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            r = requests.get("https://snapapi.pics/screenshot", params={
                "access_key": os.environ["SNAPAPI_KEY"],
                "url": url, "format": "png", "full_page": "1",
            }, timeout=30)
            if r.status_code == 429:
                wait = int(r.headers.get("Retry-After", 2 ** attempt))
                time.sleep(wait + random.random())
                continue
            if 500 <= r.status_code < 600:
                time.sleep((2 ** attempt) + random.random())
                continue
            r.raise_for_status()
            return r.content
        except (requests.ConnectionError, requests.Timeout):
            if attempt == max_attempts - 1:
                raise
            time.sleep((2 ** attempt) + random.random())
    raise RuntimeError("capture failed after retries")

When to Use Hosted vs. Self-Hosted

Self-hosted Playwright still makes sense if you need full programmatic control of the browser — intercepting requests mid-flight, injecting cookies, or running end-to-end test suites. For webpage capture specifically — feeding pages through a pipeline, generating thumbnails, archiving content, producing PDF reports — a hosted API is nearly always cheaper once you factor in the ops cost of running Chromium fleets. SnapAPI handles the browser pool, stealth evasions, device emulation, ad and cookie blocking, and full-page rendering with sticky headers out of the box.

Start Capturing Webpages from Python in Under a Minute

SnapAPI's free tier gives you 200 captures per month — enough to prototype a pipeline, test the retry logic, and benchmark real network latency from your environment before committing. Grab a key at snapapi.pics/register and drop it into any of the examples above. No browser binaries, no Dockerfile surgery, no memory monitoring.

Get Your Free API Key