GuidesApril 5, 2026

Web Scraping Without Getting Blocked: Headers, Proxies, and Stealth Mode

Why scraping requests get blocked, and practical techniques to avoid it — from correct request headers and rate limiting to proxy rotation and browser fingerprint stealth.

Why Scrapers Get Blocked

Websites detect and block scrapers using several layers of signals. Understanding each layer helps you address them systematically rather than guessing why requests fail.

IP reputation and rate detection is the most common block mechanism. Sites track request frequency per IP address. A single IP making hundreds of requests per hour to the same domain triggers rate limits almost universally. Residential IPs are treated far more leniently than datacenter IPs — AWS, Google Cloud, and DigitalOcean IP ranges are often pre-blocked or heavily throttled by sites that have learned to recognize them.

HTTP header fingerprinting identifies non-browser requests. A real browser sends a distinctive set of headers: Accept, Accept-Language, Accept-Encoding, Connection, Sec-Fetch-*, and others that vary by browser version. A plain Python requests call sends a minimal header set that bot detection systems recognize immediately.

TLS fingerprinting examines the TLS handshake parameters — cipher suites, extensions, and their ordering — before any HTTP headers are evaluated. Python's requests library and Node.js https module produce TLS fingerprints that differ from Chromium's, allowing sites using services like Cloudflare to identify scrapers at the network layer before looking at any HTTP content.

Browser automation detection applies to headless browser scrapers. Sites run JavaScript checks for signals like navigator.webdriver === true, the presence of Playwright's __playwright global, Chrome's window.chrome.runtime being undefined in headless mode, and dozens of other automation artifacts that differ from a real browser session.

Layer 1: Request Headers

Set a realistic browser User-Agent and the supporting headers that real browsers send alongside it. An accurate Chrome User-Agent with missing Sec-Fetch-* headers is still a suspicious combination — bot detection systems look at the full header set:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
                  'AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/123.0.0.0 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
}

session = requests.Session()
session.headers.update(headers)
response = session.get('https://example.com')

Use a requests.Session() to maintain cookies and headers across requests to the same domain, mimicking a real browser session that navigates through multiple pages. Keep the User-Agent string updated — outdated Chrome versions can trigger suspicion on sites that track major release versions.

Layer 2: Rate Limiting and Human-Like Timing

Space requests with random delays to avoid the perfectly uniform inter-request timing that bot detection systems flag. A real user clicking through pages introduces variable delays:

import time
import random

def scrape_with_delay(urls: list[str], min_delay=1.5, max_delay=4.0):
    results = []
    for url in urls:
        try:
            response = session.get(url, timeout=15)
            results.append({'url': url, 'html': response.text})
        except Exception as e:
            results.append({'url': url, 'error': str(e)})
        # Random delay between requests
        time.sleep(random.uniform(min_delay, max_delay))
    return results

A 1.5 to 4 second random delay keeps request rates low enough to avoid most per-IP rate limits while still processing a meaningful number of URLs per hour. For high-value targets with aggressive rate limiting, increase the range to 5 to 15 seconds and reduce concurrency to a single thread.

Layer 3: Proxy Rotation

Residential proxy networks rotate your outgoing IP address across a pool of real consumer IPs, making each request appear to come from a different household. This addresses IP-level blocks and rate limits that target specific addresses. Commercial residential proxy providers include Bright Data, Oxylabs, and Smartproxy.

import random

PROXY_LIST = [
    "http://user:pass@proxy1.provider.com:8080",
    "http://user:pass@proxy2.provider.com:8080",
    "http://user:pass@proxy3.provider.com:8080",
]

def get_proxy():
    proxy = random.choice(PROXY_LIST)
    return {'http': proxy, 'https': proxy}

response = session.get(url, proxies=get_proxy(), timeout=15)

Residential proxies add cost — typically $5 to $15 per GB of traffic. For high-volume scraping where individual IP blocks are a persistent problem, this cost is usually justified. For moderate volumes, the header and rate limiting techniques above are often sufficient.

Layer 4: Browser Fingerprint Stealth via API

For sites that require full browser fingerprint stealth — Cloudflare-protected pages, JavaScript challenge sites, and platforms with aggressive DataDome or PerimeterX deployments — delegating rendering to a managed stealth browser API eliminates the need to maintain fingerprint patches yourself:

import requests, os

def stealth_extract(url: str, selector: str) -> str:
    response = requests.post(
        "https://api.snapapi.pics/v1/extract",
        json={"url": url, "selector": selector, "stealth": True, "wait_for": selector},
        headers={"X-Api-Key": os.environ["SNAPAPI_KEY"]},
        timeout=30
    )
    response.raise_for_status()
    return response.json()["text"]

price = stealth_extract("https://protected-shop.example.com/product/123", ".price")
print(price)

SnapAPI's stealth infrastructure handles browser fingerprint randomization, TLS fingerprint matching, proxy selection, and anti-bot bypass at the infrastructure level. Your Python code stays simple — no playwright-extra plugins to install, no fingerprint patches to maintain as detection vendors update their systems.

Start scraping without getting blocked at snapapi.pics — 200 free captures per month, no credit card required. The extract and scrape endpoints with stealth mode enabled are available on all paid plans.

Web Scraping Without Getting Blocked: Headers, Proxies, and Stealth Mode

Why Scrapers Get Blocked

Layer 1: Request Headers

Layer 2: Rate Limiting and Human-Like Timing

Layer 3: Proxy Rotation

Layer 4: Browser Fingerprint Stealth via API

Browser Fingerprinting and How to Avoid Detection

Request Pacing and Crawl Politeness

The Simplest Path: Managed Browser APIs

CAPTCHA Handling Strategies