Web Scraping Anti-Bot Stealth April 5, 2026

Web Scraping Without Getting Blocked (2026 Guide)

Getting blocked while scraping is frustrating — especially when your scraper worked yesterday. Modern bot detection has layered defenses: rate limiting, IP reputation databases, JavaScript fingerprinting, behavioural analysis, and CAPTCHAs. This guide breaks down exactly how each detection method works and gives you the countermeasures.

Legal note: Always check a site's robots.txt and Terms of Service before scraping. This guide covers technical methods for legitimate use cases — price monitoring, research, and public data aggregation.

How Websites Detect Bots

Detection methodWhat it checksCountermeasure
Rate limitingRequest frequency per IPDelays + proxy rotation
IP reputationDatacenter / proxy IP rangesResidential proxies
User-AgentHeadless Chrome UA stringRealistic UA rotation
navigator.webdriverAutomation flag in JSCDP patch / stealth plugin
Browser fingerprintCanvas, WebGL, fonts, pluginsStealth evasion / real browsers
Behavioural analysisMouse movement, scroll, timingHuman-like interaction simulation
Honeypot linksHidden links only bots followCheck visibility before clicking
CAPTCHAHuman verification challenge2captcha / avoid triggering

Rate Limiting and Polite Delays

The single most common reason scrapers get blocked is too many requests too fast. A human browsing a site averages 3–10 seconds between page loads. Your scraper should be similar.

// Configurable delay between requests
function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

// Jitter: random delay between min and max ms
function jitter(min = 1000, max = 4000) {
  return sleep(Math.floor(Math.random() * (max - min) + min));
}

async function scrapePagesPolitely(urls) {
  const results = [];
  for (const url of urls) {
    try {
      const data = await fetchPage(url);
      results.push(data);
    } catch (err) {
      console.error(`Failed: ${url}`, err.message);
    }
    await jitter(1500, 5000); // wait 1.5–5s between requests
  }
  return results;
}

// p-queue for controlled concurrency
const PQueue = require('p-queue');
const queue = new PQueue({ concurrency: 2, interval: 1000, intervalCap: 2 });

const tasks = urls.map(url => () => queue.add(() => fetchPage(url)));
const results = await Promise.all(tasks.map(t => t()));

User-Agent Rotation

Default axios/requests user-agents scream "bot". Always set a realistic Chrome or Firefox UA — and rotate it to avoid pattern detection.

const USER_AGENTS = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36',
  'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36',
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15',
];

function randomUA() {
  return USER_AGENTS[Math.floor(Math.random() * USER_AGENTS.length)];
}

const axios = require('axios');

async function fetchPage(url) {
  return axios.get(url, {
    headers: {
      'User-Agent':      randomUA(),
      'Accept':          'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language': 'en-US,en;q=0.9',
      'Accept-Encoding': 'gzip, deflate, br',
      'Cache-Control':   'no-cache',
      'Pragma':          'no-cache',
      'DNT':             '1',
      'Connection':      'keep-alive',
      'Upgrade-Insecure-Requests': '1',
    },
    timeout: 15000
  });
}
Tip: Match your Accept-Language and Accept headers to realistic browser values. Detection systems look at the full header fingerprint, not just User-Agent.

Proxy Rotation

Datacenter proxies (cheap) are easily detected by IP reputation databases. For protected sites, residential or mobile proxies are more reliable — they route traffic through real consumer IPs.

const axios = require('axios');
const { HttpsProxyAgent } = require('https-proxy-agent');

const PROXIES = [
  'http://user:pass@proxy1.provider.com:8080',
  'http://user:pass@proxy2.provider.com:8080',
  'http://user:pass@proxy3.provider.com:8080',
];

function getRandomProxy() {
  return PROXIES[Math.floor(Math.random() * PROXIES.length)];
}

async function fetchWithProxy(url, retries = 3) {
  for (let i = 0; i < retries; i++) {
    const proxyUrl = getRandomProxy();
    try {
      const agent = new HttpsProxyAgent(proxyUrl);
      const { data } = await axios.get(url, {
        httpsAgent: agent,
        headers: { 'User-Agent': randomUA() },
        timeout: 20000
      });
      return data;
    } catch (err) {
      if (i === retries - 1) throw err;
      await sleep(2000 * (i + 1)); // exponential backoff
    }
  }
}

Playwright Stealth Mode

A headless Playwright browser has dozens of tell-tale fingerprints. The most important one to fix is navigator.webdriver = true, which is set by default in automation contexts. Here are the essential CDP patches:

const { chromium } = require('playwright');

async function stealthScrape(url) {
  const browser = await chromium.launch({
    headless: true,
    args: [
      '--disable-blink-features=AutomationControlled',
      '--disable-dev-shm-usage',
      '--no-sandbox',
      '--disable-setuid-sandbox',
    ]
  });

  const context = await browser.newContext({
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36',
    viewport: { width: 1366, height: 768 },
    locale: 'en-US',
    timezoneId: 'America/New_York',
    permissions: ['geolocation'],
    extraHTTPHeaders: { 'Accept-Language': 'en-US,en;q=0.9' }
  });

  const page = await context.newPage();

  // Patch automation fingerprints via CDP
  await page.addInitScript(() => {
    Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
    Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
    Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
    window.chrome = { runtime: {} };
  });

  // Block fingerprinting scripts
  await page.route('**/*', route => {
    const url = route.request().url();
    if (/fingerprintjs|botd|datadome|perimeterx/.test(url)) return route.abort();
    route.continue();
  });

  await page.goto(url, { waitUntil: 'networkidle', timeout: 30000 });

  // Human-like: random scroll before extracting
  await page.evaluate(() => window.scrollBy(0, Math.floor(Math.random() * 400 + 200)));
  await page.waitForTimeout(Math.floor(Math.random() * 1000 + 500));

  const html = await page.content();
  await browser.close();
  return html;
}
puppeteer-extra-plugin-stealth: For Puppeteer users, this plugin automates all the CDP patches above. Install: npm install puppeteer-extra puppeteer-extra-plugin-stealth. For full details, see our Puppeteer stealth guide.

SnapAPI Stealth Mode (Managed Solution)

Managing proxies, rotating fingerprints, and patching CDP is a full-time job. SnapAPI's stealth: true parameter handles all of it — residential IP rotation, fingerprint randomisation, and human-like behaviour — in a single API call.

const axios = require('axios');

async function stealthScrapeAPI(url) {
  const { data } = await axios.post('https://api.snapapi.pics/v1/scrape', {
    url,
    stealth:     true,   // residential proxy + fingerprint randomisation
    blockAds:    true,
    blockCookieBanners: true,
    waitFor:     'networkidle'
  }, { headers: { 'X-Api-Key': process.env.SNAPAPI_KEY } });

  return data.html; // fully rendered HTML, ready for Cheerio/node-html-parser
}

// Same for screenshots
async function stealthScreenshot(url) {
  const { data } = await axios.post('https://api.snapapi.pics/v1/screenshot', {
    url,
    stealth: true,
    fullPage: true,
    format: 'webp'
  }, { headers: { 'X-Api-Key': process.env.SNAPAPI_KEY } });

  return Buffer.from(data.screenshot, 'base64');
}

SnapAPI also exposes a Python SDK for the same stealth requests:

import httpx, os

async def stealth_scrape(url: str) -> str:
    async with httpx.AsyncClient() as client:
        r = await client.post(
            'https://api.snapapi.pics/v1/scrape',
            json={'url': url, 'stealth': True, 'blockAds': True, 'waitFor': 'networkidle'},
            headers={'X-Api-Key': os.environ['SNAPAPI_KEY']},
            timeout=60
        )
        r.raise_for_status()
        return r.json()['html']

Avoiding Honeypot Traps

Honeypots are invisible links or form fields that only automated tools interact with. Following a honeypot link flags your IP immediately.

const cheerio = require('cheerio');

function getVisibleLinks(html, baseUrl) {
  const $ = cheerio.load(html);
  return $('a[href]').map((_, el) => {
    const $el = $(el);
    const style = $el.attr('style') ?? '';
    const cls   = $el.attr('class') ?? '';

    // Skip hidden links (honeypots)
    const isHidden = (
      style.includes('display:none') ||
      style.includes('display: none') ||
      style.includes('visibility:hidden') ||
      style.includes('opacity:0') ||
      style.includes('font-size:0') ||
      cls.includes('hidden') ||
      cls.includes('invisible')
    );

    if (isHidden) return null;

    const href = $el.attr('href');
    if (!href || href.startsWith('javascript:') || href === '#') return null;

    try { return new URL(href, baseUrl).href; } catch { return null; }
  }).get().filter(Boolean);
}
Rule of thumb: Only interact with elements that have non-zero dimensions and are not hidden via CSS. In Playwright: check element.isVisible() before clicking.

Anti-Block Checklist

  • ✓ Set realistic User-Agent and browser-like headers
  • ✓ Add random delays (1–5s) between requests
  • ✓ Limit concurrency to 2–3 parallel requests per domain
  • ✓ Use residential proxies for well-protected sites
  • ✓ Patch navigator.webdriver and other automation signals
  • ✓ Simulate human behaviour: random scroll, mouse movement, viewport resize
  • ✓ Skip hidden elements and honeypot links
  • ✓ Respect robots.txt and Crawl-delay directives
  • ✓ Retry with exponential backoff on 429 and 503 responses
  • ✓ Monitor your error rate — rising 403s mean you're being detected

Skip the cat-and-mouse game

SnapAPI handles proxies, stealth mode, and fingerprint rotation for you. One stealth: true parameter — that's it. 200 free requests/month.

Try SnapAPI Free →