Web Scraping Playwright SPAs April 5, 2026

How to Scrape Dynamic Websites (React, Vue, Angular) in 2026

Static scrapers using axios or requests return an empty skeleton when targeting modern SPAs — the actual content is rendered by JavaScript after the initial HTML loads. This guide covers every technique for scraping JS-heavy pages: smart waitFor strategies, API interception, infinite scroll, authentication, and using SnapAPI as a managed browser.

Why Static Scrapers Fail on SPAs

A React/Vue/Angular app typically serves an index.html with just a <div id="root"></div> and a JavaScript bundle. The actual HTML is injected by the framework after the bundle runs. axios.get(url) captures the skeleton, not the content.

Approach	Works for	Fails for
axios + Cheerio	Static HTML, SSR (Next.js, Nuxt), MPA	CSR React/Vue/Angular apps
Playwright / Puppeteer	Any page including SPAs	Blocked by advanced bot detection
API interception	Sites with accessible JSON APIs	Sites that don't expose a public API
SnapAPI /scrape	Any page, stealth mode available	Requires API key / usage credits

waitFor Strategies in Playwright

The most common scraping mistake is not waiting long enough for content to render. Playwright has several strategies — choose the right one for the page:

const { chromium } = require('playwright');

async function scrapeWithWait(url) {
  const browser = await chromium.launch({ headless: true });
  const page    = await browser.newPage();

  await page.goto(url, {
    // Strategy 1: wait until no network requests for 500ms (good default for SPAs)
    waitUntil: 'networkidle',
    timeout: 30000
  });

  // Strategy 2: wait for a specific element to appear
  await page.waitForSelector('.product-list', { timeout: 10000 });

  // Strategy 3: wait for a specific text to appear
  await page.waitForFunction(() => document.querySelector('.price')?.textContent?.includes('$'));

  // Strategy 4: wait for an API response
  await page.waitForResponse(resp => resp.url().includes('/api/products') && resp.status() === 200);

  const html = await page.content();
  await browser.close();
  return html;
}

Best strategy: Combine waitUntil: 'networkidle' with waitForSelector on the key content element. networkidle alone can resolve before lazy-loaded content appears.

Intercepting API Calls (The Fast Path)

Many SPAs fetch their data from a JSON API. Intercepting those API calls is faster and more reliable than parsing rendered HTML — you get clean structured data with no selectors.

const { chromium } = require('playwright');

async function interceptAPI(url, apiPattern) {
  const browser  = await chromium.launch({ headless: true });
  const page     = await browser.newPage();
  const captured = [];

  // Listen for API responses matching the pattern
  page.on('response', async response => {
    if (response.url().match(apiPattern) && response.status() === 200) {
      try {
        const json = await response.json();
        captured.push(json);
      } catch {}
    }
  });

  await page.goto(url, { waitUntil: 'networkidle' });
  await browser.close();
  return captured;
}

// Example: scrape product listings from an e-commerce SPA
const data = await interceptAPI(
  'https://shop.example.com/category/electronics',
  /api\/products|api\/catalog/
);
console.log(data[0]); // Raw JSON — no HTML parsing needed

Block unnecessary resources while intercepting

const page = await context.newPage();

// Block images, fonts, and stylesheets — speed up 3-5x
await page.route('**/*', route => {
  const resourceType = route.request().resourceType();
  if (['image', 'stylesheet', 'font', 'media'].includes(resourceType)) {
    return route.abort();
  }
  route.continue();
});

// But allow the API calls through (don't abort those!)

Handling Infinite Scroll

Infinite-scroll pages load more content as the user scrolls. Automate scrolling until no new items appear:

async function scrapeInfiniteScroll(url, itemSelector, maxItems = 500) {
  const browser = await chromium.launch({ headless: true });
  const page    = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle' });

  const items = new Set();

  while (items.size < maxItems) {
    // Collect currently visible items
    const newItems = await page.$$eval(itemSelector, els =>
      els.map(el => ({
        title: el.querySelector('h3, h2, .title')?.textContent?.trim(),
        href:  el.querySelector('a')?.href,
      })).filter(i => i.title)
    );

    const prevCount = items.size;
    newItems.forEach(i => items.add(JSON.stringify(i)));

    if (items.size === prevCount) break; // no new items — reached the end

    // Scroll to bottom and wait for new content
    await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
    await page.waitForTimeout(1500 + Math.random() * 1000);

    // Check for "Load more" button and click if present
    const loadMore = await page.$('button:has-text("Load more"), a:has-text("Show more")');
    if (loadMore) await loadMore.click();
  }

  await browser.close();
  return [...items].map(i => JSON.parse(i));
}

Scraping Behind Authentication

Playwright can log in and maintain session cookies across pages:

const { chromium } = require('playwright');
const fs = require('fs');

const STORAGE_PATH = './session.json';

async function loginAndSave(loginUrl, username, password) {
  const browser = await chromium.launch({ headless: true });
  const context = await browser.newContext();
  const page    = await context.newPage();

  await page.goto(loginUrl);
  await page.fill('input[name="email"]', username);
  await page.fill('input[name="password"]', password);
  await page.click('button[type="submit"]');
  await page.waitForNavigation({ waitUntil: 'networkidle' });

  // Save session state (cookies + localStorage)
  await context.storageState({ path: STORAGE_PATH });
  await browser.close();
  console.log('Session saved to', STORAGE_PATH);
}

async function scrapeAuthenticated(targetUrl) {
  if (!fs.existsSync(STORAGE_PATH)) {
    await loginAndSave(process.env.LOGIN_URL, process.env.USERNAME, process.env.PASSWORD);
  }

  const browser = await chromium.launch({ headless: true });
  // Restore saved session
  const context = await browser.newContext({ storageState: STORAGE_PATH });
  const page    = await context.newPage();

  await page.goto(targetUrl, { waitUntil: 'networkidle' });
  const html = await page.content();
  await browser.close();
  return html;
}

SnapAPI — Managed Browser for Dynamic Pages

Managing browser pools, waitFor logic, and stealth evasion in-house is expensive. SnapAPI's /v1/scrape endpoint accepts a URL and a waitFor parameter and returns fully-rendered HTML — no Playwright installation required.

const axios = require('axios');

async function scrapeSPA(url, options = {}) {
  const { data } = await axios.post('https://api.snapapi.pics/v1/scrape', {
    url,
    waitFor:   options.waitFor   ?? 'networkidle',    // or a CSS selector
    stealth:   options.stealth   ?? false,
    blockAds:  options.blockAds  ?? true,
    userAgent: options.userAgent ?? undefined,         // override UA if needed
  }, { headers: { 'X-Api-Key': process.env.SNAPAPI_KEY } });

  return data.html; // fully rendered, JS-executed HTML
}

// Wait for a specific selector to appear before returning
const html = await scrapeSPA('https://app.example.com/dashboard', {
  waitFor: '.data-table',  // CSS selector
  stealth: true
});

import httpx, asyncio, os

async def scrape_spa(url: str, wait_for: str = 'networkidle') -> str:
    async with httpx.AsyncClient() as client:
        r = await client.post(
            'https://api.snapapi.pics/v1/scrape',
            json={'url': url, 'waitFor': wait_for, 'blockAds': True},
            headers={'X-Api-Key': os.environ['SNAPAPI_KEY']},
            timeout=45
        )
        r.raise_for_status()
        return r.json()['html']

Scrape any SPA — no browser setup needed

SnapAPI renders in real Chromium and waits for your selector. One API call replaces a full Playwright setup. 200 free requests/month.

Try SnapAPI Free →