How to Scrape Dynamic Websites (React, Vue, Angular) in 2026
Static scrapers using axios or requests return an empty skeleton when targeting modern SPAs — the actual content is rendered by JavaScript after the initial HTML loads. This guide covers every technique for scraping JS-heavy pages: smart waitFor strategies, API interception, infinite scroll, authentication, and using SnapAPI as a managed browser.
Why Static Scrapers Fail on SPAs
A React/Vue/Angular app typically serves an index.html with just a <div id="root"></div> and a JavaScript bundle. The actual HTML is injected by the framework after the bundle runs. axios.get(url) captures the skeleton, not the content.
| Approach | Works for | Fails for |
|---|---|---|
| axios + Cheerio | Static HTML, SSR (Next.js, Nuxt), MPA | CSR React/Vue/Angular apps |
| Playwright / Puppeteer | Any page including SPAs | Blocked by advanced bot detection |
| API interception | Sites with accessible JSON APIs | Sites that don't expose a public API |
| SnapAPI /scrape | Any page, stealth mode available | Requires API key / usage credits |
waitFor Strategies in Playwright
The most common scraping mistake is not waiting long enough for content to render. Playwright has several strategies — choose the right one for the page:
const { chromium } = require('playwright');
async function scrapeWithWait(url) {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, {
// Strategy 1: wait until no network requests for 500ms (good default for SPAs)
waitUntil: 'networkidle',
timeout: 30000
});
// Strategy 2: wait for a specific element to appear
await page.waitForSelector('.product-list', { timeout: 10000 });
// Strategy 3: wait for a specific text to appear
await page.waitForFunction(() => document.querySelector('.price')?.textContent?.includes('$'));
// Strategy 4: wait for an API response
await page.waitForResponse(resp => resp.url().includes('/api/products') && resp.status() === 200);
const html = await page.content();
await browser.close();
return html;
}
waitUntil: 'networkidle' with waitForSelector on the key content element. networkidle alone can resolve before lazy-loaded content appears.
Intercepting API Calls (The Fast Path)
Many SPAs fetch their data from a JSON API. Intercepting those API calls is faster and more reliable than parsing rendered HTML — you get clean structured data with no selectors.
const { chromium } = require('playwright');
async function interceptAPI(url, apiPattern) {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
const captured = [];
// Listen for API responses matching the pattern
page.on('response', async response => {
if (response.url().match(apiPattern) && response.status() === 200) {
try {
const json = await response.json();
captured.push(json);
} catch {}
}
});
await page.goto(url, { waitUntil: 'networkidle' });
await browser.close();
return captured;
}
// Example: scrape product listings from an e-commerce SPA
const data = await interceptAPI(
'https://shop.example.com/category/electronics',
/api\/products|api\/catalog/
);
console.log(data[0]); // Raw JSON — no HTML parsing needed
Block unnecessary resources while intercepting
const page = await context.newPage();
// Block images, fonts, and stylesheets — speed up 3-5x
await page.route('**/*', route => {
const resourceType = route.request().resourceType();
if (['image', 'stylesheet', 'font', 'media'].includes(resourceType)) {
return route.abort();
}
route.continue();
});
// But allow the API calls through (don't abort those!)
Handling Infinite Scroll
Infinite-scroll pages load more content as the user scrolls. Automate scrolling until no new items appear:
async function scrapeInfiniteScroll(url, itemSelector, maxItems = 500) {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle' });
const items = new Set();
while (items.size < maxItems) {
// Collect currently visible items
const newItems = await page.$$eval(itemSelector, els =>
els.map(el => ({
title: el.querySelector('h3, h2, .title')?.textContent?.trim(),
href: el.querySelector('a')?.href,
})).filter(i => i.title)
);
const prevCount = items.size;
newItems.forEach(i => items.add(JSON.stringify(i)));
if (items.size === prevCount) break; // no new items — reached the end
// Scroll to bottom and wait for new content
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
await page.waitForTimeout(1500 + Math.random() * 1000);
// Check for "Load more" button and click if present
const loadMore = await page.$('button:has-text("Load more"), a:has-text("Show more")');
if (loadMore) await loadMore.click();
}
await browser.close();
return [...items].map(i => JSON.parse(i));
}
Scraping Behind Authentication
Playwright can log in and maintain session cookies across pages:
const { chromium } = require('playwright');
const fs = require('fs');
const STORAGE_PATH = './session.json';
async function loginAndSave(loginUrl, username, password) {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
await page.goto(loginUrl);
await page.fill('input[name="email"]', username);
await page.fill('input[name="password"]', password);
await page.click('button[type="submit"]');
await page.waitForNavigation({ waitUntil: 'networkidle' });
// Save session state (cookies + localStorage)
await context.storageState({ path: STORAGE_PATH });
await browser.close();
console.log('Session saved to', STORAGE_PATH);
}
async function scrapeAuthenticated(targetUrl) {
if (!fs.existsSync(STORAGE_PATH)) {
await loginAndSave(process.env.LOGIN_URL, process.env.USERNAME, process.env.PASSWORD);
}
const browser = await chromium.launch({ headless: true });
// Restore saved session
const context = await browser.newContext({ storageState: STORAGE_PATH });
const page = await context.newPage();
await page.goto(targetUrl, { waitUntil: 'networkidle' });
const html = await page.content();
await browser.close();
return html;
}
SnapAPI — Managed Browser for Dynamic Pages
Managing browser pools, waitFor logic, and stealth evasion in-house is expensive. SnapAPI's /v1/scrape endpoint accepts a URL and a waitFor parameter and returns fully-rendered HTML — no Playwright installation required.
const axios = require('axios');
async function scrapeSPA(url, options = {}) {
const { data } = await axios.post('https://api.snapapi.pics/v1/scrape', {
url,
waitFor: options.waitFor ?? 'networkidle', // or a CSS selector
stealth: options.stealth ?? false,
blockAds: options.blockAds ?? true,
userAgent: options.userAgent ?? undefined, // override UA if needed
}, { headers: { 'X-Api-Key': process.env.SNAPAPI_KEY } });
return data.html; // fully rendered, JS-executed HTML
}
// Wait for a specific selector to appear before returning
const html = await scrapeSPA('https://app.example.com/dashboard', {
waitFor: '.data-table', // CSS selector
stealth: true
});
import httpx, asyncio, os
async def scrape_spa(url: str, wait_for: str = 'networkidle') -> str:
async with httpx.AsyncClient() as client:
r = await client.post(
'https://api.snapapi.pics/v1/scrape',
json={'url': url, 'waitFor': wait_for, 'blockAds': True},
headers={'X-Api-Key': os.environ['SNAPAPI_KEY']},
timeout=45
)
r.raise_for_status()
return r.json()['html']
Scrape any SPA — no browser setup needed
SnapAPI renders in real Chromium and waits for your selector. One API call replaces a full Playwright setup. 200 free requests/month.
Try SnapAPI Free →