Why Node.js Developers Drop Puppeteer for a Scraping API

Puppeteer is the default answer to "how do I scrape a JavaScript-rendered site in Node.js" — and it works perfectly in local development. Production is a different story. Puppeteer bundles a 300 MB Chromium binary that makes Docker images too large for many serverless platforms. AWS Lambda's 250 MB unzipped layer limit makes vanilla Puppeteer deployments impossible without custom builds. Vercel, Netlify, and Cloudflare Workers have function size limits that exclude headless browser bundles entirely. Even on dedicated servers, managing a pool of Chromium instances — handling memory leaks, recovering from crashes, managing concurrent sessions — requires operational code that is often more complex than the scraping logic itself. SnapAPI exposes scraping as a REST endpoint: a single fetch call from Node.js returns the rendered page content, with no browsers, no Docker configuration, and no crash recovery code on your side.

Basic Scraping with fetch (Node 18+)

Node.js 18 and above includes native fetch. Scraping a JavaScript-rendered page requires no npm dependencies beyond your own project:

const apiKey = process.env.SNAPAPI_KEY;

async function scrape(url, output = 'markdown') {
  const params = new URLSearchParams({ access_key: apiKey, url, output });
  const res = await fetch('https://snapapi.pics/scrape?' + params);
  if (!res.ok) throw new Error('Scrape failed: ' + res.status);
  return res.json();
}

const result = await scrape('https://example.com/blog');
console.log(result.title);    // page title
console.log(result.content);  // cleaned markdown
console.log(result.links);    // array of outbound links

Structured Data Extraction

The extract endpoint returns typed JSON matching a schema you define. This replaces CSS selector chains and manual DOM parsing — describe the data structure and SnapAPI fills it from the rendered page:

async function extract(url, schema) {
  const res = await fetch('https://snapapi.pics/extract', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ access_key: apiKey, url, schema })
  });
  return res.json();
}

// Extract job listing data
const job = await extract('https://example.com/jobs/senior-engineer', {
  title: 'string',
  company: 'string',
  location: 'string',
  salary_range: 'string',
  remote: 'boolean',
  requirements: 'array of strings',
  posted_date: 'string'
});
console.log(JSON.stringify(job, null, 2));

Axios Integration with Retry and Timeout

For projects already using axios, adding SnapAPI calls is straightforward. Here is an axios-based scraper with exponential backoff retry using axios-retry:

const axios = require('axios');
const axiosRetry = require('axios-retry');

const client = axios.create({ baseURL: 'https://snapapi.pics', timeout: 30000 });
axiosRetry(client, {
  retries: 3,
  retryDelay: axiosRetry.exponentialDelay,
  retryCondition: (err) =>
    err.response?.status === 429 || err.response?.status >= 500
});

async function scrapeWithAxios(url) {
  const { data } = await client.get('/scrape', {
    params: { access_key: process.env.SNAPAPI_KEY, url, output: 'text' }
  });
  return data;
}

// Batch scraping with concurrency control
const pLimit = require('p-limit');
const limit = pLimit(5);  // max 5 concurrent requests

const urls = ['https://example.com/p1', 'https://example.com/p2', 'https://example.com/p3'];
const results = await Promise.all(urls.map(u => limit(() => scrapeWithAxios(u))));

Building a Scraping Pipeline with Cheerio

Cheerio provides jQuery-style DOM manipulation for Node.js. Combine SnapAPI scraping with Cheerio processing to extract specific elements from rendered pages. SnapAPI handles the JavaScript rendering and bot bypass; Cheerio handles the element selection from the resulting HTML:

const cheerio = require('cheerio');

async function extractLinks(url) {
  const result = await scrape(url, 'html');  // get full rendered HTML
  const $ = cheerio.load(result.html);
  const links = [];
  $('a[href]').each((_, el) => {
    const href = $(el).attr('href');
    const text = $(el).text().trim();
    if (href && href.startsWith('http')) links.push({ href, text });
  });
  return links;
}

async function extractPrices(url) {
  const result = await scrape(url, 'html');
  const $ = cheerio.load(result.html);
  return $('.price, [class*="price"], [data-price]').map((_, el) => ({
    text: $(el).text().trim(),
    selector: el.attribs.class || el.attribs['data-price']
  })).get();
}

Scraping in Next.js API Routes and Edge Functions

SnapAPI integrates cleanly with Next.js API routes since it is a pure HTTP call with no browser dependency. Add a scraping endpoint to a Next.js project in a few lines:

// pages/api/scrape.js  (or app/api/scrape/route.js for App Router)
export default async function handler(req, res) {
  const { url } = req.query;
  if (!url) return res.status(400).json({ error: 'url required' });

  const params = new URLSearchParams({
    access_key: process.env.SNAPAPI_KEY,
    url,
    output: 'markdown'
  });
  const snap = await fetch('https://snapapi.pics/scrape?' + params);
  if (!snap.ok) return res.status(502).json({ error: 'scrape failed' });
  const data = await snap.json();
  res.status(200).json(data);
}

// For Edge Runtime (Vercel Edge Functions):
export const config = { runtime: 'edge' };
export async function GET(request) {
  const url = new URL(request.url).searchParams.get('url');
  const params = new URLSearchParams({ access_key: process.env.SNAPAPI_KEY, url, output: 'text' });
  const snap = await fetch('https://snapapi.pics/scrape?' + params);
  return new Response(await snap.text(), { headers: { 'Content-Type': 'application/json' } });
}

When to Use Scraping API vs Direct Fetch

Not every URL needs a scraping API. Plain HTML pages served without JavaScript — most blogs, documentation sites, Wikipedia, government data portals — can be scraped with a raw fetch and a HTML parser like node-html-parser. The cost of an API call is justified when the target site uses client-side rendering (React, Vue, Angular), when it has bot detection that blocks datacenter IPs, or when the content you need loads asynchronously via XHR or WebSocket after the initial page render. Use SnapAPI for SaaS product pages, e-commerce sites, social media profiles, job boards, and any site that returns an empty shell to a plain HTTP request. Use direct fetch for static-HTML sites, REST APIs, and RSS feeds where JavaScript rendering adds no value and increases latency unnecessarily.

Handling Pagination and Multi-Page Scraping

Many data sources paginate their content — search results, product listings, news feeds, job boards. A pagination scraping loop calls SnapAPI's scrape endpoint once per page, extracts the next-page URL from the response, and continues until no next-page link is found or the maximum page count is reached. Combine with the extract endpoint to get structured data from each page rather than raw HTML, which eliminates the need to run Cheerio or a DOM parser on each response:

async function scrapeAllPages(startUrl, maxPages = 10) {
  const results = [];
  let url = startUrl;
  let page = 0;

  while (url && page < maxPages) {
    const data = await scrape(url, 'html');  // get full rendered HTML
    const $ = require('cheerio').load(data.html);

    // Extract items from this page
    $('.item').each((_, el) => {
      results.push({
        title: $(el).find('h2').text().trim(),
        link: $(el).find('a').attr('href'),
      });
    });

    // Find next page
    const nextHref = $('a[rel="next"], .pagination-next a, [aria-label="Next page"]').attr('href');
    url = nextHref ? new URL(nextHref, url).toString() : null;
    page++;
    await new Promise(r => setTimeout(r, 500)); // polite delay
  }
  return results;
}

Scraping Behind Authentication with Cookies

Authenticated pages — dashboards, paywalled content, member-only areas — require a valid session cookie to render their content. SnapAPI supports cookie injection through the cookies parameter. Extract the session cookie from a logged-in browser session using DevTools, store it server-side, and pass it with each SnapAPI request. Because SnapAPI renders the page in a real browser with the provided cookies, the rendered content is identical to what a logged-in user sees. This eliminates the login automation sequence (typing credentials, submitting forms, handling CAPTCHAs) that fragile browser automation scripts require. Session cookies expire, so implement a refresh mechanism: store the cookie value in your database with an expiry timestamp, and trigger a re-login flow when the stored cookie expires or a SnapAPI request returns a redirect to the login page.

Saving Scraped Data with Prisma or Mongoose

Node.js scraping pipelines commonly use Prisma (for PostgreSQL/MySQL) or Mongoose (for MongoDB) to persist scraped data. The structured JSON returned by SnapAPI's extract endpoint maps directly to ORM models with minimal transformation. For high-throughput batch inserts, use Prisma's createMany or Mongoose's insertMany rather than per-document saves — batch inserts are 10 to 50 times faster for large datasets. Add a unique constraint on the source URL and scraped date to enable idempotent re-runs of scraping jobs without duplicate data. For data that changes over time, store snapshots with a timestamp rather than updating a single row so you have a full history for trend analysis.

TypeScript Types for SnapAPI Responses

TypeScript projects benefit from typed SnapAPI response interfaces. The scrape endpoint response includes title, content, links, og_title, og_description, og_image, and canonical_url fields. The extract endpoint response shape matches the schema you provide. Here are TypeScript interfaces for common patterns:

interface ScrapeResult {
  title: string;
  content: string;
  html?: string;
  links: string[];
  og_title?: string;
  og_description?: string;
  og_image?: string;
  canonical_url?: string;
}

async function typedScrape(url: string): Promise {
  const params = new URLSearchParams({ access_key: process.env.SNAPAPI_KEY!, url, output: 'markdown' });
  const res = await fetch('https://snapapi.pics/scrape?' + params);
  if (!res.ok) throw new Error('Scrape failed: ' + res.status);
  return res.json() as Promise;
}

Web Scraping API JavaScript Guide 2026 — Node.js, fetch & Puppeteer Alternative