Why Node.js Developers Drop Puppeteer for a Scraping API
Puppeteer is the default answer to "how do I scrape a JavaScript-rendered site in Node.js" — and it works perfectly in local development. Production is a different story. Puppeteer bundles a 300 MB Chromium binary that makes Docker images too large for many serverless platforms. AWS Lambda's 250 MB unzipped layer limit makes vanilla Puppeteer deployments impossible without custom builds. Vercel, Netlify, and Cloudflare Workers have function size limits that exclude headless browser bundles entirely. Even on dedicated servers, managing a pool of Chromium instances — handling memory leaks, recovering from crashes, managing concurrent sessions — requires operational code that is often more complex than the scraping logic itself. SnapAPI exposes scraping as a REST endpoint: a single fetch call from Node.js returns the rendered page content, with no browsers, no Docker configuration, and no crash recovery code on your side.
Basic Scraping with fetch (Node 18+)
Node.js 18 and above includes native fetch. Scraping a JavaScript-rendered page requires no npm dependencies beyond your own project:
const apiKey = process.env.SNAPAPI_KEY;
async function scrape(url, output = 'markdown') {
const params = new URLSearchParams({ access_key: apiKey, url, output });
const res = await fetch('https://snapapi.pics/scrape?' + params);
if (!res.ok) throw new Error('Scrape failed: ' + res.status);
return res.json();
}
const result = await scrape('https://example.com/blog');
console.log(result.title); // page title
console.log(result.content); // cleaned markdown
console.log(result.links); // array of outbound links
Structured Data Extraction
The extract endpoint returns typed JSON matching a schema you define. This replaces CSS selector chains and manual DOM parsing — describe the data structure and SnapAPI fills it from the rendered page:
async function extract(url, schema) {
const res = await fetch('https://snapapi.pics/extract', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ access_key: apiKey, url, schema })
});
return res.json();
}
// Extract job listing data
const job = await extract('https://example.com/jobs/senior-engineer', {
title: 'string',
company: 'string',
location: 'string',
salary_range: 'string',
remote: 'boolean',
requirements: 'array of strings',
posted_date: 'string'
});
console.log(JSON.stringify(job, null, 2));
Axios Integration with Retry and Timeout
For projects already using axios, adding SnapAPI calls is straightforward. Here is an axios-based scraper with exponential backoff retry using axios-retry:
const axios = require('axios');
const axiosRetry = require('axios-retry');
const client = axios.create({ baseURL: 'https://snapapi.pics', timeout: 30000 });
axiosRetry(client, {
retries: 3,
retryDelay: axiosRetry.exponentialDelay,
retryCondition: (err) =>
err.response?.status === 429 || err.response?.status >= 500
});
async function scrapeWithAxios(url) {
const { data } = await client.get('/scrape', {
params: { access_key: process.env.SNAPAPI_KEY, url, output: 'text' }
});
return data;
}
// Batch scraping with concurrency control
const pLimit = require('p-limit');
const limit = pLimit(5); // max 5 concurrent requests
const urls = ['https://example.com/p1', 'https://example.com/p2', 'https://example.com/p3'];
const results = await Promise.all(urls.map(u => limit(() => scrapeWithAxios(u))));
Building a Scraping Pipeline with Cheerio
Cheerio provides jQuery-style DOM manipulation for Node.js. Combine SnapAPI scraping with Cheerio processing to extract specific elements from rendered pages. SnapAPI handles the JavaScript rendering and bot bypass; Cheerio handles the element selection from the resulting HTML:
const cheerio = require('cheerio');
async function extractLinks(url) {
const result = await scrape(url, 'html'); // get full rendered HTML
const $ = cheerio.load(result.html);
const links = [];
$('a[href]').each((_, el) => {
const href = $(el).attr('href');
const text = $(el).text().trim();
if (href && href.startsWith('http')) links.push({ href, text });
});
return links;
}
async function extractPrices(url) {
const result = await scrape(url, 'html');
const $ = cheerio.load(result.html);
return $('.price, [class*="price"], [data-price]').map((_, el) => ({
text: $(el).text().trim(),
selector: el.attribs.class || el.attribs['data-price']
})).get();
}
Scraping in Next.js API Routes and Edge Functions
SnapAPI integrates cleanly with Next.js API routes since it is a pure HTTP call with no browser dependency. Add a scraping endpoint to a Next.js project in a few lines:
// pages/api/scrape.js (or app/api/scrape/route.js for App Router)
export default async function handler(req, res) {
const { url } = req.query;
if (!url) return res.status(400).json({ error: 'url required' });
const params = new URLSearchParams({
access_key: process.env.SNAPAPI_KEY,
url,
output: 'markdown'
});
const snap = await fetch('https://snapapi.pics/scrape?' + params);
if (!snap.ok) return res.status(502).json({ error: 'scrape failed' });
const data = await snap.json();
res.status(200).json(data);
}
// For Edge Runtime (Vercel Edge Functions):
export const config = { runtime: 'edge' };
export async function GET(request) {
const url = new URL(request.url).searchParams.get('url');
const params = new URLSearchParams({ access_key: process.env.SNAPAPI_KEY, url, output: 'text' });
const snap = await fetch('https://snapapi.pics/scrape?' + params);
return new Response(await snap.text(), { headers: { 'Content-Type': 'application/json' } });
}
When to Use Scraping API vs Direct Fetch
Not every URL needs a scraping API. Plain HTML pages served without JavaScript — most blogs, documentation sites, Wikipedia, government data portals — can be scraped with a raw fetch and a HTML parser like node-html-parser. The cost of an API call is justified when the target site uses client-side rendering (React, Vue, Angular), when it has bot detection that blocks datacenter IPs, or when the content you need loads asynchronously via XHR or WebSocket after the initial page render. Use SnapAPI for SaaS product pages, e-commerce sites, social media profiles, job boards, and any site that returns an empty shell to a plain HTTP request. Use direct fetch for static-HTML sites, REST APIs, and RSS feeds where JavaScript rendering adds no value and increases latency unnecessarily.