What can I extract with the Scrape API?

SnapAPI's scrape endpoint returns rendered HTML, extracted text, structured data tables, links, images, and metadata from any web page. The page is rendered in a real browser before extraction, capturing JavaScript-generated content.

Does the Scrape API handle JavaScript-rendered content?

Yes. Unlike simple HTTP scrapers, SnapAPI uses a headless Chromium browser to fully render the page — including all JavaScript, dynamic content, lazy-loaded images, and AJAX responses.

How do I avoid getting blocked while scraping?

SnapAPI uses stealth Chromium mode for scraping requests, which randomizes browser fingerprints and bypasses common bot detection systems. You can also configure delays, custom headers, and rotating user agents.

What formats does the scrape API return?

The API returns a JSON response with fields for rendered HTML, plain text, title, meta tags, structured data (tables, lists), links, and status. You can request specific fields to reduce response size.

Web Scraping API — Rendered HTML from Any URL

What Is a Scrape API and When Do You Need One?

A scrape API solves one specific problem: getting the fully rendered HTML source of a web page, including all content generated by client-side JavaScript, without running a browser in your own infrastructure. When you make a plain HTTP request to a modern web application, you receive the server-sent HTML skeleton — a mostly empty document with script tags. The visible page content does not exist in that response; it is generated client-side by JavaScript after the browser parses the initial document and executes the scripts. To get the real content, you need to execute the JavaScript first, then capture the DOM state.

SnapAPI's /scrape endpoint handles this by loading each request in a real Chromium browser, waiting for the JavaScript to execute and the page to render, and then returning the full document.documentElement.outerHTML — the complete rendered HTML including all dynamically generated content. The response is the same HTML you would see if you opened browser DevTools and inspected the DOM after page load. From that rendered HTML, you can parse with Cheerio, BeautifulSoup, lxml, or any HTML parser to extract the data you need.

Scrape API vs Screenshot API vs Extract API

SnapAPI provides three related but distinct endpoints. Understanding when to use each saves time and API credits:

The /scrape endpoint returns the rendered HTML source as a string. Use this when you need to parse the page yourself with an HTML library, when you need to process the full DOM tree, or when you want maximum flexibility in how you extract data. This is the closest equivalent to Playwright's page.content().

The /extract endpoint accepts a JSON array of CSS selectors and returns a structured JSON object with the matched text values. Use this when you know exactly which elements you want and want to avoid parsing HTML yourself. It is the closest equivalent to Playwright's page.$$eval(selector, els => els.map(e => e.textContent)) — but without needing to write or run any browser automation code.

The /screenshot endpoint returns a PNG, JPEG, or WebP image of the rendered page. Use this for visual capture, OG image generation, monitoring, and archiving use cases where the image itself is the deliverable, not the underlying HTML data.

Quick Start: Scrape a JavaScript-Rendered Page

# Python
import requests, os
from bs4 import BeautifulSoup

def scrape_page(url):
    r = requests.get('https://snapapi.pics/scrape', params={
        'access_key': os.environ['SNAPAPI_KEY'],
        'url': url,
        'wait_for': 'body',  # wait for body element before capturing
    })
    r.raise_for_status()
    return BeautifulSoup(r.text, 'html.parser')

soup = scrape_page('https://example.com/products')
titles = [t.text.strip() for t in soup.select('.product-title')]
prices = [p.text.strip() for p in soup.select('.product-price')]
print(list(zip(titles, prices)))

// Node.js with Cheerio
import fetch from 'node-fetch';
import * as cheerio from 'cheerio';

async function scrapePage(url) {
  const params = new URLSearchParams({
    access_key: process.env.SNAPAPI_KEY,
    url,
    wait_for: '.content-loaded',
  });
  const res = await fetch(`https://snapapi.pics/scrape?${params}`);
  const html = await res.text();
  return cheerio.load(html);
}

const $ = await scrapePage('https://news.ycombinator.com');
const stories = [];
$('.athing').each((i, el) => {
  stories.push($(el).find('.titleline a').first().text());
});
console.log(stories.slice(0, 10));

Structured Data Extraction with /extract

For use cases where you know the CSS selectors for the data you want, the /extract endpoint saves you the step of parsing HTML yourself. Pass a JSON array of selector objects and receive a JSON response with each selector's matched text values:

import requests, json, os

def extract_data(url, selectors):
    r = requests.get('https://snapapi.pics/extract', params={
        'access_key': os.environ['SNAPAPI_KEY'],
        'url': url,
        'selectors': json.dumps(selectors)
    })
    return r.json()

data = extract_data('https://example.com/product/123', [
    {'key': 'title',       'selector': 'h1.product-title'},
    {'key': 'price',       'selector': '.product-price .amount'},
    {'key': 'rating',      'selector': '.star-rating[data-score]', 'attr': 'data-score'},
    {'key': 'description', 'selector': '.product-description p:first-child'},
    {'key': 'images',      'selector': '.product-gallery img', 'attr': 'src', 'all': True},
])

print(f"Title: {data['title']}")
print(f"Price: {data['price']}")
print(f"Images: {', '.join(data['images'])}")

Handling Dynamic Content and Wait Conditions

Many modern web applications load content asynchronously. A product listing page might fetch items from an API after the initial page load; a social feed might render skeleton loaders before populating with real content; a dashboard might show a loading spinner until chart data arrives. For these pages, capturing the HTML immediately after navigation would return empty containers rather than the actual content.

SnapAPI's wait_for parameter addresses this by delaying the HTML capture until a specified CSS selector appears in the DOM. Pass the selector of an element that only appears after the dynamic content has loaded — a product card class, a chart container, a loaded state attribute. The Chromium instance will wait up to 10 seconds for that element to appear before capturing the HTML. For pages where the load timing is less predictable, the delay parameter adds a fixed millisecond wait after initial page load before capture, giving JavaScript animations and lazy fetch calls time to complete.

Scraping Behind Authentication

Many high-value scraping targets require authentication — dashboards, account pages, price-protected content. SnapAPI supports authenticated scraping via the headers parameter, which accepts a JSON object of HTTP request headers. Pass your session cookies exactly as they appear in your browser's cookie jar for the target site:

r = requests.get('https://snapapi.pics/scrape', params={
    'access_key': os.environ['SNAPAPI_KEY'],
    'url': 'https://app.example.com/dashboard',
    'headers': json.dumps({
        'Cookie': 'session_id=abc123; auth_token=xyz456',
        'Authorization': 'Bearer your_jwt_token'
    }),
    'wait_for': '.dashboard-content'
})

For applications using OAuth or SSO, capture the session cookies from your authenticated browser session and pass them via the headers parameter. SnapAPI does not maintain persistent cookie sessions between requests — pass the authentication headers on every request that requires them. For scraping workflows that require a full login flow (username/password form submission), SnapAPI is not the right tool — use Playwright or Puppeteer for those cases.

Rate Limiting and Ethical Scraping

Even with a reliable API, responsible scraping requires respecting target site rate limits. Add deliberate delays between requests targeting the same domain — typically 1–5 seconds between requests is sufficient to avoid triggering anti-bot measures on most sites. Check the target site's robots.txt before scraping to understand which paths are off-limits. Avoid scraping at peak hours for consumer-facing sites, and cache results aggressively so each unique URL is only scraped once per refresh interval.

SnapAPI's built-in caching via cache=1&cache_ttl=3600 naturally prevents re-scraping the same URL within the TTL window, which both reduces API usage and lowers your request volume to the target site. For scheduled scraping jobs, structure your job to process each URL once per cycle rather than re-scraping the same URL multiple times — store the extracted data in your database rather than re-fetching from the live page on every access.

Start Scraping with SnapAPI

200 free requests/month. Full JS rendering. HTML scrape + structured extraction + screenshots.

Get Free API Key

Frequently Asked Questions

What is the difference between /scrape and /extract?

The /scrape endpoint returns the full rendered HTML of the page as a string — you parse it yourself with BeautifulSoup, Cheerio, lxml, or another HTML parser. The /extract endpoint accepts CSS selectors and does the parsing for you, returning a structured JSON object. Use /scrape when you need maximum flexibility or plan to run multiple extraction passes on the same HTML. Use /extract for targeted, simple extractions where you know exactly which elements you want.

Does the scrape API handle JavaScript-heavy SPAs?

Yes — SnapAPI uses a real Chromium browser for all requests, so JavaScript executes fully before the HTML is captured. React, Vue, Angular, Svelte, and other SPA frameworks render correctly. Use the wait_for parameter to specify a CSS selector that confirms the dynamic content has loaded before capture, ensuring you receive the fully populated DOM rather than empty skeleton components.

Can I scrape multiple pages in parallel?

Yes — make concurrent HTTP requests to the SnapAPI scrape endpoint for different URLs. Each request is independent; there is no shared state between requests. Use asyncio.gather() in Python, Promise.all() in JavaScript, goroutines in Go, or tokio tasks in Rust to parallelize. Rate-limit your concurrency if you are scraping the same target domain to avoid triggering anti-bot measures on that domain.

What HTML parser should I use with SnapAPI's scrape response?

For Python, BeautifulSoup with the lxml parser is the fastest and most robust option. Install with pip install beautifulsoup4 lxml. For Node.js, Cheerio (npm install cheerio) provides a jQuery-like API for HTML parsing. For Go, golang.org/x/net/html is the standard library choice. For Ruby, Nokogiri is the de-facto standard. All of these work perfectly with the HTML returned by SnapAPI's scrape endpoint.