Web scraping is a $2.5 billion market, and it's only growing. Every company needs web data — pricing intelligence, lead generation, content aggregation, market research, SEO monitoring. But building and maintaining scrapers is a nightmare of proxy rotation, CAPTCHA solving, browser fingerprinting, and constant selector maintenance.
A web scraping API eliminates all that complexity. You send a URL and get back clean, structured data. This guide covers the full spectrum: from DIY scraping to API-based extraction, with real code for both approaches.
Why DIY Scraping Breaks in Production
Every developer who's built a scraper has hit the same walls:
- IP bans: Websites block your server IP after a few hundred requests. You need proxy rotation with residential IPs, which costs $50-500/month alone.
- JavaScript rendering: Over 70% of modern websites require JavaScript execution to load content. Static HTTP requests (axios, requests) return empty pages.
- Anti-bot systems: Cloudflare, DataDome, PerimeterX, and hCaptcha detect headless browsers through navigator properties, WebGL fingerprints, and behavioral analysis.
- Selector breakage: CSS selectors break when sites redesign. A single class name change can break your entire pipeline overnight.
- Rate limiting: Sites throttle or block aggressive scraping. You need exponential backoff, request queuing, and respectful crawl delays.
- Scale: Scraping 10 pages is easy. Scraping 10,000 pages concurrently requires browser pools, job queues, and distributed infrastructure.
DIY Approach — Playwright + Cheerio
Here's what a production-grade DIY scraper looks like. This handles JavaScript rendering, proxy rotation, and anti-bot evasion:
import { chromium, Browser } from 'playwright';
import * as cheerio from 'cheerio';
interface ScrapeResult {
url: string;
html: string;
text: string;
statusCode: number;
timing: number;
}
class WebScraper {
private browser: Browser | null = null;
private proxyList: string[];
private proxyIndex = 0;
constructor(proxies: string[] = []) {
this.proxyList = proxies;
}
async init(): Promise<void> {
const launchOptions: any = {
args: ['--no-sandbox', '--disable-setuid-sandbox'],
};
if (this.proxyList.length > 0) {
launchOptions.proxy = {
server: this.proxyList[this.proxyIndex],
};
}
this.browser = await chromium.launch(launchOptions);
}
async scrape(url: string, options: {
waitFor?: string;
timeout?: number;
stealth?: boolean;
} = {}): Promise<ScrapeResult> {
if (!this.browser) await this.init();
const start = Date.now();
const context = await this.browser!.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
'AppleWebKit/537.36 (KHTML, like Gecko) ' +
'Chrome/120.0.0.0 Safari/537.36',
viewport: { width: 1280, height: 720 },
locale: 'en-US',
});
const page = await context.newPage();
// Block heavy resources for speed
await page.route('**/*', (route) => {
const type = route.request().resourceType();
if (['image', 'media', 'font'].includes(type)) {
return route.abort();
}
return route.continue();
});
try {
const response = await page.goto(url, {
waitUntil: 'domcontentloaded',
timeout: options.timeout ?? 30000,
});
if (options.waitFor) {
await page.waitForSelector(options.waitFor, {
timeout: 10000,
});
}
// Wait for dynamic content
await page.waitForTimeout(2000);
const html = await page.content();
const $ = cheerio.load(html);
// Remove scripts and styles for clean text
$('script, style, noscript').remove();
const text = $('body').text().replace(/\s+/g, ' ').trim();
return {
url,
html,
text,
statusCode: response?.status() ?? 0,
timing: Date.now() - start,
};
} finally {
await context.close();
}
}
async close(): Promise<void> {
await this.browser?.close();
}
}
// Usage
const scraper = new WebScraper([
'http://proxy1:8080',
'http://proxy2:8080',
]);
await scraper.init();
const result = await scraper.scrape('https://example.com/products', {
waitFor: '.product-list',
});
console.log(result.text);
API Approach — One Request, Clean Data
A web scraping API handles all the infrastructure. You send a URL, optionally with extraction rules, and get back clean data:
Basic Scraping
// Scrape any page — JS rendering, anti-bot, proxies handled
const response = await fetch('https://api.snapapi.pics/v1/scrape', {
method: 'POST',
headers: {
'X-Api-Key': 'sk_live_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://example.com/products',
stealth: true,
formats: ['html', 'markdown', 'text'],
wait_for: '.product-list',
}),
});
const data = await response.json();
console.log(data.markdown); // Clean markdown
console.log(data.text); // Plain text
console.log(data.html); // Full rendered HTML
Structured Data Extraction
The real power of a scraping API is structured extraction. Instead of writing CSS selectors that break, you define a schema and the API extracts matching data:
// Extract structured data — no CSS selectors needed
const response = await fetch('https://api.snapapi.pics/v1/extract', {
method: 'POST',
headers: {
'X-Api-Key': 'sk_live_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://example.com/products',
schema: {
products: [{
name: 'string',
price: 'number',
description: 'string',
rating: 'number',
reviews_count: 'number',
in_stock: 'boolean',
image_url: 'string',
}],
pagination: {
current_page: 'number',
total_pages: 'number',
next_url: 'string',
},
},
}),
});
const { extracted } = await response.json();
// extracted.products = [
// { name: "Widget Pro", price: 49.99, rating: 4.5, ... },
// { name: "Widget Basic", price: 19.99, rating: 4.2, ... },
// ]
// extracted.pagination = { current_page: 1, total_pages: 12, ... }
AI-Powered Extraction
For complex or unstructured pages, AI extraction understands context like a human reader:
// AI analyzes the page and answers your question
const response = await fetch('https://api.snapapi.pics/v1/analyze', {
method: 'POST',
headers: {
'X-Api-Key': 'sk_live_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://competitor.com/pricing',
prompt: 'Extract all pricing plans with their names, prices, and feature lists. Include any free tier details and annual discount percentages.',
}),
});
const { result } = await response.json();
// Structured analysis from AI — no selectors needed
Real-World Use Cases
E-Commerce Price Monitoring
async function monitorPrices(urls) {
const results = [];
for (const url of urls) {
const response = await fetch('https://api.snapapi.pics/v1/extract', {
method: 'POST',
headers: {
'X-Api-Key': 'sk_live_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
schema: {
products: [{
name: 'string',
price: 'number',
original_price: 'number',
discount_percent: 'number',
in_stock: 'boolean',
}],
},
}),
});
const { extracted } = await response.json();
results.push({ url, products: extracted.products });
}
// Compare with yesterday's prices
for (const result of results) {
for (const product of result.products) {
const previous = await db.getLastPrice(product.name);
if (previous && product.price !== previous.price) {
await notify(`Price change: ${product.name} ${previous.price} → ${product.price}`);
}
await db.savePrice(product.name, product.price);
}
}
}
Lead Generation
// Extract contact info from company websites
async function extractLeads(companyUrls) {
const leads = [];
for (const url of companyUrls) {
const response = await fetch('https://api.snapapi.pics/v1/extract', {
method: 'POST',
headers: {
'X-Api-Key': 'sk_live_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
schema: {
company: {
name: 'string',
description: 'string',
industry: 'string',
},
contacts: [{
name: 'string',
title: 'string',
email: 'string',
linkedin: 'string',
}],
tech_stack: ['string'],
},
}),
});
const { extracted } = await response.json();
leads.push(extracted);
}
return leads;
}
Content Aggregation
// Aggregate news from multiple sources
const sources = [
{ url: 'https://techcrunch.com', schema: {
articles: [{ title: 'string', author: 'string', date: 'string', summary: 'string', url: 'string' }]
}},
{ url: 'https://news.ycombinator.com', schema: {
posts: [{ title: 'string', points: 'number', comments: 'number', url: 'string' }]
}},
];
const aggregated = await Promise.all(
sources.map(source =>
fetch('https://api.snapapi.pics/v1/extract', {
method: 'POST',
headers: {
'X-Api-Key': 'sk_live_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: source.url,
schema: source.schema,
}),
}).then(r => r.json())
)
);
SEO Monitoring
// Monitor search rankings and competitor pages
async function seoAudit(url) {
// Get page content and metadata
const scrape = await fetch('https://api.snapapi.pics/v1/extract', {
method: 'POST',
headers: {
'X-Api-Key': 'sk_live_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
schema: {
meta: {
title: 'string',
description: 'string',
h1: 'string',
h2_count: 'number',
word_count: 'number',
image_count: 'number',
images_without_alt: 'number',
},
links: {
internal_count: 'number',
external_count: 'number',
broken_count: 'number',
},
structured_data: {
has_schema_org: 'boolean',
schema_types: ['string'],
},
},
}),
});
// Visual snapshot for comparison
const screenshot = await fetch('https://api.snapapi.pics/v1/screenshot', {
method: 'POST',
headers: {
'X-Api-Key': 'sk_live_your_key_here',
'Content-Type': 'application/json',
},
body: JSON.stringify({ url, full_page: true }),
});
return { seo: await scrape.json(), screenshot: await screenshot.arrayBuffer() };
}
Multi-Language SDK Examples
Python
import requests
# Scrape with stealth mode
response = requests.post(
'https://api.snapapi.pics/v1/scrape',
headers={'X-Api-Key': 'sk_live_your_key_here'},
json={
'url': 'https://example.com/data',
'stealth': True,
'formats': ['markdown', 'text'],
}
)
data = response.json()
print(data['markdown'])
# Extract structured data
response = requests.post(
'https://api.snapapi.pics/v1/extract',
headers={'X-Api-Key': 'sk_live_your_key_here'},
json={
'url': 'https://example.com/products',
'schema': {
'products': [{
'name': 'string',
'price': 'number',
'in_stock': 'boolean',
}]
}
}
)
extracted = response.json()['extracted']
for product in extracted['products']:
print(f"{product['name']}: ${product['price']}")
Go
package main
import (
"bytes"
"encoding/json"
"fmt"
"net/http"
"io"
)
func main() {
payload, _ := json.Marshal(map[string]interface{}{
"url": "https://example.com/products",
"stealth": true,
"formats": []string{"markdown", "text"},
})
req, _ := http.NewRequest("POST",
"https://api.snapapi.pics/v1/scrape",
bytes.NewBuffer(payload))
req.Header.Set("X-Api-Key", "sk_live_your_key_here")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result map[string]interface{}
json.Unmarshal(body, &result)
fmt.Println(result["markdown"])
}
Web Scraping API Comparison
| Feature | DIY (Playwright + Proxies) | SnapAPI | Firecrawl | ScrapingBee |
|---|---|---|---|---|
| JS rendering | Manual (Playwright) | Built-in | Built-in | Built-in |
| Anti-bot bypass | Stealth plugins (fragile) | Built-in stealth mode | Limited | Built-in |
| Structured extraction | CSS selectors (manual) | Schema-based + AI | LLM extraction | Not available |
| Screenshot | Manual code | Single endpoint | Not available | Built-in |
| PDF generation | Manual code | Single endpoint | Not available | Not available |
| Video recording | Complex (ffmpeg) | Single endpoint | Not available | Not available |
| AI analysis | Build your own pipeline | Built-in /analyze | LLM extraction | Not available |
| MCP server | Build your own | npm package ready | Not available | Not available |
| Device emulation | Manual config | 30+ presets | Not available | Limited |
| Free tier | N/A (infra costs) | 200 req/month | 500 credits | 1,000 credits |
| Pricing | $100-1,000/mo (servers + proxies) | From $19/mo | From $19/mo | From $49/mo |
Getting Started
Start extracting data from any website in under 5 minutes:
- Sign up free at snapapi.pics — 200 requests/month included
- Get your API key from the dashboard
- Choose your method: scrape (raw content), extract (structured data), or analyze (AI-powered)
- Install an SDK — JavaScript, Python, Go, PHP, Swift, Kotlin, and more
- Add MCP — let AI agents scrape for you with
npx snapapi-mcp
Stop Building Scrapers. Start Extracting Data.
200 free requests/month. Schema-based extraction. AI analysis. 8 SDKs. MCP server for AI agents.
Get Your Free API Key