Tutorial

April 4, 2026 · 7 min read

Web Scraping with Python in 2026: Requests, Playwright, and API Approaches

Python remains the top choice for web scraping. But the right tool depends on your target site. Here is when to use Requests, when to reach for Playwright, and when an API saves you days of work.

Tier 1 — Simple Static Pages: requests + BeautifulSoup

For plain HTML pages — government data, Wikipedia, news archives, simple directories — requests with BeautifulSoup is still the fastest path. No browser overhead, no JavaScript execution, sub-second response times. Install both with pip install requests beautifulsoup4.

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (compatible; MyBot/1.0)'}
resp = requests.get('https://example.com/data', headers=headers, timeout=10)
resp.raise_for_status()

soup = BeautifulSoup(resp.text, 'html.parser')
rows = soup.select('table.data-table tbody tr')

for row in rows:
    cells = [td.get_text(strip=True) for td in row.find_all('td')]
    print(cells)

This pattern fails when: the page requires JavaScript to render, the site checks for headless browser signatures, or the content is loaded via XHR after the initial HTML response.

Tier 2 — JavaScript-Rendered Pages: Playwright

React, Vue, Angular, and Next.js apps render content client-side. The raw HTML response from the server is often just a shell with an empty <div id="root">. You need a real browser. Playwright is the current best choice over Selenium and Puppeteer: faster, more reliable, better async support, and actively maintained by Microsoft.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto('https://spa-example.com/products', wait_until='networkidle')

    # Wait for specific element to appear
    page.wait_for_selector('.product-card', timeout=10000)

    products = page.query_selector_all('.product-card')
    for prod in products:
        name = prod.query_selector('h3').inner_text()
        price = prod.query_selector('.price').inner_text()
        print(f'{name}: {price}')

    browser.close()

Playwright works well for medium-scale scraping. The challenge at scale: Chromium uses 150-200 MB RAM per instance, crashes under load, and requires careful browser pool management. Running 50 concurrent Playwright instances needs a dedicated, well-tuned machine.

Tier 3 — Protected Sites and Scale: API Approach

Sites with bot protection (Cloudflare, DataDome, PerimeterX) actively detect and block headless browsers. Playwright in default mode fails immediately. Stealth mode (playwright-stealth or undetected-chromedriver) works temporarily but requires constant maintenance as detection systems update. For commercial scraping workloads, an API that handles stealth and infrastructure is dramatically more reliable.

import requests

# SnapAPI handles stealth, proxies, and browser infrastructure
res = requests.post('https://api.snapapi.pics/v1/scrape',
    headers={'X-Api-Key': 'sk_live_xxx'},
    json={
        'url': 'https://protected-ecommerce-site.com/products',
        'waitFor': '.product-list',
        'stealth': True,
    }
)
data = res.json()
html = data['html']
print(f"Got {len(html)} bytes of rendered HTML")

# Parse normally with BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
prices = [el.text for el in soup.select('.price')]

Structured Data Extraction

For structured extraction — product details, article content, contact information — SnapAPI's extract endpoint returns JSON directly without you writing CSS selectors. Pass a schema describing the fields you want and the API returns structured data.

res = requests.post('https://api.snapapi.pics/v1/extract',
    headers={'X-Api-Key': 'sk_live_xxx'},
    json={
        'url': 'https://example.com/product/123',
        'schema': {
            'name': 'string',
            'price': 'number',
            'description': 'string',
            'images': 'array',
            'inStock': 'boolean'
        }
    }
)
product = res.json()['data']
print(product)  # {'name': 'Widget X', 'price': 49.99, 'inStock': True, ...}

SnapAPI's free tier gives 200 requests/month — enough for prototyping. Starter ($19/mo) covers 5,000 requests for small production workloads. All plans include scrape, extract, screenshot, PDF, video, and AI analysis under a single key. Get started free.

Handling Anti-Bot Protection

Modern anti-bot systems (Cloudflare, DataDome, PerimeterX) use TLS fingerprinting, browser API presence checks, canvas fingerprinting, mouse entropy, and IP reputation scoring. Plain Playwright in headless mode fails most checks instantly.

Stealth plugins mask some signals but require constant updates as detection evolves. What worked 6 months ago often fails today. Commercial scraping APIs like SnapAPI maintain stealth configurations as a managed service.

For Cloudflare specifically, residential proxies plus correct TLS profiles plus real browser fingerprints are all required. Most teams that try to maintain this internally spend more on anti-detection engineering than the data is worth.

Rate Limiting and Politeness

Respect robots.txt and implement request delays regardless of which tool you use. Thousands of requests per minute strain target servers, increases block risk, and may have legal implications depending on jurisdiction and the site's ToS.

A practical approach: cap at 1-2 requests/second for courtesy scraping. For higher-throughput pipelines, use a proper queue with configurable concurrency and inter-request delays. Python's asyncio with a Semaphore gives you a clean rate-limiting primitive without external dependencies.

Data Pipelines and Storage

Raw scraped HTML needs parsing, cleaning, and storage. Pandas DataFrames work well for tabular data. JSON or SQLite cover small datasets. For production pipelines with millions of records, stream directly to PostgreSQL or BigQuery using batch inserts with ON CONFLICT handling for idempotency.

For ongoing monitoring (price tracking, content changes), store snapshots with timestamps and run diffs. A simple cronjob calling SnapAPI's extract endpoint, writing to PostgreSQL, and alerting on significant changes covers 80% of monitoring use cases without a heavy infrastructure stack.

Choosing the Right Tool

requests + BeautifulSoup: static HTML, government data, Wikipedia, news archives. Fast, no infrastructure.

Playwright: SPAs, pages requiring login, interactive workflows. Best for low-to-medium volume where you control infra.

SnapAPI scrape/extract: protected sites, high volume without infra management, structured extraction without CSS selectors, reliability over custom control.

Start with SnapAPI's free tier (200 requests/month) to evaluate before committing to any infrastructure.

Legal Considerations for Web Scraping in 2026

Web scraping occupies a legally complex space. The key cases to understand: hiQ v. LinkedIn (2022) held that scraping publicly available data does not violate the Computer Fraud and Abuse Act, but this ruling is US-specific and does not address terms-of-service violations or copyright claims. The EU's General Data Protection Regulation adds another layer: scraping pages containing personal data of EU residents may require a legal basis under GDPR.

Practical guidance: scraping publicly available data for personal use, research, or building competing products is generally lower-risk than scraping behind authentication or in bulk for resale. Always check a site's robots.txt and Terms of Service before scraping. Aggregating and republishing copyrighted content (news articles, academic papers, creative works) carries separate copyright risk regardless of how you access it.

When in doubt, consult a lawyer familiar with internet law in your jurisdiction. This post is not legal advice.

Python Scraping Libraries Comparison

LibraryJS RenderingSpeedAnti-BotEase of Use
requestsNoFastBasic headers onlyVery easy
ScrapyNoVery fast (async)Basic headers onlyModerate
SeleniumYesSlowDetectableModerate
Playwright (Python)YesModerateStealth plugin availableEasy
SnapAPI scrape endpointYes2-4s (API)Managed stealthVery easy (HTTP call)

For most production Python scraping pipelines, the right choice depends on your target sites and volume. Start with requests for simple cases. Add Playwright when you hit JS rendering requirements. Move to SnapAPI when you encounter anti-bot protection or want to eliminate browser infrastructure management entirely.

SnapAPI's free tier (200 requests/month) is enough to prototype any scraping use case. Get your free key and test your first scrape in minutes.

Getting Started

Register at snapapi.pics/register.html to get your free API key instantly. Email verification activates the key — the process takes under two minutes. The Free plan gives 200 captures per month with no credit card required, enough to validate any integration before deciding on a paid tier.

Full API documentation including live playground, all request parameters, response schemas, and code examples in JavaScript, Python, Go, PHP, Ruby, Swift, and Kotlin is at snapapi.pics/docs.html. The MCP server (npx snapapi-mcp) integrates directly with Claude Code, Cursor, VS Code, and Windsurf for AI-assisted development workflows.

All paid plans include all endpoints: screenshot, PDF, scrape, extract, video recording, and AI page analysis. There are no per-feature surcharges. Starter ($19/mo): 5,000 requests. Pro ($79/mo): 50,000 requests. Business ($299/mo): 500,000 requests. Annual billing saves approximately 20% on all plans.

Support is available via the chat widget on any page or by emailing support@snapapi.pics. Average response time is under four hours during business hours. Enterprise customers with custom quotas or SLAs can contact the team directly.

SnapAPI runs on dedicated infrastructure with Playwright and Chromium managed at the service level. Browser updates, stealth configuration maintenance, and proxy pool management are all handled transparently. Your integration stays stable across Chromium version updates without any code changes on your side.

Start Free Today

200 captures/month, no card required. Live in under 5 minutes.

Get Free API Key

SnapAPI is built and maintained by an independent team focused on reliability and developer experience. The API has been in production since 2024, serving customers across e-commerce, SaaS, media, and developer tooling verticals. Browser infrastructure runs on dedicated servers with automatic failover, daily health checks, and 24h watchdog processes ensuring consistent uptime. New SDK versions, documentation improvements, and feature releases are announced on the blog and via the changelog at snapapi.pics. Follow updates at @slwl_dev on X.