Web Scraping API for Python: Extract Data Without Browser Infrastructure
Replace Playwright and Selenium with a single API call. SnapAPI handles JavaScript rendering, anti-bot bypass, and proxy rotation so your Python scraper stays lean and focused on data extraction logic.
Python Web Scraping API Quickstart
Extracting data from a JavaScript-rendered page requires only the requests library. Pass the URL and a CSS selector, and SnapAPI returns the matched text after full browser rendering:
import requests
import os
SNAPAPI_KEY = os.environ["SNAPAPI_KEY"]
BASE_URL = "https://api.snapapi.pics/v1"
def extract(url: str, selector: str, wait_for: str = None) -> str:
payload = {"url": url, "selector": selector}
if wait_for:
payload["wait_for"] = wait_for
response = requests.post(
f"{BASE_URL}/extract",
json=payload,
headers={"X-Api-Key": SNAPAPI_KEY},
timeout=30
)
response.raise_for_status()
return response.json()["text"]
# Extract a product price from a JS-rendered page
price = extract(
url="https://shop.example.com/product/widget",
selector=".product-price",
wait_for=".product-price"
)
print(f"Price: {price}") # "$29.99"
The wait_for parameter instructs the API to wait until the CSS selector is visible in the DOM before extracting. This handles lazy-loaded content, infinite scroll pages, and React/Vue/Angular applications that populate data after the initial render.
Scraping Full Page HTML
For cases where you need the complete rendered HTML to parse with BeautifulSoup or lxml, use the scrape endpoint:
from bs4 import BeautifulSoup
def scrape(url: str, wait_for: str = None) -> BeautifulSoup:
payload = {"url": url}
if wait_for:
payload["wait_for"] = wait_for
response = requests.post(
f"{BASE_URL}/scrape",
json=payload,
headers={"X-Api-Key": SNAPAPI_KEY},
timeout=30
)
response.raise_for_status()
html = response.json()["html"]
return BeautifulSoup(html, "html.parser")
soup = scrape("https://news.ycombinator.com", wait_for=".athing")
titles = [a.get_text() for a in soup.select(".athing .titleline a")]
print(titles[:5])
The scrape endpoint returns the fully rendered HTML after all JavaScript has executed. Parse it with BeautifulSoup using the html.parser or lxml backend — the same workflow as scraping static pages, but with full JavaScript support handled remotely.
Async Python Scraping with httpx
For high-throughput scraping pipelines, use httpx with asyncio to fire many requests concurrently. This pattern processes hundreds of URLs in parallel while respecting concurrency limits:
import asyncio
import httpx
async def extract_async(client: httpx.AsyncClient, url: str, selector: str) -> dict:
response = await client.post(
"https://api.snapapi.pics/v1/extract",
json={"url": url, "selector": selector, "wait_for": selector},
headers={"X-Api-Key": SNAPAPI_KEY},
timeout=30
)
return {"url": url, "text": response.json().get("text", "")}
async def scrape_all(urls: list[str], selector: str, concurrency: int = 5) -> list[dict]:
semaphore = asyncio.Semaphore(concurrency)
async with httpx.AsyncClient() as client:
async def bounded(url):
async with semaphore:
return await extract_async(client, url, selector)
return await asyncio.gather(*[bounded(url) for url in urls])
# Usage
results = asyncio.run(scrape_all(product_urls, ".product-price", concurrency=5))
for r in results:
print(r["url"], r["text"])
The asyncio.Semaphore caps the number of concurrent API calls to 5, which is safe for the Starter plan. Increase to 20 for Pro accounts. asyncio.gather collects results in the order URLs were provided, even though requests complete out of order.
Pandas Integration for Data Pipelines
Scraped results feed naturally into pandas DataFrames for analysis, deduplication, and export:
import pandas as pd
results = asyncio.run(scrape_all(product_urls, ".product-price"))
df = pd.DataFrame(results)
df["price_numeric"] = df["text"].str.replace(r"[^\d.]", "", regex=True).astype(float)
df = df.dropna(subset=["price_numeric"])
df.to_csv("prices.csv", index=False)
print(df.describe())
Clean the extracted price strings with a regex that strips currency symbols and formatting, then cast to float for numerical analysis. Export to CSV, write to PostgreSQL with df.to_sql, or pass to a visualization library for price trend charts.
Handling Anti-Bot Protection in Python
Many commercial sites deploy bot detection that blocks requests from Playwright, Selenium, or standard HTTP clients. SnapAPI's stealth mode bypasses these systems transparently. Enable it by adding "stealth": true to your request payload:
response = requests.post(
f"{BASE_URL}/extract",
json={
"url": url,
"selector": ".price",
"stealth": True,
"wait_for": ".price"
},
headers={"X-Api-Key": SNAPAPI_KEY},
timeout=30
)
Stealth mode selects a browser fingerprint optimized for the target domain, handles CAPTCHA avoidance at the infrastructure level, and rotates residential proxies when needed. Updates are deployed server-side — your Python code requires no changes as anti-bot vendors evolve their detection.
Get started at snapapi.pics — 200 free extractions per month, no credit card required. The Python SDK at github.com/Sleywill/snapapi-python wraps all endpoints with type hints and async support. Install with pip install snapapi-python.