Web Scraping with Python in 2025: requests, BeautifulSoup, and APIs
From basic HTML parsing to JavaScript-rendered SPAs — a practical guide to choosing the right Python scraping approach for every situation.
The Python Web Scraping Stack
Python has the richest ecosystem for web scraping of any programming language. The combination of requests for HTTP, BeautifulSoup for HTML parsing, and lxml for XPath covers the majority of static scraping jobs. For JavaScript-rendered content, Playwright or Selenium launch a real browser, execute JavaScript, and expose the final DOM for parsing. Understanding when to use each approach — and when to skip the stack entirely in favor of a managed API — is the key skill for building reliable scrapers.
Approach 1: requests + BeautifulSoup
For static HTML pages with no JavaScript rendering, this combination is fast, simple, and requires minimal resources. Install with pip install requests beautifulsoup4 lxml and you are ready to parse any static page.
import requests
from bs4 import BeautifulSoup
resp = requests.get(
"https://example.com/products",
headers={"User-Agent": "Mozilla/5.0 (compatible; MyBot/1.0)"}
)
soup = BeautifulSoup(resp.text, "lxml")
products = []
for card in soup.select(".product-card"):
products.append({
"title": card.select_one("h2").get_text(strip=True),
"price": card.select_one(".price").get_text(strip=True),
})
print(f"Found {len(products)} products")
This approach works well for news sites, blogs, Wikipedia, government data portals, and any site that serves complete HTML from the server. It fails on React, Vue, and Angular SPAs where the server sends an empty shell and JavaScript populates the content.
Approach 2: Playwright for JavaScript Sites
Playwright launches a real Chromium browser and waits for JavaScript to render before exposing the DOM. Install with pip install playwright followed by playwright install chromium.
from playwright.sync_api import sync_playwright
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://spa-example.com/products")
page.wait_for_selector(".product-card")
cards = page.query_selector_all(".product-card")
for card in cards:
title = card.query_selector("h2").inner_text()
price = card.query_selector(".price").inner_text()
print(title, price)
browser.close()
Playwright is powerful but resource-heavy. Each browser instance consumes 150-300 MB of RAM. Running concurrent scrapers requires careful process management to avoid memory exhaustion. On cloud infrastructure, Chromium must be installed with the correct system dependencies — a surprisingly complex task on minimal Linux images.
Approach 3: Managed Scraping API
When you need JavaScript rendering but do not want to manage browsers on your servers, a managed scraping API like SnapAPI handles the browser infrastructure remotely. Your Python code stays simple — just an HTTP request — while SnapAPI runs Chromium, executes JavaScript, applies stealth mode, and returns the rendered HTML or extracted data.
import requests
# Get fully rendered HTML from a JavaScript SPA
resp = requests.get(
"https://api.snapapi.pics/v1/scrape",
params={"url": "https://spa-example.com/products", "wait_for": ".product-card"},
headers={"X-Api-Key": "YOUR_API_KEY"}
)
html = resp.json()["html"]
# Now parse with BeautifulSoup as usual
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")
products = soup.select(".product-card")
This hybrid approach combines the simplicity of requests with JavaScript rendering capability. No browser installation, no memory management, no proxy rotation code. SnapAPI handles bot detection evasion, user agent rotation, and residential proxy routing when enabled.
When to Choose Each Approach
Use requests + BeautifulSoup for static sites, RSS feeds, Wikipedia, and government data — it is fast and free. Use Playwright when you need full browser control for interactive multi-step workflows like form filling and authentication. Use a managed API like SnapAPI when you need JavaScript rendering at scale without managing server infrastructure.
Sign up at snapapi.pics for 200 free scraping requests per month to test the managed API approach against your target sites. If it covers your use case, you will never need to install Chromium on a production server again.