Proxy Strategies for Web Scraping

Choosing the right proxy type depends on your use case, not on which anti-bot system you're bypassing. This article covers the two main proxy models and when to use each.

Rotating Residential vs ISP Sticky

Rotating Residential ISP Sticky
IP per request New IP every request Same IP across requests
Session persistence None -- stateless Cookies persist across requests
ASN diversity High (thousands of ASNs) Low (often single ASN)
IP classification Residential (real ISP subscribers) Varies -- often flagged as hosting/proxy
Cost model Per GB (~$1-3/GB) Per IP per month (~$3-6/IP)
Best for Stateless scraping at scale Login flows, authenticated sessions

Rotating Residential

Use when each request is independent and you don't need cookies to persist:

from curl_cffi.requests import Session as CurlSession

session = CurlSession(
    impersonate="chrome",
    proxy="http://user:pass@gate.provider.com:port",
)
# Provider rotates exit IP on every request
resp = session.get("https://target.example.com/search?q=test")

No per-IP health tracking, no cookie warming, no session management. Each request is a fresh IP.

ISP Sticky

Use when cookies obtained during one request must be sent from the same IP later (e.g. after login):

import uuid

def get_sticky_proxy(base_user, password, host, port, sessttl_min=120):
    """Random session ID -> same IP for the next sessttl_min minutes.

    Username/password syntax varies by provider. The example below uses
    DataImpulse's `;sessid.X;sessttl.N` modifier appended to the username
    (semicolon-delimited, N in minutes).
    """
    sid = uuid.uuid4().hex[:12]
    username = f"{base_user};sessid.{sid};sessttl.{sessttl_min}"
    return f"http://{username}:{password}@{host}:{port}"

proxy = get_sticky_proxy("user__cr.fr", "pw", "gw.dataimpulse.com", 823)
session = CurlSession(impersonate="chrome", proxy=proxy)

The sessttl value pins the exit IP for that many minutes. Use a deterministic sessid (e.g. sha256(account_id)[:12]) when you need the same IP for a specific account across processes; a random sessid is fine when you just need within-process stickiness.

The French ISP Proxy Problem

If you need French IPs, be aware of a structural limitation:

Rotating residential proxies avoid this entirely: each request is a different IP from a different ASN.

Does IP Classification Matter?

Anti-Bot System IP Intelligence Source Uses IP2Location?
Akamai Bot Manager Own proprietary (~30% global traffic) No
Cloudflare Enterprise Own internal (~20% global traffic) No
DataDome Own ML + likely IP2Location (~25-30% of score) Probably

Akamai and Cloudflare have enough traffic to build their own IP databases. Third-party classification is irrelevant for them -- but the single-ASN problem still applies.

Rate Limiting

With Rotating Proxies

No per-IP state to manage. Focus on behavioral delays -- prevent detectable patterns across the IP pool:

import random
import time

delay = random.uniform(1.0, 3.0)  # jittered delay
time.sleep(delay)

If blocks start appearing, escalate: double your delays for a cooldown period. A good threshold is 3 blocks within 2 minutes triggering a 120-second backoff.

With Sticky Proxies

Rate limit per account/session, not per IP:

class RateLimiter:
    def __init__(self, min_delay=0.5, max_delay=1.5):
        self._last_request = {}

    def wait(self, key):
        jitter = random.uniform(0, self._max_delay - self._min_delay)
        earliest = self._last_request.get(key, 0) + self._min_delay + jitter
        sleep_for = max(0, earliest - time.time())
        if sleep_for > 0:
            time.sleep(sleep_for)
        self._last_request[key] = time.time()

Scraper API Alternative

All proxy management (rotation, rate limiting, TLS impersonation, challenge solving) can be offloaded to a scraper API:

import requests

# One HTTP call replaces all proxy/session/impersonation code
response = requests.get("http://api.scrape.do/", params={
    "url": "https://target.example.com/page",
    "token": SCRAPEDO_TOKEN,
    "geoCode": "FR",
}, timeout=30)

Trade-offs:

Self-managed Scraper API
Cost ~$6-30/mo (proxies) ~$29-99/mo
Infrastructure Proxy config, rate limiting, cookies, challenge solvers One env var
Control Full None (black box)
Reliability 95%+ (depends on tuning) ~100% (provider handles bypass)
Vendor lock-in None Single point of failure

Created 2026-04-09T11:53:49+02:00, updated 2026-05-15T23:35:48+02:00 · History · Edit