Bypassing Akamai Bot Manager with curl_cffi

How to scrape Akamai-protected pages using Chrome TLS impersonation -- without a headless browser.

What Akamai Detects

Akamai Bot Manager performs:

Standard Python HTTP clients (requests, httpx, aiohttp) get blocked immediately. Even with perfect headers, the TLS handshake alone identifies them as bots.

curl_cffi: Chrome TLS Impersonation

curl_cffi is a Python binding for libcurl that can impersonate real browsers at the TLS level. Setting impersonate="chrome" reproduces Chrome's exact:

from curl_cffi.requests import Session as CurlSession
from curl_cffi.const import CurlOpt

session = CurlSession(
    impersonate="chrome",
    timeout=10,
    allow_redirects=True,
    headers={"Accept-Language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7"},
    proxy="http://user:pass@gate.provider.com:port",
    curl_options={
        CurlOpt.TCP_KEEPALIVE: 1,
        CurlOpt.TCP_KEEPIDLE: 60,
        CurlOpt.TCP_KEEPINTVL: 30,
        CurlOpt.DNS_CACHE_TIMEOUT: 300,
        CurlOpt.MAXCONNECTS: 10,
        CurlOpt.PIPEWAIT: 1,           # HTTP/2 multiplexing
        CurlOpt.CONNECTTIMEOUT_MS: 3000,
        CurlOpt.IPRESOLVE: 1,          # IPv4-only -- skip AAAA + Happy Eyeballs
    },
)

resp = session.get("https://target.example.com/search?q=test")

What NOT to Set Manually

curl_cffi's impersonate= handles User-Agent, Sec-CH-UA, Sec-CH-UA-Mobile, Sec-CH-UA-Platform, Accept, and Accept-Encoding automatically. Do not override these -- conflicting headers (e.g. wrong Sec-CH-UA version) cause Akamai to detect a mismatch between the TLS fingerprint and the declared browser identity.

The only header worth setting manually is Accept-Language, because curl_cffi doesn't localize this. Randomize it for entropy:

_ACCEPT_LANGUAGES = [
    "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7",
    "fr-FR,fr;q=0.9,en-US;q=0.5,en;q=0.3",
    "fr-FR,fr;q=0.9",
]

headers = {"Accept-Language": random.choice(_ACCEPT_LANGUAGES)}

TLS Fingerprint Verification

Verified against tls.peet.ws:

4 runtime divergences remain unfixable (WINDOW_UPDATE cadence, stream dependencies, HPACK encoding, TLS ALPS behavior) but these are weak signals, not blocking triggers.

Block Detection

Akamai uses several block response patterns. Detect them by content + size:

def detect_akamai_block(html):
    if 'Pardon Our Interruption' in html:
        return 'pardon'
    if 'Access Denied' in html and len(html) < 10_000:
        return 'access-denied'
    if len(html) < 10_000 and re.search(r'Reference #\d+\.\w+', html):
        return 'akamai-ref'
    if len(html) < 30_000 and 'splashui' in html:
        return 'splashui'
    if len(html) < 5_000 and 'sensor_data' in html:
        return 'sensor-challenge'
    if len(html) < 5_000 and 'sec-cpt-if' in html:
        return 'crypto-challenge'
    return None

The len(html) guards prevent false positives -- a real results page is 500KB+, block pages are typically <10KB.

Avoiding Behavioral Detection

TLS impersonation alone isn't enough. Akamai performs cross-IP behavioral correlation, so machine-speed request patterns get flagged even with rotating proxies. Add jittered delays between requests:

delay = random.uniform(1.0, 3.0)
time.sleep(delay)

If you start getting blocked, back off. A simple escalation pattern: track how many blocks you've hit recently, and if it crosses a threshold (e.g. 3 blocks in 2 minutes), double your delays for a cooldown period.

Results

Created 2026-04-09T11:53:49+02:00 · Edit