How to scrape Akamai-protected pages using Chrome TLS impersonation -- without a headless browser.
Akamai Bot Manager performs:
:method, :path, :authority, :scheme), and HPACK encoding against known browser profiles.Standard Python HTTP clients (requests, httpx, aiohttp) get blocked immediately. Even with perfect headers, the TLS handshake alone identifies them as bots.
curl_cffi is a Python binding for libcurl that can impersonate real browsers at the TLS level. Setting impersonate="chrome" reproduces Chrome's exact:
HEADER_TABLE_SIZE=65536, INITIAL_WINDOW_SIZE=6291456, etc.)u=0, i)from curl_cffi.requests import Session as CurlSession
from curl_cffi.const import CurlOpt
session = CurlSession(
impersonate="chrome",
timeout=10,
allow_redirects=True,
headers={"Accept-Language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7"},
proxy="http://user:pass@gate.provider.com:port",
curl_options={
CurlOpt.TCP_KEEPALIVE: 1,
CurlOpt.TCP_KEEPIDLE: 60,
CurlOpt.TCP_KEEPINTVL: 30,
CurlOpt.DNS_CACHE_TIMEOUT: 300,
CurlOpt.MAXCONNECTS: 10,
CurlOpt.PIPEWAIT: 1, # HTTP/2 multiplexing
CurlOpt.CONNECTTIMEOUT_MS: 3000,
CurlOpt.IPRESOLVE: 1, # IPv4-only -- skip AAAA + Happy Eyeballs
},
)
resp = session.get("https://target.example.com/search?q=test")curl_cffi's impersonate= handles User-Agent, Sec-CH-UA, Sec-CH-UA-Mobile, Sec-CH-UA-Platform, Accept, and Accept-Encoding automatically. Do not override these -- conflicting headers (e.g. wrong Sec-CH-UA version) cause Akamai to detect a mismatch between the TLS fingerprint and the declared browser identity.
The only header worth setting manually is Accept-Language, because curl_cffi doesn't localize this. Randomize it for entropy:
_ACCEPT_LANGUAGES = [
"fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7",
"fr-FR,fr;q=0.9,en-US;q=0.5,en;q=0.3",
"fr-FR,fr;q=0.9",
]
headers = {"Accept-Language": random.choice(_ACCEPT_LANGUAGES)}Verified against tls.peet.ws:
1:65536;2:0;4:6291456;6:262144|15663105|0|m,a,s,p
4 runtime divergences remain unfixable (WINDOW_UPDATE cadence, stream dependencies, HPACK encoding, TLS ALPS behavior) but these are weak signals, not blocking triggers.
Akamai uses several block response patterns. Detect them by content + size:
def detect_akamai_block(html):
if 'Pardon Our Interruption' in html:
return 'pardon'
if 'Access Denied' in html and len(html) < 10_000:
return 'access-denied'
if len(html) < 10_000 and re.search(r'Reference #\d+\.\w+', html):
return 'akamai-ref'
if len(html) < 30_000 and 'splashui' in html:
return 'splashui'
if len(html) < 5_000 and 'sensor_data' in html:
return 'sensor-challenge'
if len(html) < 5_000 and 'sec-cpt-if' in html:
return 'crypto-challenge'
return NoneThe len(html) guards prevent false positives -- a real results page is 500KB+, block pages are typically <10KB.
TLS impersonation alone isn't enough. Akamai performs cross-IP behavioral correlation, so machine-speed request patterns get flagged even with rotating proxies. Add jittered delays between requests:
delay = random.uniform(1.0, 3.0)
time.sleep(delay)If you start getting blocked, back off. A simple escalation pattern: track how many blocks you've hit recently, and if it crosses a threshold (e.g. 3 blocks in 2 minutes), double your delays for a cooldown period.
Created 2026-04-09T11:53:49+02:00 · Edit