How to use the HTTP cache and rate limiter¶

The stack has an HTTP cache and a rate limiter shared across all child projects (mitigation of failure mode F05). This guide covers how to inspect, configure, and clear them so you don't exhaust quotas or refetch DOIs.

Where they live¶

Two stdlib file-based components under ~/.belico/ (override with BELICO_CACHE_DIR):

~/.belico/http_cache.db — HTTP cache with per-host TTL (SQLite).
~/.belico/rate_limiter.db — persistent per-host token bucket (SQLite).

Two invocations of different tools share the same cache: if find_top_sources already downloaded a work, verify_citations reads it from cache without hitting OpenAlex.

TTL per host¶

Host	TTL	Reason
`api.openalex.org`	24h	Metadata stable day to day
`api.semanticscholar.org`	24h	Same
`api.crossref.org`	7d	DOI metadata nearly immutable
`api.elsevier.com`	30d	Paid quota; annual snapshots
`api.zotero.org`	1h	Personal library changes
default	24h	Conservative

Not cached: status ≥ 500, status 429, Cache-Control: no-store, POST without X-Cache-Idempotent: true.

Rate limiter limits¶

Host	Capacity	Refill/s	Polite pool
`api.openalex.org`	10	10.0	100
`api.semanticscholar.org`	1	0.33	100 (with key)
`api.crossref.org`	50	50.0	50
`api.elsevier.com`	10	10.0	10
default	5	1.0	5

Task: enable the polite pool¶

If you set BELICO_API_EMAIL, OpenAlex and Semantic Scholar recognize the client and raise capacity 10x. Add to .env:

BELICO_API_EMAIL=your_email@example.com

Task: inspect state¶

python tools/cache_inspector.py stats        # cache state
python tools/cache_inspector.py rate-stats   # available tokens, throttled count
python tools/cache_inspector.py paths        # DB paths

Task: clear the cache (stale / old data)¶

# Clear a specific host's cache
python tools/cache_inspector.py clear --host api.openalex.org

# Clear entries older than N days
python tools/cache_inspector.py clear --older-than 7d

Stale cache

If you updated metadata on OpenAlex/Zotero but the tool still sees the old value, it's the TTL. Clear the host with clear --host or wait for expiry.

Automatic behavior¶

tools/adapters/rest_json.py (get/post/request_raw) does:

cache_get() — on hit, returns without touching network or rate limiter.
rate_limiter.acquire(host) — blocks until a token is available.
Success → cache_put() with the host TTL.
429/5xx → exponential backoff with jitter, up to 3 retries. Not cached.

Per-call opt-out: get(url, use_cache=False) or get(url, rate_limited=False).