Skip to content

How to use the HTTP cache and rate limiter

The stack has an HTTP cache and a rate limiter shared across all child projects (mitigation of failure mode F05). This guide covers how to inspect, configure, and clear them so you don't exhaust quotas or refetch DOIs.

Where they live

Two stdlib file-based components under ~/.belico/ (override with BELICO_CACHE_DIR):

  • ~/.belico/http_cache.db — HTTP cache with per-host TTL (SQLite).
  • ~/.belico/rate_limiter.db — persistent per-host token bucket (SQLite).

Two invocations of different tools share the same cache: if find_top_sources already downloaded a work, verify_citations reads it from cache without hitting OpenAlex.

TTL per host

Host TTL Reason
api.openalex.org 24h Metadata stable day to day
api.semanticscholar.org 24h Same
api.crossref.org 7d DOI metadata nearly immutable
api.elsevier.com 30d Paid quota; annual snapshots
api.zotero.org 1h Personal library changes
default 24h Conservative

Not cached: status ≥ 500, status 429, Cache-Control: no-store, POST without X-Cache-Idempotent: true.

Rate limiter limits

Host Capacity Refill/s Polite pool
api.openalex.org 10 10.0 100
api.semanticscholar.org 1 0.33 100 (with key)
api.crossref.org 50 50.0 50
api.elsevier.com 10 10.0 10
default 5 1.0 5

Task: enable the polite pool

If you set BELICO_API_EMAIL, OpenAlex and Semantic Scholar recognize the client and raise capacity 10x. Add to .env:

BELICO_API_EMAIL=your_email@example.com

Task: inspect state

python tools/cache_inspector.py stats        # cache state
python tools/cache_inspector.py rate-stats   # available tokens, throttled count
python tools/cache_inspector.py paths        # DB paths

Task: clear the cache (stale / old data)

# Clear a specific host's cache
python tools/cache_inspector.py clear --host api.openalex.org

# Clear entries older than N days
python tools/cache_inspector.py clear --older-than 7d

Stale cache

If you updated metadata on OpenAlex/Zotero but the tool still sees the old value, it's the TTL. Clear the host with clear --host or wait for expiry.

Automatic behavior

tools/adapters/rest_json.py (get/post/request_raw) does:

  1. cache_get() — on hit, returns without touching network or rate limiter.
  2. rate_limiter.acquire(host) — blocks until a token is available.
  3. Success → cache_put() with the host TTL.
  4. 429/5xx → exponential backoff with jitter, up to 3 retries. Not cached.

Per-call opt-out: get(url, use_cache=False) or get(url, rate_limited=False).

See also

Canonical source

Derives from docs/shared/CACHE_AND_RATE_LIMIT.md.