Cómo usar el cache HTTP y el rate limiter¶

El stack tiene un cache HTTP y un rate limiter compartidos entre todos los proyectos hijos (mitigación del fallo F05). Esta guía resuelve cómo inspeccionarlos, configurarlos y limpiarlos para no agotar cuotas ni refetchear DOIs.

Dónde viven¶

Dos componentes stdlib file-based en ~/.belico/ (override con BELICO_CACHE_DIR):

~/.belico/http_cache.db — cache HTTP con TTL por host (SQLite).
~/.belico/rate_limiter.db — token bucket persistente por host (SQLite).

Dos invocaciones de tools distintos comparten el mismo cache: si find_top_sources ya bajó un work, verify_citations lo lee del cache sin pegarle a OpenAlex.

TTL por host¶

Host	TTL	Razón
`api.openalex.org`	24h	Metadata estable día a día
`api.semanticscholar.org`	24h	Igual
`api.crossref.org`	7d	DOI metadata casi inmutable
`api.elsevier.com`	30d	Quota paga; snapshots anuales
`api.zotero.org`	1h	Biblio personal cambia
default	24h	Conservador

No se cachea: status ≥ 500, status 429, header Cache-Control: no-store, POST sin X-Cache-Idempotent: true.

Límites del rate limiter¶

Host	Capacity	Refill/s	Polite pool
`api.openalex.org`	10	10.0	100
`api.semanticscholar.org`	1	0.33	100 (con key)
`api.crossref.org`	50	50.0	50
`api.elsevier.com`	10	10.0	10
default	5	1.0	5

Tarea: habilitar el polite pool¶

Si seteás BELICO_API_EMAIL, OpenAlex y Semantic Scholar reconocen al cliente y elevan la capacity 10x. Agregá a .env:

BELICO_API_EMAIL=tu_email@ejemplo.com

Tarea: inspeccionar el estado¶

python tools/cache_inspector.py stats        # estado del cache
python tools/cache_inspector.py rate-stats   # tokens disponibles, throttled count
python tools/cache_inspector.py paths        # paths de las DBs

Tarea: limpiar el cache (datos viejos / stale)¶

# Limpiar el cache de un host específico
python tools/cache_inspector.py clear --host api.openalex.org

# Limpiar entradas más viejas que N días
python tools/cache_inspector.py clear --older-than 7d

Cache stale

Si actualizaste metadata en OpenAlex/Zotero pero el tool sigue viendo el valor viejo, es el TTL. Limpiá el host con clear --host o esperá a que expire.

Comportamiento automático¶

tools/adapters/rest_json.py (get/post/request_raw) hace:

cache_get() — si hit, devuelve sin tocar red ni rate limiter.
rate_limiter.acquire(host) — bloquea hasta tener token.
Éxito → cache_put() con el TTL del host.
429/5xx → backoff exponencial con jitter, hasta 3 reintentos. NO cachea.

Opt-out caso a caso: get(url, use_cache=False) o get(url, rate_limited=False).

Ver también¶

Troubleshooting — síntomas de rate limit y stale cache.
FMEA del stack — el fallo F05 que esto mitiga.

Fuente canónica

Deriva de docs/shared/CACHE_AND_RATE_LIMIT.md.