EPL Insider

← Back to chat

How it works

EPL Insider is a Retrieval-Augmented Generation (RAG) chatbot built on top of a live Premier League news pipeline. Here's what happens under the hood.

Data pipeline
1
RSS ingestion — 18 sources, every 30 minutes
A scheduled job fetches from BBC Sport, The Guardian, Sky Sports, ESPN, and 15 other Premier League news outlets using feedparser. Articles are deduplicated by SHA-256 content hash before anything hits the database.
2
Embedding — sentence-transformers/all-MiniLM-L6-v2
Each article title + summary is encoded into a 384-dimensional vector using a local sentence-transformers model. Embeddings are generated in batches of 32 and L2-normalised for cosine similarity search.
3
Vector storage — Weaviate
Vectors and article metadata are upserted into a Weaviate collection using deterministic UUID5 IDs. An HNSW index with cosine distance powers sub-millisecond similarity search at query time.
Chat flow
4
Query enrichment
Follow-up questions (short messages or those containing pronouns like "him", "they", "it") are automatically enriched with context from the previous assistant turn before the retrieval step runs — so "tell me more about him" correctly resolves to the entity just discussed.
5
Retrieval — top-5 nearest neighbours
The user's message is embedded and compared against the entire article store. The five most semantically relevant articles are retrieved and injected into the LLM context window as grounding material.
6
Agentic tool calling — GPT-4o-mini
The LLM decides autonomously whether to call tools before answering: live standings, fixtures, and top scorers from football-data.org, or biographical lookups from the Wikipedia API for owners, managers, and player backgrounds. Tool results fold back into the context before the final answer is generated.
7
Streaming response
The final answer streams token-by-token over Server-Sent Events so the response feels instant. Source articles are attached at the end. Conversation history is persisted in PostgreSQL and rehydrated on return visits via a 90-day session cookie.
Personalised digest
8
Preference capture & visit tracking
On first visit, users choose which of the 20 Premier League clubs they follow. Every page load is logged so the app knows exactly when you last visited.
9
Digest generation & caching
On return visits, articles published since your last session are retrieved for your chosen clubs and summarised by GPT-4o-mini into 3–5 digest items, each tagged with a category (result, transfer, injury, news) and a suggested follow-up prompt. Results are cached for 3 hours to avoid redundant LLM calls.
Stack
API
FastAPI
Async Python API with Server-Sent Events for streaming
Vector DB
Weaviate
HNSW index for cosine similarity search over article embeddings
Embeddings
MiniLM-L6-v2
Local sentence-transformers model, 384-dim vectors
LLM
GPT-4o-mini
OpenAI chat completions with tool-calling support
Database
PostgreSQL
Conversation history, preferences, digest cache
Observability
Arize Phoenix
OpenTelemetry tracing for every retrieval, tool call, and LLM span