How it works — EPL Insider

Data pipeline

1

RSS ingestion — 18 sources, every 30 minutes

A scheduled job fetches from BBC Sport, The Guardian, Sky Sports, ESPN, and 15 other Premier League news outlets using feedparser. Articles are deduplicated by SHA-256 content hash before anything hits the database.

2

Embedding — sentence-transformers/all-MiniLM-L6-v2

Each article title + summary is encoded into a 384-dimensional vector using a local sentence-transformers model. Embeddings are generated in batches of 32 and L2-normalised for cosine similarity search.

3

Vector storage — Weaviate

Vectors and article metadata are upserted into a Weaviate collection using deterministic UUID5 IDs. An HNSW index with cosine distance powers sub-millisecond similarity search at query time.

Chat flow

4

Query enrichment

Follow-up questions (short messages or those containing pronouns like "him", "they", "it") are automatically enriched with context from the previous assistant turn before the retrieval step runs — so "tell me more about him" correctly resolves to the entity just discussed.

5

Retrieval — top-5 nearest neighbours

The user's message is embedded and compared against the entire article store. The five most semantically relevant articles are retrieved and injected into the LLM context window as grounding material.

6

Agentic tool calling — GPT-4o-mini

The LLM decides autonomously whether to call tools before answering: live standings, fixtures, and top scorers from football-data.org, or biographical lookups from the Wikipedia API for owners, managers, and player backgrounds. Tool results fold back into the context before the final answer is generated.

7

Streaming response

The final answer streams token-by-token over Server-Sent Events so the response feels instant. Source articles are attached at the end. Conversation history is persisted in PostgreSQL and rehydrated on return visits via a 90-day session cookie.

Personalised digest

8

Preference capture & visit tracking

On first visit, users choose which of the 20 Premier League clubs they follow. Every page load is logged so the app knows exactly when you last visited.

9

Digest generation & caching

On return visits, articles published since your last session are retrieved for your chosen clubs and summarised by GPT-4o-mini into 3–5 digest items, each tagged with a category (result, transfer, injury, news) and a suggested follow-up prompt. Results are cached for 3 hours to avoid redundant LLM calls.

Stack

API

FastAPI

Async Python API with Server-Sent Events for streaming

Vector DB

Weaviate

HNSW index for cosine similarity search over article embeddings

Embeddings

MiniLM-L6-v2

Local sentence-transformers model, 384-dim vectors

LLM

GPT-4o-mini

OpenAI chat completions with tool-calling support

Database

PostgreSQL

Conversation history, preferences, digest cache

Observability

Arize Phoenix

OpenTelemetry tracing for every retrieval, tool call, and LLM span