Skip to content
Index of work

AI · 2025

DocLens.

Hybrid retrieval over enterprise docs — FAISS dense vectors fused with BM25, then re-ranked.

Role

Solo Engineer

Team

Solo project

Stack

Python · FAISS · rank-bm25 · sentence-transformers · FastAPI

Year

2025

01 / The problem

Why this needed building.

Pure vector search misses literal-term queries; pure BM25 misses paraphrased intent. Enterprise documentation queries shift between both styles inside a single session — names of internal systems on one query, conceptual questions the next. The retrieval layer has to handle both without me hand-tuning per query.

02 / Approach

How I broke it down.

  1. 01

    Chunk documents with semantic-aware splitter (token budget + paragraph boundary fallback) so neither index over-fragments a coherent argument.

  2. 02

    Build dense index in FAISS with sentence-transformers MPNet embeddings; sparse index with rank-bm25 on the same chunk set so scores stay comparable.

  3. 03

    Fuse at rank-time using reciprocal rank fusion (RRF) — score-agnostic so I don't need to normalize across two very different distributions.

  4. 04

    Re-rank top-k with a cross-encoder for the final list; cheap because k is small after fusion.

03 / System

The pipeline, stage by stage.

Scroll to walk through each stage. Each is small on its own; what matters is the composition.

  1. STAGE / 01

    Chunk.

    Semantic-aware splitter respects token budget while keeping paragraph boundaries intact. Coherent arguments stay together.

  2. STAGE / 02

    Dense index.

    FAISS over MPNet embeddings. Captures paraphrased and conceptual intent; weak on literal-term lookups.

  3. STAGE / 03

    Sparse index.

    rank-bm25 over the same chunk set so the two retrievers stay aligned. Captures literal names, acronyms, IDs.

  4. STAGE / 04

    Fuse.

    Reciprocal rank fusion at query time. Score-agnostic — no normalization needed across two very different distributions.

  5. STAGE / 05

    Re-rank.

    Cross-encoder over the top-k. Small k keeps the cost bounded; this is where the final ordering is earned.

04 / Outcomes

What it ended up being good at.

  • Recall@10 improved over dense-only baseline on a hand-built eval set of 80 queries spanning both literal and conceptual styles.

  • p95 latency stayed under 350ms end-to-end on a single machine — the cross-encoder is the cost, not the fusion.

  • Currently exploring learned fusion weights instead of the static 0.5/0.5 RRF; treating this as DocLens v2.