AI · 2025

DocLens.

Hybrid retrieval over enterprise docs — FAISS dense vectors fused with BM25, then re-ranked.

Role

Solo Engineer

Team

Solo project

Stack

Python · FAISS · rank-bm25 · sentence-transformers · FastAPI

Year

2025

01 / The problem

Why this needed building.

Pure vector search misses literal-term queries; pure BM25 misses paraphrased intent. Enterprise documentation queries shift between both styles inside a single session — names of internal systems on one query, conceptual questions the next. The retrieval layer has to handle both without me hand-tuning per query.

02 / Approach

How I broke it down.

Chunk documents with semantic-aware splitter (token budget + paragraph boundary fallback) so neither index over-fragments a coherent argument.

Build dense index in FAISS with sentence-transformers MPNet embeddings; sparse index with rank-bm25 on the same chunk set so scores stay comparable.

Fuse at rank-time using reciprocal rank fusion (RRF) — score-agnostic so I don't need to normalize across two very different distributions.

Re-rank top-k with a cross-encoder for the final list; cheap because k is small after fusion.

03 / System

The pipeline, stage by stage.

Scroll to walk through each stage. Each is small on its own; what matters is the composition.

STAGE / 01
Chunk.
Semantic-aware splitter respects token budget while keeping paragraph boundaries intact. Coherent arguments stay together.
STAGE / 02
Dense index.
FAISS over MPNet embeddings. Captures paraphrased and conceptual intent; weak on literal-term lookups.
STAGE / 03
Sparse index.
rank-bm25 over the same chunk set so the two retrievers stay aligned. Captures literal names, acronyms, IDs.
STAGE / 04
Fuse.
Reciprocal rank fusion at query time. Score-agnostic — no normalization needed across two very different distributions.
STAGE / 05
Re-rank.
Cross-encoder over the top-k. Small k keeps the cost bounded; this is where the final ordering is earned.

04 / Outcomes

What it ended up being good at.

Recall@10 improved over dense-only baseline on a hand-built eval set of 80 queries spanning both literal and conceptual styles.

p95 latency stayed under 350ms end-to-end on a single machine — the cross-encoder is the cost, not the fusion.

Currently exploring learned fusion weights instead of the static 0.5/0.5 RRF; treating this as DocLens v2.

More work

Keep reading.

AI · Full-stack · Product

Penny.

Local-first finance tracker that keeps the LLM out of the hot path.

AI · Full-stack · Product

Goodle.

Pet adoption app that turns photos into personality profiles.

DocLens.

Why this needed building.

How I broke it down.

The pipeline, stage by stage.

Chunk.

Dense index.

Sparse index.

Fuse.

Re-rank.

What it ended up being good at.

Keep reading.

Penny.

Goodle.