82d Firehose: Wikipedia

The Smallest Search Protocol

1. Download one small file 125 KB — one-time, cached forever 2. Turn text into numbers local — runs on your laptop, nothing leaves 3. Compress to 82 numbers 0.14ms — instant, single operation 4. Send a tiny query 328 B — smaller than this sentence 5. Get answers 25ms — searched 41.5M passages on our GPU

The Wikipedia corpus was embedded by Cohere embed-v3 (1024D). Your query is embedded by MiniLM-L6-v2 (384D). Two neural networks that have never met — different architectures, different training data, different dimensions — agreeing on what words mean, in 82 dimensions.

That's Managed Query. You don't run the index. You don't store 87GB of embeddings. You don't manage a GPU. You send 328 bytes and get answers from the entire English Wikipedia.

How It Works

The Corpus

41.5 million Wikipedia passages, pre-embedded by Cohere embed-v3 (1024D), then projected to 82D using our trained W matrix. 13 GB on a single A100.

The Projection

Not a random matrix — trained to preserve both intra-model geometry and cross-model consensus. >99% R@1 across models.

The Query Path

You embed locally with any supported model. Multiply by W. Send 82 floats (328 bytes) to the Firehose API. GPU cosine search returns top-k results in ~25ms.

Cross-Model Magic

Cohere 1024D corpus, queried by MiniLM 384D. Different models, same 82D consensus space. No re-embedding. No model lock-in. Query any index with any model.

What You Don't Run

No GPU. No 87 GB of embeddings. No FAISS cluster. No embedding API costs. Download 125 KB, embed on your laptop, search the world.

Managed Query

Query our indices. Don't run your own. Wikipedia is the first dataset. More coming — arXiv, Common Crawl, Stack Overflow. One W matrix per model.

Get Started

Three ways to search Wikipedia in 82 dimensions.

# pip install numpy requests sentence-transformers
import numpy as np, requests
from sentence_transformers import SentenceTransformer

# 1. Download W (125 KB, cached forever)
W = np.load("W_minilm.npy")  # or: GET /w/minilm

# 2. Embed locally
model = SentenceTransformer("all-MiniLM-L6-v2")
emb = model.encode("history of the Roman Empire")

# 3. Project to 82D (0.14ms)
vec_82d = emb @ W
vec_82d /= np.linalg.norm(vec_82d)

# 4. Search 41.5M passages (328 bytes out, results back)
r = requests.post(API + "/search_vector", json={
    "vector": vec_82d.tolist(), "top_k": 5
})
for hit in r.json()["results"]:
    print(f"{hit['title']}: {hit['score']:.4f}")
        

# Text search (server embeds + projects for you)
curl -X POST API_URL/search \
  -H "Content-Type: application/json" \
  -d '{"query": "quantum entanglement", "top_k": 5}'

# Download W matrix (125 KB)
curl -o W_minilm.npy API_URL/w/minilm

# List available W matrices
curl API_URL/w

# Check system health
curl API_URL/health
        

# Already have 82D vectors? Send them directly.
# No embedding model needed on your side.

curl -X POST API_URL/search_vector \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.123, -0.456, 0.789, ...],
    "top_k": 10
  }'

# 82 floats = 328 bytes
# GPU searches 41.5M passages in ~25ms
# Returns title, text, URL, similarity score
        

client.py W_minilm.npy 125 KB W_cohere.npy 336 KB

82d Firehose Access