slarty firehose — Public maps for your private RAG

You don't build the maps. You import them.

We crawl, chunk, embed, and project massive public datasets into 82D — Wikipedia, arXiv, SEC EDGAR, podcast transcripts. We keep them fresh. You pull the maps into your private RAG and search everything together: your internal data + the public maps, in one query.

The maps work with any embedding model. The Primer (W matrix) translates your model's vectors into the same 82D coordinate system. Two models that have never met, agreeing on what words mean. That's cross-model search — and it's why the maps are portable.

How it works in your private RAG

One import. Your private vectors + our public maps = one unified search.

from slarty_agent import Firehose

# One line: import the latest public maps
fh = Firehose()

# Your private vectors + our maps = one unified search
results = fh.search(
    "AI regulation compliance impact on Q3 filings",
    private_collection="my_company_rag"   # your gear
)

# Generate report that mixes your internal data with public maps
report = results.generate_report()
        

The agent searches Wikipedia, arXiv, EDGAR, and Podcasts in the same query as your private data. You keep control. We keep the maps fresh.

Run it on your gear

click to expand

Pipeline diagram Tap to expand

flowchart LR Q(["query"]) --> EMB["your model"] --> P{{"Primer"}} --> A(["82D agent"]) A -.->|"native"| RAG[("your RAG")] A ==>|"328 bytes"| V82(["82D search"]) V82 -.->|"demo"| REF(["references\n↗ title · URL · ID"]) V82 ==>|"managed"| FEED(["structured passages\ntext · metadata · context"]) A ==>|"local"| LOCAL[("our index\non your disk")] style Q fill:#1a1816,stroke:#3a3632,color:#a09888 style EMB fill:#1a1816,stroke:#3a3632,color:#a09888 style RAG fill:#1a1816,stroke:#3a3632,color:#a09888 style P fill:#0b1f18,stroke:#5a8a7a,color:#8ccab5,stroke-width:3px style A fill:#0b1f18,stroke:#5a8a7a,color:#8ccab5,stroke-width:2px style V82 fill:#0b1f18,stroke:#5a8a7a,color:#8ccab5,stroke-width:2px style REF fill:#161616,stroke:#3a3632,color:#a09888 style FEED fill:#0b1f18,stroke:#5a8a7a,color:#8ccab5,stroke-width:2px style LOCAL fill:#0f1a16,stroke:#8a7ab0,color:#b8a0e0,stroke-width:2px linkStyle 0,1,2 stroke:#4a4640,stroke-width:1.5px linkStyle 3 stroke:#6a6560,stroke-width:1.5px,stroke-dasharray:6 linkStyle 4 stroke:#5a8a7a,stroke-width:2.5px linkStyle 5 stroke:#6a6560,stroke-width:1.5px,stroke-dasharray:6 linkStyle 6 stroke:#5a8a7a,stroke-width:2.5px linkStyle 7 stroke:#8a7ab0,stroke-width:2.5px

Local first. Managed if you want it. Demo to try it.

Every tier sends the same 328-byte query. The difference is where the search runs.

	Local	Managed	Demo (free)
You get back	You search your own copy	Structured passages text + metadata + context	References title, URL, ID
Your agent has to	Run FAISS locally	Nothing — data arrives ready	Fetch text from source
Pricing	Flat monthly	$2 / GB transferred	Free Rate-limited
Data leaves your network	Never	328 bytes per query	328 bytes per query
Custom silos	Yes	Coming soon	—
GPU required	No	No	No

Every tier sends the same 328-byte query. Primer included at every tier. No GPU required.

Where the change happens

One line. After your embedding call, before your search call. vec_82d = emb @ W — a single matrix multiply that takes 0.14ms on a laptop CPU. Everything upstream and downstream stays untouched. The Primer is the only new part.

What it costs

Local: flat monthly. Nothing leaves your network. Full index on your disk, your queries, your data.

Managed: $2 per GB transferred. Structured passages, metadata, context. One round trip.

Demo: free, rate-limited. References only.

Why local is the way

Your RAG is locked to one embedding model. Swap models and you re-embed everything — weeks of compute, thousands of dollars, zero uptime. With local 82D, the index lives on your disk and never changes. Swap your embedding model, download the new Primer, keep searching. No API calls. No round trips. No data leaving your network.

Get Started

Three ways to search Wikipedia in 82 dimensions.

# pip install numpy requests sentence-transformers
import numpy as np, requests
from sentence_transformers import SentenceTransformer

# 1. Download W (125 KB, cached forever)
W = np.load("W_minilm.npy")  # or: GET /w/minilm

# 2. Embed locally
model = SentenceTransformer("all-MiniLM-L6-v2")
emb = model.encode("history of the Roman Empire")

# 3. Project to 82D (0.14ms)
vec_82d = emb @ W
vec_82d /= np.linalg.norm(vec_82d)

# 4. Search 41.5M passages (328 bytes out, results back)
r = requests.post(API + "/search_vector", json={
    "vector": vec_82d.tolist(), "top_k": 5
})
for hit in r.json()["results"]:
    print(f"{hit['title']}: {hit['score']:.4f}")
        

# Text search (server embeds + projects for you)
curl -X POST API_URL/search \
  -H "Content-Type: application/json" \
  -d '{"query": "quantum entanglement", "top_k": 5}'

# Download W matrix (125 KB)
curl -o W_minilm.npy API_URL/w/minilm

# List available W matrices
curl API_URL/w

# Check system health
curl API_URL/health
        

# Already have 82D vectors? Send them directly.
# No embedding model needed on your side.

curl -X POST API_URL/search_vector \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.123, -0.456, 0.789, ...],
    "top_k": 10
  }'

# 82 floats = 328 bytes
# FAISS + BM25 + cross-encoder rerank
# Returns title, text, URL, similarity score
        

client.py W_minilm.npy 125 KB W_cohere.npy 336 KB

Pull fresh public maps into your private RAG.