Company Intelligence

Many agents, one grounded brain.

Your whole company's knowledge answers in seconds - grounded in your real docs, cited so you can verify it, and able to take the next step instead of just replying.

Scope this build ↳ Free agentic audit · zero obligation

Company brain642queries today

Ingest

Notion · Slack · tickets

running

Retrieve

Hybrid search · re-rank

queued

Answer

Grounded · cited

queued

Complete

—

09:41:00retrieve → 8 sources

09:40:59rerank · top 4 kept

09:40:58grounded ✓ · cited

09:40:57notion.indexed · 1.2k docs

↳ Built on the stack that ships

Claude CodeAgent SDKn8nRailwayVercelSupabase

[ 000 ]Trusted by operators

[ 01 ]What it is

The capability, defined.

This is the deep end of the ladder: a system, not a chatbot. A RAG layer that answers from your verified documents, tickets, and history instead of guessing - with citations you can click - and, above it, multiple agents coordinating like a team: an orchestrator planning, specialists executing in parallel. It's how you move from one helpful reply to a system that runs the whole workflow, grounded in what's actually true at your company.

Not that · ✓ this

Not a ChatGPT wrapper. Not a model with a prompt that guesses when it doesn't know. It's a grounded RAG layer over your verified sources with click-through citations, and above it a team of agents - an orchestrator planning, specialists executing in parallel.

[ 02 ]The status quo

What this costs you today.

The answer exists somewhere at your company - in a doc, a ticket, someone's head - and finding it still takes a Slack message and a half-day wait.

Knowledge lives in one person's head or a buried Notion page, and new hires ask the same five people for months.

The chatbot you tried hallucinates confidently - it makes things up when it doesn't know, so nobody trusts it for anything that matters.

There are no citations, so even a right answer can't be verified - and one wrong answer in front of a customer poisons the whole tool.

Knowledge rots the day a doc goes stale, because nothing re-indexes when your business changes.

[ 03 ]What we build

The anatomy of the system.

Most RAG failures are retrieval failures, not model failures. So we engineer the knowledge layer to a real standard first, then put agents on top of it - and we measure groundedness the whole way.

Ingestion

Your SOPs, Notion, tickets, Slack, and docs are pulled in with metadata and freshness tracking, and re-indexed continuously - so the brain knows what it knows and when it last changed.

Chunking + embeddings

Content is split semantically (not by blind character count) and embedded into a vector store - the chunking strategy that decides whether retrieval ever finds the right passage.

Hybrid retrieval + re-ranker

Dense vector search and sparse keyword (BM25) run together, then a cross-encoder re-ranker keeps only the few passages that truly answer the question - the step that kills 'close but wrong' results.

Grounding + Citations

The model answers only from retrieved text and attaches a source to every claim via the Citations API - unsupported sentences get stripped or sent back to retrieval, and 'I don't know' beats a guess.

Orchestrator-worker agents

A lead agent plans and delegates to specialists running in parallel, then synthesizes - the same pattern Anthropic runs for its own research system - so the system executes a workflow, not just a reply.

Evals + groundedness

Faithfulness, context-precision, and hallucination are scored continuously with an LLM-as-judge against a golden dataset - quality is measured on real traffic, not asserted at launch.

[ 04 ]How it works

Engineered, not prompted.

We follow Anthropic's own complexity ladder - start simple, add agents only where they earn their keep - and build on Claude Code, the Claude Agent SDK, n8n, Railway, Vercel, Cloudflare, and Supabase.

Ingest

Your SOPs, Notion, tickets, Slack, and docs get chunked, embedded, and indexed - with metadata and freshness tracking so the brain knows what it knows and when it last changed.

Retrieve

A hybrid search (semantic + keyword) pulls the few passages that actually answer the question, re-ranks them, and hands the model only verified source text - not the open internet.

Answer / Act

The system responds grounded in those sources with citations you can click through - or, when the job calls for it, an agent takes the next step across your tools instead of just replying.

How we engineer it

Ground it first

Most failures are retrieval failures. We build the knowledge layer - chunking, hybrid search, and evaluation - before any agent touches it, so the system answers from facts, not vibes.

Add agents where judgment is needed

Deterministic steps stay deterministic. We only hand control to an agent where the path genuinely has to be decided at runtime - that's the Anthropic line between a workflow and an agent.

Orchestrate like a team

A lead agent plans and delegates to specialists running in parallel, then synthesizes - the same orchestrator-worker pattern Anthropic uses for its own research system.

Make it observable

Every retrieval, decision, and tool call is logged and evaluated. You see why the system did what it did - no black box.

[ 05 ]Example builds

What this looks like in the wild.

Company Brain

A grounded brain over your SOPs, Notion, Slack, and tickets - ask it anything in plain language and it answers with clickable sources, or executes the task across your tools.

Research & analysis system

An orchestrator spawns specialist agents to research, cross-check, and synthesize - turning a vague question into a sourced, structured answer no single context window could hold.

Customer-facing support brain

A retrieval-grounded assistant over your help center and ticket history that resolves or drafts cited replies - safe to put in front of customers because it cites and defers.

Internal ops copilot

A grounded layer over your policies and systems that reads context, decides, and acts across tools - the multi-step work that used to need a person watching every step.

[ 06 ]By the numbers

The reliability that ships.

Retrieval-first

Where most RAG quality is won or lost - hybrid search plus a re-ranker is the 2026 production default precisely because naive single-vector retrieval misses the right passage too often.

Per-claim citations

The grounding bar that separates a demo from production - every sentence carries a chunk ID, and uncited claims get stripped or rewritten before the answer ships.

Continuous evals

Groundedness scored on live traffic, not just at launch - and only ~15% of GenAI deployments instrument this today (Gartner), which is exactly why most company chatbots quietly drift.

↳ Industry benchmarks and engineering standards, not Anfloy client metrics - we report your real numbers once you're live.

[ 07 ]The stack

Named tools, and why.

The model is fungible - the system is the moat. Here's what we build it on, and the reason each earns its place.

Claude (Anthropic API)

Grounded generation plus the native Citations API, so every answer carries verifiable, click-through sources instead of unsupported confidence.

Vector DB (pgvector / Pinecone / Qdrant)

The embedding store behind dense retrieval - on Supabase Postgres when you want one fewer system to run, dedicated when scale demands it.

Hybrid search + BM25

Dense and sparse retrieval together catch both meaning and exact terms - the combination that single-vector RAG keeps missing.

Re-ranker (Cohere / cross-encoder)

Re-scores the candidate passages so the model sees only the few that actually answer - the single highest-leverage fix for 'close but wrong' answers.

LangGraph

Orchestrates the lead-and-specialist agent team with persistent state and parallel workers - the orchestrator-worker pattern, in production form.

MCP (Model Context Protocol)

Standard, portable connectors to your data sources and tools, so the brain reads from and acts in your real systems without bespoke glue.

Langfuse / Ragas

Continuous groundedness, faithfulness, and context-precision evals with LLM-as-judge - so hallucination is caught and measured, not discovered by a customer.

[ 08 ]The architectural difference

Why not just a ChatGPT wrapper?

A naive chatbot is a model with a prompt - it answers from its training data and guesses when it doesn't know. A grounded RAG system answers from your verified sources, cites them, and is built to say 'I don't know' instead of inventing. The gap is the difference between a demo and something you'd put in front of a customer.

· Dimension

· Naive chatbot

· Anfloy custom

Source of truth

The model's training data - frozen, generic.

Your live docs, tickets, and history - retrieved per query.

Hallucination

Guesses confidently when it doesn't know.

Grounded in retrieved text; defers when there's no match.

Citations

None - you can't verify the answer.

Click-through sources on every response (Citations API).

Freshness

Stale the moment your business changes.

Re-indexed continuously as your knowledge updates.

Quality control

Vibes. Nobody catches the wrong answers.

Continuous groundedness evals - failures get measured, not missed.

Doing the work

Answers, then stops.

Answers, or executes the task across your tools.

Source of truth

Naive chatbotThe model's training data - frozen, generic.

Anfloy customYour live docs, tickets, and history - retrieved per query.

Hallucination

Naive chatbotGuesses confidently when it doesn't know.

Anfloy customGrounded in retrieved text; defers when there's no match.

Citations

Naive chatbotNone - you can't verify the answer.

Anfloy customClick-through sources on every response (Citations API).

Freshness

Naive chatbotStale the moment your business changes.

Anfloy customRe-indexed continuously as your knowledge updates.

Quality control

Naive chatbotVibes. Nobody catches the wrong answers.

Anfloy customContinuous groundedness evals - failures get measured, not missed.

Doing the work

Naive chatbotAnswers, then stops.

Anfloy customAnswers, or executes the task across your tools.

[ 09 ]Who it's for

The honest fit check.

Build this if

Companies with knowledge scattered across docs, tickets, and tools - support, ops, sales enablement, or a product that needs answers grounded in proprietary data - who need answers people can actually trust and verify.

Skip it if

If one well-scoped agent or a single retrieval step already does the job, you don't need a multi-agent system - we'll build the simpler thing. And if your knowledge base is thin, messy, or has no stable source of truth, the honest first move is fixing the data, not wrapping a model around it.

[ 10 ]Questions

The honest answers.

Q01

When do I actually need multi-agent instead of one agent?

Only when the work is genuinely parallel or too broad for one context window - research across many sources, or several distinct specialist roles working at once. We follow Anthropic's guidance: start with the simplest thing that works and add agents only when simpler patterns fall short. Plenty of 'company brain' projects are a strong single-agent RAG system, and we'll tell you honestly when that's all you need - it's cheaper and more reliable.

Q02

How do you stop it from hallucinating?

Retrieval grounding plus continuous evaluation. The system answers only from your verified sources, attaches a citation to every claim via the Citations API, and is built to defer to a human rather than guess when there's no good match. We score groundedness, faithfulness, and context-precision continuously with an LLM-as-judge against a golden dataset - so unsupported answers get caught and measured on real traffic, not just hoped away at launch.

Q03

Who owns the knowledge base and the agents?

You do, entirely. The RAG layer, the vector store, the agents, and your data all live on your infrastructure and accounts. Built once, yours forever - it keeps running with or without us, with no Anfloy platform in the middle. Your proprietary knowledge never trains a public model and never leaves your perimeter when you don't want it to.

Q04

What happens when it breaks or a source is wrong?

Because every answer is cited, a wrong answer is traceable to its source - you click through and see exactly which passage it came from, so you fix the doc, not chase a black box. Retrieval and generation are monitored separately, so we can tell whether the system pulled the wrong context or reasoned poorly over the right one. Continuous evals catch quality drift before users do, and the ingestion pipeline re-indexes as your sources change so stale knowledge gets refreshed automatically.

Q05

How long does it take to ship, and does it run on our infra?

A grounded single-source brain can ship in a couple of weeks; a full multi-agent system over many sources takes longer, and we ship it in increments rather than one big launch. It runs entirely on your infrastructure - the vector store on your Supabase or cloud, the agents in your account, your keys - and for sensitive data we self-host the whole pipeline so nothing leaves your perimeter.

Q06

Our docs are messy - does that kill the project?

It's the most common starting point, not a blocker, but it's also where most of the real work is. Good retrieval depends on good chunking and clean, current sources, so part of what we build is the ingestion and freshness layer that keeps your knowledge usable. We'll be straight with you about where the data needs cleanup first - because grounding a model on a mess just produces confident, well-cited wrong answers.

[ 11 ]Keep going up the ladder

BuildFull-Stack AI BuildsShip a real AI product - MVP to production, in your repo.RunInfrastructure & HostingWe don't just build it - we host and operate it on real infra.RunMaintenance & EvolutionSystems that get sharper every month, not stale.

Let's scope your
company intelligence.

Scope this build