case/projects/ledio
05/08
Ledio·Founder · Fullstack·In production (beta)In production with real customers

Ledio

The architecture carries the intelligence, not the model: a Brain, a lean RAG and a post-generation hallucination checker let one cheap LLM serve many businesses without inventing prices.

~49/day
conversations handled in production
multi-tenant
one agent serves many businesses
dogfooding
runs Batameta's own support
Founder · solo engineering
team

Founder · solo build (API + Web) · 5 months. In production handling ~49 conversations a day, today running Batameta's own customer support (dogfooding). Small businesses lose WhatsApp sales because nobody answers fast enough, especially at night - and a generic bot makes it worse with wrong answers and invented prices. Ledio is a SaaS where any business spins up an AI agent that handles WhatsApp 24/7, trained by conversation, no code. The bet: the architecture carries the intelligence, so a deliberately cheap model produces answers good enough that the owner trusts it with real conversations.

01

What it solves

Small businesses lose sales on WhatsApp for one reason: nobody answers fast enough, especially at night. A generic chatbot doesn't fix it - it answers wrong, invents prices, and annoys the customer. The bar isn't 'a bot,' it's 'an agent good enough that the owner trusts it with real conversations.'

02

The architecture carries the intelligence

Instead of reaching for the biggest, most expensive LLM, the default tier is a cheap model - the quality comes from how each turn is assembled, not raw model power. A Brain service decides whether to respond at all, runs engagement analysis, and builds a situational prompt (triage, follow-up, sales, support) before a single token is generated. The expensive model is reserved for genuinely heavy lifting.

~49/day
conversations handled in production
03

Lean RAG + anti-hallucination

A document learner ingests PDFs, docs and URLs through a multi-pass pipeline into a typed memory (facts, scripts, objection answers) and injects the top-K relevant memories into each turn - a lean RAG. After generation, a hallucination checker runs rule-based guardrails: any price, URL, percentage or time the agent emits that isn't backed by the business's knowledge base gets flagged and routed to a human, instead of letting the agent make something up.

04

Multi-tenant without context bleed

One trained agent serves many businesses at once. Tenant isolation is enforced through a businessId carried from request decorators down through every repository and service boundary, with business-scoped Socket.IO rooms for the live dashboard. A conversation for business A can never leak into business B's context.

05

The rest of the machine

Audio messages are transcribed with Whisper. A visual flow builder (React Flow) lets non-developers draw deterministic bot flows that the backend executes as a stateful engine, escalating to a human on repeated failure. BullMQ + Redis handle async work (broadcasts, emails, training) with retries and dead-letter handling. WhatsApp connectivity is abstracted behind one interface with two implementations - WAHA (QR sessions) and the official Cloud API - switchable per business. A prompt test suite even uses a second LLM as an 'AI judge' to score prompt-building quality.

decisions & tradeoffs
  • Why a deliberately cheap model by default?
    Orchestration over model sizeWith the right orchestration - a Brain that builds the prompt, a lean RAG, and post-generation guardrails - a small model produces great answers at a fraction of the cost. The expensive model is escalation, not the default.
  • Why a post-generation hallucination checker?
    Rule-based fact guardrails + handoffIn sales, a made-up price or link is worse than no answer. Checking the generated message for any fact not backed by the knowledge base - and handing those off to a human - is what makes the agent safe to trust with real customers.
  • Why businessId-scoped multi-tenancy?
    Isolation at every layerOne agent, many clients, zero context bleed. The businessId is enforced from the request decorator down to every repository and Socket.IO room, so isolation is a property of the architecture, not a query you might forget to filter.
  • Why abstract WhatsApp behind one interface?
    WAHA + Cloud API, swappableWAHA (QR sessions) and the official Cloud API have different trade-offs in cost, reliability and onboarding. One contract with two implementations lets each business pick the right rail without touching the agent core.

In production handling ~49 conversations a day, today running Batameta's own customer support - built solo in 5 months: a multi-tenant agent running on a cost-efficient model, anti-hallucination guardrails that hand unbacked facts to a human, a lean RAG document learner, a React Flow visual flow builder, and a prompt test suite scored by an AI judge.