Kwasi — Personal Assistant¶

Kwasi is a personal AI assistant available via Telegram (primary), WhatsApp, and CLI. It handles scheduling, email, notes, tasks, reminders, web search, weather, GitHub, browser automation, and more — all through natural conversation.

This documentation covers how the system actually works: its architecture, data flows, memory model, tools, and deployment setup.

At a Glance¶

Concern	Choice
Framework	Pydantic AI — agent loop, tool registration, streaming
LLM	Three model roles: `MODEL_NAME` (primary), `MINI_MODEL_NAME` (skill synthesis), `REFLECTION_MODEL_NAME` (nightly reflection) — supports Gemini, Claude, GPT-4
Primary interface	Telegram (long-polling via `python-telegram-bot`)
Secondary interface	WhatsApp webhook
Web framework	FastAPI — health check, reflection endpoint, WhatsApp webhook, dashboard
Storage	SQLite (local dev) / PostgreSQL (Railway production)
Email & calendar	Gmail (personal + work) + Outlook via MCP tool wrappers
Multimodal	Gemini Vision/STT — images, PDFs, voice messages
Voice output	Microsoft Neural TTS via `edge-tts` — voice-in → voice-out
Browser	Playwright (headless Chromium) — JS-rendered pages + form submission
GitHub	PyGitHub — PRs, issues, notifications, repo summaries
Jira	Jira Cloud REST API — issues, projects, JQL search, create/update tickets
Google Drive	Drive API — search, read, list recent, meeting transcripts (personal + work)
Slack	Slack SDK — read channels, unreads, post messages, search history
Intent routing	Keyword classifier → 12-domain agent dispatch (email, calendar, memory, github, news, slack, jira, drive, meetings, diagnostics, health, utility); context-aware follow-up inheritance; semantic fallback
Plugin system	File-drop skill registry — drop a `.py` into `app/skills/` — built-ins: Content Curator, Travel Briefing, CV, Deep Research, Meeting Intelligence
Personality	Explicit directives in system prompt: direct voice, Ghanaian cultural context, banned apology openers, retry-first error handling, situational tone calibration
Semantic search	Gemini `gemini-embedding-001` (3072 dims) — vector embeddings stored in SQLite (JSON) / PostgreSQL (`pgvector`) for meaning-based search across notes, history, and saved articles
Wearable / health data (Spec 010)	Sideloaded Kotlin Android bridge (`bridge-android/`) reads Google Health Connect every ~15 min and POSTs to `POST /health/ingest`. Stored as a single normalised `health_samples` table; queried by a dedicated `health_agent`
Deployment	Railway (Dockerfile)

Key Design Principles¶

Single AgentDeps instance. All interfaces (Telegram, WhatsApp, CLI) use the same build_deps() factory and share one AgentDeps object per process. Each request gets a shallow per-request copy (via dataclasses.replace) with the correct telegram_chat_id and active_categories — the shared singleton is never mutated.

Fail-safe everything. Storage errors never surface to the user. Tool failures are reported gracefully. The system prompt context fetch is wrapped in try/except so a broken DB never kills the agent.

Storage is pluggable. A StoragePort protocol defines the interface. SqliteAdapter and PostgresAdapter are drop-in implementations. Switching is a single env var change (STORAGE_BACKEND=postgres). Both adapters also implement vector similarity search — Python cosine in SQLite, pgvector HNSW in Postgres.

Three-tier memory pipeline. Facts stated in conversation are extracted within seconds (post-message mini-model pass). Sessions are summarised ~30 minutes after going quiet (session-close mini-model pass). A nightly Reflection Engine distils the full history into a narrative profile, intentions, and behavioral learnings. All tiers feed the same user_facts and notes tables — the agent always has the freshest possible context. Before each agent run, up to three semantic retrieval layers (notes, summaries, saved articles) inject relevant context into the user turn, sharing a 1,000-token budget with XML tagging so the model can distinguish retrieved memory from instructions.

Background loops run inside FastAPI lifespan. The reminder loop, daily briefing, scheduled tasks loop, alert loop, and nightly reflection are asyncio tasks started on startup and cancelled on shutdown — no separate workers or queues.

Extensible by file-drop. Adding a new skill requires only creating a file in app/skills/ — no changes to core agent code.

Intent-routed for efficiency. Every message (Telegram and WhatsApp) is classified by a keyword matcher in microseconds. The message is then handled by a focused domain agent carrying only its relevant tools — reducing token usage per call and improving model accuracy. The full agent is always the fallback for ambiguous or cross-domain requests.

Curriculum — Pedagogic walkthrough of the whole codebase in 8 sessions (start here if you want to understand the system)
Architecture — Component diagram and how the pieces fit together
Message Lifecycle — Sequence diagrams for text, voice, image, and document messages
Memory & Reflection — How short-term history and long-term context work
Tools Reference — Every tool available to the agent (native, skills, MCP, GitHub, browser)
Background Loops — Reminders, briefing, scheduled tasks, nightly reflection
Storage Layer — Schema, adapters, and the storage protocol
Deployment — Railway setup and full environment variable reference

Kwasi — Personal Assistant¶

At a Glance¶

Key Design Principles¶

Quick Navigation¶