Kwasi — Personal Assistant¶
Kwasi is a personal AI assistant available via Telegram (primary), WhatsApp, and CLI. It handles scheduling, email, notes, tasks, reminders, web search, weather, GitHub, browser automation, and more — all through natural conversation.
This documentation covers how the system actually works: its architecture, data flows, memory model, tools, and deployment setup.
At a Glance¶
| Concern | Choice |
|---|---|
| Framework | Pydantic AI — agent loop, tool registration, streaming |
| LLM | Three model roles: MODEL_NAME (primary), MINI_MODEL_NAME (skill synthesis), REFLECTION_MODEL_NAME (nightly reflection) — supports Gemini, Claude, GPT-4 |
| Primary interface | Telegram (long-polling via python-telegram-bot) |
| Secondary interface | WhatsApp webhook |
| Web framework | FastAPI — health check, reflection endpoint, WhatsApp webhook, dashboard |
| Storage | SQLite (local dev) / PostgreSQL (Railway production) |
| Email & calendar | Gmail (personal + work) + Outlook via MCP tool wrappers |
| Multimodal | Gemini Vision/STT — images, PDFs, voice messages |
| Voice output | Microsoft Neural TTS via edge-tts — voice-in → voice-out |
| Browser | Playwright (headless Chromium) — JS-rendered pages + form submission |
| GitHub | PyGitHub — PRs, issues, notifications, repo summaries |
| Jira | Jira Cloud REST API — issues, projects, JQL search, create/update tickets |
| Google Drive | Drive API — search, read, list recent, meeting transcripts (personal + work) |
| Slack | Slack SDK — read channels, unreads, post messages, search history |
| Intent routing | Keyword classifier → 12-domain agent dispatch (email, calendar, memory, github, news, slack, jira, drive, meetings, diagnostics, health, utility); context-aware follow-up inheritance; semantic fallback |
| Plugin system | File-drop skill registry — drop a .py into app/skills/ — built-ins: Content Curator, Travel Briefing, CV, Deep Research, Meeting Intelligence |
| Personality | Explicit directives in system prompt: direct voice, Ghanaian cultural context, banned apology openers, retry-first error handling, situational tone calibration |
| Semantic search | Gemini gemini-embedding-001 (3072 dims) — vector embeddings stored in SQLite (JSON) / PostgreSQL (pgvector) for meaning-based search across notes, history, and saved articles |
| Wearable / health data (Spec 010) | Sideloaded Kotlin Android bridge (bridge-android/) reads Google Health Connect every ~15 min and POSTs to POST /health/ingest. Stored as a single normalised health_samples table; queried by a dedicated health_agent |
| Deployment | Railway (Dockerfile) |
Key Design Principles¶
Single AgentDeps instance. All interfaces (Telegram, WhatsApp, CLI) use the same build_deps() factory and share one AgentDeps object per process. Each request gets a shallow per-request copy (via dataclasses.replace) with the correct telegram_chat_id and active_categories — the shared singleton is never mutated.
Fail-safe everything. Storage errors never surface to the user. Tool failures are reported gracefully. The system prompt context fetch is wrapped in try/except so a broken DB never kills the agent.
Storage is pluggable. A StoragePort protocol defines the interface. SqliteAdapter and PostgresAdapter are drop-in implementations. Switching is a single env var change (STORAGE_BACKEND=postgres). Both adapters also implement vector similarity search — Python cosine in SQLite, pgvector HNSW in Postgres.
Three-tier memory pipeline. Facts stated in conversation are extracted within seconds (post-message mini-model pass). Sessions are summarised ~30 minutes after going quiet (session-close mini-model pass). A nightly Reflection Engine distils the full history into a narrative profile, intentions, and behavioral learnings. All tiers feed the same user_facts and notes tables — the agent always has the freshest possible context. Before each agent run, up to three semantic retrieval layers (notes, summaries, saved articles) inject relevant context into the user turn, sharing a 1,000-token budget with XML tagging so the model can distinguish retrieved memory from instructions.
Background loops run inside FastAPI lifespan. The reminder loop, daily briefing, scheduled tasks loop, alert loop, and nightly reflection are asyncio tasks started on startup and cancelled on shutdown — no separate workers or queues.
Extensible by file-drop. Adding a new skill requires only creating a file in app/skills/ — no changes to core agent code.
Intent-routed for efficiency. Every message (Telegram and WhatsApp) is classified by a keyword matcher in microseconds. The message is then handled by a focused domain agent carrying only its relevant tools — reducing token usage per call and improving model accuracy. The full agent is always the fallback for ambiguous or cross-domain requests.
Quick Navigation¶
- Curriculum — Pedagogic walkthrough of the whole codebase in 8 sessions (start here if you want to understand the system)
- Architecture — Component diagram and how the pieces fit together
- Message Lifecycle — Sequence diagrams for text, voice, image, and document messages
- Memory & Reflection — How short-term history and long-term context work
- Tools Reference — Every tool available to the agent (native, skills, MCP, GitHub, browser)
- Background Loops — Reminders, briefing, scheduled tasks, nightly reflection
- Storage Layer — Schema, adapters, and the storage protocol
- Deployment — Railway setup and full environment variable reference