Skip to content

Tools Reference

Tool registration moved (2026-06 cleanup)

Each tool's @agent.tool wrapper now lives in app/agent_tools/<domain>.py (the module exposes register(agent); agent.py calls it and re-exports the function). The underlying implementations are still in app/tools/. So "the <x> tool" = agent_tools/<domain>.py (wrapper) → app/tools/<x>.py (logic).

All tools available to the agent. Before each request the intent router (app/tools/router.py) classifies the message and routes it to the appropriate domain agent — an Agent instance pre-built with only the relevant tool subset. The LLM then decides autonomously which tools to call within that subset.

For multi-step messages on Telegram, a planning gate runs before intent routing — see Architecture → Planning Layer and the Multi-step plan flow sequence diagram. A simple message bypasses the gate at zero LLM cost; a complex one is decomposed, previewed for approval, and executed step-by-step with each step routed independently through the same classify_intent + select_agent pair documented below.

Intent Routing

classify_intent(message, context_hint=None) → set[str] — three-stage classifier:

  1. Keyword match (microseconds, zero LLM cost) — checks per-domain keyword lists. Logs keyword_hits to Langfuse.
  2. Semantic fallback with confidence bands — when keywords find nothing, embeds the message and compares against per-domain anchor embeddings:
  3. ≥ 0.65 → single best domain (high confidence)
  4. 0.45 – 0.65 → top-K within 0.10 of best, capped at 3 → composed agent so the LLM picks across candidates
  5. < 0.45 → no confident match Logs semantic_top3: [{domain, score}] per turn for empirical tuning.
  6. Context-hint inheritance — when both miss, an optional context_hint (single non-utility domain from the previous turn) is inherited so short follow-ups like "anything else?" stay in the same domain agent.

The three-band design replaced an earlier binary ≥ 0.60 cutoff that produced misroutes when borderline-scoring queries fell through to a wrong agent (or to the full agent without the right tools), causing the model to claim "I don't have a tool" for capabilities that existed.

select_agent(categories) → Agent:

Detected categories Agent used
{} (conversational) full_agent
{"utility"} utility_agent
{"email", "utility"} email_agent
{"calendar", "utility"} calendar_agent
{"memory", "utility"} memory_agent
{"github", "utility"} github_agent
{"news", "utility"} news_agent
{"slack", "utility"} slack_agent
{"jira", "utility"} jira_agent
{"drive", "utility"} drive_agent
{"meetings", "utility"} meeting_agent
{"diagnostics", "utility"} diagnostics_agent
{"health", "utility"} health_agent
Two or more non-utility domains composed_agent (union of relevant tool sets, cached by frozenset)

The full_agent (defined in app/agent.py) is always the safe fallback.

Follow-up action keywords: the memory domain also matches short follow-up phrases like "delete it", "remove that", "edit it", "mark it done" — ensuring these reliably route to the memory agent even without an explicit subject noun.



Native Tools

Each tool's @agent.tool wrapper lives in its app/agent_tools/<domain>.py module (via that module's register(agent)); app/agent.py calls them and re-exports the functions. Implementations are in app/tools/.

Web & Information

Tool Function Requires
search_web Web search via Tavily API TAVILY_API_KEY
check_weather Current weather + forecast for a location WEATHERAPI_KEY
get_datetime Current date, time, day, timezone info
summarize_url Fetch and extract readable content from a URL (lightweight HTTP, no JS)
Tool Function Requires
find_everything Search all sources simultaneously — notes, tasks, history, Gmail, Outlook — in a single parallel call. Also runs a semantic (meaning-based) search in parallel and appends any new matches not already found by keywords. Returns results grouped by source. Use when the source is unknown ("find anything about X").
semantic_search Search notes, conversation history, and saved articles by meaning rather than keywords. Use for conceptual queries — "anything about my career", "conversations where I mentioned burnout". Requires GOOGLE_API_KEY. Falls back gracefully if unavailable. GOOGLE_API_KEY

Maps

Powered by the Google Maps Platform (Places, Directions, Geocoding APIs). Requires GOOGLE_MAPS_API_KEY. All three tools degrade gracefully with a clear message when the key is absent. Travel modes support natural-language aliases (e.g. cardriving, metrotransit, bikebicycling).

Tool Function
search_places Text-search for places near an optional location. Returns name, address, rating, and open/closed status for up to 10 results.
get_directions Step-by-step directions between two points. Returns duration, distance, route summary, and first 6 steps (HTML stripped).
get_travel_time Concise duration + distance for a single mode — use this when you only need the ETA, not full turn-by-turn directions.

Paris Transit

Real-time traffic status for Île-de-France public transport via the IDFM Prim' SIRI general-message API. Requires IDFM_API_KEY (free registration at prim.iledefrance-mobilites.fr). Gracefully degrades if the key is absent.

Tool Function
check_transit_status Get live disruption status for one or more lines, or all major lines serving a location.

Two modes:

By linelines=["RER A", "Metro 14", "Bus 95", "Transilien J"] - Hardcoded registry for RER (A–E), all 16 Métro lines, and 8 Transilien lines (single API call). - Any bus, tram, or Noctilien line resolved dynamically via filter=line.code=X (2 API calls; prefers RATP when multiple operators share a number, otherwise shows all matching networks).

By locationlocation="Sartrouville" - Searches IDFM places API for the nearest stop area. - Returns full disruption detail for RER/Transilien/Métro lines serving that area. - Lists local bus/tram lines in a summary without fetching individual status.

Output format: Active disruptions (⚠️) are shown separately from upcoming planned works (📅). "✅ Running normally" when no active issues.


Browser Automation

Powered by Playwright (headless Chromium). Use for JavaScript-rendered pages where summarize_url fails. Respects BROWSER_ALLOWED_DOMAINS if set. Gracefully degrades to an error message if Playwright is not installed.

Tool Function
browse_web Navigate to a URL and extract text content. Targets semantic containers (<main>, <article>) and extracts h1/h2/h3 headlines.
browser_submit_form Navigate to a URL, fill form fields by CSS selector, submit, and return the result page text. Useful for search forms and filter UIs.

GitHub

Requires GITHUB_TOKEN (Personal Access Token with repo + notifications scope). All tools degrade gracefully with a clear message if the token is not set.

Tool Function
github_my_prs List your own PRs across all repos (filter by state: open/closed/all)
github_repo_prs List PRs for a specific repo (owner/repo)
github_my_issues List issues assigned to you across all repos
github_repo_issues List issues for a specific repo
github_create_issue Create a new issue with optional body and labels
github_merge_pr Merge a PR (squash) — approval-gated; refuses repos outside the allowlisted owner
github_pr_details Get diff stats, review status, and description for a specific PR
github_notifications List unread GitHub notifications (mentions, PR reviews, CI failures)
github_repo_summary Stars, forks, open PRs, open issues, language for a repo
github_review_pr Run a structured PR review for a single URL or owner/repo#N (manual entry to the same pipeline as the daily PR-review loop). Read-only — never writes back to GitHub
github_pending_pr_reviews List PRs awaiting Lawrence's review across allowlisted orgs (Gmail + GitHub API discovery, deduped)

Slack

Requires SLACK_BOT_TOKEN (Bot User OAuth Token, xoxb-...). All tools degrade gracefully with a clear message if the token is not set. The bot must be invited to any channel it needs to read or post in.

Tool Function
slack_list_channels List all public channels the bot can see
slack_read_channel Read recent messages from a channel (by name or ID)
slack_get_unreads Get unread messages across all channels
slack_post_message Post a message to a channel (always confirm before posting)
slack_search_messages Full-text search across all Slack message history

Jira

Requires JIRA_BASE_URL, JIRA_EMAIL, and JIRA_API_TOKEN (API token from id.atlassian.com/manage-api-tokens). All tools degrade gracefully if any credential is unset. Routed via the jira intent domain — keywords include "jira", "ticket", "sprint", "KHA-", "backlog".

Tool Function
jira_my_issues List open issues assigned to you — use first when asked "what's on my plate in Jira"
jira_search_issues Arbitrary JQL query — use for sprint, project, or status filters
jira_get_issue Full details for one issue (e.g. KHA-123) including recent comments
jira_create_issue Create a new ticket — always confirm project_key, summary, and type before calling
jira_update_issue Update summary/description or transition status ("In Progress", "Done")
jira_list_projects List all accessible projects — use when unsure of the project key

Notes

Full CRUD on the notes table.

Tool What it does
save_note Create a new note with title + content
get_notes List all notes (newest first)
search_notes Case-insensitive search on title or content
edit_note Update title and content by ID (supports 8-char prefix)
delete_note Delete by ID

Tasks (Kwasi native)

Full CRUD on Kwasi's own tasks table. Use these for reminders and Kwasi-managed automations. For tasks you want visible in the Microsoft To Do app, use the todo_* MCP tools instead (see below).

Tool What it does
create_task Create with title and optional due_date (YYYY-MM-DD)
list_tasks List all tasks, optionally filtered by status ("todo"/"done")
search_tasks Case-insensitive search on title
complete_task Mark as done by ID
edit_task Update title and/or due_date by ID
delete_task Delete by ID

Reminders

Tool What it does
set_reminder Create a reminder — remind_at accepts natural language ("in 2 hours", "tomorrow at 9am") or ISO 8601
list_reminders Show pending reminders
cancel_reminder Cancel by ID

Scheduled Tasks

Persistent user-defined cron jobs. Evaluated every 60 seconds by _user_scheduled_tasks_loop. Requires BRIEFING_CHAT_ID to deliver results.

Tool What it does
create_scheduled_task Create a recurring task with a name, cron expression, and prompt to run
list_scheduled_tasks Show all scheduled tasks with their cron schedule and last run time
update_scheduled_task Change the name, prompt, cron expression, or enabled state of a task
delete_scheduled_task Remove a scheduled task permanently

Example: "Every Monday at 9am, summarise my open GitHub issues and tasks"create_scheduled_task with cron 0 9 * * 1.

Alert Rules

Proactive alert rules stored in the alert_rules table. Evaluated every 5 minutes by _alert_loop. Requires BRIEFING_CHAT_ID. Two system defaults are seeded on first run ("Tasks due today" at 11am, "Overdue tasks" daily). Each rule has a cooldown_hours field to prevent repeat notifications.

Supported trigger types: task_due_today (fires at a configured local hour when To Do tasks are due today), task_overdue (fires when tasks are past due by N+ days). Phase 2 triggers (meeting_soon, email_arrived) are specced but deferred.

Alert messages are always informational — the agent informs and may suggest an action, never auto-acts.

Tool What it does
create_alert_rule Create a proactive alert rule with a trigger type and conditions JSON
list_alert_rules List all alert rules with enabled/disabled state and last fired time
update_alert_rule Change the name, conditions, cooldown, or enabled state of a rule
delete_alert_rule Remove an alert rule permanently

News

Personalised news feed powered by Tavily's news search mode. Stories are deduped against a 7-day rolling seen-URL store.

Tool What it does
follow_topic Add a topic to your followed list (stored lowercase)
unfollow_topic Remove a followed topic by name
list_topics Show all currently followed topics
get_news Fetch latest stories for all followed topics (or pass a topic for a one-off query). Filters already-seen URLs automatically.

History

Tool What it does
search_history Case-insensitive search over past user_message + agent_response — returns 5 most recent matches

Semantic fallback: when search_history returns nothing or misses the intent, the agent will automatically try semantic_search(sources=["interactions"]) — e.g. "find where I mentioned feeling burnt out" works even if those exact words weren't used.

Journal

Private journal stored in the journal_entries table. Entries are included in the nightly Reflection Engine context so they inform the long-term narrative profile and intention extraction. A weekly digest fires automatically on JOURNAL_DIGEST_DAY at JOURNAL_DIGEST_TIME (default Sunday 19:00 local).

Tool What it does
save_journal_entry(content, title) Save a journal entry. Gated by the inline approval flow on Telegram.
get_journal_digest(days=7) Return all journal entries from the last N days, newest first.

Intentions

Tracks soft personal commitments extracted from conversation by the nightly Reflection Engine. Follow-ups are sent proactively via Telegram (and WhatsApp if BRIEFING_WHATSAPP_NUMBER is set). Available on every domain agent via _UTILITY_FNS.

Tool What it does
list_intentions List tracked intentions, grouped by status (pending / snoozed). Shows age since mentioned.
resolve_intention Mark an intention as done by ID (supports 8-char prefix).
dismiss_intention Dismiss an intention — not going to act on it.
snooze_intention Snooze a follow-up for N days (1–30).

Lifecycle: pending → snoozed → resolved / dismissed. Resolved and dismissed intentions older than 90 days are excluded from list_intentions.

Daily cap: At most 2 follow-up messages per day to avoid being annoying.

Permanent User Facts

Stored in the user_facts table. Injected into every system prompt so Kwasi always knows them — no retrieval step required. Available on every domain agent (not just memory_agent).

Tool What it does
remember_fact(key, value, category) Upsert a permanent fact by key. Overwrites silently if the key exists. Called proactively by the agent when you share personal information, or explicitly when asked.
recall_facts(query="") List all facts grouped by category (empty query), or search keys and values.
forget_fact(key) Delete a fact by key — use when information has changed (moved house, changed job, etc.).

Key naming: snake_case, descriptive. Examples: home_address, workplace, partner_name, preferred_transport, dietary_restrictions, manager_name, birthday.

Categories: location, personal, preference, work, health, general.

Health Data (Spec 010)

Read-only access to wearable / Samsung Watch data ingested by the sideloaded bridge-android/ app. Data lives in the health_samples table (see Storage). The agent never writes — writes happen via POST /health/ingest from the bridge.

Routed via the health intent domain. Keywords include "how did i sleep", "my hrv", "resting heart rate", "step count", "my recovery", "my watch". Available on health_agent only (not on every domain agent — keeps health context out of unrelated prompts).

Tool What it does
get_recent_health(metric_type, days=7) Plain-text dump of recent samples for one metric type (e.g. "heart_rate", "steps"). Compact for LLM context — last 30 lines if more.
get_sleep_summary(days=7) Summary of recent sleep sessions: bedtime, wake time, duration, score (when available). Includes average duration + score over the window.
get_hrv_trend(days=30) HRV (RMSSD) trend with a rolling baseline and most-recent reading; reports the delta vs baseline.
get_health_snapshot() One-shot snapshot for briefings / "how am I doing today?" — last night's sleep + HRV vs baseline + resting HR + 24 h step count. Used by briefing_agent once Phase 3 ships.

Metric types read from Health Connect by the bridge: steps, sleep_session, heart_rate, hrv_rmssd, spo2, resting_hr, respiratory_rate, exercise, body_fat, weight, blood_pressure.

Phase 3 (planned, not yet shipped): wire get_health_snapshot into briefing_agent, inject a 7-day health block into the nightly reflection prompt, and extend the alert engine with a health_metric trigger_type so users can create rules like "alert me when 24h-mean HRV drops below 35" via the existing create_alert_rule tool.

Source Introspection

Read-only access to the deployed source tree. Lets agents verify what code is actually running before declining a request — e.g. grepping for a tool name to check whether a capability exists. REPO_ROOT auto-detects via pyproject.toml walk-up so paths resolve identically in local dev, CLI, and Docker. In production the tests/, specs/, docs/, .env* paths are excluded by .dockerignore. Registered in _UTILITY_FNS — available on every domain agent.

Tool What it does
read_source_file(path, offset=0, limit=500) Read up to limit lines of a file under repo root. Refuses paths that escape (.., absolute paths).
grep_source(pattern, path_glob="app/**/*.py", max_results=50) Python regex applied per line. Returns path:line:content matches. Use to verify a tool/function exists before saying it doesn't.

Capability Gaps

Post-turn detection of declined requests, classified by whether the underlying capability exists in the deployed source. Pipeline runs fire-and-forget after every Telegram turn: regex prefilter → mini-model decline classifier → grep_source verification → row in capability_gaps. See Architecture → Self-Improvement Loop for the full flow.

Available on the diagnostics_agent.

Tool What it does
get_capability_gap_digest(days=7, status="open", classification=None, limit=30) List recent capability gaps grouped by classification (gap = real hole / candidate for a new skill; available = misroute or hallucination — capability existed but was declined; unknown = inconclusive). Filters by status/classification/days.

Skill Proposer

Drafts new skills from a description (or a logged capability gap), validates them statically, persists drafts in DB until approved, then on activation writes to app/skills/ and hot-reloads onto every skill-bearing agent. Activation is approval-gated on Telegram. See Architecture → Self-Improvement Loop.

Available on the diagnostics_agent.

Tool What it does
propose_skill(description, gap_id?=None, name?=None) Drafts a complete .py file via the primary model, runs static validation (compile + AST forbidden-pattern check + structure check), persists as a ProposedSkill row. Code never touches disk at this stage.
list_proposed_skills(status="draft", limit=20) List drafts. status: draft \| activated \| rejected \| None for all.
view_proposed_skill(skill_id) Show full code + validation report so the user can review before activation.
activate_proposed_skill(skill_id) Approval-gated. On Confirm: re-validates, writes app/skills/{name}.py (refusing to overwrite), hot-reloads via hot_reload_new_skill. Returns an "ephemeral until committed to git" warning — the file lives on the running container's disk only and disappears on next Railway redeploy. PyGitHub PR persistence is queued as PR 3.
reject_proposed_skill(skill_id) Mark a draft rejected without activating.

Validator deny-list (AST-level): imports of subprocess, shutil, ctypes, marshal, pickle, socket, smtplib, ftplib, pty; calls of os.system, os.popen, os.exec*, os.fork, os.spawn; the eval() and exec() builtins. Plus structure checks: exactly one @skill-decorated async function, ctx as first parameter, non-empty docstring, name matches expected slug.

Raw Database Ops

Two meta tools for inspecting and (carefully) mutating Kwasi's own state. Registered in _UTILITY_FNS so they ship on every domain agent — the database routing category exists for focused-attention mode, not exclusive access. Schema grounding is live: describe_schema() (on StoragePort + both adapters) introspects every table/column from information_schema / sqlite_master and is injected into the system prompt as a "## Live database schema" section, so the model uses real names that can never drift from the DB (it replaced a hand-maintained 19-table cheat-sheet). Cached for process lifetime; fails safe to an empty section.

Tool What it does
db_query(sql) Read-only. Tool-level prefix check + Postgres transaction(readonly=True) so CTE-write tricks (WITH x AS (DELETE …) SELECT …) are rejected at the DB.
db_execute(sql, reason) Approval-gated. Requires a reason arg. Runs raw_sql_dryrun() first (transactional execute + rollback) so the approval card shows "Will affect N rows" before Confirm. Blocks DDL (DROP/TRUNCATE/ALTER/CREATE) and refuses any SQL touching audit_log / pending_actions (the tables that underpin the approval mechanism itself).

Self-Management

Kwasi can change its own configuration and manage its own Railway deployment. All tools are in _UTILITY_FNS (universal on every domain agent), because "switch your model" / "redeploy" often arrives from a non-config-framed request. The set of changeable settings is a single source of truth — CONFIG_REGISTRY in app/tools/config_registry.py — from which both the runtime-override allowlist and the Railway env allowlist are derived. Secret-shaped variable names (*_TOKEN/_KEY/_SECRET, DATABASE_URL, …) are blocked by an independent deny-substring gate, so the agent can never set its own credentials.

Tool What it does Gate
set_runtime_config(key, value) Change a setting instantly, no restart (e.g. model_name, a *_time schedule, a numeric limit). Mutates settings in memory and persists to the context KV (system:config:<field>) so it survives a redeploy. Validates the value first. approval-gated (config)
get_runtime_config(key) Show the current value of one setting + whether it's a runtime override. read-only
list_runtime_config() List all runtime-configurable settings, current values, and which are overridden. read-only
clear_runtime_config(key) Revert an override to its boot value (from boot_settings_snapshot). approval-gated (config)
railway_set_env(name, value, redeploy=True) Set an allowlisted Railway env var and (by default) redeploy — bakes the change into the deployment. Validates the value via the field's registry validator before writing, so a bad MODEL_NAME can't crash-loop the redeploy. approval-gated (deployment)
railway_redeploy() Trigger a redeploy/restart with no config change. approval-gated (deployment)
railway_deployment_status() Latest Railway deployment id + status (SUCCESS/BUILDING/FAILED/CRASHED) + created time. read-only
railway_deploy_logs(limit=50) Build logs of the latest deployment — for diagnosing a failed self-redeploy. read-only
diagnose_self() One-shot health snapshot: effective config (with override marks), recent Logfire exceptions, background-loop heartbeats, Railway deploy status, and any unconfirmed self-redeploy. read-only

Two control planes, when to use which: prefer set_runtime_config for "change my model / a schedule right now" (instant, no downtime). Use railway_set_env only to bake a change permanently into the deployment, or for a setting read only at boot (e.g. WEAVE_ENABLED). After a railway_set_env(redeploy=True) or railway_redeploy, Kwasi restarts and then proactively posts a "✅ back up" (or ⚠️) message once it returns — see Architecture → Self-Management Subsystem. Requires the four RAILWAY_* env vars (an account/team token with deploy perms).

Example questions that route here: "switch to gemini-2.5-flash", "move the morning briefing to 07:30", "redeploy yourself", "what's your deploy status?", "how are you doing / any problems?"

Sandboxed Code Execution

Tool What it does Requires
execute_python(code) Runs Kwasi-authored Python in an ephemeral E2B sandbox VM. Approval-gated on Telegram — the user sees the code before it runs. Matplotlib chart artifacts are embedded as [CHART_PNG:<b64>] markers; bot.py strips the marker and sends them as Telegram photos. E2B_API_KEY

Task Delegation

Tool What it does Requires
delegate_to_coding_agent(task, backend?, max_minutes?) Spawns a coding agent inside an E2B sandbox for complex tasks no existing tool covers. backend is "opencode" (default; Gemini, cheap, plain-text stdout) or "claude_code" (Anthropic, more capable; stream-json telemetry, mid-run cost-cap). Approval-gated; runs in the background; result + cost/tokens delivered as a follow-up Telegram message. Daily rate-limit pre-check, wall-clock cap, soft cost cap (Claude Code path). Each delegation persists a Note tagged #delegation for searchable history. E2B_API_KEY; backend="claude_code" requires ANTHROPIC_API_KEY (metered — subscription OAuth is not used, for compliance); GEMINI_API_KEY for OpenCode
delegate_web_task(task, credential?, max_minutes?) Third delegation backend — DOM-driven browser automation (browser-use + Gemini in an E2B sandbox) for JS/SPA scraping + multi-step form-fills. A dedicated tool (not a backend= value) so the model routes browser intents correctly. credential="<name>" references an enrolled credential-vault login by name only (the LLM never sees the secret; decrypted at run time into domain-scoped sensitive_data). Same approval gate / caps / run_delegation pipeline. E2B_API_KEY; WEB_TASK_TEMPLATE (prebuilt browser-use E2B template) + WEB_TASK_MODEL; VAULT_MASTER_KEY for credential= logins

Verse of the Day

Tool What it does Requires
get_verse_of_the_day Returns today's verse reference + verse text from YouVersion. Surfaces in step 9 of the morning briefing prompt, and works ad-hoc (e.g. "today's verse"). Returns "not configured" when the keys are unset, so the feature degrades silently. Registered in _UTILITY_FNS — every domain agent + the briefing agent has it. YOUVERSION_APP_KEY, YOUVERSION_BIBLE_ID

Behavioral Learnings

Corrections the agent has learned from past feedback. When you tell the agent "don't do that", "stop doing X", or "you shouldn't Y", the nightly Reflection Engine extracts these as AgentLearning records. Each learning starts as a "candidate" and is auto-promoted to "active" after appearing in two separate reflection cycles. Active learnings are injected into every system prompt as a "Behavioral Guidelines" section — the agent follows them without being reminded.

Available on the memory_agent and full_agent.

Tool What it does
list_learnings Show all active and candidate learnings with IDs and recurrence count
dismiss_learning(learning_id) Delete a learning by ID (8-char prefix). Use when a rule is no longer relevant.

Langfuse-managed Prompts (Spec 009)

Nine system prompts are managed in Langfuse with code-constant fallback so they can be tuned and version-pinned without redeploying. They are not agent tools — the agent never sees these names — but they shape every interactive and scheduled response.

Prompt name Where it's used
persona Voice, banned openers, cultural context — injected into every system prompt
tone_calibration Situational tone matching guidelines
morning_briefing Briefing template for the daily morning recap loop
evening_recap Evening recap template
weekly_recap Friday weekly lookback template
weekly_prep Sunday week-ahead prep template
journal_digest Sunday journal digest template
email_intel Daily proactive email triage template
reflection Nightly Reflection Engine prompt

app/prompts.get_prompt(name, fallback) returns the production-labeled Langfuse version when reachable; otherwise the code constant for the name (defined in app/agent.py and app/memory/reflection.py). First fallback hit per name logs WARN, subsequent hits log DEBUG so logs don't spam during a Langfuse outage.

Drift detection. prompts.lock.json (repo root) pins each constant's sha256 + last-known Langfuse version. check_drift() runs at startup and warns per drifted prompt. Use scripts/sync_prompts.py to keep code and Langfuse aligned:

uv run python scripts/sync_prompts.py --check    # CI gate; exits 1 on drift
uv run python scripts/sync_prompts.py --push     # code → Langfuse, bumps lock
uv run python scripts/sync_prompts.py --pull     # Langfuse → code (rewrites _NAME constant), bumps lock
uv run python scripts/sync_prompts.py --pull --dry-run

The sync script is the only path that mutates Langfuse or the lock file — never edit prompts.lock.json by hand.


Skills (File-Drop Plugins)

Skills live in app/skills/. Any .py file dropped there and decorated with @skill is automatically registered as an agent tool on startup — no changes to core code required.

Skill Tool(s) What it does
read_later.py save_to_read_later, list_read_later, delete_read_later, get_read_later_digest Save articles for later with auto-summarisation and weekly digest.
travel_briefing.py get_travel_briefing On-demand travel summary for a destination: weather, maps/transit data, and an LLM-synthesised briefing paragraph.
cv.py store_cv, get_cv Parse and store a CV as structured user facts (cv_skills, cv_experience, cv_education, cv_achievements, cv_summary). Used for job application workflows.
research.py deep_research Multi-step web research: generates sub-questions, searches and fetches sources in parallel, synthesises a structured brief, and saves it as a note titled "Research: <topic>".
meeting_notes.py get_meeting_insights, list_recent_meetings Extract structured insights from meeting transcripts via the mini model; list recent meetings without reading each file.

To add a new skill: create app/skills/my_skill.py, import from app.skills import skill, decorate your async function with @skill.

Deep Research

Multi-step research workflow powered by Tavily search and Playwright/httpx fetching. Requires TAVILY_API_KEY.

Pipeline (fixed, not iterative): 1. LLM generates depth focused sub-questions from the topic (default 3, max 5) 2. Each sub-question is searched in parallel via Tavily 3. Unique source URLs are collected (up to 6), fetched in parallel for full-page content 4. Falls back to Tavily snippets if URL fetches fail 5. Single LLM synthesis call produces a structured brief (Overview / Key findings / Nuances & caveats / Sources) 6. Result is saved directly to the notes table as "Research: <topic>" — no approval gate (user explicitly requested it)

Saving convention: All research briefs are titled "Research: <topic>". Use search_notes("Research:") to list all past briefs.

Tool What it does
deep_research(topic, depth=3) Research a topic in depth and save the result as a note. depth controls how many sub-questions are explored (1–5).

Example user flows: - "Research the EU AI Act" → brief delivered in Telegram, note saved as "Research: the EU AI Act" - "Deep dive on remote work trends with depth 5" → 5 sub-questions, up to 6 sources - "Find my research on OpenAI"search_notes("Research: OpenAI") surfaces the saved brief


Meeting Intelligence

Extracts structured insights from meeting transcripts stored in Google Drive. Routed via the meetings intent domain — keywords include "recap the meeting", "notes from the", "what came out of", "standup notes", "last meeting with".

The skill uses a source registry pattern (_SOURCE_REGISTRY / _LIST_SOURCE_REGISTRY) that maps source names to async fetcher functions — so Teams, Notion, or other note sources can be wired in later without changing the tool interface.

Tool What it does
get_meeting_insights(query, source="drive") Find a meeting transcript by query, extract decisions/actions/key points via the mini model, and save the result as a note titled "Meeting: <title>".
list_recent_meetings(days=14, source="drive") List recent meeting transcripts without reading each file — returns title, date, and a brief description.

Output structure from get_meeting_insights: - Key decisions — explicit choices made or confirmed - Action items — specific tasks with owner and deadline if mentioned - Key discussion points — themes and topics discussed - Next steps — follow-ups mentioned - Insight note saved as "Meeting: <transcript title>" for later retrieval via search_notes


Content Curator

Saves URLs to a read_later table with an auto-generated summary fetched at save time. Weekly digest fires every Saturday at 09:00 local via _read_later_digest_loop.

Each saved item has a tags field — a comma-separated list of up to 10 frequency-ranked keywords extracted from the title and summary at save time by _extract_tags() (no extra LLM call, stop-word filtered). Tags are used by find_relevant_read_later() in app/utils/message_utils.py to surface up to 3 matching articles as context at the top of the agent input when a user message overlaps with saved article tags. The original message text is used for intent classification and interaction logging — only the injected context block is enriched.

get_read_later_digest curates output: annotated items (those with a personal note) surface first, then by recency, capped at 10.

Tool What it does
save_to_read_later Fetch, summarise, and save a URL. Extracts tags automatically. Optional personal note.
list_read_later List all saved articles with title, date, and URL.
delete_read_later Remove an article by ID (8-char prefix).
get_read_later_digest Compile saved articles into a digest. Annotated items first, then by recency, capped at 10. Called automatically by the weekly loop; also available on demand.

Finding a specific article: there is no keyword search for saved articles. When asked for an article by topic or description (e.g. "find that article I saved about burnout"), the agent uses semantic_search(sources=["read_later"]) directly. list_read_later is only for showing the full list.


MCP Tools

Loaded via get_mcp_tools() in app/interfaces/mcp/client.py. These wrap synchronous Google/Microsoft APIs with asyncio.to_thread().

Gmail (Personal)

Requires GMAIL_REFRESH_TOKEN. All tools return snippet only unless noted.

Tool What it does
search_emails_wrapper Search inbox with Gmail query syntax (supports from:, subject:, has:attachment, etc.) — returns snippet only
read_thread_wrapper Read a full email thread by thread ID
gmail_read_email_wrapper Read the full body of a specific email by message ID. Use after search_emails_wrapper to fetch the complete text (up to 20,000 chars). Decodes multipart MIME, prefers text/plain, strips HTML if only text/html is available.
draft_email_wrapper Create a draft email (to, subject, body)
send_email_wrapper Send an email immediately
get_unread_count_wrapper Total unread count across the inbox
get_unread_summary_wrapper Preview of the most recent unread messages — returns snippet only
get_todays_emails_wrapper All emails received today — do not construct a manual date query

Gmail (Work)

Requires GMAIL_WORK_REFRESH_TOKEN. Identical tool set to personal Gmail but operates on the work account. The agent checks both accounts when email is requested and the user doesn't specify which.

Tool What it does
search_work_emails_wrapper Search work Gmail inbox
gmail_read_work_email_wrapper Read full body of a work email by message ID (up to 20,000 chars)
draft_work_email_wrapper Create a draft in the work account
send_work_email_wrapper Send from the work account
get_work_unread_count_wrapper Unread count for the work inbox

Outlook Email

Tool What it does
outlook_search_wrapper Search Outlook inbox by query string
outlook_read_wrapper Read a specific email by ID
outlook_draft_wrapper Create a draft email
outlook_send_wrapper Send an email immediately
outlook_unread_count_wrapper Total unread count
outlook_unread_wrapper Preview of recent unread messages
outlook_todays_emails_wrapper All emails received today

Google Calendar

Both read tools query all accessible calendars — primary, shared, and subscribed — not just the primary calendar. Each event is prefixed with [Calendar Name] in the output so you can see the source.

Tool What it does
google_get_todays_schedule_wrapper Today's events across all accessible Google Calendars
google_get_calendar_events_wrapper Events in a date range across all accessible calendars (ISO 8601 datetimes)
google_create_calendar_event_wrapper Create an event (writes to primary calendar)
google_update_calendar_event_wrapper Update an existing event by ID
google_delete_calendar_event_wrapper Delete an event by ID (confirm before calling)

Outlook Calendar

Tool What it does
outlook_get_todays_schedule_wrapper Today's events
outlook_get_calendar_events_wrapper Events in a date range
outlook_create_calendar_event_wrapper Create an event
outlook_update_calendar_event_wrapper Update an existing event by ID
outlook_delete_calendar_event_wrapper Delete an event by ID (confirm before calling)

Microsoft To Do (todo_*)

Requires OUTLOOK_REFRESH_TOKEN with Tasks.ReadWrite scope. Tasks created here sync to the Microsoft To Do app on your phone/desktop. Delete and complete tools resolve abbreviated IDs automatically and fall back to searching all lists if the specified list name is wrong.

Tool What it does
todo_list_lists_wrapper List all your To Do lists
todo_list_tasks_wrapper List tasks in a named list (default: "Tasks"). Pass include_completed=true to see done items. Renders each task's notes/body and any checklist items (sub-tasks), not just the title.
todo_create_task_wrapper Create a task in a named list with optional due_date (YYYY-MM-DD) and notes
todo_complete_task_wrapper Mark a task as completed by ID
todo_delete_task_wrapper Delete a task permanently by ID

Google Drive

Requires GOOGLE_DRIVE_REFRESH_TOKEN (personal Drive) and/or GOOGLE_DRIVE_WORK_REFRESH_TOKEN (work/Khaya Drive). Routed via the drive intent domain. Pass work=True for any work account file. Gracefully degrades if neither token is set.

Tool What it does
search_drive_files_wrapper Full-text search across Drive — use for "find my doc on X"
read_drive_file_wrapper Read the content of a file by ID (supports Google Docs, Sheets, Slides, and plain text)
list_recent_drive_files_wrapper Most recently modified files — use when the user says "my recent files"
search_meeting_transcripts_wrapper Search for Google Meet transcript/recording files by name and content
get_drive_file_metadata_wrapper File name, type, owner, and sharing link — use before reading large files

Logfire Self-Diagnosis

Requires LOGFIRE_READ_TOKEN (read-only Logfire token). Calls the external Logfire MCP HTTP server at https://logfire-eu.pydantic.dev/mcp. Routed via the diagnostics intent category. Use these to ask Kwasi about its own behaviour and performance.

Tool What it does
logfire_schema_wrapper Get the schema of available columns in Logfire trace data. Call this first before writing SQL.
logfire_query_wrapper Run arbitrary SQL against Logfire trace data (spans, tool calls, loop events). E.g. slowest tools, message counts, error rates.
logfire_exceptions_wrapper Find recent exceptions from a given file path prefix. Defaults to "app/".

Example questions that route here: "what errors did you have this week?", "which of your tools is slowest?", "did the briefing run today?"

Langfuse Self-Diagnosis

Native tools (not MCP) registered in _UTILITY_FNS so every domain agent can inspect Kwasi's own LLM behaviour. Require LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY; fail-safe with a "not configured" string when absent.

Tool What it does
langfuse_recent_traces_wrapper(name=None, since_minutes=60, limit=20) List recent LLM traces by name (e.g. telegram.turn, research, system.briefing) with timestamp, latency, cost, tags.
langfuse_trace_details_wrapper(trace_id) Full contents of one trace — prompts, completions, token usage, scores, and the observation tree (tool calls nested under generations). Pair with langfuse_recent_traces_wrapper to drill in.

Example questions that route here: "why did the last call take so long?", "inspect that trace", "how much did today's briefing cost?"


Tool Decision Logic

The agent (LLM) decides autonomously. The system prompt guides some decisions:

  • Email not specified → check both Gmail and Outlook, present unified view
  • Calendar not specified → check both Google and Outlook
  • Link shared → call summarize_url without being asked; use browse_web if that fails
  • Web search → last resort, not first choice; prefer specific tools
  • Email action → default to draft unless user says "send"
  • Unit conversion → answer directly from knowledge; no dedicated skill
  • GitHub request → use GitHub tools if GITHUB_TOKEN is configured; no search fallback
  • Maps / directions / places → use get_directions / search_places / get_travel_time if GOOGLE_MAPS_API_KEY is set; fall back to search_web otherwise
  • Transit / RER / Métro / bus status → use check_transit_status if IDFM_API_KEY is set; use location= for area-wide overview, lines= for specific lines
  • User asks about themselves (address, workplace, partner name, etc.) → check permanent facts in system prompt first; call recall_facts if not immediately visible; call search_history as last resort; never say "I don't know" without checking all sources
  • User shares personal information → call remember_fact proactively without asking permission; confirm briefly ("Got it, I'll remember that")
  • User says something is done / completed → check list_intentions for a matching pending intention and call resolve_intention automatically
  • User mentions a personal commitment ("I should...", "I need to...", "I want to...") → the Reflection Engine will extract it nightly; no need to act immediately unless the user asks
  • Searching for a specific note/task by description → try search_notes / search_tasks first; if nothing found, fall back to semantic_search before saying "not found"
  • Searching for a saved article by topic → use semantic_search(sources=["read_later"]) directly — there is no keyword search for saved articles
  • Finding a past conversation by theme → try search_history first; fall back to semantic_search(sources=["interactions"]) if keyword search misses it

Shared Utilities (app/utils/message_utils.py)

Function What it does
build_message_history(interactions) Converts stored Interaction rows into Pydantic AI ModelRequest/ModelResponse pairs for use as message_history in agent runs
fetch_message_history(*, storage, user_id, message, settings) Orchestrator called by every interface to load the right interactions for a turn. Default: chronological last 10. With ENABLE_SEMANTIC_HISTORY=true: last 3 verbatim + top 3 semantic from older history (recency-boosted). Decorated with @observe(name="message_history_retrieval") so the retrieval step shows up under each turn in Langfuse. Falls back to chronological on any failure.
find_relevant_interactions(*, storage, message, api_key, excluded_ids, threshold=0.6, max_results=3) Semantic search over past interactions, hydrated to full Interaction rows via get_interactions_by_ids. Used internally by fetch_message_history.
extract_tool_calls(messages) Extracts tool call + result pairs from agent output messages for audit logging
find_relevant_read_later(items, message, max_results=3) Finds saved read-later articles whose tags overlap the current message; up to 3 matches are prepended as context to the agent input
find_relevant_summaries(storage, message, api_key) Embeds the current message and semantic-searches "Summary:" notes; returns top 2 matches with similarity ≥ 0.6 for context injection
find_relevant_notes(storage, message, api_key) Embeds the current message and semantic-searches regular notes (excludes "Summary: " and "Research: " prefixes); returns top 2 matches with similarity ≥ 0.6. Injected as [You have related notes on this topic: ...] before every Telegram and WhatsApp agent run.
find_relevant_research(storage, message, api_key) Embeds the current message and semantic-searches "Research: " notes; returns top 2 matches with similarity ≥ 0.75 (kept higher because deep-research surfacing should be conservative). Used by the deep_research skill to surface prior research before starting — the synthesis prompt instructs the agent to build on existing work rather than repeat it.
strip_markdown(text) Removes Markdown formatting (bold, italic, strikethrough, inline code, headings) from text for plain-text delivery. Used before TTS synthesis in both Telegram voice paths and by the WhatsApp client before sending messages.

Multimodal Inputs (not agent tools)

These run before the agent, transforming media into text that becomes the user message.

Input type Handler Processing
Voice message handle_voice_message transcribe_audio() via Gemini STT; reply is sent as a voice note via TTS if ≤500 words
Photo handle_photo_message analyze_image() via Gemini Vision
PDF handle_document_message analyze_image() via Gemini Vision
Text file handle_document_message UTF-8 decode, truncate at 50k chars