# claudedo Voice control for [Claude Code](https://claude.com/claude-code) on **WSL2**. `claudedo` listens on your mic, runs **local** speech-to-text, recognizes a wake phrase plus a small command grammar, and injects the matching keystrokes into your active Claude Code tmux session via `tmux send-keys`. You answer Claude Code's prompts ("yes", "option one", "approve") and dictate prompts **by voice** — including hands-free while another window (a game) is focused. It exists because Claude Code's native `/voice` is hardcoded-blocked in WSL (it assumes WSL has no audio). Modern WSL2 + WSLg *does* have working mic input via PulseAudio/RDP. `claudedo` captures the mic itself, transcribes on-device, and drives Claude Code over tmux — fully local and private. You run it in a terminal you watch. ## How it works ``` mic (WSLg/PulseAudio RDPSource) -> sounddevice capture -> faster-whisper (local STT, on-device) -> wake gate: utterance must start with a wake phrase, else DISCARD locally -> grammar match (yes/no/one..four/approve/deny/send/type/space/backspace/erase/ mode/set/target/unset/list/context/reload/system/cancel) -> resolve target session (one-shot > sticky ~/.claude-active > auto/none) -> tmux send-keys -t "" -> log the action to the watched terminal ([session]/[SYSTEM]/[VOICE], colored) ``` **Privacy by construction.** STT runs on-device. In listen mode, any speech that doesn't start with a wake phrase is dropped the instant it's transcribed — never stored, never sent anywhere. That's what makes always-listening acceptable while you're on voice comms in a game. **Injection is PTY-only.** `claudedo` only ever calls `tmux send-keys`. It never uses OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are text into a Linux pseudo-terminal — they work regardless of which window is focused and never touch Windows input or a game/anticheat's view. ## Install ```bash git clone claudedo && cd claudedo ./install.sh ``` `install.sh` is idempotent. It installs the WSL audio deps, writes the `~/.asoundrc` Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper model, and installs the **cc kit** (`~/.config/claudedo/cc.sh`, sourced from every `~/.zshrc`/`~/.bashrc` you have). It also checks the two Windows-side bits it can't automate and tells you to fix them: - **WSLg present** (`/mnt/wslg/PulseServer`). If missing: `wsl --update` in Windows, then `wsl --shutdown`, then re-run. - **Mic permission**: Windows Settings → Privacy & security → Microphone → enable *"Let desktop apps access your microphone"*. Required. Verify the riskiest piece (mic capture) first: ```bash claudedo test-audio ``` ## Usage **Run it in a terminal you watch — that's the product.** You launch `claudedo start` and it drops into a visible listen loop (pass `--check` to run a mic check first). Each utterance prints a timestamped, colored line — `HH:MM:SS [claude-libs] heard "…" → typed 'fix'` (green for injected, red for drops, `[SYSTEM]`/`[VOICE]` for state and recognition). That terminal is your recognition/action console; you attach to the `claude-` session in another pane to watch the keystrokes land. It runs in the foreground by design — the console is the point — though `claudedo stop` can signal a stray instance. ```bash claudedo start # the visible listen loop (listen mode default; no mic check) claudedo start --check # run a mic check before listening claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes) claudedo status # running? mode? target session? claudedo stop # stop a running daemon claudedo reload # reload config.toml + contexts.toml in a running daemon claudedo set # set the sticky target -> claude- (alias: switch) claudedo unset # clear the sticky target claudedo list # list running claude-* sessions claudedo test-audio # verify the mic capture path claudedo test-tone # play each earcon (verify the audio-OUT path) ``` ### Modes - **listen (default)** — continuous capture; only acts on utterances that **start with a wake phrase**; all other speech is transcribed locally and discarded instantly. This is the hands-free path and works while a game is focused, because the trigger is your voice over the mic bridge — not a keyboard hook. - **ptt** — push-to-talk. **Desk-only:** it captures only while the daemon's own terminal window is focused. There is deliberately **no global hotkey** — a system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for, and `claudedo` refuses to install one. For hands-free-while-gaming, use listen mode. (Terminals don't deliver key-up events, so PTT is press-to-start / press-to-stop in the daemon window, not literal hold.) Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt". ## Command grammar Wake phrases (listen mode), fuzzy-matched. The default list is **"claudedo"**, **"claude do"**, **"hey claude"**, **"ok claude"**, **"okay claude"** — Whisper has no token for the coined word "claudedo" and renders it as real words ("claude do"), so that spelling is listed explicitly. Matching is lenient (case/space-insensitive). Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode the wake phrase is optional. When a command's wake phrase matched loosely (e.g. you said "okay clouds"), the heard line notes which phrase it assumed — `heard "okay clouds list" -> LIST (wake: okay claude)`. | Say | Does | |---|---| | `yes` / `no` | answer a yes/no prompt | | `one` / `two` / `three` / `four` | pick numbered option 1–4 | | `approve` / `deny` | allow / deny a permission prompt | | `send` / `enter` | submit (Enter) | | `type ` | insert literal text, **no** submit (read-before-send; say "send") | | `space []` (also `add [a] space`, `insert spaces`) | insert n spaces (default 1) | | `backspace []` (alias `delete`) | delete n chars (default 1), capped at the last submit boundary | | `erase` (alias `clear`/`wipe`) | delete everything typed since the last submit/boundary | | `debug ` (alias `echo`) | just print what you said to the console (test wake/STT; injects nothing) | | `mode ptt` / `mode listen` | switch input mode | | `set ` (alias `sticky`/`switch`) | set the **sticky** target → `claude-` (persists) | | `target ` | **one-shot** override: run that command on `claude-` for this utterance only; sticky default unchanged | | `unset` (alias `unsticky`) | clear the sticky target | | `list` | list running `claude-*` sessions to the daemon console | | `context ` (alias `prepare`) | inject a `contexts.toml` blurb as a preamble + the dictated instruction, then **wait** (no submit — say "send") | | `reload` | re-read `config.toml` + `contexts.toml` live (no daemon restart, model stays loaded) | | `system status` | print mode / target / model / context count to the console (daemon-control; never injects) | | `system reload [config\|contexts]` | reload one or both config files | | `commands` (alias `help`/`menu`) | print the voice-command menu to the console | | `customs` (alias `custom`) | list the loaded context names | | `version` | print the claudedo version to the console | | `cancel` / `escape` | back out of a prompt | Optional filler (`select` / `use` / `choose`) may precede any command and is ignored: `select yes` and `use yes` behave like `yes`. (`select 1` is still the select command.) When no sticky target is set, a bare command does nothing and asks you to `set` one (the default). Set `auto_target = true` to instead auto-use the single running `claude-*` session when there's exactly one; with several running it always does nothing and asks you to `set` one. Number words are normalized to digits before matching ("one"/"won" → 1). ## Targeting `~/.claude-active` holds the **sticky** target session name (e.g. `claude-rethink-public`). The **cc kit** writes this file when you attach, and `claudedo set ` (alias `sticky`/`switch`) overwrites it; `unset` clears it. A `target ` voice command is a **one-shot** that does NOT touch the sticky default — it routes a single command and the next bare command reverts to sticky. Resolution order (one place — `target.resolve()`): one-shot if present → sticky if set and the session exists → else, only if `auto_target = true`, the single running `claude-*` session → else (default, or zero/several sessions) do nothing and say so. It never guesses, and never injects into a nonexistent session. Every name maps to `claude-` through one helper (`target.session_name()`), and the cc kit mirrors it exactly — so `cc libs` (shell) and `set libs` (voice) refer to the same session `claude-libs`. The name is your **stable, speakable handle**: because the kit forces an explicit name (no basename guessing), you always know the exact word to say. The cc kit lives in `~/.config/claudedo/cc.sh` (sourced from your rc; works under bash and zsh). Every command **requires an explicit name**: ```bash cc # attach/create claude-; writes ~/.claude-active ccr # re-attach an existing claude- only ccl # list claude-* sessions cck # kill claude- cckl # kill all claude-* sessions ``` ## Contexts (named reference blurbs) `contexts.toml` holds named reference snippets you can inject ahead of a dictated instruction with the **`context `** voice command (alias `prepare`). It lives next to `config.toml` (`$CLAUDEDO_CONTEXTS` → `~/.config/claudedo/contexts.toml` → `./contexts.toml`); a missing file just means no contexts (the feature is opt-in). ```toml [contexts] webhooks = "discord webhooks — test: (safe to spam), live: (real, careful)" testing = "use the test/staging resources only, never touch prod" ``` Saying `context webhooks send a test message` injects the `webhooks` blurb as a preamble, then the dictated instruction, and **waits** — nothing is auto-submitted. You say `send` to submit (**read-before-send**; Claude's own permission prompt is the backstop for anything consequential). A bare `context webhooks` injects just the blurb. One context per command (no stacking yet); an unknown name announces and injects nothing. Names are **spoken and fuzzy-matched**, so keep them simple and distinct — they're looked up on a despaced/lowercased key, so `web hooks` / `web-hooks` / `webhooks` all resolve the same block. Assembly is config-gated: `behavior.context_multiline` (default `true`) puts the blurb and instruction on separate lines via a Shift+Enter soft newline; set it `false` to flatten onto one line with `context_separator` (default `" — "`) if Shift+Enter is unreliable in your terminal. Edit `contexts.toml`, then say **`reload`** (or run `claudedo reload`) — it re-reads `config.toml` and `contexts.toml` live without restarting the daemon or reloading the Whisper model. The **`system`** namespace gives daemon-control by voice without touching Claude: `system status` (mode / target / model / context count) and `system reload [config|contexts]`. ## Earcons (audio feedback tones) Short confirmation tones play on key events so you get **eyes-free feedback** — "did it hear me?" — without watching the terminal. They're tones, not speech (not TTS): a bright blip when a command is accepted/injected, a low buzz when nothing matched, a rising chime on submit, and an optional blip on wake. Tones are short (<300ms) and quiet, and they're **additive** to the console feed — mute them and read at the desk, or hear them eyes-free. Verify the audio-OUT path (the reverse of `test-audio`, and the less-tested direction on WSLg) with: ```bash claudedo test-tone # plays each tone through WSLg — the audio-out gate ``` Tones play through WSLg's PulseAudio sink, **paplay-first** (a separate process, so it doesn't contend with the sounddevice mic stream), falling back to in-process sounddevice, then `powershell.exe` on the Windows host. Playback is **fire-and-forget**: a dead speaker or a missing tone file logs once and is ignored — audio-out can never block or break a command (`claudedo yes` injects whether or not the speaker works). Configure under `[sound]`: `enabled` (master, default on), per-event `on_wake` (default **off** — a blip right before you speak can bleed into the command capture, and it's chatty), `on_accept` / `on_no_match` / `on_submit` (default on), and `volume` (0.0–1.0, best-effort — scaled for sounddevice, `--volume` for paplay, ignored by the PowerShell fallback). A `[sound.files]` table can point any event at your own `.wav`. The shipped tones live in the package (`claudedo/sounds/*.wav`); `claudedo/sounds/generate.py` is a synthetic-beep fallback that can regenerate a placeholder set (it does **not** reproduce the shipped tones — running it overwrites them with plain beeps). ## The confirmed Claude Code keymap The keystrokes in [`keys.py`](src/claudedo/keys.py) were confirmed **empirically** against a live `claude` v2.1.191 session (not assumed): - Numbered prompts (trust prompt, permission prompt): pressing the **bare digit** selects **and confirms immediately** — **no trailing Enter**. - Arrow keys move the highlight without acting; Enter then confirms (modeled as an alternative sequence). - Permission prompt is `1. Yes / 2. Yes, and don't ask again / 3. No`; Escape cancels. - Literal text goes in via `send-keys -l` (no submit); a bare Enter submits. If Claude Code changes its prompt UI, re-confirm against a live session and update `keys.py` — it is the single source of truth. ## Config Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]` (`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`). The default model is **`small.en`** (the English-only small model — ~1s/command on a strong CPU, more accurate on English than multilingual `small` at the same speed); `medium`/`medium.en` are more accurate but ~3× slower (noticeable lag), `base.en` is snappier/less accurate, `large-v3` most accurate/slowest. Every `heard` line shows the STT latency as `(/