# claudedo Voice control for [Claude Code](https://claude.com/claude-code) on **WSL2**. `claudedo` listens on your mic, runs **local** speech-to-text, recognizes a wake phrase plus a small command grammar, and injects the matching keystrokes into your active Claude Code tmux session via `tmux send-keys`. You answer Claude Code's prompts ("yes", "option one", "approve") and dictate prompts **by voice** — including hands-free while another window (a game) is focused. It exists because Claude Code's native `/voice` is hardcoded-blocked in WSL (it assumes WSL has no audio). Modern WSL2 + WSLg *does* have working mic input via PulseAudio/RDP. `claudedo` captures the mic itself, transcribes on-device, and drives Claude Code over tmux — fully local and private. You run it in a terminal you watch. ## How it works ``` mic (WSLg/PulseAudio RDPSource) -> sounddevice capture -> faster-whisper (local STT, on-device) -> wake gate: utterance must start with a wake phrase, else DISCARD locally -> grammar match (yes/no/one..four/approve/deny/send/type/space/backspace/erase/ mode/set/target/unset/list/cancel) -> resolve target session (one-shot > sticky ~/.claude-active > auto/none) -> tmux send-keys -t "" -> log the action to the watched terminal ([session]/[SYSTEM]/[VOICE], colored) ``` **Privacy by construction.** STT runs on-device. In listen mode, any speech that doesn't start with a wake phrase is dropped the instant it's transcribed — never stored, never sent anywhere. That's what makes always-listening acceptable while you're on voice comms in a game. **Injection is PTY-only.** `claudedo` only ever calls `tmux send-keys`. It never uses OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are text into a Linux pseudo-terminal — they work regardless of which window is focused and never touch Windows input or a game/anticheat's view. ## Install ```bash git clone claudedo && cd claudedo ./install.sh ``` `install.sh` is idempotent. It installs the WSL audio deps, writes the `~/.asoundrc` Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper model, and installs the **cc kit** (`~/.config/claudedo/cc.sh`, sourced from every `~/.zshrc`/`~/.bashrc` you have). It also checks the two Windows-side bits it can't automate and tells you to fix them: - **WSLg present** (`/mnt/wslg/PulseServer`). If missing: `wsl --update` in Windows, then `wsl --shutdown`, then re-run. - **Mic permission**: Windows Settings → Privacy & security → Microphone → enable *"Let desktop apps access your microphone"*. Required. Verify the riskiest piece (mic capture) first: ```bash claudedo test-audio ``` ## Usage **Run it in a terminal you watch — that's the product.** You launch `claudedo start`, it does a quick mic check, then drops into a visible listen loop. Each utterance prints a timestamped, colored line — `HH:MM:SS [claude-libs] heard "…" → typed 'fix'` (green for injected, red for drops, `[SYSTEM]`/`[VOICE]` for state and recognition). That terminal is your recognition/action console; you attach to the `claude-` session in another pane to watch the keystrokes land. It runs in the foreground by design — the console is the point — though `claudedo stop` can signal a stray instance. ```bash claudedo start # mic-check, then the visible listen loop (listen mode default) claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes) claudedo start --skip-audio-check # skip the pre-listen mic check claudedo status # running? mode? target session? claudedo stop # stop a running daemon claudedo set # set the sticky target -> claude- (alias: switch) claudedo unset # clear the sticky target claudedo list # list running claude-* sessions claudedo test-audio # verify the mic capture path ``` ### Modes - **listen (default)** — continuous capture; only acts on utterances that **start with a wake phrase**; all other speech is transcribed locally and discarded instantly. This is the hands-free path and works while a game is focused, because the trigger is your voice over the mic bridge — not a keyboard hook. - **ptt** — push-to-talk. **Desk-only:** it captures only while the daemon's own terminal window is focused. There is deliberately **no global hotkey** — a system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for, and `claudedo` refuses to install one. For hands-free-while-gaming, use listen mode. (Terminals don't deliver key-up events, so PTT is press-to-start / press-to-stop in the daemon window, not literal hold.) Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt". ## Command grammar Wake phrases (listen mode), fuzzy-matched. The default list is **"claudedo"**, **"claude do"**, **"hey claude"**, **"ok claude"**, **"okay claude"** — Whisper has no token for the coined word "claudedo" and renders it as real words ("claude do"), so that spelling is listed explicitly. Matching is lenient (case/space-insensitive). Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode the wake phrase is optional. | Say | Does | |---|---| | `yes` / `no` | answer a yes/no prompt | | `one` / `two` / `three` / `four` | pick numbered option 1–4 | | `approve` / `deny` | allow / deny a permission prompt | | `send` / `enter` | submit (Enter) | | `type ` | insert literal text, **no** submit (read-before-send; say "send") | | `space []` (also `add [a] space`, `insert spaces`) | insert n spaces (default 1) | | `backspace []` (alias `delete`) | delete n chars (default 1), capped at the last submit boundary | | `erase` (alias `clear`/`wipe`) | delete everything typed since the last submit/boundary | | `mode ptt` / `mode listen` | switch input mode | | `set ` (alias `sticky`/`switch`) | set the **sticky** target → `claude-` (persists) | | `target ` | **one-shot** override: run that command on `claude-` for this utterance only; sticky default unchanged | | `unset` (alias `unsticky`) | clear the sticky target | | `list` | list running `claude-*` sessions to the daemon console | | `cancel` / `escape` | back out of a prompt | Optional filler (`select` / `use` / `choose`) may precede any command and is ignored: `select yes` and `use yes` behave like `yes`. (`select 1` is still the select command.) When no sticky target is set, a bare command does nothing and asks you to `set` one (the default). Set `auto_target = true` to instead auto-use the single running `claude-*` session when there's exactly one; with several running it always does nothing and asks you to `set` one. Number words are normalized to digits before matching ("one"/"won" → 1). ## Targeting `~/.claude-active` holds the **sticky** target session name (e.g. `claude-rethink-public`). The **cc kit** writes this file when you attach, and `claudedo set ` (alias `sticky`/`switch`) overwrites it; `unset` clears it. A `target ` voice command is a **one-shot** that does NOT touch the sticky default — it routes a single command and the next bare command reverts to sticky. Resolution order (one place — `target.resolve()`): one-shot if present → sticky if set and the session exists → else, only if `auto_target = true`, the single running `claude-*` session → else (default, or zero/several sessions) do nothing and say so. It never guesses, and never injects into a nonexistent session. Every name maps to `claude-` through one helper (`target.session_name()`), and the cc kit mirrors it exactly — so `cc libs` (shell) and `set libs` (voice) refer to the same session `claude-libs`. The name is your **stable, speakable handle**: because the kit forces an explicit name (no basename guessing), you always know the exact word to say. The cc kit lives in `~/.config/claudedo/cc.sh` (sourced from your rc; works under bash and zsh). Every command **requires an explicit name**: ```bash cc # attach/create claude-; writes ~/.claude-active ccr # re-attach an existing claude- only ccl # list claude-* sessions cck # kill claude- cckl # kill all claude-* sessions ``` ## The confirmed Claude Code keymap The keystrokes in [`keys.py`](src/claudedo/keys.py) were confirmed **empirically** against a live `claude` v2.1.191 session (not assumed): - Numbered prompts (trust prompt, permission prompt): pressing the **bare digit** selects **and confirms immediately** — **no trailing Enter**. - Arrow keys move the highlight without acting; Enter then confirms (modeled as an alternative sequence). - Permission prompt is `1. Yes / 2. Yes, and don't ask again / 3. No`; Escape cancels. - Literal text goes in via `send-keys -l` (no submit); a bare Enter submits. If Claude Code changes its prompt UI, re-confirm against a live session and update `keys.py` — it is the single source of truth. ## Config Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT key, Whisper model/language/device, audio segmentation thresholds, and `[behavior]` (`type_autosend`, `filler_words`, `auto_target`, `print_heard`). The default model is `small`; bump to `medium` if the coined wake word is recognized poorly. `claudedo -c ...` points at a specific config; otherwise it searches `$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then `./config.toml`. - **`auto_target`** (default `false`): with no sticky target set and exactly one `claude-*` session running, `false` makes a bare command do nothing and ask you to `set` one; `true` auto-targets that single session. - **`print_heard`** (default `false`, debug): prints non-wake transcripts to the console so you can see how Whisper renders your wake word. Turn it on to debug detection, then off. Whisper has no token for "claudedo" — it commonly emits "claude do", which is in the default wake list. ## Requirements Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the `claude` CLI, and either bash or zsh (the cc kit supports both).