init: claude, with ears

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 17:54:49 -04:00 · 2026-06-25 17:54:49 -04:00 · 98a6f0283f
commit 98a6f0283f
2 changed files with 190 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,9 @@
 CLAUDE.md
 __pycache__/
 *.pyc
 *.egg-info/
 build/
 dist/
 .venv/
 venv/
--- a/README.md
+++ b/README.md
@ -0,0 +1,181 @@
 # claudedo
 Voice control for [Claude Code](https://claude.com/claude-code) on **WSL2**.
 `claudedo` listens on your mic, runs **local** speech-to-text, recognizes a wake
 phrase plus a small command grammar, and injects the matching keystrokes into your
 active Claude Code tmux session via `tmux send-keys`. You answer Claude Code's
 prompts ("yes", "option one", "approve") and dictate prompts **by voice** — including
 hands-free while another window (a game) is focused.
 It exists because Claude Code's native `/voice` is hardcoded-blocked in WSL (it
 assumes WSL has no audio). Modern WSL2 + WSLg *does* have working mic input via
 PulseAudio/RDP. `claudedo` captures the mic itself, transcribes on-device, and drives
 Claude Code over tmux — fully local, private, backgroundable.
 ## How it works
 ```
 mic (WSLg/PulseAudio RDPSource)
  -> sounddevice capture
  -> faster-whisper (local STT, on-device)
  -> wake gate: utterance must start with a wake phrase, else DISCARD locally
  -> grammar match (yes/no/one..four/approve/deny/send/type/mode/switch/cancel)
  -> resolve target session (~/.claude-active)
  -> tmux send-keys -t <session> "<keys>"
 ```
 **Privacy by construction.** STT runs on-device. In listen mode, any speech that
 doesn't start with a wake phrase is dropped the instant it's transcribed — never
 stored, never sent anywhere. That's what makes always-listening acceptable while
 you're on voice comms in a game.
 **Injection is PTY-only.** `claudedo` only ever calls `tmux send-keys`. It never uses
 OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are
 text into a Linux pseudo-terminal — they work regardless of which window is focused
 and never touch Windows input or a game/anticheat's view.
 ## Install
 ```bash
 git clone <repo> claudedo && cd claudedo
 ./install.sh
 ```
 `install.sh` is idempotent. It installs the WSL audio deps, writes the `~/.asoundrc`
 Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper
 model, and installs the **cc kit** (`~/.config/claudedo/cc.sh`, sourced from every
 `~/.zshrc`/`~/.bashrc` you have). It also checks the two Windows-side bits it can't
 automate and tells you to fix them:
 - **WSLg present** (`/mnt/wslg/PulseServer`). If missing: `wsl --update` in Windows,
  then `wsl --shutdown`, then re-run.
 - **Mic permission**: Windows Settings → Privacy & security → Microphone → enable
  *"Let desktop apps access your microphone"*. Required.
 Verify the riskiest piece (mic capture) first:
 ```bash
 claudedo test-audio
 ```
 ## Usage
 ```bash
 claudedo start            # run the daemon (foreground; listen mode by default)
 claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
 claudedo status           # running? mode? target session?
 claudedo stop             # stop a running daemon
 claudedo switch <name>    # retarget to claude-<name>
 claudedo test-audio       # verify the mic capture path
 ```
 Background it in its own tmux session:
 ```bash
 tmux new-session -d -s claudedo 'claudedo start'
 ```
 ### Autostart
 WSL has no real boot, so autostart is rc-based and **opt-in**. `install.sh` ships
 `~/.config/claudedo/autostart.sh`, which starts the daemon in a `claudedo-daemon`
 tmux session once per WSL session — but only when `CLAUDEDO_AUTOSTART=1` is set.
 Enable it by uncommenting the `export CLAUDEDO_AUTOSTART=1` line in the cc-kit marker
 block of your rc; disable it by re-commenting (or deleting the file). Watch its logs
 with `tmux attach -t claudedo-daemon`.
 If your WSL runs systemd (`systemd=true` in `/etc/wsl.conf`), `install.sh` also
 installs an optional user unit — enable it instead with:
 ```bash
 systemctl --user enable --now claudedo
 ```
 ### Modes
 - **listen (default)** — continuous capture; only acts on utterances that **start
  with a wake phrase**; all other speech is transcribed locally and discarded
  instantly. This is the hands-free path and works while a game is focused, because
  the trigger is your voice over the mic bridge — not a keyboard hook.
 - **ptt** — push-to-talk. **Desk-only:** it captures only while the daemon's own
  terminal window is focused. There is deliberately **no global hotkey** — a
  system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for,
  and `claudedo` refuses to install one. For hands-free-while-gaming, use listen
  mode. (Terminals don't deliver key-up events, so PTT is press-to-start /
  press-to-stop in the daemon window, not literal hold.)
 Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt".
 ## Command grammar
 Wake phrases (listen mode), fuzzy-matched: **"claudedo"**, **"hey claude"**.
 "claudedo" is a coined word, so the matcher is lenient (accepts "claude do",
 "clauddo", "cloud do", …). In PTT mode the wake phrase is optional.
 | Say | Does |
 |---|---|
 | `yes` / `no` | answer a yes/no prompt |
 | `one` / `two` / `three` / `four` | pick numbered option 1–4 |
 | `approve` / `deny` | allow / deny a permission prompt |
 | `send` / `enter` | submit (Enter) |
 | `type <phrase>` | insert literal text, **no** submit (read-before-send; say "send") |
 | `mode ptt` / `mode listen` | switch input mode |
 | `switch <name>` / `target <name>` | retarget to `claude-<name>` |
 | `cancel` / `escape` | back out of a prompt |
 Number words are normalized to digits before matching ("one"/"won" → 1).
 ## Targeting
 `~/.claude-active` holds the target session name (e.g. `claude-rethink-public`). The
 **cc kit** writes this file when you attach, so the target is "the project you most
 recently attached to". `claudedo switch <name>` / `target <name>` overwrites it. If
 the file is missing or the session no longer exists, `claudedo` injects nothing and
 logs a warning (it never guesses a target).
 Every name maps to `claude-<name>` through one helper (`target.session_name()`), and
 the cc kit mirrors it exactly — so `cc libs` (shell) and `target libs` (voice) refer
 to the same session `claude-libs`. The name is your **stable, speakable handle**:
 because the kit forces an explicit name (no basename guessing), you always know the
 exact word to say.
 The cc kit lives in `~/.config/claudedo/cc.sh` (sourced from your rc; works under
 bash and zsh). Every command **requires an explicit name**:
 ```bash
 cc <name>    # attach/create claude-<name>; writes ~/.claude-active
 ccr <name>   # re-attach an existing claude-<name> only
 ccl          # list claude-* sessions
 cck <name>   # kill claude-<name>
 cckl         # kill all claude-* sessions
 ```
 ## The confirmed Claude Code keymap
 The keystrokes in [`keys.py`](src/claudedo/keys.py) were confirmed **empirically**
 against a live `claude` v2.1.191 session (not assumed):
 - Numbered prompts (trust prompt, permission prompt): pressing the **bare digit**
  selects **and confirms immediately** — **no trailing Enter**.
 - Arrow keys move the highlight without acting; Enter then confirms (modeled as an
  alternative sequence).
 - Permission prompt is `1. Yes / 2. Yes, and don't ask again / 3. No`; Escape cancels.
 - Literal text goes in via `send-keys -l` (no submit); a bare Enter submits.
 If Claude Code changes its prompt UI, re-confirm against a live session and update
 `keys.py` — it is the single source of truth.
 ## Config
 Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
 key, Whisper model/language/device, audio segmentation thresholds, and
 `type_autosend = false`. The default model is `small`; bump to `medium` if the coined
 wake word is recognized poorly. `claudedo -c <path> ...` points at a specific config;
 otherwise it searches `$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then
 `./config.toml`.
 ## Requirements
 Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the `claude` CLI, and
 either bash or zsh (the cc kit supports both).