From 98a6f0283ff692b32ebfe2e3462af2803926a4a5 Mon Sep 17 00:00:00 2001 From: disqualifier Date: Thu, 25 Jun 2026 17:54:49 -0400 Subject: [PATCH] init: claude, with ears Signed-off-by: disqualifier --- .gitignore | 9 +++ README.md | 181 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 190 insertions(+) create mode 100644 .gitignore create mode 100644 README.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..5a3a9be --- /dev/null +++ b/.gitignore @@ -0,0 +1,9 @@ +CLAUDE.md + +__pycache__/ +*.pyc +*.egg-info/ +build/ +dist/ +.venv/ +venv/ diff --git a/README.md b/README.md new file mode 100644 index 0000000..5ea5566 --- /dev/null +++ b/README.md @@ -0,0 +1,181 @@ +# claudedo + +Voice control for [Claude Code](https://claude.com/claude-code) on **WSL2**. + +`claudedo` listens on your mic, runs **local** speech-to-text, recognizes a wake +phrase plus a small command grammar, and injects the matching keystrokes into your +active Claude Code tmux session via `tmux send-keys`. You answer Claude Code's +prompts ("yes", "option one", "approve") and dictate prompts **by voice** — including +hands-free while another window (a game) is focused. + +It exists because Claude Code's native `/voice` is hardcoded-blocked in WSL (it +assumes WSL has no audio). Modern WSL2 + WSLg *does* have working mic input via +PulseAudio/RDP. `claudedo` captures the mic itself, transcribes on-device, and drives +Claude Code over tmux — fully local, private, backgroundable. + +## How it works + +``` +mic (WSLg/PulseAudio RDPSource) + -> sounddevice capture + -> faster-whisper (local STT, on-device) + -> wake gate: utterance must start with a wake phrase, else DISCARD locally + -> grammar match (yes/no/one..four/approve/deny/send/type/mode/switch/cancel) + -> resolve target session (~/.claude-active) + -> tmux send-keys -t "" +``` + +**Privacy by construction.** STT runs on-device. In listen mode, any speech that +doesn't start with a wake phrase is dropped the instant it's transcribed — never +stored, never sent anywhere. That's what makes always-listening acceptable while +you're on voice comms in a game. + +**Injection is PTY-only.** `claudedo` only ever calls `tmux send-keys`. It never uses +OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are +text into a Linux pseudo-terminal — they work regardless of which window is focused +and never touch Windows input or a game/anticheat's view. + +## Install + +```bash +git clone claudedo && cd claudedo +./install.sh +``` + +`install.sh` is idempotent. It installs the WSL audio deps, writes the `~/.asoundrc` +Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper +model, and installs the **cc kit** (`~/.config/claudedo/cc.sh`, sourced from every +`~/.zshrc`/`~/.bashrc` you have). It also checks the two Windows-side bits it can't +automate and tells you to fix them: + +- **WSLg present** (`/mnt/wslg/PulseServer`). If missing: `wsl --update` in Windows, + then `wsl --shutdown`, then re-run. +- **Mic permission**: Windows Settings → Privacy & security → Microphone → enable + *"Let desktop apps access your microphone"*. Required. + +Verify the riskiest piece (mic capture) first: + +```bash +claudedo test-audio +``` + +## Usage + +```bash +claudedo start # run the daemon (foreground; listen mode by default) +claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes) +claudedo status # running? mode? target session? +claudedo stop # stop a running daemon +claudedo switch # retarget to claude- +claudedo test-audio # verify the mic capture path +``` + +Background it in its own tmux session: + +```bash +tmux new-session -d -s claudedo 'claudedo start' +``` + +### Autostart + +WSL has no real boot, so autostart is rc-based and **opt-in**. `install.sh` ships +`~/.config/claudedo/autostart.sh`, which starts the daemon in a `claudedo-daemon` +tmux session once per WSL session — but only when `CLAUDEDO_AUTOSTART=1` is set. +Enable it by uncommenting the `export CLAUDEDO_AUTOSTART=1` line in the cc-kit marker +block of your rc; disable it by re-commenting (or deleting the file). Watch its logs +with `tmux attach -t claudedo-daemon`. + +If your WSL runs systemd (`systemd=true` in `/etc/wsl.conf`), `install.sh` also +installs an optional user unit — enable it instead with: + +```bash +systemctl --user enable --now claudedo +``` + +### Modes + +- **listen (default)** — continuous capture; only acts on utterances that **start + with a wake phrase**; all other speech is transcribed locally and discarded + instantly. This is the hands-free path and works while a game is focused, because + the trigger is your voice over the mic bridge — not a keyboard hook. +- **ptt** — push-to-talk. **Desk-only:** it captures only while the daemon's own + terminal window is focused. There is deliberately **no global hotkey** — a + system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for, + and `claudedo` refuses to install one. For hands-free-while-gaming, use listen + mode. (Terminals don't deliver key-up events, so PTT is press-to-start / + press-to-stop in the daemon window, not literal hold.) + +Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt". + +## Command grammar + +Wake phrases (listen mode), fuzzy-matched: **"claudedo"**, **"hey claude"**. +"claudedo" is a coined word, so the matcher is lenient (accepts "claude do", +"clauddo", "cloud do", …). In PTT mode the wake phrase is optional. + +| Say | Does | +|---|---| +| `yes` / `no` | answer a yes/no prompt | +| `one` / `two` / `three` / `four` | pick numbered option 1–4 | +| `approve` / `deny` | allow / deny a permission prompt | +| `send` / `enter` | submit (Enter) | +| `type ` | insert literal text, **no** submit (read-before-send; say "send") | +| `mode ptt` / `mode listen` | switch input mode | +| `switch ` / `target ` | retarget to `claude-` | +| `cancel` / `escape` | back out of a prompt | + +Number words are normalized to digits before matching ("one"/"won" → 1). + +## Targeting + +`~/.claude-active` holds the target session name (e.g. `claude-rethink-public`). The +**cc kit** writes this file when you attach, so the target is "the project you most +recently attached to". `claudedo switch ` / `target ` overwrites it. If +the file is missing or the session no longer exists, `claudedo` injects nothing and +logs a warning (it never guesses a target). + +Every name maps to `claude-` through one helper (`target.session_name()`), and +the cc kit mirrors it exactly — so `cc libs` (shell) and `target libs` (voice) refer +to the same session `claude-libs`. The name is your **stable, speakable handle**: +because the kit forces an explicit name (no basename guessing), you always know the +exact word to say. + +The cc kit lives in `~/.config/claudedo/cc.sh` (sourced from your rc; works under +bash and zsh). Every command **requires an explicit name**: + +```bash +cc # attach/create claude-; writes ~/.claude-active +ccr # re-attach an existing claude- only +ccl # list claude-* sessions +cck # kill claude- +cckl # kill all claude-* sessions +``` + +## The confirmed Claude Code keymap + +The keystrokes in [`keys.py`](src/claudedo/keys.py) were confirmed **empirically** +against a live `claude` v2.1.191 session (not assumed): + +- Numbered prompts (trust prompt, permission prompt): pressing the **bare digit** + selects **and confirms immediately** — **no trailing Enter**. +- Arrow keys move the highlight without acting; Enter then confirms (modeled as an + alternative sequence). +- Permission prompt is `1. Yes / 2. Yes, and don't ask again / 3. No`; Escape cancels. +- Literal text goes in via `send-keys -l` (no submit); a bare Enter submits. + +If Claude Code changes its prompt UI, re-confirm against a live session and update +`keys.py` — it is the single source of truth. + +## Config + +Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT +key, Whisper model/language/device, audio segmentation thresholds, and +`type_autosend = false`. The default model is `small`; bump to `medium` if the coined +wake word is recognized poorly. `claudedo -c ...` points at a specific config; +otherwise it searches `$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then +`./config.toml`. + +## Requirements + +Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the `claude` CLI, and +either bash or zsh (the cc kit supports both).