claudedo/README.md
disqualifier 66b08d290c docs: lead how-to-run with the terminal-run model
state terminal-run as the product (the claudedo start terminal is the
recognition/action console) and frame backgrounding/autostart/systemd as
optional extras, not the default.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 18:42:22 -04:00

189 lines
8.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# claudedo
Voice control for [Claude Code](https://claude.com/claude-code) on **WSL2**.
`claudedo` listens on your mic, runs **local** speech-to-text, recognizes a wake
phrase plus a small command grammar, and injects the matching keystrokes into your
active Claude Code tmux session via `tmux send-keys`. You answer Claude Code's
prompts ("yes", "option one", "approve") and dictate prompts **by voice** — including
hands-free while another window (a game) is focused.
It exists because Claude Code's native `/voice` is hardcoded-blocked in WSL (it
assumes WSL has no audio). Modern WSL2 + WSLg *does* have working mic input via
PulseAudio/RDP. `claudedo` captures the mic itself, transcribes on-device, and drives
Claude Code over tmux — fully local, private, backgroundable.
## How it works
```
mic (WSLg/PulseAudio RDPSource)
-> sounddevice capture
-> faster-whisper (local STT, on-device)
-> wake gate: utterance must start with a wake phrase, else DISCARD locally
-> grammar match (yes/no/one..four/approve/deny/send/type/mode/switch/cancel)
-> resolve target session (~/.claude-active)
-> tmux send-keys -t <session> "<keys>"
```
**Privacy by construction.** STT runs on-device. In listen mode, any speech that
doesn't start with a wake phrase is dropped the instant it's transcribed — never
stored, never sent anywhere. That's what makes always-listening acceptable while
you're on voice comms in a game.
**Injection is PTY-only.** `claudedo` only ever calls `tmux send-keys`. It never uses
OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are
text into a Linux pseudo-terminal — they work regardless of which window is focused
and never touch Windows input or a game/anticheat's view.
## Install
```bash
git clone <repo> claudedo && cd claudedo
./install.sh
```
`install.sh` is idempotent. It installs the WSL audio deps, writes the `~/.asoundrc`
Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper
model, and installs the **cc kit** (`~/.config/claudedo/cc.sh`, sourced from every
`~/.zshrc`/`~/.bashrc` you have). It also checks the two Windows-side bits it can't
automate and tells you to fix them:
- **WSLg present** (`/mnt/wslg/PulseServer`). If missing: `wsl --update` in Windows,
then `wsl --shutdown`, then re-run.
- **Mic permission**: Windows Settings → Privacy & security → Microphone → enable
*"Let desktop apps access your microphone"*. Required.
Verify the riskiest piece (mic capture) first:
```bash
claudedo test-audio
```
## Usage
**Run it in a terminal you watch** — that's the product. The `claudedo start`
terminal is your recognition/action console (it logs what it heard, what it matched,
and what it injected); you attach to the `claude-<name>` session in another pane to
watch the keystrokes land. Backgrounding (tmux/autostart/systemd, below) is an
optional extra, not the default — it hides the console you'd otherwise read.
```bash
claudedo start # run the daemon (foreground; listen mode by default)
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo status # running? mode? target session?
claudedo stop # stop a running daemon
claudedo switch <name> # retarget to claude-<name>
claudedo test-audio # verify the mic capture path
```
If you do want it backgrounded (optional — you lose the live console), run it in its
own tmux session:
```bash
tmux new-session -d -s claudedo 'claudedo start'
```
### Autostart
WSL has no real boot, so autostart is rc-based and **opt-in**. `install.sh` ships
`~/.config/claudedo/autostart.sh`, which starts the daemon in a `claudedo-daemon`
tmux session once per WSL session — but only when `CLAUDEDO_AUTOSTART=1` is set.
Enable it by uncommenting the `export CLAUDEDO_AUTOSTART=1` line in the cc-kit marker
block of your rc; disable it by re-commenting (or deleting the file). Watch its logs
with `tmux attach -t claudedo-daemon`.
If your WSL runs systemd (`systemd=true` in `/etc/wsl.conf`), `install.sh` also
installs an optional user unit — enable it instead with:
```bash
systemctl --user enable --now claudedo
```
### Modes
- **listen (default)** — continuous capture; only acts on utterances that **start
with a wake phrase**; all other speech is transcribed locally and discarded
instantly. This is the hands-free path and works while a game is focused, because
the trigger is your voice over the mic bridge — not a keyboard hook.
- **ptt** — push-to-talk. **Desk-only:** it captures only while the daemon's own
terminal window is focused. There is deliberately **no global hotkey** — a
system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for,
and `claudedo` refuses to install one. For hands-free-while-gaming, use listen
mode. (Terminals don't deliver key-up events, so PTT is press-to-start /
press-to-stop in the daemon window, not literal hold.)
Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt".
## Command grammar
Wake phrases (listen mode), fuzzy-matched: **"claudedo"**, **"hey claude"**.
"claudedo" is a coined word, so the matcher is lenient (accepts "claude do",
"clauddo", "cloud do", …). In PTT mode the wake phrase is optional.
| Say | Does |
|---|---|
| `yes` / `no` | answer a yes/no prompt |
| `one` / `two` / `three` / `four` | pick numbered option 14 |
| `approve` / `deny` | allow / deny a permission prompt |
| `send` / `enter` | submit (Enter) |
| `type <phrase>` | insert literal text, **no** submit (read-before-send; say "send") |
| `mode ptt` / `mode listen` | switch input mode |
| `switch <name>` / `target <name>` | retarget to `claude-<name>` |
| `cancel` / `escape` | back out of a prompt |
Number words are normalized to digits before matching ("one"/"won" → 1).
## Targeting
`~/.claude-active` holds the target session name (e.g. `claude-rethink-public`). The
**cc kit** writes this file when you attach, so the target is "the project you most
recently attached to". `claudedo switch <name>` / `target <name>` overwrites it. If
the file is missing or the session no longer exists, `claudedo` injects nothing and
logs a warning (it never guesses a target).
Every name maps to `claude-<name>` through one helper (`target.session_name()`), and
the cc kit mirrors it exactly — so `cc libs` (shell) and `target libs` (voice) refer
to the same session `claude-libs`. The name is your **stable, speakable handle**:
because the kit forces an explicit name (no basename guessing), you always know the
exact word to say.
The cc kit lives in `~/.config/claudedo/cc.sh` (sourced from your rc; works under
bash and zsh). Every command **requires an explicit name**:
```bash
cc <name> # attach/create claude-<name>; writes ~/.claude-active
ccr <name> # re-attach an existing claude-<name> only
ccl # list claude-* sessions
cck <name> # kill claude-<name>
cckl # kill all claude-* sessions
```
## The confirmed Claude Code keymap
The keystrokes in [`keys.py`](src/claudedo/keys.py) were confirmed **empirically**
against a live `claude` v2.1.191 session (not assumed):
- Numbered prompts (trust prompt, permission prompt): pressing the **bare digit**
selects **and confirms immediately****no trailing Enter**.
- Arrow keys move the highlight without acting; Enter then confirms (modeled as an
alternative sequence).
- Permission prompt is `1. Yes / 2. Yes, and don't ask again / 3. No`; Escape cancels.
- Literal text goes in via `send-keys -l` (no submit); a bare Enter submits.
If Claude Code changes its prompt UI, re-confirm against a live session and update
`keys.py` — it is the single source of truth.
## Config
Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
key, Whisper model/language/device, audio segmentation thresholds, and
`type_autosend = false`. The default model is `small`; bump to `medium` if the coined
wake word is recognized poorly. `claudedo -c <path> ...` points at a specific config;
otherwise it searches `$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then
`./config.toml`.
## Requirements
Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the `claude` CLI, and
either bash or zsh (the cc kit supports both).