claudedo/README.md
disqualifier d96dc3898f feat: backspace/space/erase editing commands + colored prefixed console
voice editing: 'space [<n>]' inserts spaces, 'backspace [<n>]' (alias delete)
deletes chars, 'erase' (alias clear/wipe) wipes the current input. the daemon
tracks a per-session uncommitted-input char count so backspace is capped at the
last submit boundary and erase clears exactly back to it; submit/set reset it.
keys.py gains BSpace/space; grammar gains a count parser (digits + number words).

new console.py renders every daemon line as 'HH:MM:SS [prefix] message' with
color: [<session>] for injected lines (green), [SYSTEM] for state, [VOICE] for
recognition/drops (red/dim). bump to 0.1.2.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 01:17:22 -04:00

195 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# claudedo
Voice control for [Claude Code](https://claude.com/claude-code) on **WSL2**.
`claudedo` listens on your mic, runs **local** speech-to-text, recognizes a wake
phrase plus a small command grammar, and injects the matching keystrokes into your
active Claude Code tmux session via `tmux send-keys`. You answer Claude Code's
prompts ("yes", "option one", "approve") and dictate prompts **by voice** — including
hands-free while another window (a game) is focused.
It exists because Claude Code's native `/voice` is hardcoded-blocked in WSL (it
assumes WSL has no audio). Modern WSL2 + WSLg *does* have working mic input via
PulseAudio/RDP. `claudedo` captures the mic itself, transcribes on-device, and drives
Claude Code over tmux — fully local, private, backgroundable.
## How it works
```
mic (WSLg/PulseAudio RDPSource)
-> sounddevice capture
-> faster-whisper (local STT, on-device)
-> wake gate: utterance must start with a wake phrase, else DISCARD locally
-> grammar match (yes/no/one..four/approve/deny/send/type/mode/set/target/cancel)
-> resolve target session (~/.claude-active)
-> tmux send-keys -t <session> "<keys>"
```
**Privacy by construction.** STT runs on-device. In listen mode, any speech that
doesn't start with a wake phrase is dropped the instant it's transcribed — never
stored, never sent anywhere. That's what makes always-listening acceptable while
you're on voice comms in a game.
**Injection is PTY-only.** `claudedo` only ever calls `tmux send-keys`. It never uses
OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are
text into a Linux pseudo-terminal — they work regardless of which window is focused
and never touch Windows input or a game/anticheat's view.
## Install
```bash
git clone <repo> claudedo && cd claudedo
./install.sh
```
`install.sh` is idempotent. It installs the WSL audio deps, writes the `~/.asoundrc`
Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper
model, and installs the **cc kit** (`~/.config/claudedo/cc.sh`, sourced from every
`~/.zshrc`/`~/.bashrc` you have). It also checks the two Windows-side bits it can't
automate and tells you to fix them:
- **WSLg present** (`/mnt/wslg/PulseServer`). If missing: `wsl --update` in Windows,
then `wsl --shutdown`, then re-run.
- **Mic permission**: Windows Settings → Privacy & security → Microphone → enable
*"Let desktop apps access your microphone"*. Required.
Verify the riskiest piece (mic capture) first:
```bash
claudedo test-audio
```
## Usage
**Run it in a terminal you watch — that's the product.** You launch `claudedo
start`, it does a quick mic check, then drops into a visible listen loop that prints
`heard → matched → sent` for every utterance. That terminal is your
recognition/action console; you attach to the `claude-<name>` session in another pane
to watch the keystrokes land. There is no backgrounding/daemon mode — the whole point
is the console you read.
```bash
claudedo start # mic-check, then the visible listen loop (listen mode default)
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo start --skip-audio-check # skip the pre-listen mic check
claudedo status # running? mode? target session?
claudedo stop # stop a running daemon
claudedo set <name> # set the sticky target -> claude-<name> (alias: switch)
claudedo unset # clear the sticky target
claudedo list # list running claude-* sessions
claudedo test-audio # verify the mic capture path
```
### Modes
- **listen (default)** — continuous capture; only acts on utterances that **start
with a wake phrase**; all other speech is transcribed locally and discarded
instantly. This is the hands-free path and works while a game is focused, because
the trigger is your voice over the mic bridge — not a keyboard hook.
- **ptt** — push-to-talk. **Desk-only:** it captures only while the daemon's own
terminal window is focused. There is deliberately **no global hotkey** — a
system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for,
and `claudedo` refuses to install one. For hands-free-while-gaming, use listen
mode. (Terminals don't deliver key-up events, so PTT is press-to-start /
press-to-stop in the daemon window, not literal hold.)
Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt".
## Command grammar
Wake phrases (listen mode), fuzzy-matched: **"claudedo"**, **"hey claude"**.
"claudedo" is a coined word, so the matcher is lenient (accepts "claude do",
"clauddo", "cloud do", …). In PTT mode the wake phrase is optional.
| Say | Does |
|---|---|
| `yes` / `no` | answer a yes/no prompt |
| `one` / `two` / `three` / `four` | pick numbered option 14 |
| `approve` / `deny` | allow / deny a permission prompt |
| `send` / `enter` | submit (Enter) |
| `type <phrase>` | insert literal text, **no** submit (read-before-send; say "send") |
| `space [<n>]` | insert n spaces (default 1) |
| `backspace [<n>]` (alias `delete`) | delete n chars (default 1), capped at the last submit boundary |
| `erase` (alias `clear`/`wipe`) | delete everything typed since the last submit/boundary |
| `mode ptt` / `mode listen` | switch input mode |
| `set <name>` (alias `sticky`/`switch`) | set the **sticky** target → `claude-<name>` (persists) |
| `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged |
| `unset` (alias `unsticky`) | clear the sticky target |
| `list` | list running `claude-*` sessions to the daemon console |
| `cancel` / `escape` | back out of a prompt |
Optional filler (`select` / `use` / `choose`) may precede any command and is ignored:
`select yes` and `use yes` behave like `yes`. (`select 1` is still the select command.)
When no sticky target is set, a bare command auto-targets the **only** running
`claude-*` session; if several are running it does nothing and asks you to `set` one.
Number words are normalized to digits before matching ("one"/"won" → 1).
## Targeting
`~/.claude-active` holds the **sticky** target session name (e.g.
`claude-rethink-public`). The **cc kit** writes this file when you attach, and
`claudedo set <name>` (alias `sticky`/`switch`) overwrites it; `unset` clears it.
A `target <name>` voice command is a **one-shot** that does NOT touch the sticky
default — it routes a single command and the next bare command reverts to sticky.
Resolution order (one place — `target.resolve()`): one-shot if present →
sticky if set and the session exists → else the only running `claude-*` session →
else (zero or several) do nothing and say so. It never guesses, and never injects
into a nonexistent session.
Every name maps to `claude-<name>` through one helper (`target.session_name()`), and
the cc kit mirrors it exactly — so `cc libs` (shell) and `set libs` (voice) refer
to the same session `claude-libs`. The name is your **stable, speakable handle**:
because the kit forces an explicit name (no basename guessing), you always know the
exact word to say.
The cc kit lives in `~/.config/claudedo/cc.sh` (sourced from your rc; works under
bash and zsh). Every command **requires an explicit name**:
```bash
cc <name> # attach/create claude-<name>; writes ~/.claude-active
ccr <name> # re-attach an existing claude-<name> only
ccl # list claude-* sessions
cck <name> # kill claude-<name>
cckl # kill all claude-* sessions
```
## The confirmed Claude Code keymap
The keystrokes in [`keys.py`](src/claudedo/keys.py) were confirmed **empirically**
against a live `claude` v2.1.191 session (not assumed):
- Numbered prompts (trust prompt, permission prompt): pressing the **bare digit**
selects **and confirms immediately****no trailing Enter**.
- Arrow keys move the highlight without acting; Enter then confirms (modeled as an
alternative sequence).
- Permission prompt is `1. Yes / 2. Yes, and don't ask again / 3. No`; Escape cancels.
- Literal text goes in via `send-keys -l` (no submit); a bare Enter submits.
If Claude Code changes its prompt UI, re-confirm against a live session and update
`keys.py` — it is the single source of truth.
## Config
Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
key, Whisper model/language/device, audio segmentation thresholds, and `[behavior]`
(`type_autosend`, `filler_words`, `auto_target`, `print_heard`). The default model is
`small`; bump to `medium` if the coined wake word is recognized poorly. `claudedo -c
<path> ...` points at a specific config; otherwise it searches `$CLAUDEDO_CONFIG`,
`~/.config/claudedo/config.toml`, then `./config.toml`.
- **`auto_target`** (default `false`): with no sticky target set and exactly one
`claude-*` session running, `false` makes a bare command do nothing and ask you to
`set` one; `true` auto-targets that single session.
- **`print_heard`** (default `false`, debug): prints non-wake transcripts to the
console so you can see how Whisper renders your wake word. Turn it on to debug
detection, then off. Whisper has no token for "claudedo" — it commonly emits
"claude do" or "claude due", both of which are in the default wake list.
## Requirements
Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the `claude` CLI, and
either bash or zsh (the cc kit supports both).