claudedo/README.md
disqualifier 17db65858e feat: terminal-run only — drop systemd/autostart, start does mic-check + visible loop
terminal-run is the product, so remove all backgrounding: delete the
claudedo.service unit and autostart.sh, strip the systemd step and the
autostart source-line from install.sh (rc block now sources cc.sh only).

claudedo start now runs a mic check first (warm-up + brief capture, aborts with
guidance if silent; --skip-audio-check to bypass) then drops into a visible
listen loop printing the recognition/action log: a startup banner, then
heard -> matched -> target / injected per utterance, target/mode state changes,
and (listen mode) non-wake speech dropped WITHOUT the transcript per the privacy
invariant.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 19:30:36 -04:00

7.4 KiB
Raw Blame History

claudedo

Voice control for Claude Code on WSL2.

claudedo listens on your mic, runs local speech-to-text, recognizes a wake phrase plus a small command grammar, and injects the matching keystrokes into your active Claude Code tmux session via tmux send-keys. You answer Claude Code's prompts ("yes", "option one", "approve") and dictate prompts by voice — including hands-free while another window (a game) is focused.

It exists because Claude Code's native /voice is hardcoded-blocked in WSL (it assumes WSL has no audio). Modern WSL2 + WSLg does have working mic input via PulseAudio/RDP. claudedo captures the mic itself, transcribes on-device, and drives Claude Code over tmux — fully local, private, backgroundable.

How it works

mic (WSLg/PulseAudio RDPSource)
  -> sounddevice capture
  -> faster-whisper (local STT, on-device)
  -> wake gate: utterance must start with a wake phrase, else DISCARD locally
  -> grammar match (yes/no/one..four/approve/deny/send/type/mode/switch/cancel)
  -> resolve target session (~/.claude-active)
  -> tmux send-keys -t <session> "<keys>"

Privacy by construction. STT runs on-device. In listen mode, any speech that doesn't start with a wake phrase is dropped the instant it's transcribed — never stored, never sent anywhere. That's what makes always-listening acceptable while you're on voice comms in a game.

Injection is PTY-only. claudedo only ever calls tmux send-keys. It never uses OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are text into a Linux pseudo-terminal — they work regardless of which window is focused and never touch Windows input or a game/anticheat's view.

Install

git clone <repo> claudedo && cd claudedo
./install.sh

install.sh is idempotent. It installs the WSL audio deps, writes the ~/.asoundrc Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper model, and installs the cc kit (~/.config/claudedo/cc.sh, sourced from every ~/.zshrc/~/.bashrc you have). It also checks the two Windows-side bits it can't automate and tells you to fix them:

  • WSLg present (/mnt/wslg/PulseServer). If missing: wsl --update in Windows, then wsl --shutdown, then re-run.
  • Mic permission: Windows Settings → Privacy & security → Microphone → enable "Let desktop apps access your microphone". Required.

Verify the riskiest piece (mic capture) first:

claudedo test-audio

Usage

Run it in a terminal you watch — that's the product. You launch claudedo start, it does a quick mic check, then drops into a visible listen loop that prints heard → matched → sent for every utterance. That terminal is your recognition/action console; you attach to the claude-<name> session in another pane to watch the keystrokes land. There is no backgrounding/daemon mode — the whole point is the console you read.

claudedo start            # mic-check, then the visible listen loop (listen mode default)
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo start --skip-audio-check  # skip the pre-listen mic check
claudedo status           # running? mode? target session?
claudedo stop             # stop a running daemon
claudedo switch <name>    # retarget to claude-<name>
claudedo test-audio       # verify the mic capture path

Modes

  • listen (default) — continuous capture; only acts on utterances that start with a wake phrase; all other speech is transcribed locally and discarded instantly. This is the hands-free path and works while a game is focused, because the trigger is your voice over the mic bridge — not a keyboard hook.
  • ptt — push-to-talk. Desk-only: it captures only while the daemon's own terminal window is focused. There is deliberately no global hotkey — a system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for, and claudedo refuses to install one. For hands-free-while-gaming, use listen mode. (Terminals don't deliver key-up events, so PTT is press-to-start / press-to-stop in the daemon window, not literal hold.)

Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt".

Command grammar

Wake phrases (listen mode), fuzzy-matched: "claudedo", "hey claude". "claudedo" is a coined word, so the matcher is lenient (accepts "claude do", "clauddo", "cloud do", …). In PTT mode the wake phrase is optional.

Say Does
yes / no answer a yes/no prompt
one / two / three / four pick numbered option 14
approve / deny allow / deny a permission prompt
send / enter submit (Enter)
type <phrase> insert literal text, no submit (read-before-send; say "send")
mode ptt / mode listen switch input mode
switch <name> / target <name> retarget to claude-<name>
cancel / escape back out of a prompt

Number words are normalized to digits before matching ("one"/"won" → 1).

Targeting

~/.claude-active holds the target session name (e.g. claude-rethink-public). The cc kit writes this file when you attach, so the target is "the project you most recently attached to". claudedo switch <name> / target <name> overwrites it. If the file is missing or the session no longer exists, claudedo injects nothing and logs a warning (it never guesses a target).

Every name maps to claude-<name> through one helper (target.session_name()), and the cc kit mirrors it exactly — so cc libs (shell) and target libs (voice) refer to the same session claude-libs. The name is your stable, speakable handle: because the kit forces an explicit name (no basename guessing), you always know the exact word to say.

The cc kit lives in ~/.config/claudedo/cc.sh (sourced from your rc; works under bash and zsh). Every command requires an explicit name:

cc <name>    # attach/create claude-<name>; writes ~/.claude-active
ccr <name>   # re-attach an existing claude-<name> only
ccl          # list claude-* sessions
cck <name>   # kill claude-<name>
cckl         # kill all claude-* sessions

The confirmed Claude Code keymap

The keystrokes in keys.py were confirmed empirically against a live claude v2.1.191 session (not assumed):

  • Numbered prompts (trust prompt, permission prompt): pressing the bare digit selects and confirms immediatelyno trailing Enter.
  • Arrow keys move the highlight without acting; Enter then confirms (modeled as an alternative sequence).
  • Permission prompt is 1. Yes / 2. Yes, and don't ask again / 3. No; Escape cancels.
  • Literal text goes in via send-keys -l (no submit); a bare Enter submits.

If Claude Code changes its prompt UI, re-confirm against a live session and update keys.py — it is the single source of truth.

Config

Everything tunable lives in config.toml: wake phrases, mode + PTT key, Whisper model/language/device, audio segmentation thresholds, and type_autosend = false. The default model is small; bump to medium if the coined wake word is recognized poorly. claudedo -c <path> ... points at a specific config; otherwise it searches $CLAUDEDO_CONFIG, ~/.config/claudedo/config.toml, then ./config.toml.

Requirements

Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the claude CLI, and either bash or zsh (the cc kit supports both).