Hands-free voice control for Claude Code. Wake-word activated, local speech-to-text, injects into your tmux sessions.

Go to file

disqualifier 5f05a01423 feat: v0.1.4 — HELP menu, 15s cap, wake 0.65, small.en default + docs sync commands menu now prints under a single [HELP] header with bare indented rows (brightblue usage) instead of 15 repeated [SYSTEM] tags. raise [vad].max_seconds 10 -> 15 for long dictation. wake_fuzzy_threshold 0.6 -> 0.65 (slightly fewer false wakes; note short spellings 'ok/okay claude' still admit some). carries the prior small.en default, [vad].silence_ms 700, lighter (brightblue) command color, lean injection lines, .en model variants in the validator. README/CLAUDE.md synced. Signed-off-by: disqualifier <dev@disqualifier.me>		2026-06-26 03:52:19 -04:00
shell	feat: terminal-run only — drop systemd/autostart, start does mic-check + visible loop	2026-06-25 19:30:36 -04:00
src/claudedo	feat: v0.1.4 — HELP menu, 15s cap, wake 0.65, small.en default + docs sync	2026-06-26 03:52:19 -04:00
.gitignore	fix: install.sh installs config.toml to ~/.config/claudedo	2026-06-26 01:53:25 -04:00
config.toml	feat: v0.1.4 — HELP menu, 15s cap, wake 0.65, small.en default + docs sync	2026-06-26 03:52:19 -04:00
install.sh	fix: install.sh installs config.toml to ~/.config/claudedo	2026-06-26 01:53:25 -04:00
pyproject.toml	feat: v0.1.4 — HELP menu, 15s cap, wake 0.65, small.en default + docs sync	2026-06-26 03:52:19 -04:00
README.md	feat: v0.1.4 — HELP menu, 15s cap, wake 0.65, small.en default + docs sync	2026-06-26 03:52:19 -04:00

README.md

claudedo

Voice control for Claude Code on WSL2.

claudedo listens on your mic, runs local speech-to-text, recognizes a wake phrase plus a small command grammar, and injects the matching keystrokes into your active Claude Code tmux session via tmux send-keys. You answer Claude Code's prompts ("yes", "option one", "approve") and dictate prompts by voice — including hands-free while another window (a game) is focused.

It exists because Claude Code's native /voice is hardcoded-blocked in WSL (it assumes WSL has no audio). Modern WSL2 + WSLg does have working mic input via PulseAudio/RDP. claudedo captures the mic itself, transcribes on-device, and drives Claude Code over tmux — fully local and private. You run it in a terminal you watch.

How it works

mic (WSLg/PulseAudio RDPSource)
  -> sounddevice capture
  -> faster-whisper (local STT, on-device)
  -> wake gate: utterance must start with a wake phrase, else DISCARD locally
  -> grammar match (yes/no/one..four/approve/deny/send/type/space/backspace/erase/
                    mode/set/target/unset/list/cancel)
  -> resolve target session (one-shot > sticky ~/.claude-active > auto/none)
  -> tmux send-keys -t <session> "<keys>"
  -> log the action to the watched terminal ([session]/[SYSTEM]/[VOICE], colored)

Privacy by construction. STT runs on-device. In listen mode, any speech that doesn't start with a wake phrase is dropped the instant it's transcribed — never stored, never sent anywhere. That's what makes always-listening acceptable while you're on voice comms in a game.

Injection is PTY-only. claudedo only ever calls tmux send-keys. It never uses OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are text into a Linux pseudo-terminal — they work regardless of which window is focused and never touch Windows input or a game/anticheat's view.

Install

git clone <repo> claudedo && cd claudedo
./install.sh

install.sh is idempotent. It installs the WSL audio deps, writes the ~/.asoundrc Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper model, and installs the cc kit (~/.config/claudedo/cc.sh, sourced from every ~/.zshrc/~/.bashrc you have). It also checks the two Windows-side bits it can't automate and tells you to fix them:

WSLg present (/mnt/wslg/PulseServer). If missing: wsl --update in Windows, then wsl --shutdown, then re-run.
Mic permission: Windows Settings → Privacy & security → Microphone → enable "Let desktop apps access your microphone". Required.

Verify the riskiest piece (mic capture) first:

claudedo test-audio

Usage

Run it in a terminal you watch — that's the product. You launch claudedo start and it drops into a visible listen loop (pass --check to run a mic check first). Each utterance prints a timestamped, colored line — HH:MM:SS [claude-libs] heard "…" → typed 'fix' (green for injected, red for drops, [SYSTEM]/[VOICE] for state and recognition). That terminal is your recognition/action console; you attach to the claude-<name> session in another pane to watch the keystrokes land. It runs in the foreground by design — the console is the point — though claudedo stop can signal a stray instance.

claudedo start            # the visible listen loop (listen mode default; no mic check)
claudedo start --check    # run a mic check before listening
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo status           # running? mode? target session?
claudedo stop             # stop a running daemon
claudedo set <name>       # set the sticky target -> claude-<name> (alias: switch)
claudedo unset            # clear the sticky target
claudedo list             # list running claude-* sessions
claudedo test-audio       # verify the mic capture path

Modes

listen (default) — continuous capture; only acts on utterances that start with a wake phrase; all other speech is transcribed locally and discarded instantly. This is the hands-free path and works while a game is focused, because the trigger is your voice over the mic bridge — not a keyboard hook.
ptt — push-to-talk. Desk-only: it captures only while the daemon's own terminal window is focused. There is deliberately no global hotkey — a system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for, and claudedo refuses to install one. For hands-free-while-gaming, use listen mode. (Terminals don't deliver key-up events, so PTT is press-to-start / press-to-stop in the daemon window, not literal hold.)

Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt".

Command grammar

Wake phrases (listen mode), fuzzy-matched. The default list is "claudedo", "claude do", "hey claude", "ok claude", "okay claude" — Whisper has no token for the coined word "claudedo" and renders it as real words ("claude do"), so that spelling is listed explicitly. Matching is lenient (case/space-insensitive). Add the spellings you actually see (turn on print_heard to find them). In PTT mode the wake phrase is optional.

Say	Does
`yes` / `no`	answer a yes/no prompt
`one` / `two` / `three` / `four`	pick numbered option 1–4
`approve` / `deny`	allow / deny a permission prompt
`send` / `enter`	submit (Enter)
`type <phrase>`	insert literal text, no submit (read-before-send; say "send")
`space [<n>]` (also `add [a] space`, `insert <n> spaces`)	insert n spaces (default 1)
`backspace [<n>]` (alias `delete`)	delete n chars (default 1), capped at the last submit boundary
`erase` (alias `clear`/`wipe`)	delete everything typed since the last submit/boundary
`debug <text>` (alias `echo`)	just print what you said to the console (test wake/STT; injects nothing)
`mode ptt` / `mode listen`	switch input mode
`set <name>` (alias `sticky`/`switch`)	set the sticky target → `claude-<name>` (persists)
`target <name> <command>`	one-shot override: run that command on `claude-<name>` for this utterance only; sticky default unchanged
`unset` (alias `unsticky`)	clear the sticky target
`list`	list running `claude-*` sessions to the daemon console
`commands` (alias `help`/`menu`)	print the voice-command menu to the console
`customs` (alias `custom`)	custom commands — arriving in v0.2.0 (stub for now)
`cancel` / `escape`	back out of a prompt

Optional filler (select / use / choose) may precede any command and is ignored: select yes and use yes behave like yes. (select 1 is still the select command.)

When no sticky target is set, a bare command does nothing and asks you to set one (the default). Set auto_target = true to instead auto-use the single running claude-* session when there's exactly one; with several running it always does nothing and asks you to set one.

Number words are normalized to digits before matching ("one"/"won" → 1).

Targeting

~/.claude-active holds the sticky target session name (e.g. claude-rethink-public). The cc kit writes this file when you attach, and claudedo set <name> (alias sticky/switch) overwrites it; unset clears it. A target <name> voice command is a one-shot that does NOT touch the sticky default — it routes a single command and the next bare command reverts to sticky.

Resolution order (one place — target.resolve()): one-shot if present → sticky if set and the session exists → else, only if auto_target = true, the single running claude-* session → else (default, or zero/several sessions) do nothing and say so. It never guesses, and never injects into a nonexistent session.

Every name maps to claude-<name> through one helper (target.session_name()), and the cc kit mirrors it exactly — so cc libs (shell) and set libs (voice) refer to the same session claude-libs. The name is your stable, speakable handle: because the kit forces an explicit name (no basename guessing), you always know the exact word to say.

The cc kit lives in ~/.config/claudedo/cc.sh (sourced from your rc; works under bash and zsh). Every command requires an explicit name:

cc <name>    # attach/create claude-<name>; writes ~/.claude-active
ccr <name>   # re-attach an existing claude-<name> only
ccl          # list claude-* sessions
cck <name>   # kill claude-<name>
cckl         # kill all claude-* sessions

The confirmed Claude Code keymap

The keystrokes in keys.py were confirmed empirically against a live claude v2.1.191 session (not assumed):

Numbered prompts (trust prompt, permission prompt): pressing the bare digit selects and confirms immediately — no trailing Enter.
Arrow keys move the highlight without acting; Enter then confirms (modeled as an alternative sequence).
Permission prompt is 1. Yes / 2. Yes, and don't ask again / 3. No; Escape cancels.
Literal text goes in via send-keys -l (no submit); a bare Enter submits.

If Claude Code changes its prompt UI, re-confirm against a live session and update keys.py — it is the single source of truth.

Config

Everything tunable lives in config.toml: wake phrases, mode + PTT key, Whisper model/language/device, [vad] endpointing, and [behavior] (type_autosend, fuzzy thresholds, filler_words, auto_target, print_heard). The default model is small.en (the English-only small model — ~1s/command on a strong CPU, more accurate on English than multilingual small at the same speed); medium/medium.en are more accurate but ~3× slower (noticeable lag), base.en is snappier/less accurate, large-v3 most accurate/slowest. Every heard line shows the STT latency as (<ms>/<audio>s) so you can see what a model change costs. VAD endpointing ends a capture after [vad].silence_ms (700) of trailing silence, capped at max_seconds (15). claudedo -c <path> ... points at a specific config; otherwise it searches $CLAUDEDO_CONFIG, ~/.config/claudedo/config.toml, then ./config.toml.

STT biasing. The transcriber is seeded with an initial_prompt built from the configured wake phrases + command vocabulary (one source — grammar.vocabulary()), so Whisper is conditioned to expect "claudedo" and the command words.
Split fuzzy thresholds. wake_fuzzy_threshold (default 0.6, lenient) vs command_fuzzy_threshold (default 0.8, tight). The asymmetry is deliberate: a false wake is cheap (it wakes, finds no command, does nothing), but a false command fires the wrong action. Prefer expanding command synonyms over loosening the command threshold.
[vad] endpointing. Capture starts on speech and ends after silence_ms (default 800) of trailing silence — Alexa-style record-until-pause — capped at max_seconds (default 10). The pause both ends a command and separates it from following chatter (the chatter is a separate capture the wake gate discards).
auto_target (default false): with no sticky target and one session running, false does nothing and asks you to set; true auto-uses that session.
print_heard (default false, debug): prints non-wake transcripts so you can see how Whisper renders your wake word, then tune the wake list/threshold.

Requirements

Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the claude CLI, and either bash or zsh (the cc kit supports both).

README.md Unescape Escape