voice editing: 'space [<n>]' inserts spaces, 'backspace [<n>]' (alias delete) deletes chars, 'erase' (alias clear/wipe) wipes the current input. the daemon tracks a per-session uncommitted-input char count so backspace is capped at the last submit boundary and erase clears exactly back to it; submit/set reset it. keys.py gains BSpace/space; grammar gains a count parser (digits + number words). new console.py renders every daemon line as 'HH:MM:SS [prefix] message' with color: [<session>] for injected lines (green), [SYSTEM] for state, [VOICE] for recognition/drops (red/dim). bump to 0.1.2. Signed-off-by: disqualifier <dev@disqualifier.me> |
||
|---|---|---|
| shell | ||
| src/claudedo | ||
| .gitignore | ||
| config.toml | ||
| install.sh | ||
| pyproject.toml | ||
| README.md | ||
claudedo
Voice control for Claude Code on WSL2.
claudedo listens on your mic, runs local speech-to-text, recognizes a wake
phrase plus a small command grammar, and injects the matching keystrokes into your
active Claude Code tmux session via tmux send-keys. You answer Claude Code's
prompts ("yes", "option one", "approve") and dictate prompts by voice — including
hands-free while another window (a game) is focused.
It exists because Claude Code's native /voice is hardcoded-blocked in WSL (it
assumes WSL has no audio). Modern WSL2 + WSLg does have working mic input via
PulseAudio/RDP. claudedo captures the mic itself, transcribes on-device, and drives
Claude Code over tmux — fully local, private, backgroundable.
How it works
mic (WSLg/PulseAudio RDPSource)
-> sounddevice capture
-> faster-whisper (local STT, on-device)
-> wake gate: utterance must start with a wake phrase, else DISCARD locally
-> grammar match (yes/no/one..four/approve/deny/send/type/mode/set/target/cancel)
-> resolve target session (~/.claude-active)
-> tmux send-keys -t <session> "<keys>"
Privacy by construction. STT runs on-device. In listen mode, any speech that doesn't start with a wake phrase is dropped the instant it's transcribed — never stored, never sent anywhere. That's what makes always-listening acceptable while you're on voice comms in a game.
Injection is PTY-only. claudedo only ever calls tmux send-keys. It never uses
OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are
text into a Linux pseudo-terminal — they work regardless of which window is focused
and never touch Windows input or a game/anticheat's view.
Install
git clone <repo> claudedo && cd claudedo
./install.sh
install.sh is idempotent. It installs the WSL audio deps, writes the ~/.asoundrc
Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper
model, and installs the cc kit (~/.config/claudedo/cc.sh, sourced from every
~/.zshrc/~/.bashrc you have). It also checks the two Windows-side bits it can't
automate and tells you to fix them:
- WSLg present (
/mnt/wslg/PulseServer). If missing:wsl --updatein Windows, thenwsl --shutdown, then re-run. - Mic permission: Windows Settings → Privacy & security → Microphone → enable "Let desktop apps access your microphone". Required.
Verify the riskiest piece (mic capture) first:
claudedo test-audio
Usage
Run it in a terminal you watch — that's the product. You launch claudedo start, it does a quick mic check, then drops into a visible listen loop that prints
heard → matched → sent for every utterance. That terminal is your
recognition/action console; you attach to the claude-<name> session in another pane
to watch the keystrokes land. There is no backgrounding/daemon mode — the whole point
is the console you read.
claudedo start # mic-check, then the visible listen loop (listen mode default)
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo start --skip-audio-check # skip the pre-listen mic check
claudedo status # running? mode? target session?
claudedo stop # stop a running daemon
claudedo set <name> # set the sticky target -> claude-<name> (alias: switch)
claudedo unset # clear the sticky target
claudedo list # list running claude-* sessions
claudedo test-audio # verify the mic capture path
Modes
- listen (default) — continuous capture; only acts on utterances that start with a wake phrase; all other speech is transcribed locally and discarded instantly. This is the hands-free path and works while a game is focused, because the trigger is your voice over the mic bridge — not a keyboard hook.
- ptt — push-to-talk. Desk-only: it captures only while the daemon's own
terminal window is focused. There is deliberately no global hotkey — a
system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for,
and
claudedorefuses to install one. For hands-free-while-gaming, use listen mode. (Terminals don't deliver key-up events, so PTT is press-to-start / press-to-stop in the daemon window, not literal hold.)
Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt".
Command grammar
Wake phrases (listen mode), fuzzy-matched: "claudedo", "hey claude". "claudedo" is a coined word, so the matcher is lenient (accepts "claude do", "clauddo", "cloud do", …). In PTT mode the wake phrase is optional.
| Say | Does |
|---|---|
yes / no |
answer a yes/no prompt |
one / two / three / four |
pick numbered option 1–4 |
approve / deny |
allow / deny a permission prompt |
send / enter |
submit (Enter) |
type <phrase> |
insert literal text, no submit (read-before-send; say "send") |
space [<n>] |
insert n spaces (default 1) |
backspace [<n>] (alias delete) |
delete n chars (default 1), capped at the last submit boundary |
erase (alias clear/wipe) |
delete everything typed since the last submit/boundary |
mode ptt / mode listen |
switch input mode |
set <name> (alias sticky/switch) |
set the sticky target → claude-<name> (persists) |
target <name> <command> |
one-shot override: run that command on claude-<name> for this utterance only; sticky default unchanged |
unset (alias unsticky) |
clear the sticky target |
list |
list running claude-* sessions to the daemon console |
cancel / escape |
back out of a prompt |
Optional filler (select / use / choose) may precede any command and is ignored:
select yes and use yes behave like yes. (select 1 is still the select command.)
When no sticky target is set, a bare command auto-targets the only running
claude-* session; if several are running it does nothing and asks you to set one.
Number words are normalized to digits before matching ("one"/"won" → 1).
Targeting
~/.claude-active holds the sticky target session name (e.g.
claude-rethink-public). The cc kit writes this file when you attach, and
claudedo set <name> (alias sticky/switch) overwrites it; unset clears it.
A target <name> voice command is a one-shot that does NOT touch the sticky
default — it routes a single command and the next bare command reverts to sticky.
Resolution order (one place — target.resolve()): one-shot if present →
sticky if set and the session exists → else the only running claude-* session →
else (zero or several) do nothing and say so. It never guesses, and never injects
into a nonexistent session.
Every name maps to claude-<name> through one helper (target.session_name()), and
the cc kit mirrors it exactly — so cc libs (shell) and set libs (voice) refer
to the same session claude-libs. The name is your stable, speakable handle:
because the kit forces an explicit name (no basename guessing), you always know the
exact word to say.
The cc kit lives in ~/.config/claudedo/cc.sh (sourced from your rc; works under
bash and zsh). Every command requires an explicit name:
cc <name> # attach/create claude-<name>; writes ~/.claude-active
ccr <name> # re-attach an existing claude-<name> only
ccl # list claude-* sessions
cck <name> # kill claude-<name>
cckl # kill all claude-* sessions
The confirmed Claude Code keymap
The keystrokes in keys.py were confirmed empirically
against a live claude v2.1.191 session (not assumed):
- Numbered prompts (trust prompt, permission prompt): pressing the bare digit selects and confirms immediately — no trailing Enter.
- Arrow keys move the highlight without acting; Enter then confirms (modeled as an alternative sequence).
- Permission prompt is
1. Yes / 2. Yes, and don't ask again / 3. No; Escape cancels. - Literal text goes in via
send-keys -l(no submit); a bare Enter submits.
If Claude Code changes its prompt UI, re-confirm against a live session and update
keys.py — it is the single source of truth.
Config
Everything tunable lives in config.toml: wake phrases, mode + PTT
key, Whisper model/language/device, audio segmentation thresholds, and [behavior]
(type_autosend, filler_words, auto_target, print_heard). The default model is
small; bump to medium if the coined wake word is recognized poorly. claudedo -c <path> ... points at a specific config; otherwise it searches $CLAUDEDO_CONFIG,
~/.config/claudedo/config.toml, then ./config.toml.
auto_target(defaultfalse): with no sticky target set and exactly oneclaude-*session running,falsemakes a bare command do nothing and ask you tosetone;trueauto-targets that single session.print_heard(defaultfalse, debug): prints non-wake transcripts to the console so you can see how Whisper renders your wake word. Turn it on to debug detection, then off. Whisper has no token for "claudedo" — it commonly emits "claude do" or "claude due", both of which are in the default wake list.
Requirements
Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the claude CLI, and
either bash or zsh (the cc kit supports both).