add a detached-only session cleanup in BOTH surfaces — the cc shell kit and the
claudedo daemon — so stale detached claude-* sessions can be cleared from either.
- cc.sh: ccclean kills DETACHED claude-* sessions only (tmux #{session_attached}==0),
never attached; reports 'killed X, Y (2 detached); kept Z (attached)' or 'nothing to
clean'. complements cckl (kill ALL incl attached), which stays the deliberate typed
nuke. header updated; sources clean under bash + zsh.
- target.py: cleanup_detached() kills detached claude-* and returns (killed, kept)
lists. it and list_sessions() now share ONE tmux query, _claude_sessions(), which
returns (name, attached) pairs — single source for session enumeration.
- grammar: cleanup command (aliases detached/detach) routes to Action('system',
'cleanup') — daemon-control, never injects. bare 'cleanup' and 'system cleanup' both
accepted. 'clean'/'wipe' deliberately NOT used as aliases — they fuzzy-collide with
erase's 'clear'/'wipe' (0.8 ratio); 'detached' is distinct. confirm command added for
the opt-in confirm flow.
- daemon: system 'cleanup' -> _do_cleanup -> target.cleanup_detached, reports
'[SYSTEM] cleanup: killed ...; kept ... (attached)'. behavior.cleanup_confirm
(default false) announces and waits for a following 'confirm' before killing.
- CLI: 'claudedo cleanup' (self-contained tmux op, no running daemon needed).
safety model: detached-only means a misheard voice cleanup can NEVER kill the active
(attached) session. the only kill-attached path remains the shell cckl.
Signed-off-by: disqualifier <dev@disqualifier.me>
16 KiB
claudedo
Voice control for Claude Code on WSL2.
claudedo listens on your mic, runs local speech-to-text, recognizes a wake
phrase plus a small command grammar, and injects the matching keystrokes into your
active Claude Code tmux session via tmux send-keys. You answer Claude Code's
prompts ("yes", "option one", "approve") and dictate prompts by voice — including
hands-free while another window (a game) is focused.
It exists because Claude Code's native /voice is hardcoded-blocked in WSL (it
assumes WSL has no audio). Modern WSL2 + WSLg does have working mic input via
PulseAudio/RDP. claudedo captures the mic itself, transcribes on-device, and drives
Claude Code over tmux — fully local and private. You run it in a terminal you watch.
How it works
mic (WSLg/PulseAudio RDPSource)
-> sounddevice capture
-> faster-whisper (local STT, on-device)
-> wake gate: utterance must start with a wake phrase, else DISCARD locally
-> grammar match (yes/no/one..four/approve/deny/send/type/space/backspace/erase/
mode/set/target/unset/list/context/reload/system/cancel)
-> resolve target session (one-shot > sticky ~/.claude-active > auto/none)
-> tmux send-keys -t <session> "<keys>"
-> log the action to the watched terminal ([session]/[SYSTEM]/[VOICE], colored)
Privacy by construction. STT runs on-device. In listen mode, any speech that doesn't start with a wake phrase is dropped the instant it's transcribed — never stored, never sent anywhere. That's what makes always-listening acceptable while you're on voice comms in a game.
Injection is PTY-only. claudedo only ever calls tmux send-keys. It never uses
OS-level keyboard input and installs no system-wide keyboard hook. Keystrokes are
text into a Linux pseudo-terminal — they work regardless of which window is focused
and never touch Windows input or a game/anticheat's view.
Install
git clone <repo> claudedo && cd claudedo
./install.sh
install.sh is idempotent. It installs the WSL audio deps, writes the ~/.asoundrc
Pulse shim, verifies the mic path, pip-installs the package, primes the Whisper
model, and installs the cc kit (~/.config/claudedo/cc.sh, sourced from every
~/.zshrc/~/.bashrc you have). It also checks the two Windows-side bits it can't
automate and tells you to fix them:
- WSLg present (
/mnt/wslg/PulseServer). If missing:wsl --updatein Windows, thenwsl --shutdown, then re-run. - Mic permission: Windows Settings → Privacy & security → Microphone → enable "Let desktop apps access your microphone". Required.
Verify the riskiest piece (mic capture) first:
claudedo test-audio
Usage
Run it in a terminal you watch — that's the product. You launch claudedo start and it drops into a visible listen loop (pass --check to run a mic check
first). Each utterance prints a timestamped, colored line — HH:MM:SS [claude-libs] heard "…" → typed 'fix' (green for injected, red for drops, [SYSTEM]/[VOICE] for state and
recognition). That terminal is your recognition/action console; you attach to the
claude-<name> session in another pane to watch the keystrokes land. It runs in the
foreground by design — the console is the point — though claudedo stop can signal a
stray instance.
claudedo start # the visible listen loop (listen mode default; no mic check)
claudedo start --check # run a mic check before listening
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo status # running? mode? target session?
claudedo stop # stop a running daemon
claudedo reload # reload config.toml + contexts.toml in a running daemon
claudedo set <name> # set the sticky target -> claude-<name> (alias: switch)
claudedo unset # clear the sticky target
claudedo list # list running claude-* sessions
claudedo cleanup # kill DETACHED claude-* sessions (never attached)
claudedo test-audio # verify the mic capture path
claudedo test-tone # play each earcon (verify the audio-OUT path)
Modes
- listen (default) — continuous capture; only acts on utterances that start with a wake phrase; all other speech is transcribed locally and discarded instantly. This is the hands-free path and works while a game is focused, because the trigger is your voice over the mic bridge — not a keyboard hook.
- ptt — push-to-talk. Desk-only: it captures only while the daemon's own
terminal window is focused. There is deliberately no global hotkey — a
system-wide keyboard hook is the keylogger/cheat silhouette anticheats watch for,
and
claudedorefuses to install one. For hands-free-while-gaming, use listen mode. (Terminals don't deliver key-up events, so PTT is press-to-start / press-to-stop in the daemon window, not literal hold.)
Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt".
Command grammar
Wake phrases (listen mode), fuzzy-matched. The default list is "claudedo",
"claude do", "hey claude", "ok claude", "okay claude" — Whisper has
no token for the coined word "claudedo" and renders it as real words ("claude do"),
so that spelling is listed explicitly. Matching is lenient (case/space-insensitive).
Add the spellings you actually see (turn on print_heard to find them). In PTT mode
the wake phrase is optional. When a command's wake phrase matched loosely (e.g. you
said "okay clouds"), the heard line notes which phrase it assumed —
heard "okay clouds list" -> LIST (wake: okay claude).
| Say | Does |
|---|---|
yes / no |
answer a yes/no prompt |
one / two / three / four |
pick numbered option 1–4 |
approve / deny |
allow / deny a permission prompt |
send / enter |
submit (Enter) |
type <phrase> |
insert literal text, no submit (read-before-send; say "send") |
space [<n>] (also add [a] space, insert <n> spaces) |
insert n spaces (default 1) |
backspace [<n>] (alias delete) |
delete n chars (default 1), capped at the last submit boundary |
erase (alias clear/wipe) |
delete everything typed since the last submit/boundary |
debug <text> (alias echo) |
just print what you said to the console (test wake/STT; injects nothing) |
mode ptt / mode listen |
switch input mode |
set <name> (alias sticky/switch) |
set the sticky target → claude-<name> (persists) |
target <name> <command> |
one-shot override: run that command on claude-<name> for this utterance only; sticky default unchanged |
unset (alias unsticky) |
clear the sticky target |
list |
list running claude-* sessions to the daemon console |
context <name> <instruction> (alias prepare) |
inject a contexts.toml blurb as a preamble + the dictated instruction, then wait (no submit — say "send") |
reload |
re-read config.toml + contexts.toml live (no daemon restart, model stays loaded) |
system status |
print mode / target / model / context count to the console (daemon-control; never injects) |
system reload [config|contexts] |
reload one or both config files |
cleanup (alias detached/detach, also system cleanup) |
kill detached claude-* sessions only — never an attached one |
commands (alias help/menu) |
print the voice-command menu to the console |
customs (alias custom) |
list the loaded context names |
version |
print the claudedo version to the console |
cancel / escape |
back out of a prompt |
Optional filler (select / use / choose) may precede any command and is ignored:
select yes and use yes behave like yes. (select 1 is still the select command.)
When no sticky target is set, a bare command does nothing and asks you to set one
(the default). Set auto_target = true to instead auto-use the single running
claude-* session when there's exactly one; with several running it always does
nothing and asks you to set one.
Number words are normalized to digits before matching ("one"/"won" → 1).
Targeting
~/.claude-active holds the sticky target session name (e.g.
claude-rethink-public). The cc kit writes this file when you attach, and
claudedo set <name> (alias sticky/switch) overwrites it; unset clears it.
A target <name> voice command is a one-shot that does NOT touch the sticky
default — it routes a single command and the next bare command reverts to sticky.
Resolution order (one place — target.resolve()): one-shot if present →
sticky if set and the session exists → else, only if auto_target = true, the single
running claude-* session → else (default, or zero/several sessions) do nothing and
say so. It never guesses, and never injects into a nonexistent session.
Every name maps to claude-<name> through one helper (target.session_name()), and
the cc kit mirrors it exactly — so cc libs (shell) and set libs (voice) refer
to the same session claude-libs. The name is your stable, speakable handle:
because the kit forces an explicit name (no basename guessing), you always know the
exact word to say.
The cc kit lives in ~/.config/claudedo/cc.sh (sourced from your rc; works under
bash and zsh). Every command requires an explicit name:
cc <name> # attach/create claude-<name>; writes ~/.claude-active
ccr <name> # re-attach an existing claude-<name> only
ccl # list claude-* sessions
cck <name> # kill claude-<name>
ccclean # kill DETACHED claude-* sessions only (never attached) — safe cleanup
cckl # kill ALL claude-* sessions (including attached)
Contexts (named reference blurbs)
contexts.toml holds named reference snippets you can inject ahead of a dictated
instruction with the context <name> <instruction> voice command (alias
prepare). It lives next to config.toml
($CLAUDEDO_CONTEXTS → ~/.config/claudedo/contexts.toml → ./contexts.toml); a
missing file just means no contexts (the feature is opt-in).
[contexts]
webhooks = "discord webhooks — test: <url> (safe to spam), live: <url> (real, careful)"
testing = "use the test/staging resources only, never touch prod"
Saying context webhooks send a test message injects the webhooks blurb as a
preamble, then the dictated instruction, and waits — nothing is auto-submitted. You
say send to submit (read-before-send; Claude's own permission prompt is the
backstop for anything consequential). A bare context webhooks injects just the blurb.
One context per command (no stacking yet); an unknown name announces and injects
nothing.
Names are spoken and fuzzy-matched, so keep them simple and distinct — they're
looked up on a despaced/lowercased key, so web hooks / web-hooks / webhooks all
resolve the same block. Assembly is config-gated: behavior.context_multiline (default
true) puts the blurb and instruction on separate lines via a Shift+Enter soft newline;
set it false to flatten onto one line with context_separator (default " — ") if
Shift+Enter is unreliable in your terminal.
Edit contexts.toml, then say reload (or run claudedo reload) — it re-reads
config.toml and contexts.toml live without restarting the daemon or reloading the
Whisper model. The system namespace gives daemon-control by voice without touching
Claude: system status (mode / target / model / context count) and system reload [config|contexts].
Earcons (audio feedback tones)
Short confirmation tones play on key events so you get eyes-free feedback — "did it hear me?" — without watching the terminal. They're tones, not speech (not TTS): a bright blip when a command is accepted/injected, a low buzz when nothing matched, a rising chime on submit, and an optional blip on wake. Tones are short (<300ms) and quiet, and they're additive to the console feed — mute them and read at the desk, or hear them eyes-free.
Verify the audio-OUT path (the reverse of test-audio, and the less-tested direction on
WSLg) with:
claudedo test-tone # plays each tone through WSLg — the audio-out gate
Tones play through WSLg's PulseAudio sink, paplay-first (a separate process, so it
doesn't contend with the sounddevice mic stream), falling back to in-process sounddevice,
then powershell.exe on the Windows host. Playback is fire-and-forget: a dead speaker
or a missing tone file logs once and is ignored — audio-out can never block or break a
command (claudedo yes injects whether or not the speaker works).
Configure under [sound]: enabled (master, default on), per-event on_wake (default
off — a blip right before you speak can bleed into the command capture, and it's
chatty), on_accept / on_no_match / on_submit (default on), and volume (0.0–1.0,
best-effort — scaled for sounddevice, --volume for paplay, ignored by the PowerShell
fallback). A [sound.files] table can point any event at your own .wav. The shipped
tones live in the package (claudedo/sounds/*.wav); claudedo/sounds/generate.py is a
synthetic-beep fallback that can regenerate a placeholder set (it does not reproduce
the shipped tones — running it overwrites them with plain beeps).
The confirmed Claude Code keymap
The keystrokes in keys.py were confirmed empirically
against a live claude v2.1.191 session (not assumed):
- Numbered prompts (trust prompt, permission prompt): pressing the bare digit selects and confirms immediately — no trailing Enter.
- Arrow keys move the highlight without acting; Enter then confirms (modeled as an alternative sequence).
- Permission prompt is
1. Yes / 2. Yes, and don't ask again / 3. No; Escape cancels. - Literal text goes in via
send-keys -l(no submit); a bare Enter submits.
If Claude Code changes its prompt UI, re-confirm against a live session and update
keys.py — it is the single source of truth.
Config
Everything tunable lives in config.toml: wake phrases, mode + PTT
key, Whisper model/language/device, [vad] endpointing, and [behavior]
(type_autosend, fuzzy thresholds, filler_words, auto_target, print_heard).
The default model is small.en (the English-only small model — ~1s/command on a
strong CPU, more accurate on English than multilingual small at the same speed);
medium/medium.en are more accurate but ~3× slower (noticeable lag), base.en is
snappier/less accurate, large-v3 most accurate/slowest. Every heard line shows the
STT latency as (<ms>/<audio>s) so you can see what a model change costs. VAD
endpointing ends a capture after [vad].silence_ms (700) of trailing silence, capped
at max_seconds (15). claudedo -c <path> ... points at a specific config; otherwise
it searches
$CLAUDEDO_CONFIG, ~/.config/claudedo/config.toml, then ./config.toml.
- STT biasing. The transcriber is seeded with an
initial_promptbuilt from the configured wake phrases + command vocabulary (one source —grammar.vocabulary()), so Whisper is conditioned to expect "claudedo" and the command words. - Split fuzzy thresholds.
wake_fuzzy_threshold(default0.65, lenient) vscommand_fuzzy_threshold(default0.8, tight). The asymmetry is deliberate: a false wake is cheap (it wakes, finds no command, does nothing), but a false command fires the wrong action. Prefer expanding command synonyms over loosening the command threshold. [vad]endpointing. Capture starts on speech and ends aftersilence_ms(default 700) of trailing silence — Alexa-style record-until-pause — capped atmax_seconds(default 15). The pause both ends a command and separates it from following chatter (the chatter is a separate capture the wake gate discards).auto_target(defaultfalse): with no sticky target and one session running,falsedoes nothing and asks you toset;trueauto-uses that session.print_heard(defaultfalse, debug): prints non-wake transcripts so you can see how Whisper renders your wake word, then tune the wake list/threshold.context_multiline(defaulttrue) /context_separator(default" — "): how thecontextcommand assembles the blurb and instruction — a Shift+Enter soft newline between them, or (whenfalse) flattened onto one line with the separator.
Requirements
Windows 11 + WSL2 (Ubuntu) with WSLg, Python 3.10+, tmux, the claude CLI, and
either bash or zsh (the cc kit supports both).