diff --git a/README.md b/README.md index 482ff8f..5264cd7 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ hands-free while another window (a game) is focused. It exists because Claude Code's native `/voice` is hardcoded-blocked in WSL (it assumes WSL has no audio). Modern WSL2 + WSLg *does* have working mic input via PulseAudio/RDP. `claudedo` captures the mic itself, transcribes on-device, and drives -Claude Code over tmux — fully local, private, backgroundable. +Claude Code over tmux — fully local and private. You run it in a terminal you watch. ## How it works @@ -20,9 +20,11 @@ mic (WSLg/PulseAudio RDPSource) -> sounddevice capture -> faster-whisper (local STT, on-device) -> wake gate: utterance must start with a wake phrase, else DISCARD locally - -> grammar match (yes/no/one..four/approve/deny/send/type/mode/set/target/cancel) - -> resolve target session (~/.claude-active) + -> grammar match (yes/no/one..four/approve/deny/send/type/space/backspace/erase/ + mode/set/target/unset/list/cancel) + -> resolve target session (one-shot > sticky ~/.claude-active > auto/none) -> tmux send-keys -t "" + -> log the action to the watched terminal ([session]/[SYSTEM]/[VOICE], colored) ``` **Privacy by construction.** STT runs on-device. In listen mode, any speech that @@ -62,11 +64,13 @@ claudedo test-audio ## Usage **Run it in a terminal you watch — that's the product.** You launch `claudedo -start`, it does a quick mic check, then drops into a visible listen loop that prints -`heard → matched → sent` for every utterance. That terminal is your -recognition/action console; you attach to the `claude-` session in another pane -to watch the keystrokes land. There is no backgrounding/daemon mode — the whole point -is the console you read. +start`, it does a quick mic check, then drops into a visible listen loop. Each +utterance prints a timestamped, colored line — `HH:MM:SS [claude-libs] heard "…" → +typed 'fix'` (green for injected, red for drops, `[SYSTEM]`/`[VOICE]` for state and +recognition). That terminal is your recognition/action console; you attach to the +`claude-` session in another pane to watch the keystrokes land. It runs in the +foreground by design — the console is the point — though `claudedo stop` can signal a +stray instance. ```bash claudedo start # mic-check, then the visible listen loop (listen mode default) @@ -97,9 +101,12 @@ Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt". ## Command grammar -Wake phrases (listen mode), fuzzy-matched: **"claudedo"**, **"hey claude"**. -"claudedo" is a coined word, so the matcher is lenient (accepts "claude do", -"clauddo", "cloud do", …). In PTT mode the wake phrase is optional. +Wake phrases (listen mode), fuzzy-matched. The default list is **"claudedo"**, +**"claude do"**, **"claude due"**, **"hey claude"**, **"ok claude"**, **"okay +claude"** — Whisper has no token for the coined word "claudedo" and renders it as +real words ("claude do"/"claude due"), so those spellings are listed explicitly. +Matching is lenient (case/space-insensitive). Add the spellings you actually see +(turn on `print_heard` to find them). In PTT mode the wake phrase is optional. | Say | Does | |---|---| @@ -121,8 +128,10 @@ Wake phrases (listen mode), fuzzy-matched: **"claudedo"**, **"hey claude"**. Optional filler (`select` / `use` / `choose`) may precede any command and is ignored: `select yes` and `use yes` behave like `yes`. (`select 1` is still the select command.) -When no sticky target is set, a bare command auto-targets the **only** running -`claude-*` session; if several are running it does nothing and asks you to `set` one. +When no sticky target is set, a bare command does nothing and asks you to `set` one +(the default). Set `auto_target = true` to instead auto-use the single running +`claude-*` session when there's exactly one; with several running it always does +nothing and asks you to `set` one. Number words are normalized to digits before matching ("one"/"won" → 1). @@ -135,9 +144,9 @@ A `target ` voice command is a **one-shot** that does NOT touch the sticky default — it routes a single command and the next bare command reverts to sticky. Resolution order (one place — `target.resolve()`): one-shot if present → -sticky if set and the session exists → else the only running `claude-*` session → -else (zero or several) do nothing and say so. It never guesses, and never injects -into a nonexistent session. +sticky if set and the session exists → else, only if `auto_target = true`, the single +running `claude-*` session → else (default, or zero/several sessions) do nothing and +say so. It never guesses, and never injects into a nonexistent session. Every name maps to `claude-` through one helper (`target.session_name()`), and the cc kit mirrors it exactly — so `cc libs` (shell) and `set libs` (voice) refer