Compare commits

..

No commits in common. "main" and "v0.1.1" have entirely different histories.
main ... v0.1.1

26 changed files with 132 additions and 1589 deletions

1
.gitignore vendored
View File

@ -1,5 +1,4 @@
CLAUDE.md
COMPACT.md
__pycache__/
*.pyc

166
README.md
View File

@ -11,7 +11,7 @@ hands-free while another window (a game) is focused.
It exists because Claude Code's native `/voice` is hardcoded-blocked in WSL (it
assumes WSL has no audio). Modern WSL2 + WSLg *does* have working mic input via
PulseAudio/RDP. `claudedo` captures the mic itself, transcribes on-device, and drives
Claude Code over tmux — fully local and private. You run it in a terminal you watch.
Claude Code over tmux — fully local, private, backgroundable.
## How it works
@ -20,11 +20,9 @@ mic (WSLg/PulseAudio RDPSource)
-> sounddevice capture
-> faster-whisper (local STT, on-device)
-> wake gate: utterance must start with a wake phrase, else DISCARD locally
-> grammar match (yes/no/one..four/approve/deny/send/type/space/backspace/erase/
mode/set/target/unset/list/context/reload/system/cancel)
-> resolve target session (one-shot > sticky ~/.claude-active > auto/none)
-> grammar match (yes/no/one..four/approve/deny/send/type/mode/set/target/cancel)
-> resolve target session (~/.claude-active)
-> tmux send-keys -t <session> "<keys>"
-> log the action to the watched terminal ([session]/[SYSTEM]/[VOICE], colored)
```
**Privacy by construction.** STT runs on-device. In listen mode, any speech that
@ -64,28 +62,22 @@ claudedo test-audio
## Usage
**Run it in a terminal you watch — that's the product.** You launch `claudedo
start` and it drops into a visible listen loop (pass `--check` to run a mic check
first). Each utterance prints a timestamped, colored line — `HH:MM:SS [claude-libs]
heard "…" →
typed 'fix'` (green for injected, red for drops, `[SYSTEM]`/`[VOICE]` for state and
recognition). That terminal is your recognition/action console; you attach to the
`claude-<name>` session in another pane to watch the keystrokes land. It runs in the
foreground by design — the console is the point — though `claudedo stop` can signal a
stray instance.
start`, it does a quick mic check, then drops into a visible listen loop that prints
`heard → matched → sent` for every utterance. That terminal is your
recognition/action console; you attach to the `claude-<name>` session in another pane
to watch the keystrokes land. There is no backgrounding/daemon mode — the whole point
is the console you read.
```bash
claudedo start # the visible listen loop (listen mode default; no mic check)
claudedo start --check # run a mic check before listening
claudedo start # mic-check, then the visible listen loop (listen mode default)
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo start --skip-audio-check # skip the pre-listen mic check
claudedo status # running? mode? target session?
claudedo stop # stop a running daemon
claudedo reload # reload config.toml + contexts.toml in a running daemon
claudedo set <name> # set the sticky target -> claude-<name> (alias: switch)
claudedo unset # clear the sticky target
claudedo list # list running claude-* sessions
claudedo cleanup # kill DETACHED claude-* sessions (never attached)
claudedo test-audio # verify the mic capture path
claudedo test-tone # play each earcon (verify the audio-OUT path)
```
### Modes
@ -105,14 +97,9 @@ Switch at runtime by voice: "claudedo mode listen" / "claudedo mode ptt".
## Command grammar
Wake phrases (listen mode), fuzzy-matched. The default list is **"claudedo"**,
**"claude do"**, **"hey claude"**, **"ok claude"**, **"okay claude"** — Whisper has
no token for the coined word "claudedo" and renders it as real words ("claude do"),
so that spelling is listed explicitly. Matching is lenient (case/space-insensitive).
Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode
the wake phrase is optional. When a command's wake phrase matched loosely (e.g. you
said "okay clouds"), the heard line notes which phrase it assumed —
`heard "okay clouds list" -> LIST (wake: okay claude)`.
Wake phrases (listen mode), fuzzy-matched: **"claudedo"**, **"hey claude"**.
"claudedo" is a coined word, so the matcher is lenient (accepts "claude do",
"clauddo", "cloud do", …). In PTT mode the wake phrase is optional.
| Say | Does |
|---|---|
@ -121,32 +108,18 @@ said "okay clouds"), the heard line notes which phrase it assumed —
| `approve` / `deny` | allow / deny a permission prompt |
| `send` / `enter` | submit (Enter) |
| `type <phrase>` | insert literal text, **no** submit (read-before-send; say "send") |
| `space [<n>]` (also `add [a] space`, `insert <n> spaces`) | insert n spaces (default 1) |
| `backspace [<n>]` (alias `delete`) | delete n chars (default 1), capped at the last submit boundary |
| `erase` (alias `clear`/`wipe`) | delete everything typed since the last submit/boundary |
| `debug <text>` (alias `echo`) | just print what you said to the console (test wake/STT; injects nothing) |
| `mode ptt` / `mode listen` | switch input mode |
| `set <name>` (alias `sticky`/`switch`) | set the **sticky** target → `claude-<name>` (persists) |
| `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged |
| `unset` (alias `unsticky`) | clear the sticky target |
| `list` | list running `claude-*` sessions to the daemon console |
| `context <name> <instruction>` (alias `prepare`) | inject a `contexts.toml` blurb as a preamble + the dictated instruction, then **wait** (no submit — say "send") |
| `reload` | re-read `config.toml` + `contexts.toml` live (no daemon restart, model stays loaded) |
| `system status` | print mode / target / model / context count to the console (daemon-control; never injects) |
| `system reload [config\|contexts]` | reload one or both config files |
| `cleanup` (alias `detached`/`detach`, also `system cleanup`) | kill **detached** `claude-*` sessions only — never an attached one |
| `commands` (alias `help`/`menu`) | print the voice-command menu to the console |
| `customs` (alias `custom`) | list the loaded context names |
| `version` | print the claudedo version to the console |
| `cancel` / `escape` | back out of a prompt |
Optional filler (`select` / `use` / `choose`) may precede any command and is ignored:
`select yes` and `use yes` behave like `yes`. (`select 1` is still the select command.)
When no sticky target is set, a bare command does nothing and asks you to `set` one
(the default). Set `auto_target = true` to instead auto-use the single running
`claude-*` session when there's exactly one; with several running it always does
nothing and asks you to `set` one.
When no sticky target is set, a bare command auto-targets the **only** running
`claude-*` session; if several are running it does nothing and asks you to `set` one.
Number words are normalized to digits before matching ("one"/"won" → 1).
@ -159,9 +132,9 @@ A `target <name>` voice command is a **one-shot** that does NOT touch the sticky
default — it routes a single command and the next bare command reverts to sticky.
Resolution order (one place — `target.resolve()`): one-shot if present →
sticky if set and the session exists → else, only if `auto_target = true`, the single
running `claude-*` session → else (default, or zero/several sessions) do nothing and
say so. It never guesses, and never injects into a nonexistent session.
sticky if set and the session exists → else the only running `claude-*` session →
else (zero or several) do nothing and say so. It never guesses, and never injects
into a nonexistent session.
Every name maps to `claude-<name>` through one helper (`target.session_name()`), and
the cc kit mirrors it exactly — so `cc libs` (shell) and `set libs` (voice) refer
@ -177,74 +150,9 @@ cc <name> # attach/create claude-<name>; writes ~/.claude-active
ccr <name> # re-attach an existing claude-<name> only
ccl # list claude-* sessions
cck <name> # kill claude-<name>
ccclean # kill DETACHED claude-* sessions only (never attached) — safe cleanup
cckl # kill ALL claude-* sessions (including attached)
cckl # kill all claude-* sessions
```
## Contexts (named reference blurbs)
`contexts.toml` holds named reference snippets you can inject ahead of a dictated
instruction with the **`context <name> <instruction>`** voice command (alias
`prepare`). It lives next to `config.toml`
(`$CLAUDEDO_CONTEXTS` → `~/.config/claudedo/contexts.toml``./contexts.toml`); a
missing file just means no contexts (the feature is opt-in).
```toml
[contexts]
webhooks = "discord webhooks — test: <url> (safe to spam), live: <url> (real, careful)"
testing = "use the test/staging resources only, never touch prod"
```
Saying `context webhooks send a test message` injects the `webhooks` blurb as a
preamble, then the dictated instruction, and **waits** — nothing is auto-submitted. You
say `send` to submit (**read-before-send**; Claude's own permission prompt is the
backstop for anything consequential). A bare `context webhooks` injects just the blurb.
One context per command (no stacking yet); an unknown name announces and injects
nothing.
Names are **spoken and fuzzy-matched**, so keep them simple and distinct — they're
looked up on a despaced/lowercased key, so `web hooks` / `web-hooks` / `webhooks` all
resolve the same block. Assembly is config-gated: `behavior.context_multiline` (default
`true`) puts the blurb and instruction on separate lines via a Shift+Enter soft newline;
set it `false` to flatten onto one line with `context_separator` (default `" — "`) if
Shift+Enter is unreliable in your terminal.
Edit `contexts.toml`, then say **`reload`** (or run `claudedo reload`) — it re-reads
`config.toml` and `contexts.toml` live without restarting the daemon or reloading the
Whisper model. The **`system`** namespace gives daemon-control by voice without touching
Claude: `system status` (mode / target / model / context count) and `system reload
[config|contexts]`.
## Earcons (audio feedback tones)
Short confirmation tones play on key events so you get **eyes-free feedback** — "did it
hear me?" — without watching the terminal. They're tones, not speech (not TTS): a bright
blip when a command is accepted/injected, a low buzz when nothing matched, a rising chime
on submit, and an optional blip on wake. Tones are short (<300ms) and quiet, and they're
**additive** to the console feed — mute them and read at the desk, or hear them eyes-free.
Verify the audio-OUT path (the reverse of `test-audio`, and the less-tested direction on
WSLg) with:
```bash
claudedo test-tone # plays each tone through WSLg — the audio-out gate
```
Tones play through WSLg's PulseAudio sink, **paplay-first** (a separate process, so it
doesn't contend with the sounddevice mic stream), falling back to in-process sounddevice,
then `powershell.exe` on the Windows host. Playback is **fire-and-forget**: a dead speaker
or a missing tone file logs once and is ignored — audio-out can never block or break a
command (`claudedo yes` injects whether or not the speaker works).
Configure under `[sound]`: `enabled` (master, default on), per-event `on_wake` (default
**off** — a blip right before you speak can bleed into the command capture, and it's
chatty), `on_accept` / `on_no_match` / `on_submit` (default on), and `volume` (0.01.0,
best-effort — scaled for sounddevice, `--volume` for paplay, ignored by the PowerShell
fallback). A `[sound.files]` table can point any event at your own `.wav`. The shipped
tones live in the package (`claudedo/sounds/*.wav`); `claudedo/sounds/generate.py` is a
synthetic-beep fallback that can regenerate a placeholder set (it does **not** reproduce
the shipped tones — running it overwrites them with plain beeps).
## The confirmed Claude Code keymap
The keystrokes in [`keys.py`](src/claudedo/keys.py) were confirmed **empirically**
@ -263,37 +171,11 @@ If Claude Code changes its prompt UI, re-confirm against a live session and upda
## Config
Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]`
(`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`).
The default model is **`small.en`** (the English-only small model — ~1s/command on a
strong CPU, more accurate on English than multilingual `small` at the same speed);
`medium`/`medium.en` are more accurate but ~3× slower (noticeable lag), `base.en` is
snappier/less accurate, `large-v3` most accurate/slowest. Every `heard` line shows the
STT latency as `(<ms>/<audio>s)` so you can see what a model change costs. VAD
endpointing ends a capture after `[vad].silence_ms` (700) of trailing silence, capped
at `max_seconds` (15). `claudedo -c <path> ...` points at a specific config; otherwise
it searches
`$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then `./config.toml`.
- **STT biasing.** The transcriber is seeded with an `initial_prompt` built from the
configured wake phrases + command vocabulary (one source — `grammar.vocabulary()`),
so Whisper is conditioned to expect "claudedo" and the command words.
- **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.65`, lenient) vs
`command_fuzzy_threshold` (default `0.8`, tight). The asymmetry is deliberate: a
false *wake* is cheap (it wakes, finds no command, does nothing), but a false
*command* fires the wrong action. Prefer expanding command synonyms over loosening
the command threshold.
- **`[vad]` endpointing.** Capture starts on speech and ends after `silence_ms`
(default 700) of trailing silence — Alexa-style record-until-pause — capped at
`max_seconds` (default 15). The pause both ends a command and separates it from
following chatter (the chatter is a separate capture the wake gate discards).
- **`auto_target`** (default `false`): with no sticky target and one session running,
`false` does nothing and asks you to `set`; `true` auto-uses that session.
- **`print_heard`** (default `false`, debug): prints non-wake transcripts so you can
see how Whisper renders your wake word, then tune the wake list/threshold.
- **`context_multiline`** (default `true`) / **`context_separator`** (default `" — "`):
how the `context` command assembles the blurb and instruction — a Shift+Enter soft
newline between them, or (when `false`) flattened onto one line with the separator.
key, Whisper model/language/device, audio segmentation thresholds, and
`type_autosend = false`. The default model is `small`; bump to `medium` if the coined
wake word is recognized poorly. `claudedo -c <path> ...` points at a specific config;
otherwise it searches `$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then
`./config.toml`.
## Requirements

View File

@ -5,7 +5,7 @@
# wake phrases for listen mode. fuzzy-matched: case/space-insensitive, lenient on
# the coined word "claudedo" (whisper renders it inconsistently). number words are
# normalized to digits before command matching.
phrases = ["claudedo", "claude do", "hey claude", "ok claude", "okay claude"]
phrases = ["claudedo", "hey claude"]
[input]
# "listen" (default): continuous capture; only acts on utterances that start with a
@ -21,12 +21,10 @@ mode = "listen"
ptt_key = "space"
[stt]
# faster-whisper model size. "small.en" is the default — the English-only small model
# (~1s/command on a strong cpu, more accurate on english than multilingual "small" at
# the same speed). "medium"/"medium.en" are more accurate but ~3x slower (noticeable
# lag); "large-v3" is most accurate and slowest. drop to "base.en" for max snappiness
# (less accurate). bump only if recognition is poor.
model = "small.en"
# faster-whisper model size. "small" is a good accuracy/latency balance for the
# short command grammar (~sub-second per chunk on a strong cpu). if the coined wake
# word "claudedo" is recognized poorly, bump to "medium" (slower per chunk).
model = "small"
language = "en"
# mic device: "auto", or a sounddevice device index (integer) / substring of a
# device name. run `claudedo test-audio` to list devices.
@ -38,82 +36,22 @@ compute = "auto"
# capture parameters. 16 kHz mono is what whisper expects.
samplerate = 16000
channels = 1
# rms energy below this counts as silence (the VAD onset/endpoint floor).
# listen-mode silence segmentation: an utterance ends after this many seconds below
# the rms threshold. keeps latency low without streaming.
silence_threshold = 0.012
silence_duration = 0.8
# ignore utterances shorter than this (clicks, coughs).
min_utterance = 0.3
[vad]
# Alexa-style record-until-pause endpointing (listen mode). capture starts on speech
# onset and ends after this much trailing silence — the natural end of an utterance.
# a real pause both ends the command AND separates it from following chatter (the
# chatter becomes a separate capture that the wake gate then discards).
silence_ms = 700
# hard cap so continuous noise can't record forever (also the ceiling for a long
# dictated `type` phrase).
max_seconds = 15.0
# hard cap on a single utterance so a stuck stream can't grow unbounded.
max_utterance = 15.0
[behavior]
# dictation never auto-submits: "type <phrase>" inserts literal text only; you say
# "send" separately to submit (read-before-send).
type_autosend = false
# fuzzy match ratios (0..1). the asymmetry is deliberate: a false WAKE is cheap (it
# wakes, finds no command, does nothing), so wake is lenient; a false COMMAND fires
# the WRONG action, so commands stay tight. lower = more lenient = more matches.
# prefer expanding command synonyms over loosening command_fuzzy_threshold.
wake_fuzzy_threshold = 0.65
command_fuzzy_threshold = 0.8
# fuzzy match ratio (0..1) required to accept a wake phrase / command token.
match_threshold = 0.8
# optional filler words that may precede a command and are ignored for matching:
# "select yes" / "use yes" behave like "yes". (a filler word followed by a digit is
# the select command, e.g. "select 1", and is not dropped.)
filler_words = ["select", "use", "choose"]
# when no sticky target is set and exactly ONE claude-* session is running:
# false (default) -> require an explicit `set <name>` or one-shot `target <name>`;
# a bare command does nothing and tells you to set one.
# true -> auto-target that single session (convenience).
auto_target = false
# DEBUG ONLY — relaxes the privacy invariant. when true, the daemon console prints
# the raw transcript of EVERY utterance, including non-wake speech it would otherwise
# drop silently (shown as `heard (dropped): "<transcript>"`). use it to see exactly
# how Whisper renders your wake word, then turn it OFF. default false: non-wake speech
# is discarded without ever printing the transcript.
print_heard = false
# how the `context <name> <dictation>` command assembles the blurb + instruction.
# true (default): blurb, a soft newline (Shift+Enter — needs the extended-keys tmux
# settings install.sh appends), then the instruction. if Shift+Enter is at all flaky
# in your terminal (it submits or does nothing), set false to flatten onto one line
# with context_separator between blurb and instruction — the blank line is cosmetic,
# not worth a submit risk. either way the assembled text is NEVER auto-submitted.
context_multiline = true
# separator inserted between blurb and instruction when context_multiline = false.
context_separator = " — "
# the `cleanup` / `detached` command kills DETACHED claude-* sessions only (never an
# attached one — a misheard cleanup can't nuke the active session). default false:
# kill immediately (it's detached-only, so it's safe). set true to announce the
# detached set and wait for a following `confirm` before killing.
cleanup_confirm = false
[sound]
# earcons — short confirmation tones on daemon events so you get eyes-free feedback
# ("did it hear me?") without watching the terminal. tones are SHORT (<300ms) and quiet;
# they play OUT through WSLg's PulseAudio sink (paplay-first, sounddevice fallback, then
# powershell.exe). additive to the console feed — mute these and read at the desk, or
# hear them eyes-free. a dead speaker never blocks/breaks a command (fire-and-forget).
enabled = true
# blip when a wake phrase is recognized. OFF by default: a blip right before you speak
# the command can bleed into its capture, and it's chatty. turn on only if you want it.
on_wake = false
# positive blip when a command is recognized/injected.
on_accept = true
# distinct lower buzz when nothing matched or the target was missing (did nothing).
on_no_match = true
# rising chime when a send/submit is injected.
on_submit = true
# best-effort 0.0-1.0 (scaled for sounddevice, --volume for paplay; ignored by the
# powershell fallback, which has no volume control).
volume = 0.5
# optional per-event overrides to swap in your own .wav files, e.g.:
# [sound.files]
# accept = "~/sounds/my_accept.wav"
[sound.files]

View File

@ -1,18 +0,0 @@
# claudedo contexts — named reference blurbs you can inject ahead of a dictated
# instruction with the `context <name> <instruction>` voice command (alias `prepare`).
#
# the named blurb is injected as a preamble, then your dictated instruction, and the
# daemon WAITS — nothing is auto-submitted. you say "send" to submit (read-before-send;
# claude's own permission prompt is the backstop for anything consequential).
#
# names are SPOKEN and fuzzy-matched, so keep them simple, distinct, single words
# (a-z, 0-9; spaces/hyphens/underscores are stripped for matching, so "web hooks",
# "web-hooks" and "webhooks" all resolve the same block). values are free-form text.
#
# edit this file, then say "reload" (or run `claudedo reload`) — no daemon restart,
# the whisper model is not reloaded.
[contexts]
webhooks = "discord webhooks — test: <url> (safe to spam), live: <url> (real, careful)"
testing = "use the test/staging resources only, never touch prod"
discord = "discord.py 2.x; bot token in .env as BOT_TOKEN; guild id 12345"

View File

@ -57,14 +57,9 @@ say "verifying audio path"
if pactl info >/dev/null 2>&1; then
DEFAULT_SRC="$(pactl info | sed -n 's/^Default Source: //p')"
echo " Default Source: ${DEFAULT_SRC:-<none>}"
DEFAULT_SINK="$(pactl info | sed -n 's/^Default Sink: //p')"
echo " Default Sink: ${DEFAULT_SINK:-<none>}"
if ! pactl list sources short 2>/dev/null | grep -q RDPSource; then
warn "RDPSource not listed by pactl — mic may not be bridged. check Windows mic permission."
fi
if ! pactl list sinks short 2>/dev/null | grep -q RDPSink; then
warn "RDPSink not listed by pactl — earcon/TTS audio-OUT may not play. run 'claudedo test-tone' to check."
fi
else
warn "pactl info failed — pulseaudio-utils installed but no server reachable yet."
fi
@ -99,30 +94,6 @@ mkdir -p "$CONF_DIR"
install -m 0644 "$REPO_DIR/shell/cc.sh" "$CONF_DIR/cc.sh"
echo " wrote $CONF_DIR/cc.sh"
# install config.toml to the standard location so the daemon finds it from any dir.
# never clobber an edited user config: copy only if absent, else drop a .new to diff.
if [ ! -f "$CONF_DIR/config.toml" ]; then
install -m 0644 "$REPO_DIR/config.toml" "$CONF_DIR/config.toml"
echo " wrote $CONF_DIR/config.toml"
elif ! cmp -s "$REPO_DIR/config.toml" "$CONF_DIR/config.toml"; then
install -m 0644 "$REPO_DIR/config.toml" "$CONF_DIR/config.toml.new"
echo " kept your $CONF_DIR/config.toml; new default written to config.toml.new (diff to merge)"
else
echo " $CONF_DIR/config.toml already current"
fi
# install the contexts.toml template (named blurbs for the `context` voice command).
# same policy: copy only if absent, else drop a .new — never clobber edited contexts.
if [ ! -f "$CONF_DIR/contexts.toml" ]; then
install -m 0644 "$REPO_DIR/contexts.toml" "$CONF_DIR/contexts.toml"
echo " wrote $CONF_DIR/contexts.toml"
elif ! cmp -s "$REPO_DIR/contexts.toml" "$CONF_DIR/contexts.toml"; then
install -m 0644 "$REPO_DIR/contexts.toml" "$CONF_DIR/contexts.toml.new"
echo " kept your $CONF_DIR/contexts.toml; new default written to contexts.toml.new (diff to merge)"
else
echo " $CONF_DIR/contexts.toml already current"
fi
# wire EVERY rc that exists (the user may have both zsh and bash).
wired_any=0
for RC in "$HOME/.zshrc" "$HOME/.bashrc"; do

View File

@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "claudedo"
version = "0.2.2"
version = "0.1.1"
description = "voice-control daemon for claude code (local STT -> tmux send-keys)"
readme = "README.md"
requires-python = ">=3.10"
@ -23,9 +23,6 @@ claudedo = "claudedo.__main__:main"
[tool.setuptools]
package-dir = { "" = "src" }
[tool.setuptools.package-data]
"claudedo.sounds" = ["*.wav"]
[tool.setuptools.packages.find]
where = ["src"]

View File

@ -11,8 +11,7 @@
# ccr <name> reattach only (error if it doesn't exist); writes ~/.claude-active
# ccl list running claude- sessions
# cck <name> kill claude-<name>
# ccclean kill DETACHED claude- sessions only (never attached) — safe cleanup
# cckl kill ALL claude- sessions (including attached)
# cckl kill ALL claude- sessions
cc() {
if [ -z "$1" ]; then
@ -61,34 +60,6 @@ cck() {
fi
}
ccclean() {
killed=""
kept=""
while read -r name attached; do
case "$name" in
claude-*) ;;
*) continue ;;
esac
if [ "$attached" = "0" ]; then
if tmux kill-session -t "$name" 2>/dev/null; then
killed="${killed:+$killed, }$name"
fi
else
kept="${kept:+$kept, }$name"
fi
done <<EOF
$(tmux list-sessions -F '#{session_name} #{session_attached}' 2>/dev/null)
EOF
if [ -z "$killed" ]; then
echo "nothing to clean (no detached sessions)"
else
n=$(printf '%s' "$killed" | awk -F', ' '{print NF}')
msg="killed $killed ($n detached)"
[ -n "$kept" ] && msg="$msg; kept $kept (attached)"
echo "$msg"
fi
}
cckl() {
tmux ls 2>/dev/null | grep '^claude-' | cut -d: -f1 | while read -r s; do
tmux kill-session -t "$s" && echo "killed $s"

View File

@ -1,3 +1,3 @@
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)"""
__version__ = "0.2.2"
__version__ = "0.1.1"

View File

@ -33,12 +33,12 @@ def cmd_start(args: argparse.Namespace) -> int:
config = _load_or_die(args.config)
if args.mode:
config.mode = args.mode
if args.check:
if not args.skip_audio_check:
print("checking mic before listening (speak briefly) ...")
peak = _probe_mic(config, seconds=2.0, verbose=False)
if peak is None or peak < 0.02:
print("mic check failed — no usable input.", file=sys.stderr)
print("run `claudedo test-audio` to debug, or `claudedo start` to skip the check",
print("run `claudedo test-audio` to debug; or `claudedo start --skip-audio-check`",
file=sys.stderr)
return 1
print(f"mic OK (peak {peak:.3f}).")
@ -97,56 +97,6 @@ def cmd_stop(_args: argparse.Namespace) -> int:
return 1
def cmd_test_tone(args: argparse.Namespace) -> int:
config = _load_or_die(args.config)
from . import audio_out, sound
print("== claudedo test-tone ==")
if not audio_out.available():
print("no audio-out backend found (paplay / powershell.exe).", file=sys.stderr)
print("install pulseaudio-utils (run install.sh) for paplay.", file=sys.stderr)
return 1
earcons = sound.Earcons(config)
print(f"playing each tone via WSLg audio-out (volume {config.sound_volume}) — listen ...")
ok = True
for event in sound.event_names():
path = earcons.tone_path(event)
if path is None or not Path(path).is_file():
print(f" {event:9} MISSING ({path})")
ok = False
continue
print(f" {event:9} {path.name} ...", flush=True)
played = audio_out.play_blocking(path, volume=config.sound_volume)
if not played:
print(f" {event:9} FAILED to play", file=sys.stderr)
ok = False
if not ok:
print("some tones did not play — audio-out may be unavailable.", file=sys.stderr)
return 1
print("audio-out OK (all tones played).")
return 0
def cmd_cleanup(_args: argparse.Namespace) -> int:
killed, kept = target.cleanup_detached()
if not killed:
print("nothing to clean (no detached sessions)")
return 0
msg = f"killed {', '.join(killed)}"
if kept:
msg += f"; kept {', '.join(kept)} (attached)"
print(msg)
return 0
def cmd_reload(_args: argparse.Namespace) -> int:
if daemon.reload_running():
print("signalled claudedo to reload config + contexts")
return 0
print("claudedo is not running")
return 1
def cmd_status(_args: argparse.Namespace) -> int:
pid = daemon.read_pid()
if pid is None:
@ -267,22 +217,16 @@ def build_parser() -> argparse.ArgumentParser:
sp = sub.add_parser("start", help="run the daemon (foreground)")
sp.add_argument("--mode", choices=("listen", "ptt"), help="override input mode")
sp.add_argument("--check", action="store_true",
help="run a mic check before listening (off by default)")
sp.add_argument("--skip-audio-check", action="store_true",
help="skip the pre-listen mic check")
sp.set_defaults(func=cmd_start)
sub.add_parser("stop", help="stop a running daemon").set_defaults(func=cmd_stop)
sub.add_parser("reload", help="reload config + contexts in a running daemon"
).set_defaults(func=cmd_reload)
sub.add_parser("status", help="show daemon status").set_defaults(func=cmd_status)
sub.add_parser("test-audio", help="verify the mic capture path").set_defaults(func=cmd_test_audio)
sub.add_parser("test-tone", help="play each earcon (verify the audio-out path)"
).set_defaults(func=cmd_test_tone)
sub.add_parser("install", help="re-run the bootstrap (install.sh)").set_defaults(func=cmd_install)
sub.add_parser("unset", help="clear the sticky target session").set_defaults(func=cmd_unset)
sub.add_parser("list", help="list running claude-* sessions").set_defaults(func=cmd_list)
sub.add_parser("cleanup", help="kill detached claude-* sessions (never attached)"
).set_defaults(func=cmd_cleanup)
for verb in ("set", "switch"):
sp_set = sub.add_parser(verb, help="set the sticky target session")

View File

@ -1,158 +0,0 @@
"""audio output — play short .wav files through the WSLg/PulseAudio sink (RDPSink).
the reverse direction of audio.py's mic capture, and the less-tested path on WSLg. a
three-tier player picks the first backend that works and remembers it:
1. paplay (pulseaudio-utils) a SEPARATE process hitting PulseAudio directly. this
is the primary on purpose: the daemon captures with sounddevice (an open input
stream in listen mode), so keeping OUTPUT in a separate process avoids stacking
input+output in one lib on a bridge known to be duplex-flaky.
2. sounddevice sd.play() in-process fallback if paplay is absent.
3. powershell.exe SoundPlayer last resort via the Windows host (no volume control).
both earcons (sound.py) and future v0.3 TTS readback play through this module keep it
generic (it plays a wav path, it knows nothing about events). playback is fire-and-
forget on a worker thread: a missing file or a dead speaker logs once and is swallowed,
never raised, so audio-out can NEVER block or break the inject path.
"""
from __future__ import annotations
import logging
import shutil
import subprocess
import threading
import wave
from pathlib import Path
log = logging.getLogger(__name__)
_PAPLAY = "paplay"
_POWERSHELL = "powershell.exe"
_backend_lock = threading.Lock()
_chosen_backend: str | None = None
_warned = False
def _have(cmd: str) -> bool:
return shutil.which(cmd) is not None
def _clamp_volume(volume: float) -> float:
return max(0.0, min(1.0, float(volume)))
def _play_paplay(path: Path, volume: float) -> bool:
"""play via paplay; volume scaled through --volume (0-65536 linear)"""
vol = int(_clamp_volume(volume) * 65536)
proc = subprocess.run(
[_PAPLAY, f"--volume={vol}", str(path)],
stdout=subprocess.DEVNULL, stderr=subprocess.PIPE,
)
if proc.returncode != 0:
log.debug("paplay failed: %s", proc.stderr.decode("utf-8", "replace").strip())
return False
return True
def _play_sounddevice(path: Path, volume: float) -> bool:
"""play via sounddevice (in-process fallback); volume scales the samples"""
try:
import numpy as np
import sounddevice as sd
except Exception as exc:
log.debug("sounddevice unavailable: %s", exc)
return False
try:
with wave.open(str(path), "rb") as wf:
sr = wf.getframerate()
frames = wf.readframes(wf.getnframes())
data = np.frombuffer(frames, dtype="<i2").astype(np.float32) / 32768.0
data = data * _clamp_volume(volume)
sd.play(data, sr)
sd.wait()
return True
except Exception as exc:
log.debug("sounddevice playback failed: %s", exc)
return False
def _play_powershell(path: Path, _volume: float) -> bool:
"""play via the Windows host (last resort). SoundPlayer has no volume control,
so volume is ignored on this backend (documented best-effort)."""
if not _have(_POWERSHELL):
return False
try:
win = subprocess.run(["wslpath", "-w", str(path)], stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL)
winpath = win.stdout.decode("utf-8", "replace").strip() if win.returncode == 0 else str(path)
script = f"(New-Object Media.SoundPlayer '{winpath}').PlaySync()"
proc = subprocess.run([_POWERSHELL, "-NoProfile", "-Command", script],
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
return proc.returncode == 0
except Exception as exc:
log.debug("powershell playback failed: %s", exc)
return False
_BACKENDS = {
"paplay": _play_paplay,
"sounddevice": _play_sounddevice,
"powershell": _play_powershell,
}
_ORDER = ("paplay", "sounddevice", "powershell")
def _play_sync(path: Path, volume: float) -> bool:
"""play a wav synchronously, choosing/remembering a working backend. returns
whether playback succeeded; never raises."""
global _chosen_backend, _warned
if not path.is_file():
log.debug("tone file missing: %s", path)
return False
with _backend_lock:
order = (_chosen_backend,) + _ORDER if _chosen_backend else _ORDER
tried = []
for name in order:
if name in tried:
continue
tried.append(name)
if name == "paplay" and not _have(_PAPLAY):
continue
if _BACKENDS[name](path, volume):
with _backend_lock:
_chosen_backend = name
return True
with _backend_lock:
if not _warned:
_warned = True
log.warning("audio-out unavailable (tried %s) — continuing silently; "
"tones disabled for this run", ", ".join(tried))
return False
def play(path: str | Path, volume: float = 1.0, blocking: bool = False) -> None:
"""play a wav file. fire-and-forget by default (a worker thread), so a slow or
dead speaker never delays the caller. set blocking=True only for test-tone, where
we want to play tones in sequence and report the result.
failures are swallowed (logged once) audio-out must never break a command.
"""
p = Path(path)
if blocking:
_play_sync(p, volume)
return
threading.Thread(target=_play_sync, args=(p, volume), daemon=True).start()
def play_blocking(path: str | Path, volume: float = 1.0) -> bool:
"""synchronous play that returns success — for test-tone's audio-out gate"""
return _play_sync(Path(path), volume)
def available() -> bool:
"""true if any audio-out backend is present (best-effort, paplay/powershell)"""
return _have(_PAPLAY) or _have(_POWERSHELL)

View File

@ -17,10 +17,7 @@ except ModuleNotFoundError:
log = logging.getLogger(__name__)
_VALID_MODES = ("listen", "ptt")
_VALID_MODELS = (
"tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3",
"tiny.en", "base.en", "small.en", "medium.en",
)
_VALID_MODELS = ("tiny", "base", "small", "medium", "large-v2", "large-v3")
DEFAULT_CONFIG_PATHS = (
Path(os.environ.get("CLAUDEDO_CONFIG", "")) if os.environ.get("CLAUDEDO_CONFIG") else None,
@ -47,25 +44,12 @@ class Config:
samplerate: int
channels: int
silence_threshold: float
vad_silence_ms: int
vad_max_seconds: float
silence_duration: float
min_utterance: float
max_utterance: float
type_autosend: bool
wake_fuzzy_threshold: float
command_fuzzy_threshold: float
match_threshold: float
filler_words: tuple[str, ...]
auto_target: bool
print_heard: bool
context_multiline: bool
context_separator: str
cleanup_confirm: bool
sound_enabled: bool
sound_on_wake: bool
sound_on_accept: bool
sound_on_no_match: bool
sound_on_submit: bool
sound_volume: float
sound_files: dict[str, str]
source_path: Path | None = field(default=None)
@ -112,7 +96,7 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
if mode not in _VALID_MODES:
raise ConfigError(f"[input].mode must be one of {_VALID_MODES}, got {mode!r}")
model = _require(raw, "stt", "model", (str,), "small.en")
model = _require(raw, "stt", "model", (str,), "small")
if model not in _VALID_MODELS:
log.warning("unknown stt model %r — passing through to faster-whisper", model)
@ -127,37 +111,17 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
samplerate=int(_require(raw, "audio", "samplerate", (int,), 16000)),
channels=int(_require(raw, "audio", "channels", (int,), 1)),
silence_threshold=float(_require(raw, "audio", "silence_threshold", (int, float), 0.012)),
vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 700)),
vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 15.0)),
silence_duration=float(_require(raw, "audio", "silence_duration", (int, float), 0.8)),
min_utterance=float(_require(raw, "audio", "min_utterance", (int, float), 0.3)),
max_utterance=float(_require(raw, "audio", "max_utterance", (int, float), 15.0)),
type_autosend=bool(_require(raw, "behavior", "type_autosend", (bool,), False)),
wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.65)),
command_fuzzy_threshold=float(_require(raw, "behavior", "command_fuzzy_threshold",
(int, float), 0.8)),
match_threshold=float(_require(raw, "behavior", "match_threshold", (int, float), 0.8)),
filler_words=tuple(_require(raw, "behavior", "filler_words", (list,),
["select", "use", "choose"])),
auto_target=bool(_require(raw, "behavior", "auto_target", (bool,), False)),
print_heard=bool(_require(raw, "behavior", "print_heard", (bool,), False)),
context_multiline=bool(_require(raw, "behavior", "context_multiline", (bool,), True)),
context_separator=str(_require(raw, "behavior", "context_separator", (str,), "")),
cleanup_confirm=bool(_require(raw, "behavior", "cleanup_confirm", (bool,), False)),
sound_enabled=bool(_require(raw, "sound", "enabled", (bool,), True)),
sound_on_wake=bool(_require(raw, "sound", "on_wake", (bool,), False)),
sound_on_accept=bool(_require(raw, "sound", "on_accept", (bool,), True)),
sound_on_no_match=bool(_require(raw, "sound", "on_no_match", (bool,), True)),
sound_on_submit=bool(_require(raw, "sound", "on_submit", (bool,), True)),
sound_volume=float(_require(raw, "sound", "volume", (int, float), 0.5)),
sound_files=dict(_require(raw, "sound", "files", (dict,), {})),
source_path=path,
)
for label, val in (("wake_fuzzy_threshold", cfg.wake_fuzzy_threshold),
("command_fuzzy_threshold", cfg.command_fuzzy_threshold)):
if not 0.0 < val <= 1.0:
raise ConfigError(f"[behavior].{label} must be in (0, 1]")
if not 0.0 <= cfg.sound_volume <= 1.0:
raise ConfigError("[sound].volume must be in [0, 1]")
if cfg.vad_silence_ms <= 0 or cfg.vad_max_seconds <= 0:
raise ConfigError("[vad].silence_ms and max_seconds must be positive")
if not 0.0 < cfg.match_threshold <= 1.0:
raise ConfigError("[behavior].match_threshold must be in (0, 1]")
if cfg.samplerate <= 0 or cfg.channels <= 0:
raise ConfigError("[audio].samplerate and channels must be positive")
return cfg

View File

@ -1,65 +0,0 @@
"""colored, prefixed console output for the daemon's recognition/action feed.
every line is ``HH:MM:SS [PREFIX] message``. prefixes group the source: a session
name (e.g. ``[claude-libs]``) for anything injected into a tmux session, ``[SYSTEM]``
for daemon-control/state lines, and ``[VOICE]`` for STT/recognition lines. color is
opt-in via tty detection (or forced): green for successful injections, red for
drops/errors, dim for routine. falls back to plain text when stdout is not a tty.
"""
from __future__ import annotations
import sys
import time
RESET = "\033[0m"
_COLORS = {
"green": "\033[32m",
"red": "\033[31m",
"yellow": "\033[33m",
"cyan": "\033[36m",
"blue": "\033[34m",
"brightblue": "\033[94m",
"magenta": "\033[35m",
"dim": "\033[2m",
"bold": "\033[1m",
}
SYSTEM = "SYSTEM"
VOICE = "VOICE"
HELP = "HELP"
class Console:
"""formats and prints daemon log lines with timestamp, prefix, and color"""
def __init__(self, color: bool | None = None, stream=None, clock=None) -> None:
self.stream = stream if stream is not None else sys.stdout
self._clock = clock or time.localtime
if color is None:
color = hasattr(self.stream, "isatty") and self.stream.isatty()
self.color = bool(color)
def _stamp(self) -> str:
t = self._clock()
return f"{t.tm_hour:02d}:{t.tm_min:02d}:{t.tm_sec:02d}"
def _paint(self, text: str, color: str | None) -> str:
if not self.color or not color or color not in _COLORS:
return text
return f"{_COLORS[color]}{text}{RESET}"
def paint(self, text: str, color: str | None) -> str:
"""public colorizer for pre-coloring a fragment of a message (e.g. a command
word) before passing it to emit() with color=None"""
return self._paint(text, color)
def emit(self, prefix: str, message: str, color: str | None = None) -> None:
"""print one line: ``HH:MM:SS [prefix] message`` (message optionally colored)"""
line = f"{self._stamp()} {self._paint(f'[{prefix}]', 'dim')} {self._paint(message, color)}"
print(line, file=self.stream, flush=True)
def line(self, message: str, color: str | None = None) -> None:
"""print a bare continuation line (no timestamp/prefix) — for multi-row blocks
like the help menu, indented under a preceding header"""
print(self._paint(message, color), file=self.stream, flush=True)

View File

@ -1,108 +0,0 @@
"""load named context blocks from contexts.toml into a typed lookup.
contexts are user-edited reference blurbs (claude.md-style snippets) keyed by simple
spoken names. the ``context``/``prepare`` voice command injects a named blurb ahead of
a dictated instruction (read-before-send: never auto-submitted). mirrors config.py's
load/validate pattern; a missing file is an empty set, not an error.
"""
from __future__ import annotations
import logging
import os
import re
from dataclasses import dataclass, field
from pathlib import Path
try:
import tomllib as _toml
except ModuleNotFoundError:
import tomli as _toml
log = logging.getLogger(__name__)
_NAME_RE = re.compile(r"^[a-z0-9][a-z0-9 _-]*$")
DEFAULT_CONTEXTS_PATHS = (
Path(os.environ.get("CLAUDEDO_CONTEXTS", "")) if os.environ.get("CLAUDEDO_CONTEXTS") else None,
Path.home() / ".config" / "claudedo" / "contexts.toml",
Path.cwd() / "contexts.toml",
)
class ContextsError(Exception):
"""raised on an unparseable or invalid contexts.toml"""
@dataclass
class Contexts:
"""validated named context blocks (name -> blurb), normalized for spoken lookup"""
blocks: dict[str, str] = field(default_factory=dict)
source_path: Path | None = field(default=None)
def __len__(self) -> int:
return len(self.blocks)
def names(self) -> list[str]:
"""the context names, sorted (for status / listing)"""
return sorted(self.blocks)
def get(self, name: str) -> str | None:
"""look up a blurb by its normalized (lowercased, despaced) name, or None.
names are matched on a lowercase, space/underscore/hyphen-stripped key so a
spoken "web hooks" resolves the configured ``webhooks``/``web-hooks`` block.
"""
return self.blocks.get(_key(name))
def _key(name: str) -> str:
return re.sub(r"[ _-]+", "", name.strip().lower())
def find_contexts_path(explicit: str | os.PathLike | None = None) -> Path | None:
"""resolve the contexts.toml path, or None if no file exists (not an error)"""
candidates: list[Path] = []
if explicit:
candidates.append(Path(explicit))
candidates.extend(p for p in DEFAULT_CONTEXTS_PATHS if p)
for path in candidates:
if path.is_file():
return path
return None
def load_contexts(explicit: str | os.PathLike | None = None) -> Contexts:
"""load contexts.toml from the first existing default path (or an explicit one).
a missing file yields an empty Contexts (the feature is opt-in). names must be
simple words (matchable) and values must be non-empty strings; a bad entry raises
ContextsError so the user sees a clear message rather than a silent drop.
"""
path = find_contexts_path(explicit)
if path is None:
return Contexts(blocks={}, source_path=None)
try:
with open(path, "rb") as fh:
raw = _toml.load(fh)
except _toml.TOMLDecodeError as exc:
raise ContextsError(f"could not parse {path}: {exc}") from exc
table = raw.get("contexts", {})
if not isinstance(table, dict):
raise ContextsError("[contexts] must be a table of name = \"blurb\" entries")
blocks: dict[str, str] = {}
for name, value in table.items():
if not isinstance(name, str) or not _NAME_RE.match(name.lower()):
raise ContextsError(f"context name {name!r} must be simple words (a-z, 0-9, space/-/_)")
if not isinstance(value, str) or not value.strip():
raise ContextsError(f"context {name!r} must be a non-empty string")
key = _key(name)
if key in blocks:
raise ContextsError(f"context {name!r} collides with another name on the spoken key {key!r}")
blocks[key] = value.strip()
return Contexts(blocks=blocks, source_path=path)

View File

@ -16,11 +16,8 @@ import sys
import time
from pathlib import Path
from . import __version__, audio, grammar, inject, keys, target
from .config import Config, ConfigError, load_config
from .console import HELP, SYSTEM, VOICE, Console
from .contexts import Contexts, ContextsError, load_contexts
from .sound import Earcons
from . import audio, grammar, inject, target
from .config import Config
from .stt import Transcriber
log = logging.getLogger(__name__)
@ -78,16 +75,6 @@ def stop_running() -> bool:
return True
def reload_running() -> bool:
"""signal a running daemon (SIGHUP) to reload config + contexts. returns whether
one was found. no-op on platforms without SIGHUP."""
pid = read_pid()
if pid is None or not hasattr(signal, "SIGHUP"):
return False
os.kill(pid, signal.SIGHUP)
return True
class _PTTKey:
"""desk-only push-to-talk: 'held' while the configured key is down in the
daemon's own terminal. there is deliberately NO global hotkey — a system-wide
@ -124,36 +111,18 @@ class Daemon:
self.config = config
self.mode = config.mode
self._stop = False
self._reload_pending = False
self._cleanup_pending = False
self._transcriber: Transcriber | None = None
self._device: int | None = None
self._ptt = _PTTKey()
self._pending: dict[str, int] = {}
self._console = Console()
self._contexts = Contexts()
self._earcons = Earcons(config)
self._last_stt_ms = 0.0
self._last_audio_s = 0.0
def _install_signals(self) -> None:
signal.signal(signal.SIGTERM, self._on_signal)
signal.signal(signal.SIGINT, self._on_signal)
if hasattr(signal, "SIGHUP"):
signal.signal(signal.SIGHUP, self._on_reload_signal)
def _on_signal(self, _signum, _frame) -> None:
log.info("stop requested")
self._stop = True
def _on_reload_signal(self, _signum, _frame) -> None:
"""SIGHUP from `claudedo reload` -> reload both config files on the next tick.
the actual reload runs in the loop (not the handler) so it never races a
capture/transcribe; the handler only sets the flag.
"""
self._reload_pending = True
def stopped(self) -> bool:
return self._stop
@ -164,23 +133,12 @@ class Daemon:
model=cfg.stt_model, language=cfg.stt_language,
device=cfg.stt_compute if cfg.stt_compute in ("cpu", "cuda") else "auto",
compute_type="auto",
initial_prompt=grammar.initial_prompt(cfg.wake_phrases),
)
self._load_contexts()
if audio.warm_up(cfg.samplerate, cfg.channels, self._device):
log.info("mic warmed up (source live)")
else:
log.warning("mic warm-up saw only silence — check mic permission / RDPSource")
def _load_contexts(self) -> None:
"""(re)load contexts.toml, leaving the loaded model untouched. a parse error is
logged and leaves the previous set in place rather than crashing the loop."""
try:
self._contexts = load_contexts()
except ContextsError as exc:
log.warning("contexts.toml invalid, keeping previous set: %s", exc)
self._console.emit(SYSTEM, f"contexts.toml error (kept previous): {exc}", "red")
def _capture(self):
cfg = self.config
if self.mode == "ptt":
@ -190,289 +148,71 @@ class Daemon:
return audio.record_while(
cfg.samplerate, cfg.channels, self._device,
held=lambda: not self._ptt.wait_press(self.stopped),
max_utterance=cfg.vad_max_seconds, min_utterance=cfg.min_utterance,
max_utterance=cfg.max_utterance, min_utterance=cfg.min_utterance,
)
return audio.record_until_silence(
cfg.samplerate, cfg.channels, self._device,
silence_threshold=cfg.silence_threshold, silence_duration=cfg.vad_silence_ms / 1000.0,
min_utterance=cfg.min_utterance, max_utterance=cfg.vad_max_seconds,
silence_threshold=cfg.silence_threshold, silence_duration=cfg.silence_duration,
min_utterance=cfg.min_utterance, max_utterance=cfg.max_utterance,
stop=self.stopped,
)
def _handle(self, transcript: str) -> None:
cfg = self.config
require_wake = self.mode == "listen"
parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.wake_fuzzy_threshold,
cfg.command_fuzzy_threshold, require_wake, filler=cfg.filler_words)
parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.match_threshold, require_wake,
filler=cfg.filler_words)
if parsed is None or parsed.action is None:
if parsed is not None:
self._earcons.play("wake")
self._console.emit(VOICE, f'heard "{transcript}" -> no command matched {self._timing()}',
"yellow")
self._earcons.play("no_match")
self._emit(f'heard: "{transcript}" -> no command matched')
return
action = parsed.action
self._earcons.play("wake")
# a command was recognized — echo what we heard (green) before acting. note the
# matched wake phrase (magenta) when the transcript didn't literally contain it
# (so a loose match like "okay clouds" -> "okay claude" is visible).
head = self._console.paint(f'heard "{transcript}" -> {self._describe(action)}', "green")
note = ""
if parsed.wake and parsed.wake.replace(" ", "") not in transcript.lower().replace(" ", ""):
note = (self._console.paint(" (wake: ", "green")
+ self._console.paint(parsed.wake, "magenta")
+ self._console.paint(")", "green"))
tail = self._console.paint(f" {self._timing()}", "green")
self._console.emit(VOICE, f"{head}{note}{tail}")
def blue(s):
return self._console.paint(s, "brightblue")
if action.name == "mode":
new_mode = str(action.arg)
if new_mode != self.mode:
self.mode = new_mode
self._console.emit(SYSTEM, f"{blue('mode')} -> {new_mode}")
self._emit(f"mode -> {new_mode}")
self._refresh_state()
return
if action.name == "set":
session = target.set_target(str(action.arg))
self._pending.pop(session, None)
self._console.emit(SYSTEM, f"{blue('set sticky')} -> {session}")
self._emit(f"set sticky -> {session}")
self._refresh_state()
return
if action.name == "unset":
target.unset_target()
self._console.emit(SYSTEM, f"{blue('unset')} (cleared)")
self._emit("unset (cleared)")
self._refresh_state()
return
if action.name == "list":
sessions = target.list_sessions()
self._console.emit(SYSTEM, f"{blue('list')} -> "
+ (", ".join(sessions) if sessions else "(none running)"))
self._emit("list -> " + (", ".join(sessions) if sessions else "(none running)"))
return
if action.name == "commands":
self._console.emit(HELP, "voice commands:")
for usage, desc in grammar.command_menu():
self._console.line(f" {self._console.paint(f'{usage:<26}', 'brightblue')} {desc}")
return
if action.name == "customs":
names = self._contexts.names()
listed = ", ".join(names) if names else "(none — edit contexts.toml)"
self._console.emit(SYSTEM, f"contexts: {listed}")
return
if action.name == "version":
self._console.emit(SYSTEM, f"claudedo {__version__}")
return
if action.name == "debug":
self._console.emit(VOICE, f'debug: "{action.arg}"', "yellow")
return
if action.name == "reload":
self._do_reload(str(action.arg))
return
if action.name == "system":
self._do_system(action.arg)
return
if action.name == "context":
name = str(action.arg[0])
if self._contexts.get(name) is None:
self._console.emit(VOICE, f"no context named '{name}' -> did nothing", "red")
self._earcons.play("no_match")
return
session, reason = target.resolve(parsed.one_shot, auto_target=cfg.auto_target)
session, reason = target.resolve(parsed.one_shot)
if session is None:
self._console.emit(VOICE, f'heard "{transcript}" -> {reason} -> '
f'{self._describe(action)} did nothing', "red")
self._earcons.play("no_match")
self._emit(f'heard: "{transcript}" -> {reason} -> matched {self._describe(action)} '
f'-> did nothing')
return
if action.name == "context":
self._inject_context(session, action)
prefix = f'heard: "{transcript}" -> {reason} -> matched {self._describe(action)}'
if action.name == "type" and not cfg.type_autosend:
inject.send_literal(session, str(action.arg))
self._emit(f"{prefix} -> injected literal {str(action.arg)!r} -> {session}")
return
self._inject(session, action)
def _inject(self, session: str, action) -> None:
"""run a resolved command against `session`, tracking the uncommitted-input
buffer so backspace/erase delete only back to the last submit boundary.
the 'heard ...' echo is already printed by _handle and the [session] prefix
names the target, so these lines just report the keystrokes injected. the
earcon fires here (a real injection): submit chimes the submit tone, every
other injected command the accept tone.
"""
name = action.name
self._earcons.play("submit" if name == "submit" else "accept")
if name == "type":
text = str(action.arg)
inject.send_literal(session, text)
self._pending[session] = self._pending.get(session, 0) + len(text)
if self.config.type_autosend:
inject.send_named(session, keys.SUBMIT)
self._pending[session] = 0
self._console.emit(session, f"typed {text!r}"
+ (" + send" if self.config.type_autosend else ""), "green")
return
if name == "space":
n = int(action.arg)
inject.perform(session, action)
self._pending[session] = self._pending.get(session, 0) + n
self._console.emit(session, f"space x{n}", "green")
return
if name == "backspace":
n = int(action.arg)
if n:
inject.perform(session, action)
self._pending[session] = max(0, self._pending.get(session, 0) - n)
self._console.emit(session, f"backspace x{n}", "green")
return
if name == "erase":
n = self._pending.get(session, 0)
if n:
inject.perform(session, grammar.Action("erase", n))
self._pending[session] = 0
self._console.emit(session, f"erase x{n} (to last boundary)", "green")
return
inject.perform(session, action)
if name == "submit":
self._pending[session] = 0
self._console.emit(session, f"injected {self._describe(action)}", "green")
def _inject_context(self, session: str, action) -> None:
"""inject a named context blurb ahead of the dictated instruction, then WAIT.
read-before-send: never auto-submits the user says ``send`` separately, and
claude's own permission prompt is the backstop for anything consequential.
routes through inject.send_literal (the same path as ``type``) and tracks the
uncommitted-input buffer so backspace/erase still bound to the last boundary.
assembly (config behavior.context_multiline): true -> blurb, a soft Shift+Enter
newline, then the instruction; false -> blurb + context_separator + instruction
flattened onto one line. a bare ``context <name>`` (no dictation) injects just
the blurb. the soft newline does not count toward the editable-char buffer.
"""
cfg = self.config
name, dictation = str(action.arg[0]), str(action.arg[1])
blurb = self._contexts.get(name) or ""
self._earcons.play("accept")
inject.send_literal(session, blurb)
chars = len(blurb)
if dictation:
if cfg.context_multiline:
inject.send_named(session, keys.NEWLINE)
else:
inject.send_literal(session, cfg.context_separator)
chars += len(cfg.context_separator)
inject.send_literal(session, dictation)
chars += len(dictation)
self._pending[session] = self._pending.get(session, 0) + chars
shape = "blurb" if not dictation else "blurb + dictation"
self._console.emit(session, f"context '{name}' -> {shape} (waiting for send)", "green")
def _do_reload(self, scope: str) -> None:
"""re-read config.toml and/or contexts.toml live without reinitializing the
loaded whisper model (the slow part). scope: all|config|contexts."""
did = []
if scope in ("all", "config"):
try:
new_cfg = load_config()
self._apply_config(new_cfg)
did.append("config")
except ConfigError as exc:
self._console.emit(SYSTEM, f"config reload failed (kept previous): {exc}", "red")
if scope in ("all", "contexts"):
self._load_contexts()
did.append("contexts")
what = " + ".join(did) if did else "nothing"
blue = self._console.paint("reloaded", "brightblue")
self._console.emit(SYSTEM, f"{blue} {what} ({len(self._contexts)} contexts)")
def _apply_config(self, new_cfg: Config) -> None:
"""swap in a reloaded config, preserving the runtime mode the user may have
toggled by voice and leaving the already-loaded transcriber untouched."""
new_cfg.mode = self.mode
self.config = new_cfg
self._earcons.update(new_cfg)
def _do_system(self, arg) -> None:
"""daemon-control namespace (never injects to claude): status / reload."""
if isinstance(arg, tuple) and arg and arg[0] == "reload":
self._do_reload(str(arg[1]))
return
if isinstance(arg, tuple) and arg and arg[0] == "unknown":
self._console.emit(SYSTEM, f"unknown system command '{arg[1]}'", "red")
return
if arg == "status":
cfg = self.config
sticky = target.read_active() or "(none)"
blue = self._console.paint("status", "brightblue")
self._console.emit(SYSTEM, f"{blue}: mode {self.mode}, sticky {sticky}, "
f"model {cfg.stt_model}, {len(self._contexts)} contexts")
return
if arg == "cleanup":
self._do_cleanup()
return
if arg == "confirm":
blue = self._console.paint("cleanup", "brightblue")
if self._cleanup_pending:
self._run_cleanup(blue)
else:
self._console.emit(SYSTEM, f"{blue}: nothing pending to confirm")
return
self._console.emit(SYSTEM, f"unknown system command {arg!r}", "red")
def _do_cleanup(self) -> None:
"""kill detached claude-* sessions (never attached), report killed + kept.
detached-only is the safety model: a misheard voice cleanup cannot nuke the
active (attached) session. with behavior.cleanup_confirm the daemon announces
the detached set and waits for a following ``confirm`` instead of killing now.
"""
blue = self._console.paint("cleanup", "brightblue")
if self.config.cleanup_confirm:
pending = [n for n, attached in target._claude_sessions() if not attached]
if not pending:
self._console.emit(SYSTEM, f"{blue}: nothing to clean (no detached sessions)")
return
self._cleanup_pending = True
self._console.emit(SYSTEM, f"{blue}: would kill {', '.join(sorted(pending))} "
f"— say 'confirm' to proceed")
return
self._run_cleanup(blue)
def _run_cleanup(self, blue: str) -> None:
killed, kept = target.cleanup_detached()
self._cleanup_pending = False
if not killed:
self._console.emit(SYSTEM, f"{blue}: nothing to clean (no detached sessions)")
return
msg = f"{blue}: killed {', '.join(killed)}"
if kept:
msg += f"; kept {', '.join(kept)} (attached)"
self._console.emit(SYSTEM, msg)
def _timing(self) -> str:
"""compact STT latency suffix for heard lines (transcribe ms on audio secs)"""
return f"({self._last_stt_ms:.0f}ms/{self._last_audio_s:.1f}s)"
self._emit(f"{prefix} -> injected {self._describe(action)} -> {session}")
@staticmethod
def _describe(action) -> str:
if action.name == "context":
name, dictation = action.arg
tail = " + dictation" if dictation else ""
return f"CONTEXT('{name}'{tail})"
if action.name == "system":
arg = action.arg
if isinstance(arg, tuple):
return f"SYSTEM({arg[0]} {arg[1]})"
return f"SYSTEM({arg})"
if action.arg is None:
return action.name.upper()
return f"{action.name.upper()}({action.arg})"
@staticmethod
def _emit(line: str) -> None:
"""print a recognition/action line to the watched terminal"""
print(line, flush=True)
def _has_wake(self, transcript: str) -> bool:
"""true if the utterance starts with a wake phrase (listen-mode gate).
@ -480,18 +220,20 @@ class Daemon:
invariant: non-command speech is discarded, never recorded.
"""
cfg = self.config
return grammar.strip_wake(transcript, cfg.wake_phrases,
cfg.wake_fuzzy_threshold, True) is not None
return grammar.strip_wake(transcript, cfg.wake_phrases, cfg.match_threshold, True) is not None
def _print_startup(self) -> None:
cfg = self.config
dev = cfg.stt_device if cfg.stt_device != "auto" else "default"
target_now = target.read_active() or "(none — run cc / set <name>)"
self._console.emit(SYSTEM, f"claudedo {self.mode} mode — Ctrl-C to stop", "bold")
self._console.emit(SYSTEM, f"model {cfg.stt_model} ({cfg.stt_language}) · mic {dev} · "
f"target {target_now} · {len(self._contexts)} contexts")
wakes = ", ".join(self._console.paint(p, "magenta") for p in cfg.wake_phrases)
self._console.emit(SYSTEM, f"wake: {wakes}")
target_now = target.read_active() or "(none — run cc to attach)"
self._emit("── claudedo ─────────────────────────────────")
self._emit(f" model: {cfg.stt_model} ({cfg.stt_language})")
self._emit(f" mic: {dev}")
self._emit(f" mode: {self.mode}")
self._emit(f" target: {target_now}")
self._emit(f" wake: {', '.join(cfg.wake_phrases)}")
self._emit(" Ctrl-C to stop")
self._emit("─────────────────────────────────────────────")
def _refresh_state(self) -> None:
write_state(os.getpid(), self.mode, target.read_active())
@ -506,25 +248,16 @@ class Daemon:
self._refresh_state()
self._print_startup()
while not self._stop:
if self._reload_pending:
self._reload_pending = False
self._do_reload("all")
audio_chunk = self._capture()
if self._stop:
break
if audio_chunk is None:
continue
t0 = time.monotonic()
transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate)
self._last_stt_ms = (time.monotonic() - t0) * 1000.0
self._last_audio_s = audio_chunk.size / self.config.samplerate
if not transcript:
continue
if self.mode == "listen" and not self._has_wake(transcript):
if self.config.print_heard:
self._console.emit(VOICE, f'heard (dropped) "{transcript}" {self._timing()}', "red")
else:
self._console.emit(VOICE, "dropped: non-wake speech (not recorded)", "dim")
self._emit("dropped: non-wake speech (not recorded)")
continue
self._handle(transcript)
finally:

View File

@ -27,50 +27,11 @@ _NUMBER_WORDS = {
_INDEX_WORDS = {"1": 1, "2": 2, "3": 3, "4": 4}
_COUNT_WORDS = {
"five": 5, "six": 6, "seven": 7, "eight": 8, "nine": 9, "ten": 10,
"eleven": 11, "twelve": 12, "thirteen": 13, "fourteen": 14, "fifteen": 15,
"sixteen": 16, "seventeen": 17, "eighteen": 18, "nineteen": 19, "twenty": 20,
}
_YES_VERBS = ("yes", "yeah", "yep", "yup")
_NO_VERBS = ("no", "nope", "nah")
_APPROVE_VERBS = ("approve", "allow")
_DENY_VERBS = ("deny", "reject")
_SUBMIT_VERBS = ("send", "enter", "submit")
_CANCEL_VERBS = ("cancel", "escape")
_TYPE_VERBS = ("type", "dictate", "write")
_BACKSPACE_VERBS = ("backspace", "delete")
_SPACE_VERBS = ("space", "spacebar")
_ADD_VERBS = ("add", "insert")
_ERASE_VERBS = ("erase", "clear", "wipe")
_DEBUG_VERBS = ("debug", "echo")
_MODE_VERBS = ("mode",)
_STICKY_VERBS = ("set", "sticky", "switch")
_ONESHOT_VERBS = ("target",)
_UNSET_VERBS = ("unset", "unsticky")
_LIST_VERBS = ("list", "sessions")
_COMMANDS_VERBS = ("commands", "help", "menu")
_CUSTOMS_VERBS = ("customs", "custom")
_VERSION_VERBS = ("version",)
_SELECT_VERBS = ("select", "option", "choose", "number")
_CONTEXT_VERBS = ("context", "prepare")
_RELOAD_VERBS = ("reload",)
_SYSTEM_VERBS = ("system",)
_RELOAD_SCOPES = ("config", "contexts")
_CLEANUP_VERBS = ("detached", "detach", "cleanup")
_CONFIRM_VERBS = ("confirm",)
# every command/synonym word, for biasing the STT toward the vocabulary we expect.
_COMMAND_WORDS = (
_YES_VERBS + _NO_VERBS + _APPROVE_VERBS + _DENY_VERBS + _SUBMIT_VERBS
+ _CANCEL_VERBS + _TYPE_VERBS + _BACKSPACE_VERBS + _SPACE_VERBS + _ADD_VERBS
+ _ERASE_VERBS + _DEBUG_VERBS + _MODE_VERBS + _STICKY_VERBS + _ONESHOT_VERBS + _UNSET_VERBS
+ _LIST_VERBS + _COMMANDS_VERBS + _CUSTOMS_VERBS + _VERSION_VERBS
+ _CONTEXT_VERBS + _RELOAD_VERBS + _SYSTEM_VERBS + _RELOAD_SCOPES + _CLEANUP_VERBS
+ _CONFIRM_VERBS + _SELECT_VERBS + ("ptt", "listen")
+ ("one", "two", "three", "four")
)
DEFAULT_FILLER = ("select", "use", "choose")
@ -78,13 +39,9 @@ DEFAULT_FILLER = ("select", "use", "choose")
class Action:
"""a matched command: a name plus an optional argument.
names: yes, no, select, approve, deny, submit, type, space, backspace, erase,
cancel, mode, set, unset, list, context, reload, system. arg carries the select
index (int), the literal text for ``type``, the count for ``space``/``backspace``
(int), the mode for ``mode``, the session short-name for ``set``, a
``(name, dictation)`` tuple for ``context``, the scope string for ``reload``
(``"all"``/``"config"``/``"contexts"``), or the system control for ``system``
(``"status"`` or a ``("reload", scope)`` tuple).
names: yes, no, select, approve, deny, submit, type, cancel, mode, set, unset,
list. arg carries the select index (int), the literal text for ``type``, the mode
for ``mode``, or the session short-name for ``set``.
"""
name: str
@ -97,14 +54,11 @@ class ParsedCommand:
one_shot is the session short-name from a leading ``target <name>`` (this command
only; does not change the sticky default), or None. action is the command to run,
or None if nothing matched after the wake phrase / one-shot / filler. wake is the
configured wake phrase that matched (e.g. "okay claude" for a heard "okay clouds"),
or None.
or None if nothing matched after the wake phrase / one-shot / filler.
"""
one_shot: str | None
action: Action | None
wake: str | None = None
def normalize(text: str) -> str:
@ -118,57 +72,6 @@ def normalize(text: str) -> str:
return " ".join(tokens)
def vocabulary(wake_phrases: list[str]) -> list[str]:
"""the wake + command vocabulary, deduped in first-seen order.
single source for biasing the STT: the same wake phrases the matcher uses plus
every command/synonym word in _COMMAND_WORDS. no separate hardcoded copy.
"""
seen: dict[str, None] = {}
for word in list(wake_phrases) + list(_COMMAND_WORDS):
key = word.strip()
if key and key not in seen:
seen[key] = None
return list(seen)
def initial_prompt(wake_phrases: list[str]) -> str:
"""a comma-joined vocabulary string to pass faster-whisper as initial_prompt,
conditioning transcription toward the words we expect (esp. the coined wake)"""
return ", ".join(vocabulary(wake_phrases))
def command_menu() -> list[tuple[str, str]]:
"""the voice command menu as (usage, description) rows, for the `commands` cmd.
a small curated list keyed off the verb groups the speakable command surface,
NOT the cc shell kit.
"""
return [
("yes / no", "answer a yes/no prompt"),
("one..four", "pick numbered option 1-4"),
("approve / deny", "allow / deny a permission prompt"),
("send", "submit (Enter)"),
("cancel", "back out (Escape)"),
("type <text>", "insert literal text (no submit)"),
("space [n] / add a space", "insert n spaces"),
("backspace [n]", "delete n chars (to last submit)"),
("erase", "wipe the current input"),
("debug <text>", "echo to console (no inject)"),
("set <name>", "sticky target -> claude-<name>"),
("target <name> <cmd>", "one-shot to another session"),
("unset / list", "clear sticky / list sessions"),
("mode ptt|listen", "switch input mode"),
("context <name> <text>", "inject a contexts.toml blurb + dictation (no submit)"),
("reload", "re-read config.toml + contexts.toml live"),
("system status", "print mode/target/model/contexts to the console"),
("system reload [config|contexts]", "reload one or both config files"),
("cleanup / detached", "kill detached claude-* sessions (never attached)"),
("commands / customs", "this menu / list loaded contexts"),
("version", "print the claudedo version"),
]
def _ratio(a: str, b: str) -> float:
return SequenceMatcher(None, a, b).ratio()
@ -179,15 +82,13 @@ def _wake_variants(phrase: str) -> set[str]:
return {norm, norm.replace(" ", "")}
def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> tuple[str | None, str | None]:
"""return (command remainder, matched wake phrase).
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> str | None:
"""return the command remainder after the wake phrase.
if ``require_wake`` (listen mode) and no wake phrase is found at the start, the
remainder is None so the daemon discards the utterance. if not required (ptt
mode), a leading wake phrase is stripped when present but its absence is fine.
the matched phrase is the configured wake phrase that best matched (e.g. "okay
claude" for a heard "okay clouds"), or None when none matched.
if ``require_wake`` (listen mode) and no wake phrase is found at the start,
return None so the daemon discards the utterance. if not required (ptt mode),
a leading wake phrase is stripped when present but its absence is fine.
matches leniently on a despaced prefix (whisper splits/joins the coined word
inconsistently) but always slices the remainder on a WORD boundary of the
@ -195,11 +96,10 @@ def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
"""
norm = normalize(transcript)
if not norm:
return (None, None) if require_wake else ("", None)
return None if require_wake else ""
words = norm.split(" ")
best_remainder: str | None = None
best_phrase: str | None = None
best_score = 0.0
for phrase in wake_phrases:
variants = _wake_variants(phrase)
@ -213,75 +113,16 @@ def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
if score >= threshold and score > best_score:
best_score = score
best_remainder = " ".join(words[take:]).strip()
best_phrase = phrase
if best_remainder is not None:
return best_remainder, best_phrase
return (None, None) if require_wake else (norm, None)
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> str | None:
"""return the command remainder after the wake phrase (None if no wake in listen
mode). thin wrapper over strip_wake_match for callers that don't need the phrase"""
return strip_wake_match(transcript, wake_phrases, threshold, require_wake)[0]
return best_remainder
return None if require_wake else norm
def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool:
return any(_ratio(token, opt) >= threshold for opt in options)
def _leading_count(rest: list[str], default: int = 1) -> int:
"""read a count from the first token (digit or number word), else the default.
'backspace 3' -> 3, 'backspace ten' -> 10 (normalize maps small words to digits;
larger words come from _COUNT_WORDS), 'backspace' -> default.
"""
if not rest:
return default
tok = rest[0]
if tok.isdigit():
return max(0, int(tok))
if tok in _COUNT_WORDS:
return _COUNT_WORDS[tok]
return default
def _match_reload(rest: list[str], threshold: float, bare_default: str) -> Action | None:
"""map the tokens after a ``reload`` verb to a reload Action.
bare reload -> the caller's default scope ("all" for the bare command, the
``("reload", scope)`` tuple for ``system reload``). a trailing ``config``/
``contexts`` narrows the scope; an unrecognized scope falls back to the default.
"""
scope = bare_default
if rest and _fuzzy_in(rest[0], ("config", "configuration"), threshold):
scope = "config"
elif rest and _fuzzy_in(rest[0], ("contexts", "context"), threshold):
scope = "contexts"
return Action("reload", scope)
def _match_system(rest: list[str], threshold: float) -> Action | None:
"""map the tokens after the reserved ``system`` word to a daemon-control Action.
the ``system`` namespace never injects into claude. v0.2.0 scope: ``status`` and
``reload [config|contexts]``. unknown controls return a ``system`` Action with an
``("unknown", word)`` arg so the daemon can report it rather than silently drop.
"""
if not rest:
return Action("system", "status")
head = rest[0]
if _fuzzy_in(head, _RELOAD_VERBS, threshold):
inner = _match_reload(rest[1:], threshold, bare_default="all")
return Action("system", ("reload", inner.arg))
if _fuzzy_in(head, ("status", "state"), threshold):
return Action("system", "status")
if _fuzzy_in(head, _CLEANUP_VERBS, threshold):
return Action("system", "cleanup")
return Action("system", ("unknown", head))
def match_command(remainder: str, threshold: float) -> Action | None:
"""map a normalized command remainder to an Action, or None if unrecognized.
@ -296,58 +137,30 @@ def match_command(remainder: str, threshold: float) -> Action | None:
head = tokens[0]
rest = tokens[1:]
if _fuzzy_in(head, _SYSTEM_VERBS, threshold):
return _match_system(rest, threshold)
if _fuzzy_in(head, _CLEANUP_VERBS, threshold):
return Action("system", "cleanup")
if _fuzzy_in(head, _CONFIRM_VERBS, threshold):
return Action("system", "confirm")
if _fuzzy_in(head, _RELOAD_VERBS, threshold):
return _match_reload(rest, threshold, bare_default="all")
if _fuzzy_in(head, _CONTEXT_VERBS, threshold) and rest:
name = rest[0]
dictation = " ".join(rest[1:]).strip()
return Action("context", (name, dictation))
if head in _INDEX_WORDS:
return Action("select", _INDEX_WORDS[head])
if _fuzzy_in(head, _YES_VERBS, threshold):
if _fuzzy_in(head, ("yes", "yeah", "yep", "yup"), threshold):
return Action("yes")
if _fuzzy_in(head, _NO_VERBS, threshold):
if _fuzzy_in(head, ("no", "nope", "nah"), threshold):
return Action("no")
if _fuzzy_in(head, _APPROVE_VERBS, threshold):
if _fuzzy_in(head, ("approve", "allow"), threshold):
return Action("approve")
if _fuzzy_in(head, _DENY_VERBS, threshold):
if _fuzzy_in(head, ("deny", "reject"), threshold):
return Action("deny")
if _fuzzy_in(head, _SUBMIT_VERBS, threshold):
if _fuzzy_in(head, ("send", "enter", "submit"), threshold):
return Action("submit")
if _fuzzy_in(head, _CANCEL_VERBS, threshold):
if _fuzzy_in(head, ("cancel", "escape"), threshold):
return Action("cancel")
if _fuzzy_in(head, _SELECT_VERBS, threshold) and rest and rest[0] in _INDEX_WORDS:
return Action("select", _INDEX_WORDS[rest[0]])
if _fuzzy_in(head, _TYPE_VERBS, threshold):
if _fuzzy_in(head, ("type", "dictate", "write"), threshold):
text = " ".join(rest).strip()
return Action("type", text) if text else None
if _fuzzy_in(head, _BACKSPACE_VERBS, threshold):
return Action("backspace", _leading_count(rest, default=1))
if _fuzzy_in(head, _SPACE_VERBS, threshold):
return Action("space", _leading_count(rest, default=1))
if _fuzzy_in(head, _ADD_VERBS, threshold) and rest:
tail = [t for t in rest if t not in ("a", "an")]
if any(_fuzzy_in(t, ("space", "spaces"), threshold) for t in tail):
count = next((int(t) for t in tail if t.isdigit()),
next((_COUNT_WORDS[t] for t in tail if t in _COUNT_WORDS), 1))
return Action("space", count)
if _fuzzy_in(head, _ERASE_VERBS, threshold):
return Action("erase")
if _fuzzy_in(head, _DEBUG_VERBS, threshold):
return Action("debug", " ".join(rest).strip())
if _fuzzy_in(head, _MODE_VERBS, threshold) and rest:
if _fuzzy_in(head, ("mode",), threshold) and rest:
if _fuzzy_in(rest[0], ("ptt",), threshold) or "push" in rest[0]:
return Action("mode", "ptt")
if _fuzzy_in(rest[0], ("listen",), threshold):
@ -359,14 +172,8 @@ def match_command(remainder: str, threshold: float) -> Action | None:
return Action("set", name) if name else None
if _fuzzy_in(head, _UNSET_VERBS, threshold) and not rest:
return Action("unset")
if _fuzzy_in(head, _CUSTOMS_VERBS, threshold):
return Action("customs")
if _fuzzy_in(head, _COMMANDS_VERBS, threshold):
return Action("commands")
if _fuzzy_in(head, _LIST_VERBS, threshold):
return Action("list")
if _fuzzy_in(head, _VERSION_VERBS, threshold):
return Action("version")
return None
@ -384,28 +191,24 @@ def _strip_filler(tokens: list[str], filler: tuple[str, ...], threshold: float)
return tokens
def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
command_threshold: float, require_wake: bool,
filler: tuple[str, ...] = DEFAULT_FILLER) -> ParsedCommand | None:
def parse(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool, filler: tuple[str, ...] = DEFAULT_FILLER) -> ParsedCommand | None:
"""full parse: wake gate -> optional one-shot target -> filler -> command.
wake_threshold gates the wake phrase (lenient a false wake is cheap, it just
finds no command); command_threshold gates the command words (stricter a false
command fires the wrong action). returns a ParsedCommand (one_shot, action), or
None if the wake gate dropped the utterance (listen mode, no wake phrase). a
ParsedCommand with action=None means a wake phrase was present but no command
matched.
returns a ParsedCommand (one_shot, action), or None if the wake gate dropped the
utterance (listen mode, no wake phrase). a ParsedCommand with action=None means a
wake phrase was present but no command matched.
"""
remainder, wake = strip_wake_match(transcript, wake_phrases, wake_threshold, require_wake)
remainder = strip_wake(transcript, wake_phrases, threshold, require_wake)
if remainder is None:
return None
tokens = remainder.split(" ") if remainder else []
one_shot: str | None = None
if tokens and _fuzzy_in(tokens[0], _ONESHOT_VERBS, command_threshold) and len(tokens) >= 2:
if tokens and _fuzzy_in(tokens[0], _ONESHOT_VERBS, threshold) and len(tokens) >= 2:
one_shot = tokens[1]
tokens = tokens[2:]
tokens = _strip_filler(tokens, filler, command_threshold)
action = match_command(" ".join(tokens), command_threshold)
return ParsedCommand(one_shot=one_shot, action=action, wake=wake)
tokens = _strip_filler(tokens, filler, threshold)
action = match_command(" ".join(tokens), threshold)
return ParsedCommand(one_shot=one_shot, action=action)

View File

@ -45,18 +45,11 @@ class OutputHandler(ABC):
def send_literal(self, session: str, text: str) -> None:
"""emit literal text into the input box without submitting (``type``)"""
def send_repeat(self, session: str, token: str, count: int) -> None:
"""emit a named key `count` times (e.g. BSpace x n). default impl loops."""
if count <= 0:
return
self.send_named(session, [token] * count)
def perform(self, session: str, action) -> bool:
"""resolve a grammar.Action to keystrokes and emit them. returns acted?.
``switch``/``set``/``mode`` etc. are handled by the daemon (they change daemon
state, not the claude session), so they are ignored here. ``erase`` arrives
with action.arg already set to the count the daemon wants backspaced.
``switch`` and ``mode`` are handled by the daemon (they change daemon state,
not the claude session), so they are ignored here.
"""
name = action.name
if name == "yes":
@ -79,10 +72,6 @@ class OutputHandler(ABC):
self.send_named(session, seq)
elif name == "type":
self.send_literal(session, str(action.arg))
elif name == "space":
self.send_literal(session, " " * int(action.arg))
elif name in ("backspace", "erase"):
self.send_repeat(session, keys.BACKSPACE[0], int(action.arg))
else:
return False
return True

View File

@ -37,19 +37,6 @@ DENY = ["3"]
SUBMIT = ["Enter"]
CANCEL = ["Escape"]
# NEWLINE is a soft newline inside the input box that does NOT submit — Shift+Enter,
# which tmux names ``S-Enter`` (requires the extended-keys / xterm extkeys tmux
# settings install.sh appends). used to separate a context blurb from the dictated
# instruction in multiline assembly; if it proves flaky the daemon flattens to one
# line with a separator instead (behavior.context_multiline = false).
NEWLINE = ["S-Enter"]
# BACKSPACE deletes one char left; SPACE inserts one literal space. both are emitted
# repeatedly for `backspace <n>` / `space <n>` and for `erase` (n = the daemon's
# tracked uncommitted-input count). BSpace is tmux's name for the backspace key.
BACKSPACE = ["BSpace"]
SPACE = [" "]
SELECT_BY_INDEX = {
1: SELECT_1,
2: SELECT_2,

View File

@ -1,91 +0,0 @@
"""earcons — short confirmation tones on daemon events, the eyes-free feedback layer.
the single place that maps an event name to its tone file and the per-event enable
flag. additive to the console feed (it does not replace the printed lines): at the desk
mute tones and read; eyes-free, hear them. playback goes through audio_out (paplay-first,
fire-and-forget) so a dead speaker never blocks or breaks a command.
events:
wake a wake phrase was recognized (off by default a blip right before you
speak the command can bleed into its capture; keep it off unless wanted)
accept a command was recognized/injected
no_match nothing matched, or the target was missing (did nothing)
submit a send/submit was injected
tone files live in the packaged sounds/ dir; a per-event config override may point at a
user file instead. a missing file is swallowed by audio_out (logged once), never raised.
"""
from __future__ import annotations
import logging
from pathlib import Path
from . import audio_out
from .config import Config
log = logging.getLogger(__name__)
_SOUNDS_DIR = Path(__file__).resolve().parent / "sounds"
_EVENT_FILES = {
"wake": "wake.wav",
"accept": "accepted.wav",
"no_match": "no_match.wav",
"submit": "sent.wav",
}
_EVENT_FLAGS = {
"wake": "on_wake",
"accept": "on_accept",
"no_match": "on_no_match",
"submit": "on_submit",
}
class Earcons:
"""resolves daemon events to tones and plays them per the [sound] config"""
def __init__(self, config: Config) -> None:
self._apply(config)
def update(self, config: Config) -> None:
"""re-read the [sound] config after a live reload"""
self._apply(config)
def _apply(self, config: Config) -> None:
self.enabled = config.sound_enabled
self.volume = config.sound_volume
self._flags = {
"wake": config.sound_on_wake,
"accept": config.sound_on_accept,
"no_match": config.sound_on_no_match,
"submit": config.sound_on_submit,
}
self._overrides = dict(config.sound_files)
def _resolve(self, event: str) -> Path | None:
override = self._overrides.get(event) or self._overrides.get(_EVENT_FLAGS[event])
if override:
return Path(override).expanduser()
name = _EVENT_FILES.get(event)
return _SOUNDS_DIR / name if name else None
def play(self, event: str) -> None:
"""play the tone for an event if enabled (master + per-event). fire-and-forget;
unknown/disabled events and missing files are silently no-ops."""
if not self.enabled or not self._flags.get(event, False):
return
path = self._resolve(event)
if path is None:
return
audio_out.play(path, volume=self.volume, blocking=False)
def tone_path(self, event: str) -> Path | None:
"""the resolved tone path for an event (for test-tone), ignoring enable flags"""
return self._resolve(event)
def event_names() -> list[str]:
"""the earcon event names in a stable order (for test-tone iteration)"""
return ["wake", "accept", "no_match", "submit"]

View File

@ -1 +0,0 @@
"""earcon tone assets (committed .wav files) + their generator (generate.py)"""

Binary file not shown.

View File

@ -1,75 +0,0 @@
"""synthetic-beep FALLBACK generator for the earcon .wav tones.
WARNING: the shipped tones in this directory are now CUSTOM CURATED recordings
(edge-trimmed + loudness-normalized to ~-16 dB RMS with a -1 dBTP ceiling), NOT this
script's output. running this script OVERWRITES those real tones with plain synthetic
beeps only do so if you deliberately want to fall back to generated placeholders. it
is kept as a bootstrap fallback so the package can always self-generate a tone set (the
"a missing tone must never break a command" guarantee), not as the source of the
committed wavs.
run ``python -m claudedo.sounds.generate`` (or ``python generate.py`` from this dir) to
write placeholder beeps. each is a short, quiet, fade-enveloped sine/triangle at a
distinct pitch so the four events are ear-distinguishable:
wake soft single mid blip (off by default; least intrusive)
accepted bright single high note (heard you, sent it)
no_match low two-note falling buzz (heard you, but nothing matched / error)
sent two-note rising chime (submitted to claude)
kept SHORT (<300ms) and quiet (amplitude 0.4) confirmations, not alarms.
"""
from __future__ import annotations
import struct
import wave
from pathlib import Path
SAMPLE_RATE = 44100
AMPLITUDE = 0.4
HERE = Path(__file__).resolve().parent
def _tone(freq: float, dur: float) -> list[float]:
import math
n = int(SAMPLE_RATE * dur)
fade = max(1, int(SAMPLE_RATE * 0.01))
out = []
for i in range(n):
env = min(1.0, i / fade, (n - i) / fade)
out.append(math.sin(2.0 * math.pi * freq * (i / SAMPLE_RATE)) * env * AMPLITUDE)
return out
def _silence(dur: float) -> list[float]:
return [0.0] * int(SAMPLE_RATE * dur)
def _write(name: str, samples: list[float]) -> Path:
path = HERE / name
with wave.open(str(path), "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(SAMPLE_RATE)
clipped = (max(-1.0, min(1.0, s)) for s in samples)
wf.writeframes(b"".join(struct.pack("<h", int(s * 32767)) for s in clipped))
return path
def generate() -> list[Path]:
"""(re)write all earcon wavs; return the written paths"""
tones = {
"wake.wav": _tone(660.0, 0.12),
"accepted.wav": _tone(988.0, 0.14),
"no_match.wav": _tone(330.0, 0.10) + _silence(0.03) + _tone(247.0, 0.12),
"sent.wav": _tone(784.0, 0.10) + _silence(0.02) + _tone(1175.0, 0.12),
}
return [_write(name, samples) for name, samples in tones.items()]
if __name__ == "__main__":
for p in generate():
print(f"wrote {p}")

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -6,105 +6,31 @@ short in-memory chunk; nothing is written to disk or sent anywhere.
from __future__ import annotations
import contextlib
import logging
import os
import re
import sys
import numpy as np
log = logging.getLogger(__name__)
_NOISE = re.compile(r"GPU device discovery failed|device_discovery\.cc|DiscoverDevicesForPlatform")
def _quiet_backends() -> None:
"""quiet onnxruntime/ctranslate2 chatter and the faster_whisper INFO log.
faster-whisper's VAD loads an onnx model whose device discovery prints a noisy
'GPU device discovery failed' warning on headless/WSL hosts with no GPU sysfs.
the env var + logger severity stop most onnx logging; the warning itself is
emitted at C++ init and is filtered out of stderr by _filter_stderr().
"""
os.environ.setdefault("ORT_LOGGING_LEVEL", "3")
os.environ.setdefault("OMP_NUM_THREADS", os.environ.get("OMP_NUM_THREADS", "4"))
logging.getLogger("faster_whisper").setLevel(logging.WARNING)
try:
import onnxruntime
onnxruntime.set_default_logger_severity(3)
except Exception:
pass
@contextlib.contextmanager
def _filter_stderr():
"""drop onnxruntime's GPU-discovery warning lines from stderr for this block.
a pipe temporarily replaces fd 2; a pump thread forwards every line to the real
stderr EXCEPT the known GPU-discovery noise, so real errors still surface. the
original fd is always restored on exit.
"""
import threading
try:
stderr_fd = sys.stderr.fileno()
except (AttributeError, OSError):
yield
return
saved_fd = os.dup(stderr_fd)
read_fd, write_fd = os.pipe()
os.dup2(write_fd, stderr_fd)
os.close(write_fd)
def pump():
with os.fdopen(read_fd, "rb") as reader, os.fdopen(saved_fd, "wb", closefd=False) as out:
for line in reader:
if not _NOISE.search(line.decode("utf-8", "replace")):
out.write(line)
out.flush()
thread = threading.Thread(target=pump, daemon=True)
thread.start()
try:
yield
finally:
import time
time.sleep(0.05)
os.dup2(saved_fd, stderr_fd)
os.close(saved_fd)
thread.join(timeout=1.0)
class Transcriber:
"""a loaded faster-whisper model that transcribes float32 mono audio chunks"""
def __init__(self, model: str = "small", language: str = "en", device: str = "auto",
compute_type: str = "auto", initial_prompt: str | None = None) -> None:
compute_type: str = "auto") -> None:
self.language = language
self.initial_prompt = initial_prompt
self._model = self._load(model, device, compute_type)
self._warm()
@staticmethod
def _load(model: str, device: str, compute_type: str):
from faster_whisper import WhisperModel
if device == "auto":
device = "cpu"
if compute_type == "auto":
compute_type = "int8" if device == "cpu" else "float16"
log.info("loading faster-whisper model=%s device=%s compute=%s", model, device, compute_type)
with _filter_stderr():
_quiet_backends()
from faster_whisper import WhisperModel
return WhisperModel(model, device=device, compute_type=compute_type)
def _warm(self) -> None:
"""run one throwaway transcribe so the VAD onnx session inits now, under the
stderr filter the GPU-discovery warning fires here once, not in the loop"""
with _filter_stderr():
list(self._model.transcribe(np.zeros(1600, dtype=np.float32), vad_filter=True)[0])
return WhisperModel(model, device=device, compute_type=compute_type)
def transcribe(self, audio: np.ndarray, samplerate: int = 16000) -> str:
"""transcribe a mono float32 numpy array to a stripped text string.
@ -121,7 +47,6 @@ class Transcriber:
beam_size=1,
vad_filter=True,
condition_on_previous_text=False,
initial_prompt=self.initial_prompt,
)
text = " ".join(seg.text for seg in segments).strip()
return text

View File

@ -75,57 +75,17 @@ def session_exists(name: str) -> bool:
def list_sessions() -> list[str]:
"""return the names of all running claude-* tmux sessions (sorted)"""
return sorted(name for name, _attached in _claude_sessions())
def _claude_sessions() -> list[tuple[str, bool]]:
"""the single tmux query for claude-* sessions: (name, attached) pairs.
one source of truth for session enumeration list_sessions() and the detached
cleanup both build on this. attached is True when at least one client is attached
(tmux #{session_attached} > 0). returns [] if tmux isn't reachable.
"""
result = subprocess.run(
["tmux", "list-sessions", "-F", "#{session_name} #{session_attached}"],
["tmux", "list-sessions", "-F", "#{session_name}"],
stdout=subprocess.PIPE, stderr=subprocess.DEVNULL,
)
if result.returncode != 0:
return []
out: list[tuple[str, bool]] = []
for line in result.stdout.decode("utf-8", "replace").splitlines():
parts = line.rsplit(" ", 1)
if len(parts) != 2:
continue
name, attached = parts
if name.startswith(SESSION_PREFIX):
out.append((name, attached.strip() != "0"))
return out
names = result.stdout.decode("utf-8", "replace").splitlines()
return sorted(n for n in names if n.startswith(SESSION_PREFIX))
def cleanup_detached() -> tuple[list[str], list[str]]:
"""kill every DETACHED claude-* session, never an attached one. returns the
(killed, kept_attached) name lists (both sorted) for reporting.
detached-only is the safety model: a misheard voice ``cleanup`` cannot nuke the
active session, which is attached. the kill-including-attached path stays the shell
``cckl`` (deliberate, typed).
"""
killed: list[str] = []
kept: list[str] = []
for name, attached in _claude_sessions():
if attached:
kept.append(name)
continue
result = subprocess.run(
["tmux", "kill-session", "-t", name],
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
)
if result.returncode == 0:
killed.append(name)
return sorted(killed), sorted(kept)
def resolve(one_shot: str | None = None, auto_target: bool = False) -> tuple[str | None, str]:
def resolve(one_shot: str | None = None) -> tuple[str | None, str]:
"""resolve the destination session and a short reason describing the choice.
single source of truth for targeting, used by both the voice and CLI paths.
@ -135,9 +95,7 @@ def resolve(one_shot: str | None = None, auto_target: bool = False) -> tuple[str
1. one-shot present -> claude-<name> for THIS command only; never falls through
to a different session if it doesn't exist (explicit beats convenience).
2. sticky set + exists -> use it.
3. nothing sticky, exactly one claude-* session:
auto_target=True -> auto-use it;
auto_target=False -> require an explicit set/target, do nothing.
3. nothing sticky, exactly one claude-* session -> auto-use it.
4. nothing sticky, multiple sessions -> ambiguous, do nothing.
5. nothing sticky, zero sessions -> do nothing.
"""
@ -155,9 +113,7 @@ def resolve(one_shot: str | None = None, auto_target: bool = False) -> tuple[str
sessions = list_sessions()
if len(sessions) == 1:
if auto_target:
return sessions[0], f"auto-target {sessions[0]} (only session)"
return None, f"no target set ({sessions[0]} running — set one)"
return sessions[0], f"auto-target {sessions[0]} (only session)"
if len(sessions) > 1:
return None, f"no target set, {len(sessions)} sessions (set one)"
return None, "no claude sessions"