Compare commits

..

No commits in common. "main" and "v0.1.3" have entirely different histories.
main ... v0.1.3

23 changed files with 71 additions and 1180 deletions

106
README.md
View File

@ -21,7 +21,7 @@ mic (WSLg/PulseAudio RDPSource)
-> faster-whisper (local STT, on-device)
-> wake gate: utterance must start with a wake phrase, else DISCARD locally
-> grammar match (yes/no/one..four/approve/deny/send/type/space/backspace/erase/
mode/set/target/unset/list/context/reload/system/cancel)
mode/set/target/unset/list/cancel)
-> resolve target session (one-shot > sticky ~/.claude-active > auto/none)
-> tmux send-keys -t <session> "<keys>"
-> log the action to the watched terminal ([session]/[SYSTEM]/[VOICE], colored)
@ -79,13 +79,10 @@ claudedo start --check # run a mic check before listening
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo status # running? mode? target session?
claudedo stop # stop a running daemon
claudedo reload # reload config.toml + contexts.toml in a running daemon
claudedo set <name> # set the sticky target -> claude-<name> (alias: switch)
claudedo unset # clear the sticky target
claudedo list # list running claude-* sessions
claudedo cleanup # kill DETACHED claude-* sessions (never attached)
claudedo test-audio # verify the mic capture path
claudedo test-tone # play each earcon (verify the audio-OUT path)
```
### Modes
@ -110,9 +107,7 @@ Wake phrases (listen mode), fuzzy-matched. The default list is **"claudedo"**,
no token for the coined word "claudedo" and renders it as real words ("claude do"),
so that spelling is listed explicitly. Matching is lenient (case/space-insensitive).
Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode
the wake phrase is optional. When a command's wake phrase matched loosely (e.g. you
said "okay clouds"), the heard line notes which phrase it assumed —
`heard "okay clouds list" -> LIST (wake: okay claude)`.
the wake phrase is optional.
| Say | Does |
|---|---|
@ -130,14 +125,6 @@ said "okay clouds"), the heard line notes which phrase it assumed —
| `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged |
| `unset` (alias `unsticky`) | clear the sticky target |
| `list` | list running `claude-*` sessions to the daemon console |
| `context <name> <instruction>` (alias `prepare`) | inject a `contexts.toml` blurb as a preamble + the dictated instruction, then **wait** (no submit — say "send") |
| `reload` | re-read `config.toml` + `contexts.toml` live (no daemon restart, model stays loaded) |
| `system status` | print mode / target / model / context count to the console (daemon-control; never injects) |
| `system reload [config\|contexts]` | reload one or both config files |
| `cleanup` (alias `detached`/`detach`, also `system cleanup`) | kill **detached** `claude-*` sessions only — never an attached one |
| `commands` (alias `help`/`menu`) | print the voice-command menu to the console |
| `customs` (alias `custom`) | list the loaded context names |
| `version` | print the claudedo version to the console |
| `cancel` / `escape` | back out of a prompt |
Optional filler (`select` / `use` / `choose`) may precede any command and is ignored:
@ -177,74 +164,9 @@ cc <name> # attach/create claude-<name>; writes ~/.claude-active
ccr <name> # re-attach an existing claude-<name> only
ccl # list claude-* sessions
cck <name> # kill claude-<name>
ccclean # kill DETACHED claude-* sessions only (never attached) — safe cleanup
cckl # kill ALL claude-* sessions (including attached)
cckl # kill all claude-* sessions
```
## Contexts (named reference blurbs)
`contexts.toml` holds named reference snippets you can inject ahead of a dictated
instruction with the **`context <name> <instruction>`** voice command (alias
`prepare`). It lives next to `config.toml`
(`$CLAUDEDO_CONTEXTS` → `~/.config/claudedo/contexts.toml``./contexts.toml`); a
missing file just means no contexts (the feature is opt-in).
```toml
[contexts]
webhooks = "discord webhooks — test: <url> (safe to spam), live: <url> (real, careful)"
testing = "use the test/staging resources only, never touch prod"
```
Saying `context webhooks send a test message` injects the `webhooks` blurb as a
preamble, then the dictated instruction, and **waits** — nothing is auto-submitted. You
say `send` to submit (**read-before-send**; Claude's own permission prompt is the
backstop for anything consequential). A bare `context webhooks` injects just the blurb.
One context per command (no stacking yet); an unknown name announces and injects
nothing.
Names are **spoken and fuzzy-matched**, so keep them simple and distinct — they're
looked up on a despaced/lowercased key, so `web hooks` / `web-hooks` / `webhooks` all
resolve the same block. Assembly is config-gated: `behavior.context_multiline` (default
`true`) puts the blurb and instruction on separate lines via a Shift+Enter soft newline;
set it `false` to flatten onto one line with `context_separator` (default `" — "`) if
Shift+Enter is unreliable in your terminal.
Edit `contexts.toml`, then say **`reload`** (or run `claudedo reload`) — it re-reads
`config.toml` and `contexts.toml` live without restarting the daemon or reloading the
Whisper model. The **`system`** namespace gives daemon-control by voice without touching
Claude: `system status` (mode / target / model / context count) and `system reload
[config|contexts]`.
## Earcons (audio feedback tones)
Short confirmation tones play on key events so you get **eyes-free feedback** — "did it
hear me?" — without watching the terminal. They're tones, not speech (not TTS): a bright
blip when a command is accepted/injected, a low buzz when nothing matched, a rising chime
on submit, and an optional blip on wake. Tones are short (<300ms) and quiet, and they're
**additive** to the console feed — mute them and read at the desk, or hear them eyes-free.
Verify the audio-OUT path (the reverse of `test-audio`, and the less-tested direction on
WSLg) with:
```bash
claudedo test-tone # plays each tone through WSLg — the audio-out gate
```
Tones play through WSLg's PulseAudio sink, **paplay-first** (a separate process, so it
doesn't contend with the sounddevice mic stream), falling back to in-process sounddevice,
then `powershell.exe` on the Windows host. Playback is **fire-and-forget**: a dead speaker
or a missing tone file logs once and is ignored — audio-out can never block or break a
command (`claudedo yes` injects whether or not the speaker works).
Configure under `[sound]`: `enabled` (master, default on), per-event `on_wake` (default
**off** — a blip right before you speak can bleed into the command capture, and it's
chatty), `on_accept` / `on_no_match` / `on_submit` (default on), and `volume` (0.01.0,
best-effort — scaled for sounddevice, `--volume` for paplay, ignored by the PowerShell
fallback). A `[sound.files]` table can point any event at your own `.wav`. The shipped
tones live in the package (`claudedo/sounds/*.wav`); `claudedo/sounds/generate.py` is a
synthetic-beep fallback that can regenerate a placeholder set (it does **not** reproduce
the shipped tones — running it overwrites them with plain beeps).
## The confirmed Claude Code keymap
The keystrokes in [`keys.py`](src/claudedo/keys.py) were confirmed **empirically**
@ -265,35 +187,27 @@ If Claude Code changes its prompt UI, re-confirm against a live session and upda
Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]`
(`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`).
The default model is **`small.en`** (the English-only small model — ~1s/command on a
strong CPU, more accurate on English than multilingual `small` at the same speed);
`medium`/`medium.en` are more accurate but ~3× slower (noticeable lag), `base.en` is
snappier/less accurate, `large-v3` most accurate/slowest. Every `heard` line shows the
STT latency as `(<ms>/<audio>s)` so you can see what a model change costs. VAD
endpointing ends a capture after `[vad].silence_ms` (700) of trailing silence, capped
at `max_seconds` (15). `claudedo -c <path> ...` points at a specific config; otherwise
it searches
`$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then `./config.toml`.
The default model is **`medium`** (best accuracy for the coined wake word on a strong
CPU); `small` is faster/less accurate, `large-v3` most accurate. `claudedo -c <path>
...` points at a specific config; otherwise it searches `$CLAUDEDO_CONFIG`,
`~/.config/claudedo/config.toml`, then `./config.toml`.
- **STT biasing.** The transcriber is seeded with an `initial_prompt` built from the
configured wake phrases + command vocabulary (one source — `grammar.vocabulary()`),
so Whisper is conditioned to expect "claudedo" and the command words.
- **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.65`, lenient) vs
- **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.6`, lenient) vs
`command_fuzzy_threshold` (default `0.8`, tight). The asymmetry is deliberate: a
false *wake* is cheap (it wakes, finds no command, does nothing), but a false
*command* fires the wrong action. Prefer expanding command synonyms over loosening
the command threshold.
- **`[vad]` endpointing.** Capture starts on speech and ends after `silence_ms`
(default 700) of trailing silence — Alexa-style record-until-pause — capped at
`max_seconds` (default 15). The pause both ends a command and separates it from
(default 800) of trailing silence — Alexa-style record-until-pause — capped at
`max_seconds` (default 10). The pause both ends a command and separates it from
following chatter (the chatter is a separate capture the wake gate discards).
- **`auto_target`** (default `false`): with no sticky target and one session running,
`false` does nothing and asks you to `set`; `true` auto-uses that session.
- **`print_heard`** (default `false`, debug): prints non-wake transcripts so you can
see how Whisper renders your wake word, then tune the wake list/threshold.
- **`context_multiline`** (default `true`) / **`context_separator`** (default `" — "`):
how the `context` command assembles the blurb and instruction — a Shift+Enter soft
newline between them, or (when `false`) flattened onto one line with the separator.
## Requirements

View File

@ -21,12 +21,10 @@ mode = "listen"
ptt_key = "space"
[stt]
# faster-whisper model size. "small.en" is the default — the English-only small model
# (~1s/command on a strong cpu, more accurate on english than multilingual "small" at
# the same speed). "medium"/"medium.en" are more accurate but ~3x slower (noticeable
# lag); "large-v3" is most accurate and slowest. drop to "base.en" for max snappiness
# (less accurate). bump only if recognition is poor.
model = "small.en"
# faster-whisper model size. "medium" is the default — biggest accuracy gain for the
# coined wake word ("claudedo" / "claude do") and fine on a strong cpu. "small" is
# faster but less accurate; "large-v3" is most accurate if medium still struggles.
model = "medium"
language = "en"
# mic device: "auto", or a sounddevice device index (integer) / substring of a
# device name. run `claudedo test-audio` to list devices.
@ -48,10 +46,9 @@ min_utterance = 0.3
# onset and ends after this much trailing silence — the natural end of an utterance.
# a real pause both ends the command AND separates it from following chatter (the
# chatter becomes a separate capture that the wake gate then discards).
silence_ms = 700
# hard cap so continuous noise can't record forever (also the ceiling for a long
# dictated `type` phrase).
max_seconds = 15.0
silence_ms = 800
# hard cap so continuous noise can't record forever.
max_seconds = 10.0
[behavior]
# dictation never auto-submits: "type <phrase>" inserts literal text only; you say
@ -61,7 +58,7 @@ type_autosend = false
# wakes, finds no command, does nothing), so wake is lenient; a false COMMAND fires
# the WRONG action, so commands stay tight. lower = more lenient = more matches.
# prefer expanding command synonyms over loosening command_fuzzy_threshold.
wake_fuzzy_threshold = 0.65
wake_fuzzy_threshold = 0.6
command_fuzzy_threshold = 0.8
# optional filler words that may precede a command and are ignored for matching:
# "select yes" / "use yes" behave like "yes". (a filler word followed by a digit is
@ -78,42 +75,3 @@ auto_target = false
# how Whisper renders your wake word, then turn it OFF. default false: non-wake speech
# is discarded without ever printing the transcript.
print_heard = false
# how the `context <name> <dictation>` command assembles the blurb + instruction.
# true (default): blurb, a soft newline (Shift+Enter — needs the extended-keys tmux
# settings install.sh appends), then the instruction. if Shift+Enter is at all flaky
# in your terminal (it submits or does nothing), set false to flatten onto one line
# with context_separator between blurb and instruction — the blank line is cosmetic,
# not worth a submit risk. either way the assembled text is NEVER auto-submitted.
context_multiline = true
# separator inserted between blurb and instruction when context_multiline = false.
context_separator = " — "
# the `cleanup` / `detached` command kills DETACHED claude-* sessions only (never an
# attached one — a misheard cleanup can't nuke the active session). default false:
# kill immediately (it's detached-only, so it's safe). set true to announce the
# detached set and wait for a following `confirm` before killing.
cleanup_confirm = false
[sound]
# earcons — short confirmation tones on daemon events so you get eyes-free feedback
# ("did it hear me?") without watching the terminal. tones are SHORT (<300ms) and quiet;
# they play OUT through WSLg's PulseAudio sink (paplay-first, sounddevice fallback, then
# powershell.exe). additive to the console feed — mute these and read at the desk, or
# hear them eyes-free. a dead speaker never blocks/breaks a command (fire-and-forget).
enabled = true
# blip when a wake phrase is recognized. OFF by default: a blip right before you speak
# the command can bleed into its capture, and it's chatty. turn on only if you want it.
on_wake = false
# positive blip when a command is recognized/injected.
on_accept = true
# distinct lower buzz when nothing matched or the target was missing (did nothing).
on_no_match = true
# rising chime when a send/submit is injected.
on_submit = true
# best-effort 0.0-1.0 (scaled for sounddevice, --volume for paplay; ignored by the
# powershell fallback, which has no volume control).
volume = 0.5
# optional per-event overrides to swap in your own .wav files, e.g.:
# [sound.files]
# accept = "~/sounds/my_accept.wav"
[sound.files]

View File

@ -1,18 +0,0 @@
# claudedo contexts — named reference blurbs you can inject ahead of a dictated
# instruction with the `context <name> <instruction>` voice command (alias `prepare`).
#
# the named blurb is injected as a preamble, then your dictated instruction, and the
# daemon WAITS — nothing is auto-submitted. you say "send" to submit (read-before-send;
# claude's own permission prompt is the backstop for anything consequential).
#
# names are SPOKEN and fuzzy-matched, so keep them simple, distinct, single words
# (a-z, 0-9; spaces/hyphens/underscores are stripped for matching, so "web hooks",
# "web-hooks" and "webhooks" all resolve the same block). values are free-form text.
#
# edit this file, then say "reload" (or run `claudedo reload`) — no daemon restart,
# the whisper model is not reloaded.
[contexts]
webhooks = "discord webhooks — test: <url> (safe to spam), live: <url> (real, careful)"
testing = "use the test/staging resources only, never touch prod"
discord = "discord.py 2.x; bot token in .env as BOT_TOKEN; guild id 12345"

View File

@ -57,14 +57,9 @@ say "verifying audio path"
if pactl info >/dev/null 2>&1; then
DEFAULT_SRC="$(pactl info | sed -n 's/^Default Source: //p')"
echo " Default Source: ${DEFAULT_SRC:-<none>}"
DEFAULT_SINK="$(pactl info | sed -n 's/^Default Sink: //p')"
echo " Default Sink: ${DEFAULT_SINK:-<none>}"
if ! pactl list sources short 2>/dev/null | grep -q RDPSource; then
warn "RDPSource not listed by pactl — mic may not be bridged. check Windows mic permission."
fi
if ! pactl list sinks short 2>/dev/null | grep -q RDPSink; then
warn "RDPSink not listed by pactl — earcon/TTS audio-OUT may not play. run 'claudedo test-tone' to check."
fi
else
warn "pactl info failed — pulseaudio-utils installed but no server reachable yet."
fi
@ -111,18 +106,6 @@ else
echo " $CONF_DIR/config.toml already current"
fi
# install the contexts.toml template (named blurbs for the `context` voice command).
# same policy: copy only if absent, else drop a .new — never clobber edited contexts.
if [ ! -f "$CONF_DIR/contexts.toml" ]; then
install -m 0644 "$REPO_DIR/contexts.toml" "$CONF_DIR/contexts.toml"
echo " wrote $CONF_DIR/contexts.toml"
elif ! cmp -s "$REPO_DIR/contexts.toml" "$CONF_DIR/contexts.toml"; then
install -m 0644 "$REPO_DIR/contexts.toml" "$CONF_DIR/contexts.toml.new"
echo " kept your $CONF_DIR/contexts.toml; new default written to contexts.toml.new (diff to merge)"
else
echo " $CONF_DIR/contexts.toml already current"
fi
# wire EVERY rc that exists (the user may have both zsh and bash).
wired_any=0
for RC in "$HOME/.zshrc" "$HOME/.bashrc"; do

View File

@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "claudedo"
version = "0.2.2"
version = "0.1.3"
description = "voice-control daemon for claude code (local STT -> tmux send-keys)"
readme = "README.md"
requires-python = ">=3.10"
@ -23,9 +23,6 @@ claudedo = "claudedo.__main__:main"
[tool.setuptools]
package-dir = { "" = "src" }
[tool.setuptools.package-data]
"claudedo.sounds" = ["*.wav"]
[tool.setuptools.packages.find]
where = ["src"]

View File

@ -11,8 +11,7 @@
# ccr <name> reattach only (error if it doesn't exist); writes ~/.claude-active
# ccl list running claude- sessions
# cck <name> kill claude-<name>
# ccclean kill DETACHED claude- sessions only (never attached) — safe cleanup
# cckl kill ALL claude- sessions (including attached)
# cckl kill ALL claude- sessions
cc() {
if [ -z "$1" ]; then
@ -61,34 +60,6 @@ cck() {
fi
}
ccclean() {
killed=""
kept=""
while read -r name attached; do
case "$name" in
claude-*) ;;
*) continue ;;
esac
if [ "$attached" = "0" ]; then
if tmux kill-session -t "$name" 2>/dev/null; then
killed="${killed:+$killed, }$name"
fi
else
kept="${kept:+$kept, }$name"
fi
done <<EOF
$(tmux list-sessions -F '#{session_name} #{session_attached}' 2>/dev/null)
EOF
if [ -z "$killed" ]; then
echo "nothing to clean (no detached sessions)"
else
n=$(printf '%s' "$killed" | awk -F', ' '{print NF}')
msg="killed $killed ($n detached)"
[ -n "$kept" ] && msg="$msg; kept $kept (attached)"
echo "$msg"
fi
}
cckl() {
tmux ls 2>/dev/null | grep '^claude-' | cut -d: -f1 | while read -r s; do
tmux kill-session -t "$s" && echo "killed $s"

View File

@ -1,3 +1,3 @@
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)"""
__version__ = "0.2.2"
__version__ = "0.1.3"

View File

@ -97,56 +97,6 @@ def cmd_stop(_args: argparse.Namespace) -> int:
return 1
def cmd_test_tone(args: argparse.Namespace) -> int:
config = _load_or_die(args.config)
from . import audio_out, sound
print("== claudedo test-tone ==")
if not audio_out.available():
print("no audio-out backend found (paplay / powershell.exe).", file=sys.stderr)
print("install pulseaudio-utils (run install.sh) for paplay.", file=sys.stderr)
return 1
earcons = sound.Earcons(config)
print(f"playing each tone via WSLg audio-out (volume {config.sound_volume}) — listen ...")
ok = True
for event in sound.event_names():
path = earcons.tone_path(event)
if path is None or not Path(path).is_file():
print(f" {event:9} MISSING ({path})")
ok = False
continue
print(f" {event:9} {path.name} ...", flush=True)
played = audio_out.play_blocking(path, volume=config.sound_volume)
if not played:
print(f" {event:9} FAILED to play", file=sys.stderr)
ok = False
if not ok:
print("some tones did not play — audio-out may be unavailable.", file=sys.stderr)
return 1
print("audio-out OK (all tones played).")
return 0
def cmd_cleanup(_args: argparse.Namespace) -> int:
killed, kept = target.cleanup_detached()
if not killed:
print("nothing to clean (no detached sessions)")
return 0
msg = f"killed {', '.join(killed)}"
if kept:
msg += f"; kept {', '.join(kept)} (attached)"
print(msg)
return 0
def cmd_reload(_args: argparse.Namespace) -> int:
if daemon.reload_running():
print("signalled claudedo to reload config + contexts")
return 0
print("claudedo is not running")
return 1
def cmd_status(_args: argparse.Namespace) -> int:
pid = daemon.read_pid()
if pid is None:
@ -272,17 +222,11 @@ def build_parser() -> argparse.ArgumentParser:
sp.set_defaults(func=cmd_start)
sub.add_parser("stop", help="stop a running daemon").set_defaults(func=cmd_stop)
sub.add_parser("reload", help="reload config + contexts in a running daemon"
).set_defaults(func=cmd_reload)
sub.add_parser("status", help="show daemon status").set_defaults(func=cmd_status)
sub.add_parser("test-audio", help="verify the mic capture path").set_defaults(func=cmd_test_audio)
sub.add_parser("test-tone", help="play each earcon (verify the audio-out path)"
).set_defaults(func=cmd_test_tone)
sub.add_parser("install", help="re-run the bootstrap (install.sh)").set_defaults(func=cmd_install)
sub.add_parser("unset", help="clear the sticky target session").set_defaults(func=cmd_unset)
sub.add_parser("list", help="list running claude-* sessions").set_defaults(func=cmd_list)
sub.add_parser("cleanup", help="kill detached claude-* sessions (never attached)"
).set_defaults(func=cmd_cleanup)
for verb in ("set", "switch"):
sp_set = sub.add_parser(verb, help="set the sticky target session")

View File

@ -1,158 +0,0 @@
"""audio output — play short .wav files through the WSLg/PulseAudio sink (RDPSink).
the reverse direction of audio.py's mic capture, and the less-tested path on WSLg. a
three-tier player picks the first backend that works and remembers it:
1. paplay (pulseaudio-utils) a SEPARATE process hitting PulseAudio directly. this
is the primary on purpose: the daemon captures with sounddevice (an open input
stream in listen mode), so keeping OUTPUT in a separate process avoids stacking
input+output in one lib on a bridge known to be duplex-flaky.
2. sounddevice sd.play() in-process fallback if paplay is absent.
3. powershell.exe SoundPlayer last resort via the Windows host (no volume control).
both earcons (sound.py) and future v0.3 TTS readback play through this module keep it
generic (it plays a wav path, it knows nothing about events). playback is fire-and-
forget on a worker thread: a missing file or a dead speaker logs once and is swallowed,
never raised, so audio-out can NEVER block or break the inject path.
"""
from __future__ import annotations
import logging
import shutil
import subprocess
import threading
import wave
from pathlib import Path
log = logging.getLogger(__name__)
_PAPLAY = "paplay"
_POWERSHELL = "powershell.exe"
_backend_lock = threading.Lock()
_chosen_backend: str | None = None
_warned = False
def _have(cmd: str) -> bool:
return shutil.which(cmd) is not None
def _clamp_volume(volume: float) -> float:
return max(0.0, min(1.0, float(volume)))
def _play_paplay(path: Path, volume: float) -> bool:
"""play via paplay; volume scaled through --volume (0-65536 linear)"""
vol = int(_clamp_volume(volume) * 65536)
proc = subprocess.run(
[_PAPLAY, f"--volume={vol}", str(path)],
stdout=subprocess.DEVNULL, stderr=subprocess.PIPE,
)
if proc.returncode != 0:
log.debug("paplay failed: %s", proc.stderr.decode("utf-8", "replace").strip())
return False
return True
def _play_sounddevice(path: Path, volume: float) -> bool:
"""play via sounddevice (in-process fallback); volume scales the samples"""
try:
import numpy as np
import sounddevice as sd
except Exception as exc:
log.debug("sounddevice unavailable: %s", exc)
return False
try:
with wave.open(str(path), "rb") as wf:
sr = wf.getframerate()
frames = wf.readframes(wf.getnframes())
data = np.frombuffer(frames, dtype="<i2").astype(np.float32) / 32768.0
data = data * _clamp_volume(volume)
sd.play(data, sr)
sd.wait()
return True
except Exception as exc:
log.debug("sounddevice playback failed: %s", exc)
return False
def _play_powershell(path: Path, _volume: float) -> bool:
"""play via the Windows host (last resort). SoundPlayer has no volume control,
so volume is ignored on this backend (documented best-effort)."""
if not _have(_POWERSHELL):
return False
try:
win = subprocess.run(["wslpath", "-w", str(path)], stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL)
winpath = win.stdout.decode("utf-8", "replace").strip() if win.returncode == 0 else str(path)
script = f"(New-Object Media.SoundPlayer '{winpath}').PlaySync()"
proc = subprocess.run([_POWERSHELL, "-NoProfile", "-Command", script],
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
return proc.returncode == 0
except Exception as exc:
log.debug("powershell playback failed: %s", exc)
return False
_BACKENDS = {
"paplay": _play_paplay,
"sounddevice": _play_sounddevice,
"powershell": _play_powershell,
}
_ORDER = ("paplay", "sounddevice", "powershell")
def _play_sync(path: Path, volume: float) -> bool:
"""play a wav synchronously, choosing/remembering a working backend. returns
whether playback succeeded; never raises."""
global _chosen_backend, _warned
if not path.is_file():
log.debug("tone file missing: %s", path)
return False
with _backend_lock:
order = (_chosen_backend,) + _ORDER if _chosen_backend else _ORDER
tried = []
for name in order:
if name in tried:
continue
tried.append(name)
if name == "paplay" and not _have(_PAPLAY):
continue
if _BACKENDS[name](path, volume):
with _backend_lock:
_chosen_backend = name
return True
with _backend_lock:
if not _warned:
_warned = True
log.warning("audio-out unavailable (tried %s) — continuing silently; "
"tones disabled for this run", ", ".join(tried))
return False
def play(path: str | Path, volume: float = 1.0, blocking: bool = False) -> None:
"""play a wav file. fire-and-forget by default (a worker thread), so a slow or
dead speaker never delays the caller. set blocking=True only for test-tone, where
we want to play tones in sequence and report the result.
failures are swallowed (logged once) audio-out must never break a command.
"""
p = Path(path)
if blocking:
_play_sync(p, volume)
return
threading.Thread(target=_play_sync, args=(p, volume), daemon=True).start()
def play_blocking(path: str | Path, volume: float = 1.0) -> bool:
"""synchronous play that returns success — for test-tone's audio-out gate"""
return _play_sync(Path(path), volume)
def available() -> bool:
"""true if any audio-out backend is present (best-effort, paplay/powershell)"""
return _have(_PAPLAY) or _have(_POWERSHELL)

View File

@ -17,10 +17,7 @@ except ModuleNotFoundError:
log = logging.getLogger(__name__)
_VALID_MODES = ("listen", "ptt")
_VALID_MODELS = (
"tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3",
"tiny.en", "base.en", "small.en", "medium.en",
)
_VALID_MODELS = ("tiny", "base", "small", "medium", "large-v2", "large-v3")
DEFAULT_CONFIG_PATHS = (
Path(os.environ.get("CLAUDEDO_CONFIG", "")) if os.environ.get("CLAUDEDO_CONFIG") else None,
@ -56,16 +53,6 @@ class Config:
filler_words: tuple[str, ...]
auto_target: bool
print_heard: bool
context_multiline: bool
context_separator: str
cleanup_confirm: bool
sound_enabled: bool
sound_on_wake: bool
sound_on_accept: bool
sound_on_no_match: bool
sound_on_submit: bool
sound_volume: float
sound_files: dict[str, str]
source_path: Path | None = field(default=None)
@ -112,7 +99,7 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
if mode not in _VALID_MODES:
raise ConfigError(f"[input].mode must be one of {_VALID_MODES}, got {mode!r}")
model = _require(raw, "stt", "model", (str,), "small.en")
model = _require(raw, "stt", "model", (str,), "medium")
if model not in _VALID_MODELS:
log.warning("unknown stt model %r — passing through to faster-whisper", model)
@ -127,35 +114,23 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
samplerate=int(_require(raw, "audio", "samplerate", (int,), 16000)),
channels=int(_require(raw, "audio", "channels", (int,), 1)),
silence_threshold=float(_require(raw, "audio", "silence_threshold", (int, float), 0.012)),
vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 700)),
vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 15.0)),
vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 800)),
vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 10.0)),
min_utterance=float(_require(raw, "audio", "min_utterance", (int, float), 0.3)),
type_autosend=bool(_require(raw, "behavior", "type_autosend", (bool,), False)),
wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.65)),
wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.6)),
command_fuzzy_threshold=float(_require(raw, "behavior", "command_fuzzy_threshold",
(int, float), 0.8)),
filler_words=tuple(_require(raw, "behavior", "filler_words", (list,),
["select", "use", "choose"])),
auto_target=bool(_require(raw, "behavior", "auto_target", (bool,), False)),
print_heard=bool(_require(raw, "behavior", "print_heard", (bool,), False)),
context_multiline=bool(_require(raw, "behavior", "context_multiline", (bool,), True)),
context_separator=str(_require(raw, "behavior", "context_separator", (str,), "")),
cleanup_confirm=bool(_require(raw, "behavior", "cleanup_confirm", (bool,), False)),
sound_enabled=bool(_require(raw, "sound", "enabled", (bool,), True)),
sound_on_wake=bool(_require(raw, "sound", "on_wake", (bool,), False)),
sound_on_accept=bool(_require(raw, "sound", "on_accept", (bool,), True)),
sound_on_no_match=bool(_require(raw, "sound", "on_no_match", (bool,), True)),
sound_on_submit=bool(_require(raw, "sound", "on_submit", (bool,), True)),
sound_volume=float(_require(raw, "sound", "volume", (int, float), 0.5)),
sound_files=dict(_require(raw, "sound", "files", (dict,), {})),
source_path=path,
)
for label, val in (("wake_fuzzy_threshold", cfg.wake_fuzzy_threshold),
("command_fuzzy_threshold", cfg.command_fuzzy_threshold)):
if not 0.0 < val <= 1.0:
raise ConfigError(f"[behavior].{label} must be in (0, 1]")
if not 0.0 <= cfg.sound_volume <= 1.0:
raise ConfigError("[sound].volume must be in [0, 1]")
if cfg.vad_silence_ms <= 0 or cfg.vad_max_seconds <= 0:
raise ConfigError("[vad].silence_ms and max_seconds must be positive")
if cfg.samplerate <= 0 or cfg.channels <= 0:

View File

@ -18,16 +18,12 @@ _COLORS = {
"red": "\033[31m",
"yellow": "\033[33m",
"cyan": "\033[36m",
"blue": "\033[34m",
"brightblue": "\033[94m",
"magenta": "\033[35m",
"dim": "\033[2m",
"bold": "\033[1m",
}
SYSTEM = "SYSTEM"
VOICE = "VOICE"
HELP = "HELP"
class Console:
@ -49,17 +45,7 @@ class Console:
return text
return f"{_COLORS[color]}{text}{RESET}"
def paint(self, text: str, color: str | None) -> str:
"""public colorizer for pre-coloring a fragment of a message (e.g. a command
word) before passing it to emit() with color=None"""
return self._paint(text, color)
def emit(self, prefix: str, message: str, color: str | None = None) -> None:
"""print one line: ``HH:MM:SS [prefix] message`` (message optionally colored)"""
line = f"{self._stamp()} {self._paint(f'[{prefix}]', 'dim')} {self._paint(message, color)}"
print(line, file=self.stream, flush=True)
def line(self, message: str, color: str | None = None) -> None:
"""print a bare continuation line (no timestamp/prefix) — for multi-row blocks
like the help menu, indented under a preceding header"""
print(self._paint(message, color), file=self.stream, flush=True)

View File

@ -1,108 +0,0 @@
"""load named context blocks from contexts.toml into a typed lookup.
contexts are user-edited reference blurbs (claude.md-style snippets) keyed by simple
spoken names. the ``context``/``prepare`` voice command injects a named blurb ahead of
a dictated instruction (read-before-send: never auto-submitted). mirrors config.py's
load/validate pattern; a missing file is an empty set, not an error.
"""
from __future__ import annotations
import logging
import os
import re
from dataclasses import dataclass, field
from pathlib import Path
try:
import tomllib as _toml
except ModuleNotFoundError:
import tomli as _toml
log = logging.getLogger(__name__)
_NAME_RE = re.compile(r"^[a-z0-9][a-z0-9 _-]*$")
DEFAULT_CONTEXTS_PATHS = (
Path(os.environ.get("CLAUDEDO_CONTEXTS", "")) if os.environ.get("CLAUDEDO_CONTEXTS") else None,
Path.home() / ".config" / "claudedo" / "contexts.toml",
Path.cwd() / "contexts.toml",
)
class ContextsError(Exception):
"""raised on an unparseable or invalid contexts.toml"""
@dataclass
class Contexts:
"""validated named context blocks (name -> blurb), normalized for spoken lookup"""
blocks: dict[str, str] = field(default_factory=dict)
source_path: Path | None = field(default=None)
def __len__(self) -> int:
return len(self.blocks)
def names(self) -> list[str]:
"""the context names, sorted (for status / listing)"""
return sorted(self.blocks)
def get(self, name: str) -> str | None:
"""look up a blurb by its normalized (lowercased, despaced) name, or None.
names are matched on a lowercase, space/underscore/hyphen-stripped key so a
spoken "web hooks" resolves the configured ``webhooks``/``web-hooks`` block.
"""
return self.blocks.get(_key(name))
def _key(name: str) -> str:
return re.sub(r"[ _-]+", "", name.strip().lower())
def find_contexts_path(explicit: str | os.PathLike | None = None) -> Path | None:
"""resolve the contexts.toml path, or None if no file exists (not an error)"""
candidates: list[Path] = []
if explicit:
candidates.append(Path(explicit))
candidates.extend(p for p in DEFAULT_CONTEXTS_PATHS if p)
for path in candidates:
if path.is_file():
return path
return None
def load_contexts(explicit: str | os.PathLike | None = None) -> Contexts:
"""load contexts.toml from the first existing default path (or an explicit one).
a missing file yields an empty Contexts (the feature is opt-in). names must be
simple words (matchable) and values must be non-empty strings; a bad entry raises
ContextsError so the user sees a clear message rather than a silent drop.
"""
path = find_contexts_path(explicit)
if path is None:
return Contexts(blocks={}, source_path=None)
try:
with open(path, "rb") as fh:
raw = _toml.load(fh)
except _toml.TOMLDecodeError as exc:
raise ContextsError(f"could not parse {path}: {exc}") from exc
table = raw.get("contexts", {})
if not isinstance(table, dict):
raise ContextsError("[contexts] must be a table of name = \"blurb\" entries")
blocks: dict[str, str] = {}
for name, value in table.items():
if not isinstance(name, str) or not _NAME_RE.match(name.lower()):
raise ContextsError(f"context name {name!r} must be simple words (a-z, 0-9, space/-/_)")
if not isinstance(value, str) or not value.strip():
raise ContextsError(f"context {name!r} must be a non-empty string")
key = _key(name)
if key in blocks:
raise ContextsError(f"context {name!r} collides with another name on the spoken key {key!r}")
blocks[key] = value.strip()
return Contexts(blocks=blocks, source_path=path)

View File

@ -16,11 +16,9 @@ import sys
import time
from pathlib import Path
from . import __version__, audio, grammar, inject, keys, target
from .config import Config, ConfigError, load_config
from .console import HELP, SYSTEM, VOICE, Console
from .contexts import Contexts, ContextsError, load_contexts
from .sound import Earcons
from . import audio, grammar, inject, target
from .config import Config
from .console import SYSTEM, VOICE, Console
from .stt import Transcriber
log = logging.getLogger(__name__)
@ -78,16 +76,6 @@ def stop_running() -> bool:
return True
def reload_running() -> bool:
"""signal a running daemon (SIGHUP) to reload config + contexts. returns whether
one was found. no-op on platforms without SIGHUP."""
pid = read_pid()
if pid is None or not hasattr(signal, "SIGHUP"):
return False
os.kill(pid, signal.SIGHUP)
return True
class _PTTKey:
"""desk-only push-to-talk: 'held' while the configured key is down in the
daemon's own terminal. there is deliberately NO global hotkey — a system-wide
@ -124,36 +112,20 @@ class Daemon:
self.config = config
self.mode = config.mode
self._stop = False
self._reload_pending = False
self._cleanup_pending = False
self._transcriber: Transcriber | None = None
self._device: int | None = None
self._ptt = _PTTKey()
self._pending: dict[str, int] = {}
self._console = Console()
self._contexts = Contexts()
self._earcons = Earcons(config)
self._last_stt_ms = 0.0
self._last_audio_s = 0.0
def _install_signals(self) -> None:
signal.signal(signal.SIGTERM, self._on_signal)
signal.signal(signal.SIGINT, self._on_signal)
if hasattr(signal, "SIGHUP"):
signal.signal(signal.SIGHUP, self._on_reload_signal)
def _on_signal(self, _signum, _frame) -> None:
log.info("stop requested")
self._stop = True
def _on_reload_signal(self, _signum, _frame) -> None:
"""SIGHUP from `claudedo reload` -> reload both config files on the next tick.
the actual reload runs in the loop (not the handler) so it never races a
capture/transcribe; the handler only sets the flag.
"""
self._reload_pending = True
def stopped(self) -> bool:
return self._stop
@ -166,21 +138,11 @@ class Daemon:
compute_type="auto",
initial_prompt=grammar.initial_prompt(cfg.wake_phrases),
)
self._load_contexts()
if audio.warm_up(cfg.samplerate, cfg.channels, self._device):
log.info("mic warmed up (source live)")
else:
log.warning("mic warm-up saw only silence — check mic permission / RDPSource")
def _load_contexts(self) -> None:
"""(re)load contexts.toml, leaving the loaded model untouched. a parse error is
logged and leaves the previous set in place rather than crashing the loop."""
try:
self._contexts = load_contexts()
except ContextsError as exc:
log.warning("contexts.toml invalid, keeping previous set: %s", exc)
self._console.emit(SYSTEM, f"contexts.toml error (kept previous): {exc}", "red")
def _capture(self):
cfg = self.config
if self.mode == "ptt":
@ -205,270 +167,89 @@ class Daemon:
parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.wake_fuzzy_threshold,
cfg.command_fuzzy_threshold, require_wake, filler=cfg.filler_words)
if parsed is None or parsed.action is None:
if parsed is not None:
self._earcons.play("wake")
self._console.emit(VOICE, f'heard "{transcript}" -> no command matched {self._timing()}',
"yellow")
self._earcons.play("no_match")
self._console.emit(VOICE, f'heard "{transcript}" -> no command matched', "yellow")
return
action = parsed.action
self._earcons.play("wake")
# a command was recognized — echo what we heard (green) before acting. note the
# matched wake phrase (magenta) when the transcript didn't literally contain it
# (so a loose match like "okay clouds" -> "okay claude" is visible).
head = self._console.paint(f'heard "{transcript}" -> {self._describe(action)}', "green")
note = ""
if parsed.wake and parsed.wake.replace(" ", "") not in transcript.lower().replace(" ", ""):
note = (self._console.paint(" (wake: ", "green")
+ self._console.paint(parsed.wake, "magenta")
+ self._console.paint(")", "green"))
tail = self._console.paint(f" {self._timing()}", "green")
self._console.emit(VOICE, f"{head}{note}{tail}")
def blue(s):
return self._console.paint(s, "brightblue")
if action.name == "mode":
new_mode = str(action.arg)
if new_mode != self.mode:
self.mode = new_mode
self._console.emit(SYSTEM, f"{blue('mode')} -> {new_mode}")
self._console.emit(SYSTEM, f"mode -> {new_mode}", "cyan")
self._refresh_state()
return
if action.name == "set":
session = target.set_target(str(action.arg))
self._pending.pop(session, None)
self._console.emit(SYSTEM, f"{blue('set sticky')} -> {session}")
self._console.emit(SYSTEM, f"set sticky -> {session}", "cyan")
self._refresh_state()
return
if action.name == "unset":
target.unset_target()
self._console.emit(SYSTEM, f"{blue('unset')} (cleared)")
self._console.emit(SYSTEM, "unset (cleared)", "cyan")
self._refresh_state()
return
if action.name == "list":
sessions = target.list_sessions()
self._console.emit(SYSTEM, f"{blue('list')} -> "
+ (", ".join(sessions) if sessions else "(none running)"))
return
if action.name == "commands":
self._console.emit(HELP, "voice commands:")
for usage, desc in grammar.command_menu():
self._console.line(f" {self._console.paint(f'{usage:<26}', 'brightblue')} {desc}")
return
if action.name == "customs":
names = self._contexts.names()
listed = ", ".join(names) if names else "(none — edit contexts.toml)"
self._console.emit(SYSTEM, f"contexts: {listed}")
return
if action.name == "version":
self._console.emit(SYSTEM, f"claudedo {__version__}")
self._console.emit(SYSTEM, "list -> " + (", ".join(sessions) if sessions else "(none running)"))
return
if action.name == "debug":
self._console.emit(VOICE, f'debug: "{action.arg}"', "yellow")
return
if action.name == "reload":
self._do_reload(str(action.arg))
return
if action.name == "system":
self._do_system(action.arg)
return
if action.name == "context":
name = str(action.arg[0])
if self._contexts.get(name) is None:
self._console.emit(VOICE, f"no context named '{name}' -> did nothing", "red")
self._earcons.play("no_match")
return
session, reason = target.resolve(parsed.one_shot, auto_target=cfg.auto_target)
if session is None:
self._console.emit(VOICE, f'heard "{transcript}" -> {reason} -> '
f'{self._describe(action)} did nothing', "red")
self._earcons.play("no_match")
return
if action.name == "context":
self._inject_context(session, action)
return
self._inject(session, action)
self._inject(session, transcript, reason, action)
def _inject(self, session: str, action) -> None:
def _inject(self, session: str, transcript: str, reason: str, action) -> None:
"""run a resolved command against `session`, tracking the uncommitted-input
buffer so backspace/erase delete only back to the last submit boundary.
the 'heard ...' echo is already printed by _handle and the [session] prefix
names the target, so these lines just report the keystrokes injected. the
earcon fires here (a real injection): submit chimes the submit tone, every
other injected command the accept tone.
"""
buffer so backspace/erase delete only back to the last submit boundary"""
heard = f'heard "{transcript}" ({reason})'
name = action.name
self._earcons.play("submit" if name == "submit" else "accept")
if name == "type":
text = str(action.arg)
inject.send_literal(session, text)
self._pending[session] = self._pending.get(session, 0) + len(text)
if self.config.type_autosend:
inject.send_named(session, keys.SUBMIT)
inject.send_named(session, inject.keys.SUBMIT)
self._pending[session] = 0
self._console.emit(session, f"typed {text!r}"
self._console.emit(session, f"{heard} -> typed {text!r}"
+ (" + send" if self.config.type_autosend else ""), "green")
return
if name == "space":
n = int(action.arg)
inject.perform(session, action)
self._pending[session] = self._pending.get(session, 0) + n
self._console.emit(session, f"space x{n}", "green")
self._console.emit(session, f"{heard} -> space x{n}", "green")
return
if name == "backspace":
n = int(action.arg)
have = self._pending.get(session, 0)
n = min(int(action.arg), have)
if n:
inject.perform(session, action)
self._pending[session] = max(0, self._pending.get(session, 0) - n)
self._console.emit(session, f"backspace x{n}", "green")
inject.perform(session, grammar.Action("backspace", n))
self._pending[session] = have - n
self._console.emit(session, f"{heard} -> backspace x{n}"
+ ("" if n == int(action.arg) else " (capped at boundary)"), "green")
return
if name == "erase":
n = self._pending.get(session, 0)
if n:
inject.perform(session, grammar.Action("erase", n))
self._pending[session] = 0
self._console.emit(session, f"erase x{n} (to last boundary)", "green")
self._console.emit(session, f"{heard} -> erase x{n} (to last boundary)", "green")
return
inject.perform(session, action)
if name == "submit":
self._pending[session] = 0
self._console.emit(session, f"injected {self._describe(action)}", "green")
def _inject_context(self, session: str, action) -> None:
"""inject a named context blurb ahead of the dictated instruction, then WAIT.
read-before-send: never auto-submits the user says ``send`` separately, and
claude's own permission prompt is the backstop for anything consequential.
routes through inject.send_literal (the same path as ``type``) and tracks the
uncommitted-input buffer so backspace/erase still bound to the last boundary.
assembly (config behavior.context_multiline): true -> blurb, a soft Shift+Enter
newline, then the instruction; false -> blurb + context_separator + instruction
flattened onto one line. a bare ``context <name>`` (no dictation) injects just
the blurb. the soft newline does not count toward the editable-char buffer.
"""
cfg = self.config
name, dictation = str(action.arg[0]), str(action.arg[1])
blurb = self._contexts.get(name) or ""
self._earcons.play("accept")
inject.send_literal(session, blurb)
chars = len(blurb)
if dictation:
if cfg.context_multiline:
inject.send_named(session, keys.NEWLINE)
else:
inject.send_literal(session, cfg.context_separator)
chars += len(cfg.context_separator)
inject.send_literal(session, dictation)
chars += len(dictation)
self._pending[session] = self._pending.get(session, 0) + chars
shape = "blurb" if not dictation else "blurb + dictation"
self._console.emit(session, f"context '{name}' -> {shape} (waiting for send)", "green")
def _do_reload(self, scope: str) -> None:
"""re-read config.toml and/or contexts.toml live without reinitializing the
loaded whisper model (the slow part). scope: all|config|contexts."""
did = []
if scope in ("all", "config"):
try:
new_cfg = load_config()
self._apply_config(new_cfg)
did.append("config")
except ConfigError as exc:
self._console.emit(SYSTEM, f"config reload failed (kept previous): {exc}", "red")
if scope in ("all", "contexts"):
self._load_contexts()
did.append("contexts")
what = " + ".join(did) if did else "nothing"
blue = self._console.paint("reloaded", "brightblue")
self._console.emit(SYSTEM, f"{blue} {what} ({len(self._contexts)} contexts)")
def _apply_config(self, new_cfg: Config) -> None:
"""swap in a reloaded config, preserving the runtime mode the user may have
toggled by voice and leaving the already-loaded transcriber untouched."""
new_cfg.mode = self.mode
self.config = new_cfg
self._earcons.update(new_cfg)
def _do_system(self, arg) -> None:
"""daemon-control namespace (never injects to claude): status / reload."""
if isinstance(arg, tuple) and arg and arg[0] == "reload":
self._do_reload(str(arg[1]))
return
if isinstance(arg, tuple) and arg and arg[0] == "unknown":
self._console.emit(SYSTEM, f"unknown system command '{arg[1]}'", "red")
return
if arg == "status":
cfg = self.config
sticky = target.read_active() or "(none)"
blue = self._console.paint("status", "brightblue")
self._console.emit(SYSTEM, f"{blue}: mode {self.mode}, sticky {sticky}, "
f"model {cfg.stt_model}, {len(self._contexts)} contexts")
return
if arg == "cleanup":
self._do_cleanup()
return
if arg == "confirm":
blue = self._console.paint("cleanup", "brightblue")
if self._cleanup_pending:
self._run_cleanup(blue)
else:
self._console.emit(SYSTEM, f"{blue}: nothing pending to confirm")
return
self._console.emit(SYSTEM, f"unknown system command {arg!r}", "red")
def _do_cleanup(self) -> None:
"""kill detached claude-* sessions (never attached), report killed + kept.
detached-only is the safety model: a misheard voice cleanup cannot nuke the
active (attached) session. with behavior.cleanup_confirm the daemon announces
the detached set and waits for a following ``confirm`` instead of killing now.
"""
blue = self._console.paint("cleanup", "brightblue")
if self.config.cleanup_confirm:
pending = [n for n, attached in target._claude_sessions() if not attached]
if not pending:
self._console.emit(SYSTEM, f"{blue}: nothing to clean (no detached sessions)")
return
self._cleanup_pending = True
self._console.emit(SYSTEM, f"{blue}: would kill {', '.join(sorted(pending))} "
f"— say 'confirm' to proceed")
return
self._run_cleanup(blue)
def _run_cleanup(self, blue: str) -> None:
killed, kept = target.cleanup_detached()
self._cleanup_pending = False
if not killed:
self._console.emit(SYSTEM, f"{blue}: nothing to clean (no detached sessions)")
return
msg = f"{blue}: killed {', '.join(killed)}"
if kept:
msg += f"; kept {', '.join(kept)} (attached)"
self._console.emit(SYSTEM, msg)
def _timing(self) -> str:
"""compact STT latency suffix for heard lines (transcribe ms on audio secs)"""
return f"({self._last_stt_ms:.0f}ms/{self._last_audio_s:.1f}s)"
self._console.emit(session, f"{heard} -> {self._describe(action)}", "green")
@staticmethod
def _describe(action) -> str:
if action.name == "context":
name, dictation = action.arg
tail = " + dictation" if dictation else ""
return f"CONTEXT('{name}'{tail})"
if action.name == "system":
arg = action.arg
if isinstance(arg, tuple):
return f"SYSTEM({arg[0]} {arg[1]})"
return f"SYSTEM({arg})"
if action.arg is None:
return action.name.upper()
return f"{action.name.upper()}({action.arg})"
@ -489,9 +270,8 @@ class Daemon:
target_now = target.read_active() or "(none — run cc / set <name>)"
self._console.emit(SYSTEM, f"claudedo {self.mode} mode — Ctrl-C to stop", "bold")
self._console.emit(SYSTEM, f"model {cfg.stt_model} ({cfg.stt_language}) · mic {dev} · "
f"target {target_now} · {len(self._contexts)} contexts")
wakes = ", ".join(self._console.paint(p, "magenta") for p in cfg.wake_phrases)
self._console.emit(SYSTEM, f"wake: {wakes}")
f"target {target_now}")
self._console.emit(SYSTEM, "wake: " + ", ".join(cfg.wake_phrases))
def _refresh_state(self) -> None:
write_state(os.getpid(), self.mode, target.read_active())
@ -506,23 +286,17 @@ class Daemon:
self._refresh_state()
self._print_startup()
while not self._stop:
if self._reload_pending:
self._reload_pending = False
self._do_reload("all")
audio_chunk = self._capture()
if self._stop:
break
if audio_chunk is None:
continue
t0 = time.monotonic()
transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate)
self._last_stt_ms = (time.monotonic() - t0) * 1000.0
self._last_audio_s = audio_chunk.size / self.config.samplerate
if not transcript:
continue
if self.mode == "listen" and not self._has_wake(transcript):
if self.config.print_heard:
self._console.emit(VOICE, f'heard (dropped) "{transcript}" {self._timing()}', "red")
self._console.emit(VOICE, f'heard (dropped) "{transcript}"', "red")
else:
self._console.emit(VOICE, "dropped: non-wake speech (not recorded)", "dim")
continue

View File

@ -50,25 +50,14 @@ _STICKY_VERBS = ("set", "sticky", "switch")
_ONESHOT_VERBS = ("target",)
_UNSET_VERBS = ("unset", "unsticky")
_LIST_VERBS = ("list", "sessions")
_COMMANDS_VERBS = ("commands", "help", "menu")
_CUSTOMS_VERBS = ("customs", "custom")
_VERSION_VERBS = ("version",)
_SELECT_VERBS = ("select", "option", "choose", "number")
_CONTEXT_VERBS = ("context", "prepare")
_RELOAD_VERBS = ("reload",)
_SYSTEM_VERBS = ("system",)
_RELOAD_SCOPES = ("config", "contexts")
_CLEANUP_VERBS = ("detached", "detach", "cleanup")
_CONFIRM_VERBS = ("confirm",)
# every command/synonym word, for biasing the STT toward the vocabulary we expect.
_COMMAND_WORDS = (
_YES_VERBS + _NO_VERBS + _APPROVE_VERBS + _DENY_VERBS + _SUBMIT_VERBS
+ _CANCEL_VERBS + _TYPE_VERBS + _BACKSPACE_VERBS + _SPACE_VERBS + _ADD_VERBS
+ _ERASE_VERBS + _DEBUG_VERBS + _MODE_VERBS + _STICKY_VERBS + _ONESHOT_VERBS + _UNSET_VERBS
+ _LIST_VERBS + _COMMANDS_VERBS + _CUSTOMS_VERBS + _VERSION_VERBS
+ _CONTEXT_VERBS + _RELOAD_VERBS + _SYSTEM_VERBS + _RELOAD_SCOPES + _CLEANUP_VERBS
+ _CONFIRM_VERBS + _SELECT_VERBS + ("ptt", "listen")
+ _LIST_VERBS + _SELECT_VERBS + ("ptt", "listen")
+ ("one", "two", "three", "four")
)
DEFAULT_FILLER = ("select", "use", "choose")
@ -79,12 +68,9 @@ class Action:
"""a matched command: a name plus an optional argument.
names: yes, no, select, approve, deny, submit, type, space, backspace, erase,
cancel, mode, set, unset, list, context, reload, system. arg carries the select
index (int), the literal text for ``type``, the count for ``space``/``backspace``
(int), the mode for ``mode``, the session short-name for ``set``, a
``(name, dictation)`` tuple for ``context``, the scope string for ``reload``
(``"all"``/``"config"``/``"contexts"``), or the system control for ``system``
(``"status"`` or a ``("reload", scope)`` tuple).
cancel, mode, set, unset, list. arg carries the select index (int), the literal
text for ``type``, the count for ``space``/``backspace`` (int), the mode for
``mode``, or the session short-name for ``set``.
"""
name: str
@ -97,14 +83,11 @@ class ParsedCommand:
one_shot is the session short-name from a leading ``target <name>`` (this command
only; does not change the sticky default), or None. action is the command to run,
or None if nothing matched after the wake phrase / one-shot / filler. wake is the
configured wake phrase that matched (e.g. "okay claude" for a heard "okay clouds"),
or None.
or None if nothing matched after the wake phrase / one-shot / filler.
"""
one_shot: str | None
action: Action | None
wake: str | None = None
def normalize(text: str) -> str:
@ -138,37 +121,6 @@ def initial_prompt(wake_phrases: list[str]) -> str:
return ", ".join(vocabulary(wake_phrases))
def command_menu() -> list[tuple[str, str]]:
"""the voice command menu as (usage, description) rows, for the `commands` cmd.
a small curated list keyed off the verb groups the speakable command surface,
NOT the cc shell kit.
"""
return [
("yes / no", "answer a yes/no prompt"),
("one..four", "pick numbered option 1-4"),
("approve / deny", "allow / deny a permission prompt"),
("send", "submit (Enter)"),
("cancel", "back out (Escape)"),
("type <text>", "insert literal text (no submit)"),
("space [n] / add a space", "insert n spaces"),
("backspace [n]", "delete n chars (to last submit)"),
("erase", "wipe the current input"),
("debug <text>", "echo to console (no inject)"),
("set <name>", "sticky target -> claude-<name>"),
("target <name> <cmd>", "one-shot to another session"),
("unset / list", "clear sticky / list sessions"),
("mode ptt|listen", "switch input mode"),
("context <name> <text>", "inject a contexts.toml blurb + dictation (no submit)"),
("reload", "re-read config.toml + contexts.toml live"),
("system status", "print mode/target/model/contexts to the console"),
("system reload [config|contexts]", "reload one or both config files"),
("cleanup / detached", "kill detached claude-* sessions (never attached)"),
("commands / customs", "this menu / list loaded contexts"),
("version", "print the claudedo version"),
]
def _ratio(a: str, b: str) -> float:
return SequenceMatcher(None, a, b).ratio()
@ -179,15 +131,13 @@ def _wake_variants(phrase: str) -> set[str]:
return {norm, norm.replace(" ", "")}
def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> tuple[str | None, str | None]:
"""return (command remainder, matched wake phrase).
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> str | None:
"""return the command remainder after the wake phrase.
if ``require_wake`` (listen mode) and no wake phrase is found at the start, the
remainder is None so the daemon discards the utterance. if not required (ptt
mode), a leading wake phrase is stripped when present but its absence is fine.
the matched phrase is the configured wake phrase that best matched (e.g. "okay
claude" for a heard "okay clouds"), or None when none matched.
if ``require_wake`` (listen mode) and no wake phrase is found at the start,
return None so the daemon discards the utterance. if not required (ptt mode),
a leading wake phrase is stripped when present but its absence is fine.
matches leniently on a despaced prefix (whisper splits/joins the coined word
inconsistently) but always slices the remainder on a WORD boundary of the
@ -195,11 +145,10 @@ def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
"""
norm = normalize(transcript)
if not norm:
return (None, None) if require_wake else ("", None)
return None if require_wake else ""
words = norm.split(" ")
best_remainder: str | None = None
best_phrase: str | None = None
best_score = 0.0
for phrase in wake_phrases:
variants = _wake_variants(phrase)
@ -213,18 +162,10 @@ def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
if score >= threshold and score > best_score:
best_score = score
best_remainder = " ".join(words[take:]).strip()
best_phrase = phrase
if best_remainder is not None:
return best_remainder, best_phrase
return (None, None) if require_wake else (norm, None)
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> str | None:
"""return the command remainder after the wake phrase (None if no wake in listen
mode). thin wrapper over strip_wake_match for callers that don't need the phrase"""
return strip_wake_match(transcript, wake_phrases, threshold, require_wake)[0]
return best_remainder
return None if require_wake else norm
def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool:
@ -247,41 +188,6 @@ def _leading_count(rest: list[str], default: int = 1) -> int:
return default
def _match_reload(rest: list[str], threshold: float, bare_default: str) -> Action | None:
"""map the tokens after a ``reload`` verb to a reload Action.
bare reload -> the caller's default scope ("all" for the bare command, the
``("reload", scope)`` tuple for ``system reload``). a trailing ``config``/
``contexts`` narrows the scope; an unrecognized scope falls back to the default.
"""
scope = bare_default
if rest and _fuzzy_in(rest[0], ("config", "configuration"), threshold):
scope = "config"
elif rest and _fuzzy_in(rest[0], ("contexts", "context"), threshold):
scope = "contexts"
return Action("reload", scope)
def _match_system(rest: list[str], threshold: float) -> Action | None:
"""map the tokens after the reserved ``system`` word to a daemon-control Action.
the ``system`` namespace never injects into claude. v0.2.0 scope: ``status`` and
``reload [config|contexts]``. unknown controls return a ``system`` Action with an
``("unknown", word)`` arg so the daemon can report it rather than silently drop.
"""
if not rest:
return Action("system", "status")
head = rest[0]
if _fuzzy_in(head, _RELOAD_VERBS, threshold):
inner = _match_reload(rest[1:], threshold, bare_default="all")
return Action("system", ("reload", inner.arg))
if _fuzzy_in(head, ("status", "state"), threshold):
return Action("system", "status")
if _fuzzy_in(head, _CLEANUP_VERBS, threshold):
return Action("system", "cleanup")
return Action("system", ("unknown", head))
def match_command(remainder: str, threshold: float) -> Action | None:
"""map a normalized command remainder to an Action, or None if unrecognized.
@ -296,19 +202,6 @@ def match_command(remainder: str, threshold: float) -> Action | None:
head = tokens[0]
rest = tokens[1:]
if _fuzzy_in(head, _SYSTEM_VERBS, threshold):
return _match_system(rest, threshold)
if _fuzzy_in(head, _CLEANUP_VERBS, threshold):
return Action("system", "cleanup")
if _fuzzy_in(head, _CONFIRM_VERBS, threshold):
return Action("system", "confirm")
if _fuzzy_in(head, _RELOAD_VERBS, threshold):
return _match_reload(rest, threshold, bare_default="all")
if _fuzzy_in(head, _CONTEXT_VERBS, threshold) and rest:
name = rest[0]
dictation = " ".join(rest[1:]).strip()
return Action("context", (name, dictation))
if head in _INDEX_WORDS:
return Action("select", _INDEX_WORDS[head])
@ -359,14 +252,8 @@ def match_command(remainder: str, threshold: float) -> Action | None:
return Action("set", name) if name else None
if _fuzzy_in(head, _UNSET_VERBS, threshold) and not rest:
return Action("unset")
if _fuzzy_in(head, _CUSTOMS_VERBS, threshold):
return Action("customs")
if _fuzzy_in(head, _COMMANDS_VERBS, threshold):
return Action("commands")
if _fuzzy_in(head, _LIST_VERBS, threshold):
return Action("list")
if _fuzzy_in(head, _VERSION_VERBS, threshold):
return Action("version")
return None
@ -396,7 +283,7 @@ def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
ParsedCommand with action=None means a wake phrase was present but no command
matched.
"""
remainder, wake = strip_wake_match(transcript, wake_phrases, wake_threshold, require_wake)
remainder = strip_wake(transcript, wake_phrases, wake_threshold, require_wake)
if remainder is None:
return None
@ -408,4 +295,4 @@ def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
tokens = _strip_filler(tokens, filler, command_threshold)
action = match_command(" ".join(tokens), command_threshold)
return ParsedCommand(one_shot=one_shot, action=action, wake=wake)
return ParsedCommand(one_shot=one_shot, action=action)

View File

@ -37,13 +37,6 @@ DENY = ["3"]
SUBMIT = ["Enter"]
CANCEL = ["Escape"]
# NEWLINE is a soft newline inside the input box that does NOT submit — Shift+Enter,
# which tmux names ``S-Enter`` (requires the extended-keys / xterm extkeys tmux
# settings install.sh appends). used to separate a context blurb from the dictated
# instruction in multiline assembly; if it proves flaky the daemon flattens to one
# line with a separator instead (behavior.context_multiline = false).
NEWLINE = ["S-Enter"]
# BACKSPACE deletes one char left; SPACE inserts one literal space. both are emitted
# repeatedly for `backspace <n>` / `space <n>` and for `erase` (n = the daemon's
# tracked uncommitted-input count). BSpace is tmux's name for the backspace key.

View File

@ -1,91 +0,0 @@
"""earcons — short confirmation tones on daemon events, the eyes-free feedback layer.
the single place that maps an event name to its tone file and the per-event enable
flag. additive to the console feed (it does not replace the printed lines): at the desk
mute tones and read; eyes-free, hear them. playback goes through audio_out (paplay-first,
fire-and-forget) so a dead speaker never blocks or breaks a command.
events:
wake a wake phrase was recognized (off by default a blip right before you
speak the command can bleed into its capture; keep it off unless wanted)
accept a command was recognized/injected
no_match nothing matched, or the target was missing (did nothing)
submit a send/submit was injected
tone files live in the packaged sounds/ dir; a per-event config override may point at a
user file instead. a missing file is swallowed by audio_out (logged once), never raised.
"""
from __future__ import annotations
import logging
from pathlib import Path
from . import audio_out
from .config import Config
log = logging.getLogger(__name__)
_SOUNDS_DIR = Path(__file__).resolve().parent / "sounds"
_EVENT_FILES = {
"wake": "wake.wav",
"accept": "accepted.wav",
"no_match": "no_match.wav",
"submit": "sent.wav",
}
_EVENT_FLAGS = {
"wake": "on_wake",
"accept": "on_accept",
"no_match": "on_no_match",
"submit": "on_submit",
}
class Earcons:
"""resolves daemon events to tones and plays them per the [sound] config"""
def __init__(self, config: Config) -> None:
self._apply(config)
def update(self, config: Config) -> None:
"""re-read the [sound] config after a live reload"""
self._apply(config)
def _apply(self, config: Config) -> None:
self.enabled = config.sound_enabled
self.volume = config.sound_volume
self._flags = {
"wake": config.sound_on_wake,
"accept": config.sound_on_accept,
"no_match": config.sound_on_no_match,
"submit": config.sound_on_submit,
}
self._overrides = dict(config.sound_files)
def _resolve(self, event: str) -> Path | None:
override = self._overrides.get(event) or self._overrides.get(_EVENT_FLAGS[event])
if override:
return Path(override).expanduser()
name = _EVENT_FILES.get(event)
return _SOUNDS_DIR / name if name else None
def play(self, event: str) -> None:
"""play the tone for an event if enabled (master + per-event). fire-and-forget;
unknown/disabled events and missing files are silently no-ops."""
if not self.enabled or not self._flags.get(event, False):
return
path = self._resolve(event)
if path is None:
return
audio_out.play(path, volume=self.volume, blocking=False)
def tone_path(self, event: str) -> Path | None:
"""the resolved tone path for an event (for test-tone), ignoring enable flags"""
return self._resolve(event)
def event_names() -> list[str]:
"""the earcon event names in a stable order (for test-tone iteration)"""
return ["wake", "accept", "no_match", "submit"]

View File

@ -1 +0,0 @@
"""earcon tone assets (committed .wav files) + their generator (generate.py)"""

Binary file not shown.

View File

@ -1,75 +0,0 @@
"""synthetic-beep FALLBACK generator for the earcon .wav tones.
WARNING: the shipped tones in this directory are now CUSTOM CURATED recordings
(edge-trimmed + loudness-normalized to ~-16 dB RMS with a -1 dBTP ceiling), NOT this
script's output. running this script OVERWRITES those real tones with plain synthetic
beeps only do so if you deliberately want to fall back to generated placeholders. it
is kept as a bootstrap fallback so the package can always self-generate a tone set (the
"a missing tone must never break a command" guarantee), not as the source of the
committed wavs.
run ``python -m claudedo.sounds.generate`` (or ``python generate.py`` from this dir) to
write placeholder beeps. each is a short, quiet, fade-enveloped sine/triangle at a
distinct pitch so the four events are ear-distinguishable:
wake soft single mid blip (off by default; least intrusive)
accepted bright single high note (heard you, sent it)
no_match low two-note falling buzz (heard you, but nothing matched / error)
sent two-note rising chime (submitted to claude)
kept SHORT (<300ms) and quiet (amplitude 0.4) confirmations, not alarms.
"""
from __future__ import annotations
import struct
import wave
from pathlib import Path
SAMPLE_RATE = 44100
AMPLITUDE = 0.4
HERE = Path(__file__).resolve().parent
def _tone(freq: float, dur: float) -> list[float]:
import math
n = int(SAMPLE_RATE * dur)
fade = max(1, int(SAMPLE_RATE * 0.01))
out = []
for i in range(n):
env = min(1.0, i / fade, (n - i) / fade)
out.append(math.sin(2.0 * math.pi * freq * (i / SAMPLE_RATE)) * env * AMPLITUDE)
return out
def _silence(dur: float) -> list[float]:
return [0.0] * int(SAMPLE_RATE * dur)
def _write(name: str, samples: list[float]) -> Path:
path = HERE / name
with wave.open(str(path), "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(SAMPLE_RATE)
clipped = (max(-1.0, min(1.0, s)) for s in samples)
wf.writeframes(b"".join(struct.pack("<h", int(s * 32767)) for s in clipped))
return path
def generate() -> list[Path]:
"""(re)write all earcon wavs; return the written paths"""
tones = {
"wake.wav": _tone(660.0, 0.12),
"accepted.wav": _tone(988.0, 0.14),
"no_match.wav": _tone(330.0, 0.10) + _silence(0.03) + _tone(247.0, 0.12),
"sent.wav": _tone(784.0, 0.10) + _silence(0.02) + _tone(1175.0, 0.12),
}
return [_write(name, samples) for name, samples in tones.items()]
if __name__ == "__main__":
for p in generate():
print(f"wrote {p}")

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -75,54 +75,14 @@ def session_exists(name: str) -> bool:
def list_sessions() -> list[str]:
"""return the names of all running claude-* tmux sessions (sorted)"""
return sorted(name for name, _attached in _claude_sessions())
def _claude_sessions() -> list[tuple[str, bool]]:
"""the single tmux query for claude-* sessions: (name, attached) pairs.
one source of truth for session enumeration list_sessions() and the detached
cleanup both build on this. attached is True when at least one client is attached
(tmux #{session_attached} > 0). returns [] if tmux isn't reachable.
"""
result = subprocess.run(
["tmux", "list-sessions", "-F", "#{session_name} #{session_attached}"],
["tmux", "list-sessions", "-F", "#{session_name}"],
stdout=subprocess.PIPE, stderr=subprocess.DEVNULL,
)
if result.returncode != 0:
return []
out: list[tuple[str, bool]] = []
for line in result.stdout.decode("utf-8", "replace").splitlines():
parts = line.rsplit(" ", 1)
if len(parts) != 2:
continue
name, attached = parts
if name.startswith(SESSION_PREFIX):
out.append((name, attached.strip() != "0"))
return out
def cleanup_detached() -> tuple[list[str], list[str]]:
"""kill every DETACHED claude-* session, never an attached one. returns the
(killed, kept_attached) name lists (both sorted) for reporting.
detached-only is the safety model: a misheard voice ``cleanup`` cannot nuke the
active session, which is attached. the kill-including-attached path stays the shell
``cckl`` (deliberate, typed).
"""
killed: list[str] = []
kept: list[str] = []
for name, attached in _claude_sessions():
if attached:
kept.append(name)
continue
result = subprocess.run(
["tmux", "kill-session", "-t", name],
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
)
if result.returncode == 0:
killed.append(name)
return sorted(killed), sorted(kept)
names = result.stdout.decode("utf-8", "replace").splitlines()
return sorted(n for n in names if n.startswith(SESSION_PREFIX))
def resolve(one_shot: str | None = None, auto_target: bool = False) -> tuple[str | None, str]: