Compare commits

..

8 Commits

Author SHA1 Message Date
f177b46a4b docs: fix stale README config defaults (wake 0.65, vad 700/15)
the lower [vad]/threshold bullets still said 0.6 / 800ms / max 10; sync to the real
defaults (wake_fuzzy_threshold 0.65, silence_ms 700, max_seconds 15). CLAUDE.md and
COMPACT.md (git-ignored) corrected on disk too (model small.en, same numbers).

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 04:07:27 -04:00
252385fb67 feat: highlight wake phrases in magenta (startup banner + wake note)
add a magenta color; paint wake phrases magenta in the startup 'wake:' list and in
the loose-match '(wake: <phrase>)' note (the rest of that green heard line stays
green around the magenta phrase). makes the wake vocabulary visually distinct from
green heard-text and brightblue command words.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 04:02:15 -04:00
97591eb24d feat: version voice command + matched-wake note on loose matches
add 'version' (prints claudedo <ver> to console; in vocab + menu). when a command's
wake phrase matched loosely (the transcript didn't contain it literally), the green
heard line appends '(wake: <phrase>)' so e.g. 'okay clouds' -> 'okay claude' is
visible. grammar.parse() now returns the matched phrase on ParsedCommand.wake (via a
new strip_wake_match; strip_wake kept as a thin wrapper).

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 03:59:52 -04:00
5f05a01423 feat: v0.1.4 — HELP menu, 15s cap, wake 0.65, small.en default + docs sync
commands menu now prints under a single [HELP] header with bare indented rows
(brightblue usage) instead of 15 repeated [SYSTEM] tags. raise [vad].max_seconds
10 -> 15 for long dictation. wake_fuzzy_threshold 0.6 -> 0.65 (slightly fewer false
wakes; note short spellings 'ok/okay claude' still admit some). carries the prior
small.en default, [vad].silence_ms 700, lighter (brightblue) command color, lean
injection lines, .en model variants in the validator. README/CLAUDE.md synced.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 03:52:19 -04:00
e84ef91e7b tune: small.en default, vad 700ms, lighter command color, lean inject lines
default model -> small.en (english-only small; better english accuracy, same ~1s
latency; .en variants added to the validator). raise [vad].silence_ms 500 -> 700
(500 cut off too early). command words now brightblue (lighter/cyan-ish) instead of
dark blue. drop the redundant target from injection lines — the [session] prefix
already names it, so e.g. '[claude-testing] typed ...' not '... sticky claude-testing
-> typed ...'.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 03:41:46 -04:00
2cbbabfaa1 feat: unbounded backspace + blue command words in console
backspace now sends exactly n BSpace with no boundary cap (buffer floored at 0 so a
later erase stays correct); erase remains bound to the uncommitted-input buffer. add
a blue color and Console.paint(); paint the command word blue on SYSTEM lines
(list/set/unset/mode -> ...) so the action stands out.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 03:11:42 -04:00
4357b14fad perf: default back to small model; show per-command STT latency
medium added ~3s/command lag (measured ~1.2s small vs ~3s medium on a 7950X3D), so
default model -> small; lean on initial_prompt + lenient wake for the coined word.
every heard line now shows STT latency as (<ms>/<audio>s) — always on, not just
print_heard — so a model change's cost is visible. snappier vad (silence_ms 500)
from the prior commit stands.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 02:57:52 -04:00
8e20b7eb0b feat: commands/customs menu, green heard-echo, snappier VAD
add voice 'commands' (alias help/menu) printing the command menu and 'customs'
(alias custom) stubbed for v0.2.0. echo every recognized command as a green
'heard "..." -> ACTION' line before acting, so you see what landed; the result line
then reports target + keystrokes. lower [vad].silence_ms default 800 -> 500 for a
snappier endpoint after you stop talking.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-26 02:32:28 -04:00
8 changed files with 177 additions and 59 deletions

View File

@ -107,7 +107,9 @@ Wake phrases (listen mode), fuzzy-matched. The default list is **"claudedo"**,
no token for the coined word "claudedo" and renders it as real words ("claude do"), no token for the coined word "claudedo" and renders it as real words ("claude do"),
so that spelling is listed explicitly. Matching is lenient (case/space-insensitive). so that spelling is listed explicitly. Matching is lenient (case/space-insensitive).
Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode
the wake phrase is optional. the wake phrase is optional. When a command's wake phrase matched loosely (e.g. you
said "okay clouds"), the heard line notes which phrase it assumed —
`heard "okay clouds list" -> LIST (wake: okay claude)`.
| Say | Does | | Say | Does |
|---|---| |---|---|
@ -125,6 +127,9 @@ the wake phrase is optional.
| `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged | | `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged |
| `unset` (alias `unsticky`) | clear the sticky target | | `unset` (alias `unsticky`) | clear the sticky target |
| `list` | list running `claude-*` sessions to the daemon console | | `list` | list running `claude-*` sessions to the daemon console |
| `commands` (alias `help`/`menu`) | print the voice-command menu to the console |
| `customs` (alias `custom`) | custom commands — arriving in v0.2.0 (stub for now) |
| `version` | print the claudedo version to the console |
| `cancel` / `escape` | back out of a prompt | | `cancel` / `escape` | back out of a prompt |
Optional filler (`select` / `use` / `choose`) may precede any command and is ignored: Optional filler (`select` / `use` / `choose`) may precede any command and is ignored:
@ -187,22 +192,27 @@ If Claude Code changes its prompt UI, re-confirm against a live session and upda
Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]` key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]`
(`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`). (`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`).
The default model is **`medium`** (best accuracy for the coined wake word on a strong The default model is **`small.en`** (the English-only small model — ~1s/command on a
CPU); `small` is faster/less accurate, `large-v3` most accurate. `claudedo -c <path> strong CPU, more accurate on English than multilingual `small` at the same speed);
...` points at a specific config; otherwise it searches `$CLAUDEDO_CONFIG`, `medium`/`medium.en` are more accurate but ~3× slower (noticeable lag), `base.en` is
`~/.config/claudedo/config.toml`, then `./config.toml`. snappier/less accurate, `large-v3` most accurate/slowest. Every `heard` line shows the
STT latency as `(<ms>/<audio>s)` so you can see what a model change costs. VAD
endpointing ends a capture after `[vad].silence_ms` (700) of trailing silence, capped
at `max_seconds` (15). `claudedo -c <path> ...` points at a specific config; otherwise
it searches
`$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then `./config.toml`.
- **STT biasing.** The transcriber is seeded with an `initial_prompt` built from the - **STT biasing.** The transcriber is seeded with an `initial_prompt` built from the
configured wake phrases + command vocabulary (one source — `grammar.vocabulary()`), configured wake phrases + command vocabulary (one source — `grammar.vocabulary()`),
so Whisper is conditioned to expect "claudedo" and the command words. so Whisper is conditioned to expect "claudedo" and the command words.
- **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.6`, lenient) vs - **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.65`, lenient) vs
`command_fuzzy_threshold` (default `0.8`, tight). The asymmetry is deliberate: a `command_fuzzy_threshold` (default `0.8`, tight). The asymmetry is deliberate: a
false *wake* is cheap (it wakes, finds no command, does nothing), but a false false *wake* is cheap (it wakes, finds no command, does nothing), but a false
*command* fires the wrong action. Prefer expanding command synonyms over loosening *command* fires the wrong action. Prefer expanding command synonyms over loosening
the command threshold. the command threshold.
- **`[vad]` endpointing.** Capture starts on speech and ends after `silence_ms` - **`[vad]` endpointing.** Capture starts on speech and ends after `silence_ms`
(default 800) of trailing silence — Alexa-style record-until-pause — capped at (default 700) of trailing silence — Alexa-style record-until-pause — capped at
`max_seconds` (default 10). The pause both ends a command and separates it from `max_seconds` (default 15). The pause both ends a command and separates it from
following chatter (the chatter is a separate capture the wake gate discards). following chatter (the chatter is a separate capture the wake gate discards).
- **`auto_target`** (default `false`): with no sticky target and one session running, - **`auto_target`** (default `false`): with no sticky target and one session running,
`false` does nothing and asks you to `set`; `true` auto-uses that session. `false` does nothing and asks you to `set`; `true` auto-uses that session.

View File

@ -21,10 +21,12 @@ mode = "listen"
ptt_key = "space" ptt_key = "space"
[stt] [stt]
# faster-whisper model size. "medium" is the default — biggest accuracy gain for the # faster-whisper model size. "small.en" is the default — the English-only small model
# coined wake word ("claudedo" / "claude do") and fine on a strong cpu. "small" is # (~1s/command on a strong cpu, more accurate on english than multilingual "small" at
# faster but less accurate; "large-v3" is most accurate if medium still struggles. # the same speed). "medium"/"medium.en" are more accurate but ~3x slower (noticeable
model = "medium" # lag); "large-v3" is most accurate and slowest. drop to "base.en" for max snappiness
# (less accurate). bump only if recognition is poor.
model = "small.en"
language = "en" language = "en"
# mic device: "auto", or a sounddevice device index (integer) / substring of a # mic device: "auto", or a sounddevice device index (integer) / substring of a
# device name. run `claudedo test-audio` to list devices. # device name. run `claudedo test-audio` to list devices.
@ -46,9 +48,10 @@ min_utterance = 0.3
# onset and ends after this much trailing silence — the natural end of an utterance. # onset and ends after this much trailing silence — the natural end of an utterance.
# a real pause both ends the command AND separates it from following chatter (the # a real pause both ends the command AND separates it from following chatter (the
# chatter becomes a separate capture that the wake gate then discards). # chatter becomes a separate capture that the wake gate then discards).
silence_ms = 800 silence_ms = 700
# hard cap so continuous noise can't record forever. # hard cap so continuous noise can't record forever (also the ceiling for a long
max_seconds = 10.0 # dictated `type` phrase).
max_seconds = 15.0
[behavior] [behavior]
# dictation never auto-submits: "type <phrase>" inserts literal text only; you say # dictation never auto-submits: "type <phrase>" inserts literal text only; you say
@ -58,7 +61,7 @@ type_autosend = false
# wakes, finds no command, does nothing), so wake is lenient; a false COMMAND fires # wakes, finds no command, does nothing), so wake is lenient; a false COMMAND fires
# the WRONG action, so commands stay tight. lower = more lenient = more matches. # the WRONG action, so commands stay tight. lower = more lenient = more matches.
# prefer expanding command synonyms over loosening command_fuzzy_threshold. # prefer expanding command synonyms over loosening command_fuzzy_threshold.
wake_fuzzy_threshold = 0.6 wake_fuzzy_threshold = 0.65
command_fuzzy_threshold = 0.8 command_fuzzy_threshold = 0.8
# optional filler words that may precede a command and are ignored for matching: # optional filler words that may precede a command and are ignored for matching:
# "select yes" / "use yes" behave like "yes". (a filler word followed by a digit is # "select yes" / "use yes" behave like "yes". (a filler word followed by a digit is

View File

@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "claudedo" name = "claudedo"
version = "0.1.3" version = "0.1.4"
description = "voice-control daemon for claude code (local STT -> tmux send-keys)" description = "voice-control daemon for claude code (local STT -> tmux send-keys)"
readme = "README.md" readme = "README.md"
requires-python = ">=3.10" requires-python = ">=3.10"

View File

@ -1,3 +1,3 @@
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)""" """claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)"""
__version__ = "0.1.3" __version__ = "0.1.4"

View File

@ -17,7 +17,10 @@ except ModuleNotFoundError:
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
_VALID_MODES = ("listen", "ptt") _VALID_MODES = ("listen", "ptt")
_VALID_MODELS = ("tiny", "base", "small", "medium", "large-v2", "large-v3") _VALID_MODELS = (
"tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3",
"tiny.en", "base.en", "small.en", "medium.en",
)
DEFAULT_CONFIG_PATHS = ( DEFAULT_CONFIG_PATHS = (
Path(os.environ.get("CLAUDEDO_CONFIG", "")) if os.environ.get("CLAUDEDO_CONFIG") else None, Path(os.environ.get("CLAUDEDO_CONFIG", "")) if os.environ.get("CLAUDEDO_CONFIG") else None,
@ -99,7 +102,7 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
if mode not in _VALID_MODES: if mode not in _VALID_MODES:
raise ConfigError(f"[input].mode must be one of {_VALID_MODES}, got {mode!r}") raise ConfigError(f"[input].mode must be one of {_VALID_MODES}, got {mode!r}")
model = _require(raw, "stt", "model", (str,), "medium") model = _require(raw, "stt", "model", (str,), "small.en")
if model not in _VALID_MODELS: if model not in _VALID_MODELS:
log.warning("unknown stt model %r — passing through to faster-whisper", model) log.warning("unknown stt model %r — passing through to faster-whisper", model)
@ -114,11 +117,11 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
samplerate=int(_require(raw, "audio", "samplerate", (int,), 16000)), samplerate=int(_require(raw, "audio", "samplerate", (int,), 16000)),
channels=int(_require(raw, "audio", "channels", (int,), 1)), channels=int(_require(raw, "audio", "channels", (int,), 1)),
silence_threshold=float(_require(raw, "audio", "silence_threshold", (int, float), 0.012)), silence_threshold=float(_require(raw, "audio", "silence_threshold", (int, float), 0.012)),
vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 800)), vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 700)),
vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 10.0)), vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 15.0)),
min_utterance=float(_require(raw, "audio", "min_utterance", (int, float), 0.3)), min_utterance=float(_require(raw, "audio", "min_utterance", (int, float), 0.3)),
type_autosend=bool(_require(raw, "behavior", "type_autosend", (bool,), False)), type_autosend=bool(_require(raw, "behavior", "type_autosend", (bool,), False)),
wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.6)), wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.65)),
command_fuzzy_threshold=float(_require(raw, "behavior", "command_fuzzy_threshold", command_fuzzy_threshold=float(_require(raw, "behavior", "command_fuzzy_threshold",
(int, float), 0.8)), (int, float), 0.8)),
filler_words=tuple(_require(raw, "behavior", "filler_words", (list,), filler_words=tuple(_require(raw, "behavior", "filler_words", (list,),

View File

@ -18,12 +18,16 @@ _COLORS = {
"red": "\033[31m", "red": "\033[31m",
"yellow": "\033[33m", "yellow": "\033[33m",
"cyan": "\033[36m", "cyan": "\033[36m",
"blue": "\033[34m",
"brightblue": "\033[94m",
"magenta": "\033[35m",
"dim": "\033[2m", "dim": "\033[2m",
"bold": "\033[1m", "bold": "\033[1m",
} }
SYSTEM = "SYSTEM" SYSTEM = "SYSTEM"
VOICE = "VOICE" VOICE = "VOICE"
HELP = "HELP"
class Console: class Console:
@ -45,7 +49,17 @@ class Console:
return text return text
return f"{_COLORS[color]}{text}{RESET}" return f"{_COLORS[color]}{text}{RESET}"
def paint(self, text: str, color: str | None) -> str:
"""public colorizer for pre-coloring a fragment of a message (e.g. a command
word) before passing it to emit() with color=None"""
return self._paint(text, color)
def emit(self, prefix: str, message: str, color: str | None = None) -> None: def emit(self, prefix: str, message: str, color: str | None = None) -> None:
"""print one line: ``HH:MM:SS [prefix] message`` (message optionally colored)""" """print one line: ``HH:MM:SS [prefix] message`` (message optionally colored)"""
line = f"{self._stamp()} {self._paint(f'[{prefix}]', 'dim')} {self._paint(message, color)}" line = f"{self._stamp()} {self._paint(f'[{prefix}]', 'dim')} {self._paint(message, color)}"
print(line, file=self.stream, flush=True) print(line, file=self.stream, flush=True)
def line(self, message: str, color: str | None = None) -> None:
"""print a bare continuation line (no timestamp/prefix) — for multi-row blocks
like the help menu, indented under a preceding header"""
print(self._paint(message, color), file=self.stream, flush=True)

View File

@ -16,9 +16,9 @@ import sys
import time import time
from pathlib import Path from pathlib import Path
from . import audio, grammar, inject, target from . import __version__, audio, grammar, inject, target
from .config import Config from .config import Config
from .console import SYSTEM, VOICE, Console from .console import HELP, SYSTEM, VOICE, Console
from .stt import Transcriber from .stt import Transcriber
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
@ -117,6 +117,8 @@ class Daemon:
self._ptt = _PTTKey() self._ptt = _PTTKey()
self._pending: dict[str, int] = {} self._pending: dict[str, int] = {}
self._console = Console() self._console = Console()
self._last_stt_ms = 0.0
self._last_audio_s = 0.0
def _install_signals(self) -> None: def _install_signals(self) -> None:
signal.signal(signal.SIGTERM, self._on_signal) signal.signal(signal.SIGTERM, self._on_signal)
@ -167,31 +169,58 @@ class Daemon:
parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.wake_fuzzy_threshold, parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.wake_fuzzy_threshold,
cfg.command_fuzzy_threshold, require_wake, filler=cfg.filler_words) cfg.command_fuzzy_threshold, require_wake, filler=cfg.filler_words)
if parsed is None or parsed.action is None: if parsed is None or parsed.action is None:
self._console.emit(VOICE, f'heard "{transcript}" -> no command matched', "yellow") self._console.emit(VOICE, f'heard "{transcript}" -> no command matched {self._timing()}',
"yellow")
return return
action = parsed.action action = parsed.action
# a command was recognized — echo what we heard (green) before acting. note the
# matched wake phrase (magenta) when the transcript didn't literally contain it
# (so a loose match like "okay clouds" -> "okay claude" is visible).
head = self._console.paint(f'heard "{transcript}" -> {self._describe(action)}', "green")
note = ""
if parsed.wake and parsed.wake.replace(" ", "") not in transcript.lower().replace(" ", ""):
note = (self._console.paint(" (wake: ", "green")
+ self._console.paint(parsed.wake, "magenta")
+ self._console.paint(")", "green"))
tail = self._console.paint(f" {self._timing()}", "green")
self._console.emit(VOICE, f"{head}{note}{tail}")
def blue(s):
return self._console.paint(s, "brightblue")
if action.name == "mode": if action.name == "mode":
new_mode = str(action.arg) new_mode = str(action.arg)
if new_mode != self.mode: if new_mode != self.mode:
self.mode = new_mode self.mode = new_mode
self._console.emit(SYSTEM, f"mode -> {new_mode}", "cyan") self._console.emit(SYSTEM, f"{blue('mode')} -> {new_mode}")
self._refresh_state() self._refresh_state()
return return
if action.name == "set": if action.name == "set":
session = target.set_target(str(action.arg)) session = target.set_target(str(action.arg))
self._pending.pop(session, None) self._pending.pop(session, None)
self._console.emit(SYSTEM, f"set sticky -> {session}", "cyan") self._console.emit(SYSTEM, f"{blue('set sticky')} -> {session}")
self._refresh_state() self._refresh_state()
return return
if action.name == "unset": if action.name == "unset":
target.unset_target() target.unset_target()
self._console.emit(SYSTEM, "unset (cleared)", "cyan") self._console.emit(SYSTEM, f"{blue('unset')} (cleared)")
self._refresh_state() self._refresh_state()
return return
if action.name == "list": if action.name == "list":
sessions = target.list_sessions() sessions = target.list_sessions()
self._console.emit(SYSTEM, "list -> " + (", ".join(sessions) if sessions else "(none running)")) self._console.emit(SYSTEM, f"{blue('list')} -> "
+ (", ".join(sessions) if sessions else "(none running)"))
return
if action.name == "commands":
self._console.emit(HELP, "voice commands:")
for usage, desc in grammar.command_menu():
self._console.line(f" {self._console.paint(f'{usage:<26}', 'brightblue')} {desc}")
return
if action.name == "customs":
self._console.emit(SYSTEM, "custom commands arrive in v0.2.0 (contexts.toml)")
return
if action.name == "version":
self._console.emit(SYSTEM, f"claudedo {__version__}")
return return
if action.name == "debug": if action.name == "debug":
self._console.emit(VOICE, f'debug: "{action.arg}"', "yellow") self._console.emit(VOICE, f'debug: "{action.arg}"', "yellow")
@ -202,12 +231,15 @@ class Daemon:
self._console.emit(VOICE, f'heard "{transcript}" -> {reason} -> ' self._console.emit(VOICE, f'heard "{transcript}" -> {reason} -> '
f'{self._describe(action)} did nothing', "red") f'{self._describe(action)} did nothing', "red")
return return
self._inject(session, transcript, reason, action) self._inject(session, action)
def _inject(self, session: str, transcript: str, reason: str, action) -> None: def _inject(self, session: str, action) -> None:
"""run a resolved command against `session`, tracking the uncommitted-input """run a resolved command against `session`, tracking the uncommitted-input
buffer so backspace/erase delete only back to the last submit boundary""" buffer so backspace/erase delete only back to the last submit boundary.
heard = f'heard "{transcript}" ({reason})'
the 'heard ...' echo is already printed by _handle and the [session] prefix
names the target, so these lines just report the keystrokes injected.
"""
name = action.name name = action.name
if name == "type": if name == "type":
@ -217,36 +249,38 @@ class Daemon:
if self.config.type_autosend: if self.config.type_autosend:
inject.send_named(session, inject.keys.SUBMIT) inject.send_named(session, inject.keys.SUBMIT)
self._pending[session] = 0 self._pending[session] = 0
self._console.emit(session, f"{heard} -> typed {text!r}" self._console.emit(session, f"typed {text!r}"
+ (" + send" if self.config.type_autosend else ""), "green") + (" + send" if self.config.type_autosend else ""), "green")
return return
if name == "space": if name == "space":
n = int(action.arg) n = int(action.arg)
inject.perform(session, action) inject.perform(session, action)
self._pending[session] = self._pending.get(session, 0) + n self._pending[session] = self._pending.get(session, 0) + n
self._console.emit(session, f"{heard} -> space x{n}", "green") self._console.emit(session, f"space x{n}", "green")
return return
if name == "backspace": if name == "backspace":
have = self._pending.get(session, 0) n = int(action.arg)
n = min(int(action.arg), have)
if n: if n:
inject.perform(session, grammar.Action("backspace", n)) inject.perform(session, action)
self._pending[session] = have - n self._pending[session] = max(0, self._pending.get(session, 0) - n)
self._console.emit(session, f"{heard} -> backspace x{n}" self._console.emit(session, f"backspace x{n}", "green")
+ ("" if n == int(action.arg) else " (capped at boundary)"), "green")
return return
if name == "erase": if name == "erase":
n = self._pending.get(session, 0) n = self._pending.get(session, 0)
if n: if n:
inject.perform(session, grammar.Action("erase", n)) inject.perform(session, grammar.Action("erase", n))
self._pending[session] = 0 self._pending[session] = 0
self._console.emit(session, f"{heard} -> erase x{n} (to last boundary)", "green") self._console.emit(session, f"erase x{n} (to last boundary)", "green")
return return
inject.perform(session, action) inject.perform(session, action)
if name == "submit": if name == "submit":
self._pending[session] = 0 self._pending[session] = 0
self._console.emit(session, f"{heard} -> {self._describe(action)}", "green") self._console.emit(session, f"injected {self._describe(action)}", "green")
def _timing(self) -> str:
"""compact STT latency suffix for heard lines (transcribe ms on audio secs)"""
return f"({self._last_stt_ms:.0f}ms/{self._last_audio_s:.1f}s)"
@staticmethod @staticmethod
def _describe(action) -> str: def _describe(action) -> str:
@ -271,7 +305,8 @@ class Daemon:
self._console.emit(SYSTEM, f"claudedo {self.mode} mode — Ctrl-C to stop", "bold") self._console.emit(SYSTEM, f"claudedo {self.mode} mode — Ctrl-C to stop", "bold")
self._console.emit(SYSTEM, f"model {cfg.stt_model} ({cfg.stt_language}) · mic {dev} · " self._console.emit(SYSTEM, f"model {cfg.stt_model} ({cfg.stt_language}) · mic {dev} · "
f"target {target_now}") f"target {target_now}")
self._console.emit(SYSTEM, "wake: " + ", ".join(cfg.wake_phrases)) wakes = ", ".join(self._console.paint(p, "magenta") for p in cfg.wake_phrases)
self._console.emit(SYSTEM, f"wake: {wakes}")
def _refresh_state(self) -> None: def _refresh_state(self) -> None:
write_state(os.getpid(), self.mode, target.read_active()) write_state(os.getpid(), self.mode, target.read_active())
@ -291,12 +326,15 @@ class Daemon:
break break
if audio_chunk is None: if audio_chunk is None:
continue continue
t0 = time.monotonic()
transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate) transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate)
self._last_stt_ms = (time.monotonic() - t0) * 1000.0
self._last_audio_s = audio_chunk.size / self.config.samplerate
if not transcript: if not transcript:
continue continue
if self.mode == "listen" and not self._has_wake(transcript): if self.mode == "listen" and not self._has_wake(transcript):
if self.config.print_heard: if self.config.print_heard:
self._console.emit(VOICE, f'heard (dropped) "{transcript}"', "red") self._console.emit(VOICE, f'heard (dropped) "{transcript}" {self._timing()}', "red")
else: else:
self._console.emit(VOICE, "dropped: non-wake speech (not recorded)", "dim") self._console.emit(VOICE, "dropped: non-wake speech (not recorded)", "dim")
continue continue

View File

@ -50,6 +50,9 @@ _STICKY_VERBS = ("set", "sticky", "switch")
_ONESHOT_VERBS = ("target",) _ONESHOT_VERBS = ("target",)
_UNSET_VERBS = ("unset", "unsticky") _UNSET_VERBS = ("unset", "unsticky")
_LIST_VERBS = ("list", "sessions") _LIST_VERBS = ("list", "sessions")
_COMMANDS_VERBS = ("commands", "help", "menu")
_CUSTOMS_VERBS = ("customs", "custom")
_VERSION_VERBS = ("version",)
_SELECT_VERBS = ("select", "option", "choose", "number") _SELECT_VERBS = ("select", "option", "choose", "number")
# every command/synonym word, for biasing the STT toward the vocabulary we expect. # every command/synonym word, for biasing the STT toward the vocabulary we expect.
@ -57,7 +60,8 @@ _COMMAND_WORDS = (
_YES_VERBS + _NO_VERBS + _APPROVE_VERBS + _DENY_VERBS + _SUBMIT_VERBS _YES_VERBS + _NO_VERBS + _APPROVE_VERBS + _DENY_VERBS + _SUBMIT_VERBS
+ _CANCEL_VERBS + _TYPE_VERBS + _BACKSPACE_VERBS + _SPACE_VERBS + _ADD_VERBS + _CANCEL_VERBS + _TYPE_VERBS + _BACKSPACE_VERBS + _SPACE_VERBS + _ADD_VERBS
+ _ERASE_VERBS + _DEBUG_VERBS + _MODE_VERBS + _STICKY_VERBS + _ONESHOT_VERBS + _UNSET_VERBS + _ERASE_VERBS + _DEBUG_VERBS + _MODE_VERBS + _STICKY_VERBS + _ONESHOT_VERBS + _UNSET_VERBS
+ _LIST_VERBS + _SELECT_VERBS + ("ptt", "listen") + _LIST_VERBS + _COMMANDS_VERBS + _CUSTOMS_VERBS + _VERSION_VERBS
+ _SELECT_VERBS + ("ptt", "listen")
+ ("one", "two", "three", "four") + ("one", "two", "three", "four")
) )
DEFAULT_FILLER = ("select", "use", "choose") DEFAULT_FILLER = ("select", "use", "choose")
@ -83,11 +87,14 @@ class ParsedCommand:
one_shot is the session short-name from a leading ``target <name>`` (this command one_shot is the session short-name from a leading ``target <name>`` (this command
only; does not change the sticky default), or None. action is the command to run, only; does not change the sticky default), or None. action is the command to run,
or None if nothing matched after the wake phrase / one-shot / filler. or None if nothing matched after the wake phrase / one-shot / filler. wake is the
configured wake phrase that matched (e.g. "okay claude" for a heard "okay clouds"),
or None.
""" """
one_shot: str | None one_shot: str | None
action: Action | None action: Action | None
wake: str | None = None
def normalize(text: str) -> str: def normalize(text: str) -> str:
@ -121,6 +128,32 @@ def initial_prompt(wake_phrases: list[str]) -> str:
return ", ".join(vocabulary(wake_phrases)) return ", ".join(vocabulary(wake_phrases))
def command_menu() -> list[tuple[str, str]]:
"""the voice command menu as (usage, description) rows, for the `commands` cmd.
a small curated list keyed off the verb groups the speakable command surface,
NOT the cc shell kit.
"""
return [
("yes / no", "answer a yes/no prompt"),
("one..four", "pick numbered option 1-4"),
("approve / deny", "allow / deny a permission prompt"),
("send", "submit (Enter)"),
("cancel", "back out (Escape)"),
("type <text>", "insert literal text (no submit)"),
("space [n] / add a space", "insert n spaces"),
("backspace [n]", "delete n chars (to last submit)"),
("erase", "wipe the current input"),
("debug <text>", "echo to console (no inject)"),
("set <name>", "sticky target -> claude-<name>"),
("target <name> <cmd>", "one-shot to another session"),
("unset / list", "clear sticky / list sessions"),
("mode ptt|listen", "switch input mode"),
("commands / customs", "this menu / custom commands (v0.2.0)"),
("version", "print the claudedo version"),
]
def _ratio(a: str, b: str) -> float: def _ratio(a: str, b: str) -> float:
return SequenceMatcher(None, a, b).ratio() return SequenceMatcher(None, a, b).ratio()
@ -131,13 +164,15 @@ def _wake_variants(phrase: str) -> set[str]:
return {norm, norm.replace(" ", "")} return {norm, norm.replace(" ", "")}
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float, def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> str | None: require_wake: bool) -> tuple[str | None, str | None]:
"""return the command remainder after the wake phrase. """return (command remainder, matched wake phrase).
if ``require_wake`` (listen mode) and no wake phrase is found at the start, if ``require_wake`` (listen mode) and no wake phrase is found at the start, the
return None so the daemon discards the utterance. if not required (ptt mode), remainder is None so the daemon discards the utterance. if not required (ptt
a leading wake phrase is stripped when present but its absence is fine. mode), a leading wake phrase is stripped when present but its absence is fine.
the matched phrase is the configured wake phrase that best matched (e.g. "okay
claude" for a heard "okay clouds"), or None when none matched.
matches leniently on a despaced prefix (whisper splits/joins the coined word matches leniently on a despaced prefix (whisper splits/joins the coined word
inconsistently) but always slices the remainder on a WORD boundary of the inconsistently) but always slices the remainder on a WORD boundary of the
@ -145,10 +180,11 @@ def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
""" """
norm = normalize(transcript) norm = normalize(transcript)
if not norm: if not norm:
return None if require_wake else "" return (None, None) if require_wake else ("", None)
words = norm.split(" ") words = norm.split(" ")
best_remainder: str | None = None best_remainder: str | None = None
best_phrase: str | None = None
best_score = 0.0 best_score = 0.0
for phrase in wake_phrases: for phrase in wake_phrases:
variants = _wake_variants(phrase) variants = _wake_variants(phrase)
@ -162,10 +198,18 @@ def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
if score >= threshold and score > best_score: if score >= threshold and score > best_score:
best_score = score best_score = score
best_remainder = " ".join(words[take:]).strip() best_remainder = " ".join(words[take:]).strip()
best_phrase = phrase
if best_remainder is not None: if best_remainder is not None:
return best_remainder return best_remainder, best_phrase
return None if require_wake else norm return (None, None) if require_wake else (norm, None)
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> str | None:
"""return the command remainder after the wake phrase (None if no wake in listen
mode). thin wrapper over strip_wake_match for callers that don't need the phrase"""
return strip_wake_match(transcript, wake_phrases, threshold, require_wake)[0]
def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool: def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool:
@ -252,8 +296,14 @@ def match_command(remainder: str, threshold: float) -> Action | None:
return Action("set", name) if name else None return Action("set", name) if name else None
if _fuzzy_in(head, _UNSET_VERBS, threshold) and not rest: if _fuzzy_in(head, _UNSET_VERBS, threshold) and not rest:
return Action("unset") return Action("unset")
if _fuzzy_in(head, _CUSTOMS_VERBS, threshold):
return Action("customs")
if _fuzzy_in(head, _COMMANDS_VERBS, threshold):
return Action("commands")
if _fuzzy_in(head, _LIST_VERBS, threshold): if _fuzzy_in(head, _LIST_VERBS, threshold):
return Action("list") return Action("list")
if _fuzzy_in(head, _VERSION_VERBS, threshold):
return Action("version")
return None return None
@ -283,7 +333,7 @@ def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
ParsedCommand with action=None means a wake phrase was present but no command ParsedCommand with action=None means a wake phrase was present but no command
matched. matched.
""" """
remainder = strip_wake(transcript, wake_phrases, wake_threshold, require_wake) remainder, wake = strip_wake_match(transcript, wake_phrases, wake_threshold, require_wake)
if remainder is None: if remainder is None:
return None return None
@ -295,4 +345,4 @@ def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
tokens = _strip_filler(tokens, filler, command_threshold) tokens = _strip_filler(tokens, filler, command_threshold)
action = match_command(" ".join(tokens), command_threshold) action = match_command(" ".join(tokens), command_threshold)
return ParsedCommand(one_shot=one_shot, action=action) return ParsedCommand(one_shot=one_shot, action=action, wake=wake)