Compare commits

..

No commits in common. "f177b46a4b57e7c3a7a6bb2567d31733901418d5" and "4abdfd56bc69eb14bbc4620a52095c45004bf0b8" have entirely different histories.

8 changed files with 59 additions and 177 deletions

View File

@ -107,9 +107,7 @@ Wake phrases (listen mode), fuzzy-matched. The default list is **"claudedo"**,
no token for the coined word "claudedo" and renders it as real words ("claude do"), no token for the coined word "claudedo" and renders it as real words ("claude do"),
so that spelling is listed explicitly. Matching is lenient (case/space-insensitive). so that spelling is listed explicitly. Matching is lenient (case/space-insensitive).
Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode
the wake phrase is optional. When a command's wake phrase matched loosely (e.g. you the wake phrase is optional.
said "okay clouds"), the heard line notes which phrase it assumed —
`heard "okay clouds list" -> LIST (wake: okay claude)`.
| Say | Does | | Say | Does |
|---|---| |---|---|
@ -127,9 +125,6 @@ said "okay clouds"), the heard line notes which phrase it assumed —
| `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged | | `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged |
| `unset` (alias `unsticky`) | clear the sticky target | | `unset` (alias `unsticky`) | clear the sticky target |
| `list` | list running `claude-*` sessions to the daemon console | | `list` | list running `claude-*` sessions to the daemon console |
| `commands` (alias `help`/`menu`) | print the voice-command menu to the console |
| `customs` (alias `custom`) | custom commands — arriving in v0.2.0 (stub for now) |
| `version` | print the claudedo version to the console |
| `cancel` / `escape` | back out of a prompt | | `cancel` / `escape` | back out of a prompt |
Optional filler (`select` / `use` / `choose`) may precede any command and is ignored: Optional filler (`select` / `use` / `choose`) may precede any command and is ignored:
@ -192,27 +187,22 @@ If Claude Code changes its prompt UI, re-confirm against a live session and upda
Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]` key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]`
(`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`). (`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`).
The default model is **`small.en`** (the English-only small model — ~1s/command on a The default model is **`medium`** (best accuracy for the coined wake word on a strong
strong CPU, more accurate on English than multilingual `small` at the same speed); CPU); `small` is faster/less accurate, `large-v3` most accurate. `claudedo -c <path>
`medium`/`medium.en` are more accurate but ~3× slower (noticeable lag), `base.en` is ...` points at a specific config; otherwise it searches `$CLAUDEDO_CONFIG`,
snappier/less accurate, `large-v3` most accurate/slowest. Every `heard` line shows the `~/.config/claudedo/config.toml`, then `./config.toml`.
STT latency as `(<ms>/<audio>s)` so you can see what a model change costs. VAD
endpointing ends a capture after `[vad].silence_ms` (700) of trailing silence, capped
at `max_seconds` (15). `claudedo -c <path> ...` points at a specific config; otherwise
it searches
`$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then `./config.toml`.
- **STT biasing.** The transcriber is seeded with an `initial_prompt` built from the - **STT biasing.** The transcriber is seeded with an `initial_prompt` built from the
configured wake phrases + command vocabulary (one source — `grammar.vocabulary()`), configured wake phrases + command vocabulary (one source — `grammar.vocabulary()`),
so Whisper is conditioned to expect "claudedo" and the command words. so Whisper is conditioned to expect "claudedo" and the command words.
- **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.65`, lenient) vs - **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.6`, lenient) vs
`command_fuzzy_threshold` (default `0.8`, tight). The asymmetry is deliberate: a `command_fuzzy_threshold` (default `0.8`, tight). The asymmetry is deliberate: a
false *wake* is cheap (it wakes, finds no command, does nothing), but a false false *wake* is cheap (it wakes, finds no command, does nothing), but a false
*command* fires the wrong action. Prefer expanding command synonyms over loosening *command* fires the wrong action. Prefer expanding command synonyms over loosening
the command threshold. the command threshold.
- **`[vad]` endpointing.** Capture starts on speech and ends after `silence_ms` - **`[vad]` endpointing.** Capture starts on speech and ends after `silence_ms`
(default 700) of trailing silence — Alexa-style record-until-pause — capped at (default 800) of trailing silence — Alexa-style record-until-pause — capped at
`max_seconds` (default 15). The pause both ends a command and separates it from `max_seconds` (default 10). The pause both ends a command and separates it from
following chatter (the chatter is a separate capture the wake gate discards). following chatter (the chatter is a separate capture the wake gate discards).
- **`auto_target`** (default `false`): with no sticky target and one session running, - **`auto_target`** (default `false`): with no sticky target and one session running,
`false` does nothing and asks you to `set`; `true` auto-uses that session. `false` does nothing and asks you to `set`; `true` auto-uses that session.

View File

@ -21,12 +21,10 @@ mode = "listen"
ptt_key = "space" ptt_key = "space"
[stt] [stt]
# faster-whisper model size. "small.en" is the default — the English-only small model # faster-whisper model size. "medium" is the default — biggest accuracy gain for the
# (~1s/command on a strong cpu, more accurate on english than multilingual "small" at # coined wake word ("claudedo" / "claude do") and fine on a strong cpu. "small" is
# the same speed). "medium"/"medium.en" are more accurate but ~3x slower (noticeable # faster but less accurate; "large-v3" is most accurate if medium still struggles.
# lag); "large-v3" is most accurate and slowest. drop to "base.en" for max snappiness model = "medium"
# (less accurate). bump only if recognition is poor.
model = "small.en"
language = "en" language = "en"
# mic device: "auto", or a sounddevice device index (integer) / substring of a # mic device: "auto", or a sounddevice device index (integer) / substring of a
# device name. run `claudedo test-audio` to list devices. # device name. run `claudedo test-audio` to list devices.
@ -48,10 +46,9 @@ min_utterance = 0.3
# onset and ends after this much trailing silence — the natural end of an utterance. # onset and ends after this much trailing silence — the natural end of an utterance.
# a real pause both ends the command AND separates it from following chatter (the # a real pause both ends the command AND separates it from following chatter (the
# chatter becomes a separate capture that the wake gate then discards). # chatter becomes a separate capture that the wake gate then discards).
silence_ms = 700 silence_ms = 800
# hard cap so continuous noise can't record forever (also the ceiling for a long # hard cap so continuous noise can't record forever.
# dictated `type` phrase). max_seconds = 10.0
max_seconds = 15.0
[behavior] [behavior]
# dictation never auto-submits: "type <phrase>" inserts literal text only; you say # dictation never auto-submits: "type <phrase>" inserts literal text only; you say
@ -61,7 +58,7 @@ type_autosend = false
# wakes, finds no command, does nothing), so wake is lenient; a false COMMAND fires # wakes, finds no command, does nothing), so wake is lenient; a false COMMAND fires
# the WRONG action, so commands stay tight. lower = more lenient = more matches. # the WRONG action, so commands stay tight. lower = more lenient = more matches.
# prefer expanding command synonyms over loosening command_fuzzy_threshold. # prefer expanding command synonyms over loosening command_fuzzy_threshold.
wake_fuzzy_threshold = 0.65 wake_fuzzy_threshold = 0.6
command_fuzzy_threshold = 0.8 command_fuzzy_threshold = 0.8
# optional filler words that may precede a command and are ignored for matching: # optional filler words that may precede a command and are ignored for matching:
# "select yes" / "use yes" behave like "yes". (a filler word followed by a digit is # "select yes" / "use yes" behave like "yes". (a filler word followed by a digit is

View File

@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "claudedo" name = "claudedo"
version = "0.1.4" version = "0.1.3"
description = "voice-control daemon for claude code (local STT -> tmux send-keys)" description = "voice-control daemon for claude code (local STT -> tmux send-keys)"
readme = "README.md" readme = "README.md"
requires-python = ">=3.10" requires-python = ">=3.10"

View File

@ -1,3 +1,3 @@
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)""" """claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)"""
__version__ = "0.1.4" __version__ = "0.1.3"

View File

@ -17,10 +17,7 @@ except ModuleNotFoundError:
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
_VALID_MODES = ("listen", "ptt") _VALID_MODES = ("listen", "ptt")
_VALID_MODELS = ( _VALID_MODELS = ("tiny", "base", "small", "medium", "large-v2", "large-v3")
"tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3",
"tiny.en", "base.en", "small.en", "medium.en",
)
DEFAULT_CONFIG_PATHS = ( DEFAULT_CONFIG_PATHS = (
Path(os.environ.get("CLAUDEDO_CONFIG", "")) if os.environ.get("CLAUDEDO_CONFIG") else None, Path(os.environ.get("CLAUDEDO_CONFIG", "")) if os.environ.get("CLAUDEDO_CONFIG") else None,
@ -102,7 +99,7 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
if mode not in _VALID_MODES: if mode not in _VALID_MODES:
raise ConfigError(f"[input].mode must be one of {_VALID_MODES}, got {mode!r}") raise ConfigError(f"[input].mode must be one of {_VALID_MODES}, got {mode!r}")
model = _require(raw, "stt", "model", (str,), "small.en") model = _require(raw, "stt", "model", (str,), "medium")
if model not in _VALID_MODELS: if model not in _VALID_MODELS:
log.warning("unknown stt model %r — passing through to faster-whisper", model) log.warning("unknown stt model %r — passing through to faster-whisper", model)
@ -117,11 +114,11 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
samplerate=int(_require(raw, "audio", "samplerate", (int,), 16000)), samplerate=int(_require(raw, "audio", "samplerate", (int,), 16000)),
channels=int(_require(raw, "audio", "channels", (int,), 1)), channels=int(_require(raw, "audio", "channels", (int,), 1)),
silence_threshold=float(_require(raw, "audio", "silence_threshold", (int, float), 0.012)), silence_threshold=float(_require(raw, "audio", "silence_threshold", (int, float), 0.012)),
vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 700)), vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 800)),
vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 15.0)), vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 10.0)),
min_utterance=float(_require(raw, "audio", "min_utterance", (int, float), 0.3)), min_utterance=float(_require(raw, "audio", "min_utterance", (int, float), 0.3)),
type_autosend=bool(_require(raw, "behavior", "type_autosend", (bool,), False)), type_autosend=bool(_require(raw, "behavior", "type_autosend", (bool,), False)),
wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.65)), wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.6)),
command_fuzzy_threshold=float(_require(raw, "behavior", "command_fuzzy_threshold", command_fuzzy_threshold=float(_require(raw, "behavior", "command_fuzzy_threshold",
(int, float), 0.8)), (int, float), 0.8)),
filler_words=tuple(_require(raw, "behavior", "filler_words", (list,), filler_words=tuple(_require(raw, "behavior", "filler_words", (list,),

View File

@ -18,16 +18,12 @@ _COLORS = {
"red": "\033[31m", "red": "\033[31m",
"yellow": "\033[33m", "yellow": "\033[33m",
"cyan": "\033[36m", "cyan": "\033[36m",
"blue": "\033[34m",
"brightblue": "\033[94m",
"magenta": "\033[35m",
"dim": "\033[2m", "dim": "\033[2m",
"bold": "\033[1m", "bold": "\033[1m",
} }
SYSTEM = "SYSTEM" SYSTEM = "SYSTEM"
VOICE = "VOICE" VOICE = "VOICE"
HELP = "HELP"
class Console: class Console:
@ -49,17 +45,7 @@ class Console:
return text return text
return f"{_COLORS[color]}{text}{RESET}" return f"{_COLORS[color]}{text}{RESET}"
def paint(self, text: str, color: str | None) -> str:
"""public colorizer for pre-coloring a fragment of a message (e.g. a command
word) before passing it to emit() with color=None"""
return self._paint(text, color)
def emit(self, prefix: str, message: str, color: str | None = None) -> None: def emit(self, prefix: str, message: str, color: str | None = None) -> None:
"""print one line: ``HH:MM:SS [prefix] message`` (message optionally colored)""" """print one line: ``HH:MM:SS [prefix] message`` (message optionally colored)"""
line = f"{self._stamp()} {self._paint(f'[{prefix}]', 'dim')} {self._paint(message, color)}" line = f"{self._stamp()} {self._paint(f'[{prefix}]', 'dim')} {self._paint(message, color)}"
print(line, file=self.stream, flush=True) print(line, file=self.stream, flush=True)
def line(self, message: str, color: str | None = None) -> None:
"""print a bare continuation line (no timestamp/prefix) — for multi-row blocks
like the help menu, indented under a preceding header"""
print(self._paint(message, color), file=self.stream, flush=True)

View File

@ -16,9 +16,9 @@ import sys
import time import time
from pathlib import Path from pathlib import Path
from . import __version__, audio, grammar, inject, target from . import audio, grammar, inject, target
from .config import Config from .config import Config
from .console import HELP, SYSTEM, VOICE, Console from .console import SYSTEM, VOICE, Console
from .stt import Transcriber from .stt import Transcriber
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
@ -117,8 +117,6 @@ class Daemon:
self._ptt = _PTTKey() self._ptt = _PTTKey()
self._pending: dict[str, int] = {} self._pending: dict[str, int] = {}
self._console = Console() self._console = Console()
self._last_stt_ms = 0.0
self._last_audio_s = 0.0
def _install_signals(self) -> None: def _install_signals(self) -> None:
signal.signal(signal.SIGTERM, self._on_signal) signal.signal(signal.SIGTERM, self._on_signal)
@ -169,58 +167,31 @@ class Daemon:
parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.wake_fuzzy_threshold, parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.wake_fuzzy_threshold,
cfg.command_fuzzy_threshold, require_wake, filler=cfg.filler_words) cfg.command_fuzzy_threshold, require_wake, filler=cfg.filler_words)
if parsed is None or parsed.action is None: if parsed is None or parsed.action is None:
self._console.emit(VOICE, f'heard "{transcript}" -> no command matched {self._timing()}', self._console.emit(VOICE, f'heard "{transcript}" -> no command matched', "yellow")
"yellow")
return return
action = parsed.action action = parsed.action
# a command was recognized — echo what we heard (green) before acting. note the
# matched wake phrase (magenta) when the transcript didn't literally contain it
# (so a loose match like "okay clouds" -> "okay claude" is visible).
head = self._console.paint(f'heard "{transcript}" -> {self._describe(action)}', "green")
note = ""
if parsed.wake and parsed.wake.replace(" ", "") not in transcript.lower().replace(" ", ""):
note = (self._console.paint(" (wake: ", "green")
+ self._console.paint(parsed.wake, "magenta")
+ self._console.paint(")", "green"))
tail = self._console.paint(f" {self._timing()}", "green")
self._console.emit(VOICE, f"{head}{note}{tail}")
def blue(s):
return self._console.paint(s, "brightblue")
if action.name == "mode": if action.name == "mode":
new_mode = str(action.arg) new_mode = str(action.arg)
if new_mode != self.mode: if new_mode != self.mode:
self.mode = new_mode self.mode = new_mode
self._console.emit(SYSTEM, f"{blue('mode')} -> {new_mode}") self._console.emit(SYSTEM, f"mode -> {new_mode}", "cyan")
self._refresh_state() self._refresh_state()
return return
if action.name == "set": if action.name == "set":
session = target.set_target(str(action.arg)) session = target.set_target(str(action.arg))
self._pending.pop(session, None) self._pending.pop(session, None)
self._console.emit(SYSTEM, f"{blue('set sticky')} -> {session}") self._console.emit(SYSTEM, f"set sticky -> {session}", "cyan")
self._refresh_state() self._refresh_state()
return return
if action.name == "unset": if action.name == "unset":
target.unset_target() target.unset_target()
self._console.emit(SYSTEM, f"{blue('unset')} (cleared)") self._console.emit(SYSTEM, "unset (cleared)", "cyan")
self._refresh_state() self._refresh_state()
return return
if action.name == "list": if action.name == "list":
sessions = target.list_sessions() sessions = target.list_sessions()
self._console.emit(SYSTEM, f"{blue('list')} -> " self._console.emit(SYSTEM, "list -> " + (", ".join(sessions) if sessions else "(none running)"))
+ (", ".join(sessions) if sessions else "(none running)"))
return
if action.name == "commands":
self._console.emit(HELP, "voice commands:")
for usage, desc in grammar.command_menu():
self._console.line(f" {self._console.paint(f'{usage:<26}', 'brightblue')} {desc}")
return
if action.name == "customs":
self._console.emit(SYSTEM, "custom commands arrive in v0.2.0 (contexts.toml)")
return
if action.name == "version":
self._console.emit(SYSTEM, f"claudedo {__version__}")
return return
if action.name == "debug": if action.name == "debug":
self._console.emit(VOICE, f'debug: "{action.arg}"', "yellow") self._console.emit(VOICE, f'debug: "{action.arg}"', "yellow")
@ -231,15 +202,12 @@ class Daemon:
self._console.emit(VOICE, f'heard "{transcript}" -> {reason} -> ' self._console.emit(VOICE, f'heard "{transcript}" -> {reason} -> '
f'{self._describe(action)} did nothing', "red") f'{self._describe(action)} did nothing', "red")
return return
self._inject(session, action) self._inject(session, transcript, reason, action)
def _inject(self, session: str, action) -> None: def _inject(self, session: str, transcript: str, reason: str, action) -> None:
"""run a resolved command against `session`, tracking the uncommitted-input """run a resolved command against `session`, tracking the uncommitted-input
buffer so backspace/erase delete only back to the last submit boundary. buffer so backspace/erase delete only back to the last submit boundary"""
heard = f'heard "{transcript}" ({reason})'
the 'heard ...' echo is already printed by _handle and the [session] prefix
names the target, so these lines just report the keystrokes injected.
"""
name = action.name name = action.name
if name == "type": if name == "type":
@ -249,38 +217,36 @@ class Daemon:
if self.config.type_autosend: if self.config.type_autosend:
inject.send_named(session, inject.keys.SUBMIT) inject.send_named(session, inject.keys.SUBMIT)
self._pending[session] = 0 self._pending[session] = 0
self._console.emit(session, f"typed {text!r}" self._console.emit(session, f"{heard} -> typed {text!r}"
+ (" + send" if self.config.type_autosend else ""), "green") + (" + send" if self.config.type_autosend else ""), "green")
return return
if name == "space": if name == "space":
n = int(action.arg) n = int(action.arg)
inject.perform(session, action) inject.perform(session, action)
self._pending[session] = self._pending.get(session, 0) + n self._pending[session] = self._pending.get(session, 0) + n
self._console.emit(session, f"space x{n}", "green") self._console.emit(session, f"{heard} -> space x{n}", "green")
return return
if name == "backspace": if name == "backspace":
n = int(action.arg) have = self._pending.get(session, 0)
n = min(int(action.arg), have)
if n: if n:
inject.perform(session, action) inject.perform(session, grammar.Action("backspace", n))
self._pending[session] = max(0, self._pending.get(session, 0) - n) self._pending[session] = have - n
self._console.emit(session, f"backspace x{n}", "green") self._console.emit(session, f"{heard} -> backspace x{n}"
+ ("" if n == int(action.arg) else " (capped at boundary)"), "green")
return return
if name == "erase": if name == "erase":
n = self._pending.get(session, 0) n = self._pending.get(session, 0)
if n: if n:
inject.perform(session, grammar.Action("erase", n)) inject.perform(session, grammar.Action("erase", n))
self._pending[session] = 0 self._pending[session] = 0
self._console.emit(session, f"erase x{n} (to last boundary)", "green") self._console.emit(session, f"{heard} -> erase x{n} (to last boundary)", "green")
return return
inject.perform(session, action) inject.perform(session, action)
if name == "submit": if name == "submit":
self._pending[session] = 0 self._pending[session] = 0
self._console.emit(session, f"injected {self._describe(action)}", "green") self._console.emit(session, f"{heard} -> {self._describe(action)}", "green")
def _timing(self) -> str:
"""compact STT latency suffix for heard lines (transcribe ms on audio secs)"""
return f"({self._last_stt_ms:.0f}ms/{self._last_audio_s:.1f}s)"
@staticmethod @staticmethod
def _describe(action) -> str: def _describe(action) -> str:
@ -305,8 +271,7 @@ class Daemon:
self._console.emit(SYSTEM, f"claudedo {self.mode} mode — Ctrl-C to stop", "bold") self._console.emit(SYSTEM, f"claudedo {self.mode} mode — Ctrl-C to stop", "bold")
self._console.emit(SYSTEM, f"model {cfg.stt_model} ({cfg.stt_language}) · mic {dev} · " self._console.emit(SYSTEM, f"model {cfg.stt_model} ({cfg.stt_language}) · mic {dev} · "
f"target {target_now}") f"target {target_now}")
wakes = ", ".join(self._console.paint(p, "magenta") for p in cfg.wake_phrases) self._console.emit(SYSTEM, "wake: " + ", ".join(cfg.wake_phrases))
self._console.emit(SYSTEM, f"wake: {wakes}")
def _refresh_state(self) -> None: def _refresh_state(self) -> None:
write_state(os.getpid(), self.mode, target.read_active()) write_state(os.getpid(), self.mode, target.read_active())
@ -326,15 +291,12 @@ class Daemon:
break break
if audio_chunk is None: if audio_chunk is None:
continue continue
t0 = time.monotonic()
transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate) transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate)
self._last_stt_ms = (time.monotonic() - t0) * 1000.0
self._last_audio_s = audio_chunk.size / self.config.samplerate
if not transcript: if not transcript:
continue continue
if self.mode == "listen" and not self._has_wake(transcript): if self.mode == "listen" and not self._has_wake(transcript):
if self.config.print_heard: if self.config.print_heard:
self._console.emit(VOICE, f'heard (dropped) "{transcript}" {self._timing()}', "red") self._console.emit(VOICE, f'heard (dropped) "{transcript}"', "red")
else: else:
self._console.emit(VOICE, "dropped: non-wake speech (not recorded)", "dim") self._console.emit(VOICE, "dropped: non-wake speech (not recorded)", "dim")
continue continue

View File

@ -50,9 +50,6 @@ _STICKY_VERBS = ("set", "sticky", "switch")
_ONESHOT_VERBS = ("target",) _ONESHOT_VERBS = ("target",)
_UNSET_VERBS = ("unset", "unsticky") _UNSET_VERBS = ("unset", "unsticky")
_LIST_VERBS = ("list", "sessions") _LIST_VERBS = ("list", "sessions")
_COMMANDS_VERBS = ("commands", "help", "menu")
_CUSTOMS_VERBS = ("customs", "custom")
_VERSION_VERBS = ("version",)
_SELECT_VERBS = ("select", "option", "choose", "number") _SELECT_VERBS = ("select", "option", "choose", "number")
# every command/synonym word, for biasing the STT toward the vocabulary we expect. # every command/synonym word, for biasing the STT toward the vocabulary we expect.
@ -60,8 +57,7 @@ _COMMAND_WORDS = (
_YES_VERBS + _NO_VERBS + _APPROVE_VERBS + _DENY_VERBS + _SUBMIT_VERBS _YES_VERBS + _NO_VERBS + _APPROVE_VERBS + _DENY_VERBS + _SUBMIT_VERBS
+ _CANCEL_VERBS + _TYPE_VERBS + _BACKSPACE_VERBS + _SPACE_VERBS + _ADD_VERBS + _CANCEL_VERBS + _TYPE_VERBS + _BACKSPACE_VERBS + _SPACE_VERBS + _ADD_VERBS
+ _ERASE_VERBS + _DEBUG_VERBS + _MODE_VERBS + _STICKY_VERBS + _ONESHOT_VERBS + _UNSET_VERBS + _ERASE_VERBS + _DEBUG_VERBS + _MODE_VERBS + _STICKY_VERBS + _ONESHOT_VERBS + _UNSET_VERBS
+ _LIST_VERBS + _COMMANDS_VERBS + _CUSTOMS_VERBS + _VERSION_VERBS + _LIST_VERBS + _SELECT_VERBS + ("ptt", "listen")
+ _SELECT_VERBS + ("ptt", "listen")
+ ("one", "two", "three", "four") + ("one", "two", "three", "four")
) )
DEFAULT_FILLER = ("select", "use", "choose") DEFAULT_FILLER = ("select", "use", "choose")
@ -87,14 +83,11 @@ class ParsedCommand:
one_shot is the session short-name from a leading ``target <name>`` (this command one_shot is the session short-name from a leading ``target <name>`` (this command
only; does not change the sticky default), or None. action is the command to run, only; does not change the sticky default), or None. action is the command to run,
or None if nothing matched after the wake phrase / one-shot / filler. wake is the or None if nothing matched after the wake phrase / one-shot / filler.
configured wake phrase that matched (e.g. "okay claude" for a heard "okay clouds"),
or None.
""" """
one_shot: str | None one_shot: str | None
action: Action | None action: Action | None
wake: str | None = None
def normalize(text: str) -> str: def normalize(text: str) -> str:
@ -128,32 +121,6 @@ def initial_prompt(wake_phrases: list[str]) -> str:
return ", ".join(vocabulary(wake_phrases)) return ", ".join(vocabulary(wake_phrases))
def command_menu() -> list[tuple[str, str]]:
"""the voice command menu as (usage, description) rows, for the `commands` cmd.
a small curated list keyed off the verb groups the speakable command surface,
NOT the cc shell kit.
"""
return [
("yes / no", "answer a yes/no prompt"),
("one..four", "pick numbered option 1-4"),
("approve / deny", "allow / deny a permission prompt"),
("send", "submit (Enter)"),
("cancel", "back out (Escape)"),
("type <text>", "insert literal text (no submit)"),
("space [n] / add a space", "insert n spaces"),
("backspace [n]", "delete n chars (to last submit)"),
("erase", "wipe the current input"),
("debug <text>", "echo to console (no inject)"),
("set <name>", "sticky target -> claude-<name>"),
("target <name> <cmd>", "one-shot to another session"),
("unset / list", "clear sticky / list sessions"),
("mode ptt|listen", "switch input mode"),
("commands / customs", "this menu / custom commands (v0.2.0)"),
("version", "print the claudedo version"),
]
def _ratio(a: str, b: str) -> float: def _ratio(a: str, b: str) -> float:
return SequenceMatcher(None, a, b).ratio() return SequenceMatcher(None, a, b).ratio()
@ -164,15 +131,13 @@ def _wake_variants(phrase: str) -> set[str]:
return {norm, norm.replace(" ", "")} return {norm, norm.replace(" ", "")}
def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float, def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> tuple[str | None, str | None]: require_wake: bool) -> str | None:
"""return (command remainder, matched wake phrase). """return the command remainder after the wake phrase.
if ``require_wake`` (listen mode) and no wake phrase is found at the start, the if ``require_wake`` (listen mode) and no wake phrase is found at the start,
remainder is None so the daemon discards the utterance. if not required (ptt return None so the daemon discards the utterance. if not required (ptt mode),
mode), a leading wake phrase is stripped when present but its absence is fine. a leading wake phrase is stripped when present but its absence is fine.
the matched phrase is the configured wake phrase that best matched (e.g. "okay
claude" for a heard "okay clouds"), or None when none matched.
matches leniently on a despaced prefix (whisper splits/joins the coined word matches leniently on a despaced prefix (whisper splits/joins the coined word
inconsistently) but always slices the remainder on a WORD boundary of the inconsistently) but always slices the remainder on a WORD boundary of the
@ -180,11 +145,10 @@ def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
""" """
norm = normalize(transcript) norm = normalize(transcript)
if not norm: if not norm:
return (None, None) if require_wake else ("", None) return None if require_wake else ""
words = norm.split(" ") words = norm.split(" ")
best_remainder: str | None = None best_remainder: str | None = None
best_phrase: str | None = None
best_score = 0.0 best_score = 0.0
for phrase in wake_phrases: for phrase in wake_phrases:
variants = _wake_variants(phrase) variants = _wake_variants(phrase)
@ -198,18 +162,10 @@ def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
if score >= threshold and score > best_score: if score >= threshold and score > best_score:
best_score = score best_score = score
best_remainder = " ".join(words[take:]).strip() best_remainder = " ".join(words[take:]).strip()
best_phrase = phrase
if best_remainder is not None: if best_remainder is not None:
return best_remainder, best_phrase return best_remainder
return (None, None) if require_wake else (norm, None) return None if require_wake else norm
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> str | None:
"""return the command remainder after the wake phrase (None if no wake in listen
mode). thin wrapper over strip_wake_match for callers that don't need the phrase"""
return strip_wake_match(transcript, wake_phrases, threshold, require_wake)[0]
def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool: def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool:
@ -296,14 +252,8 @@ def match_command(remainder: str, threshold: float) -> Action | None:
return Action("set", name) if name else None return Action("set", name) if name else None
if _fuzzy_in(head, _UNSET_VERBS, threshold) and not rest: if _fuzzy_in(head, _UNSET_VERBS, threshold) and not rest:
return Action("unset") return Action("unset")
if _fuzzy_in(head, _CUSTOMS_VERBS, threshold):
return Action("customs")
if _fuzzy_in(head, _COMMANDS_VERBS, threshold):
return Action("commands")
if _fuzzy_in(head, _LIST_VERBS, threshold): if _fuzzy_in(head, _LIST_VERBS, threshold):
return Action("list") return Action("list")
if _fuzzy_in(head, _VERSION_VERBS, threshold):
return Action("version")
return None return None
@ -333,7 +283,7 @@ def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
ParsedCommand with action=None means a wake phrase was present but no command ParsedCommand with action=None means a wake phrase was present but no command
matched. matched.
""" """
remainder, wake = strip_wake_match(transcript, wake_phrases, wake_threshold, require_wake) remainder = strip_wake(transcript, wake_phrases, wake_threshold, require_wake)
if remainder is None: if remainder is None:
return None return None
@ -345,4 +295,4 @@ def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
tokens = _strip_filler(tokens, filler, command_threshold) tokens = _strip_filler(tokens, filler, command_threshold)
action = match_command(" ".join(tokens), command_threshold) action = match_command(" ".join(tokens), command_threshold)
return ParsedCommand(one_shot=one_shot, action=action, wake=wake) return ParsedCommand(one_shot=one_shot, action=action)