Compare commits
No commits in common. "f177b46a4b57e7c3a7a6bb2567d31733901418d5" and "4abdfd56bc69eb14bbc4620a52095c45004bf0b8" have entirely different histories.
f177b46a4b
...
4abdfd56bc
26
README.md
26
README.md
@ -107,9 +107,7 @@ Wake phrases (listen mode), fuzzy-matched. The default list is **"claudedo"**,
|
|||||||
no token for the coined word "claudedo" and renders it as real words ("claude do"),
|
no token for the coined word "claudedo" and renders it as real words ("claude do"),
|
||||||
so that spelling is listed explicitly. Matching is lenient (case/space-insensitive).
|
so that spelling is listed explicitly. Matching is lenient (case/space-insensitive).
|
||||||
Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode
|
Add the spellings you actually see (turn on `print_heard` to find them). In PTT mode
|
||||||
the wake phrase is optional. When a command's wake phrase matched loosely (e.g. you
|
the wake phrase is optional.
|
||||||
said "okay clouds"), the heard line notes which phrase it assumed —
|
|
||||||
`heard "okay clouds list" -> LIST (wake: okay claude)`.
|
|
||||||
|
|
||||||
| Say | Does |
|
| Say | Does |
|
||||||
|---|---|
|
|---|---|
|
||||||
@ -127,9 +125,6 @@ said "okay clouds"), the heard line notes which phrase it assumed —
|
|||||||
| `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged |
|
| `target <name> <command>` | **one-shot** override: run that command on `claude-<name>` for this utterance only; sticky default unchanged |
|
||||||
| `unset` (alias `unsticky`) | clear the sticky target |
|
| `unset` (alias `unsticky`) | clear the sticky target |
|
||||||
| `list` | list running `claude-*` sessions to the daemon console |
|
| `list` | list running `claude-*` sessions to the daemon console |
|
||||||
| `commands` (alias `help`/`menu`) | print the voice-command menu to the console |
|
|
||||||
| `customs` (alias `custom`) | custom commands — arriving in v0.2.0 (stub for now) |
|
|
||||||
| `version` | print the claudedo version to the console |
|
|
||||||
| `cancel` / `escape` | back out of a prompt |
|
| `cancel` / `escape` | back out of a prompt |
|
||||||
|
|
||||||
Optional filler (`select` / `use` / `choose`) may precede any command and is ignored:
|
Optional filler (`select` / `use` / `choose`) may precede any command and is ignored:
|
||||||
@ -192,27 +187,22 @@ If Claude Code changes its prompt UI, re-confirm against a live session and upda
|
|||||||
Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
|
Everything tunable lives in [`config.toml`](config.toml): wake phrases, mode + PTT
|
||||||
key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]`
|
key, Whisper model/language/device, `[vad]` endpointing, and `[behavior]`
|
||||||
(`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`).
|
(`type_autosend`, fuzzy thresholds, `filler_words`, `auto_target`, `print_heard`).
|
||||||
The default model is **`small.en`** (the English-only small model — ~1s/command on a
|
The default model is **`medium`** (best accuracy for the coined wake word on a strong
|
||||||
strong CPU, more accurate on English than multilingual `small` at the same speed);
|
CPU); `small` is faster/less accurate, `large-v3` most accurate. `claudedo -c <path>
|
||||||
`medium`/`medium.en` are more accurate but ~3× slower (noticeable lag), `base.en` is
|
...` points at a specific config; otherwise it searches `$CLAUDEDO_CONFIG`,
|
||||||
snappier/less accurate, `large-v3` most accurate/slowest. Every `heard` line shows the
|
`~/.config/claudedo/config.toml`, then `./config.toml`.
|
||||||
STT latency as `(<ms>/<audio>s)` so you can see what a model change costs. VAD
|
|
||||||
endpointing ends a capture after `[vad].silence_ms` (700) of trailing silence, capped
|
|
||||||
at `max_seconds` (15). `claudedo -c <path> ...` points at a specific config; otherwise
|
|
||||||
it searches
|
|
||||||
`$CLAUDEDO_CONFIG`, `~/.config/claudedo/config.toml`, then `./config.toml`.
|
|
||||||
|
|
||||||
- **STT biasing.** The transcriber is seeded with an `initial_prompt` built from the
|
- **STT biasing.** The transcriber is seeded with an `initial_prompt` built from the
|
||||||
configured wake phrases + command vocabulary (one source — `grammar.vocabulary()`),
|
configured wake phrases + command vocabulary (one source — `grammar.vocabulary()`),
|
||||||
so Whisper is conditioned to expect "claudedo" and the command words.
|
so Whisper is conditioned to expect "claudedo" and the command words.
|
||||||
- **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.65`, lenient) vs
|
- **Split fuzzy thresholds.** `wake_fuzzy_threshold` (default `0.6`, lenient) vs
|
||||||
`command_fuzzy_threshold` (default `0.8`, tight). The asymmetry is deliberate: a
|
`command_fuzzy_threshold` (default `0.8`, tight). The asymmetry is deliberate: a
|
||||||
false *wake* is cheap (it wakes, finds no command, does nothing), but a false
|
false *wake* is cheap (it wakes, finds no command, does nothing), but a false
|
||||||
*command* fires the wrong action. Prefer expanding command synonyms over loosening
|
*command* fires the wrong action. Prefer expanding command synonyms over loosening
|
||||||
the command threshold.
|
the command threshold.
|
||||||
- **`[vad]` endpointing.** Capture starts on speech and ends after `silence_ms`
|
- **`[vad]` endpointing.** Capture starts on speech and ends after `silence_ms`
|
||||||
(default 700) of trailing silence — Alexa-style record-until-pause — capped at
|
(default 800) of trailing silence — Alexa-style record-until-pause — capped at
|
||||||
`max_seconds` (default 15). The pause both ends a command and separates it from
|
`max_seconds` (default 10). The pause both ends a command and separates it from
|
||||||
following chatter (the chatter is a separate capture the wake gate discards).
|
following chatter (the chatter is a separate capture the wake gate discards).
|
||||||
- **`auto_target`** (default `false`): with no sticky target and one session running,
|
- **`auto_target`** (default `false`): with no sticky target and one session running,
|
||||||
`false` does nothing and asks you to `set`; `true` auto-uses that session.
|
`false` does nothing and asks you to `set`; `true` auto-uses that session.
|
||||||
|
|||||||
19
config.toml
19
config.toml
@ -21,12 +21,10 @@ mode = "listen"
|
|||||||
ptt_key = "space"
|
ptt_key = "space"
|
||||||
|
|
||||||
[stt]
|
[stt]
|
||||||
# faster-whisper model size. "small.en" is the default — the English-only small model
|
# faster-whisper model size. "medium" is the default — biggest accuracy gain for the
|
||||||
# (~1s/command on a strong cpu, more accurate on english than multilingual "small" at
|
# coined wake word ("claudedo" / "claude do") and fine on a strong cpu. "small" is
|
||||||
# the same speed). "medium"/"medium.en" are more accurate but ~3x slower (noticeable
|
# faster but less accurate; "large-v3" is most accurate if medium still struggles.
|
||||||
# lag); "large-v3" is most accurate and slowest. drop to "base.en" for max snappiness
|
model = "medium"
|
||||||
# (less accurate). bump only if recognition is poor.
|
|
||||||
model = "small.en"
|
|
||||||
language = "en"
|
language = "en"
|
||||||
# mic device: "auto", or a sounddevice device index (integer) / substring of a
|
# mic device: "auto", or a sounddevice device index (integer) / substring of a
|
||||||
# device name. run `claudedo test-audio` to list devices.
|
# device name. run `claudedo test-audio` to list devices.
|
||||||
@ -48,10 +46,9 @@ min_utterance = 0.3
|
|||||||
# onset and ends after this much trailing silence — the natural end of an utterance.
|
# onset and ends after this much trailing silence — the natural end of an utterance.
|
||||||
# a real pause both ends the command AND separates it from following chatter (the
|
# a real pause both ends the command AND separates it from following chatter (the
|
||||||
# chatter becomes a separate capture that the wake gate then discards).
|
# chatter becomes a separate capture that the wake gate then discards).
|
||||||
silence_ms = 700
|
silence_ms = 800
|
||||||
# hard cap so continuous noise can't record forever (also the ceiling for a long
|
# hard cap so continuous noise can't record forever.
|
||||||
# dictated `type` phrase).
|
max_seconds = 10.0
|
||||||
max_seconds = 15.0
|
|
||||||
|
|
||||||
[behavior]
|
[behavior]
|
||||||
# dictation never auto-submits: "type <phrase>" inserts literal text only; you say
|
# dictation never auto-submits: "type <phrase>" inserts literal text only; you say
|
||||||
@ -61,7 +58,7 @@ type_autosend = false
|
|||||||
# wakes, finds no command, does nothing), so wake is lenient; a false COMMAND fires
|
# wakes, finds no command, does nothing), so wake is lenient; a false COMMAND fires
|
||||||
# the WRONG action, so commands stay tight. lower = more lenient = more matches.
|
# the WRONG action, so commands stay tight. lower = more lenient = more matches.
|
||||||
# prefer expanding command synonyms over loosening command_fuzzy_threshold.
|
# prefer expanding command synonyms over loosening command_fuzzy_threshold.
|
||||||
wake_fuzzy_threshold = 0.65
|
wake_fuzzy_threshold = 0.6
|
||||||
command_fuzzy_threshold = 0.8
|
command_fuzzy_threshold = 0.8
|
||||||
# optional filler words that may precede a command and are ignored for matching:
|
# optional filler words that may precede a command and are ignored for matching:
|
||||||
# "select yes" / "use yes" behave like "yes". (a filler word followed by a digit is
|
# "select yes" / "use yes" behave like "yes". (a filler word followed by a digit is
|
||||||
|
|||||||
@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "claudedo"
|
name = "claudedo"
|
||||||
version = "0.1.4"
|
version = "0.1.3"
|
||||||
description = "voice-control daemon for claude code (local STT -> tmux send-keys)"
|
description = "voice-control daemon for claude code (local STT -> tmux send-keys)"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.10"
|
requires-python = ">=3.10"
|
||||||
|
|||||||
@ -1,3 +1,3 @@
|
|||||||
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)"""
|
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)"""
|
||||||
|
|
||||||
__version__ = "0.1.4"
|
__version__ = "0.1.3"
|
||||||
|
|||||||
@ -17,10 +17,7 @@ except ModuleNotFoundError:
|
|||||||
log = logging.getLogger(__name__)
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
_VALID_MODES = ("listen", "ptt")
|
_VALID_MODES = ("listen", "ptt")
|
||||||
_VALID_MODELS = (
|
_VALID_MODELS = ("tiny", "base", "small", "medium", "large-v2", "large-v3")
|
||||||
"tiny", "base", "small", "medium", "large-v1", "large-v2", "large-v3",
|
|
||||||
"tiny.en", "base.en", "small.en", "medium.en",
|
|
||||||
)
|
|
||||||
|
|
||||||
DEFAULT_CONFIG_PATHS = (
|
DEFAULT_CONFIG_PATHS = (
|
||||||
Path(os.environ.get("CLAUDEDO_CONFIG", "")) if os.environ.get("CLAUDEDO_CONFIG") else None,
|
Path(os.environ.get("CLAUDEDO_CONFIG", "")) if os.environ.get("CLAUDEDO_CONFIG") else None,
|
||||||
@ -102,7 +99,7 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
|
|||||||
if mode not in _VALID_MODES:
|
if mode not in _VALID_MODES:
|
||||||
raise ConfigError(f"[input].mode must be one of {_VALID_MODES}, got {mode!r}")
|
raise ConfigError(f"[input].mode must be one of {_VALID_MODES}, got {mode!r}")
|
||||||
|
|
||||||
model = _require(raw, "stt", "model", (str,), "small.en")
|
model = _require(raw, "stt", "model", (str,), "medium")
|
||||||
if model not in _VALID_MODELS:
|
if model not in _VALID_MODELS:
|
||||||
log.warning("unknown stt model %r — passing through to faster-whisper", model)
|
log.warning("unknown stt model %r — passing through to faster-whisper", model)
|
||||||
|
|
||||||
@ -117,11 +114,11 @@ def load_config(explicit: str | os.PathLike | None = None) -> Config:
|
|||||||
samplerate=int(_require(raw, "audio", "samplerate", (int,), 16000)),
|
samplerate=int(_require(raw, "audio", "samplerate", (int,), 16000)),
|
||||||
channels=int(_require(raw, "audio", "channels", (int,), 1)),
|
channels=int(_require(raw, "audio", "channels", (int,), 1)),
|
||||||
silence_threshold=float(_require(raw, "audio", "silence_threshold", (int, float), 0.012)),
|
silence_threshold=float(_require(raw, "audio", "silence_threshold", (int, float), 0.012)),
|
||||||
vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 700)),
|
vad_silence_ms=int(_require(raw, "vad", "silence_ms", (int,), 800)),
|
||||||
vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 15.0)),
|
vad_max_seconds=float(_require(raw, "vad", "max_seconds", (int, float), 10.0)),
|
||||||
min_utterance=float(_require(raw, "audio", "min_utterance", (int, float), 0.3)),
|
min_utterance=float(_require(raw, "audio", "min_utterance", (int, float), 0.3)),
|
||||||
type_autosend=bool(_require(raw, "behavior", "type_autosend", (bool,), False)),
|
type_autosend=bool(_require(raw, "behavior", "type_autosend", (bool,), False)),
|
||||||
wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.65)),
|
wake_fuzzy_threshold=float(_require(raw, "behavior", "wake_fuzzy_threshold", (int, float), 0.6)),
|
||||||
command_fuzzy_threshold=float(_require(raw, "behavior", "command_fuzzy_threshold",
|
command_fuzzy_threshold=float(_require(raw, "behavior", "command_fuzzy_threshold",
|
||||||
(int, float), 0.8)),
|
(int, float), 0.8)),
|
||||||
filler_words=tuple(_require(raw, "behavior", "filler_words", (list,),
|
filler_words=tuple(_require(raw, "behavior", "filler_words", (list,),
|
||||||
|
|||||||
@ -18,16 +18,12 @@ _COLORS = {
|
|||||||
"red": "\033[31m",
|
"red": "\033[31m",
|
||||||
"yellow": "\033[33m",
|
"yellow": "\033[33m",
|
||||||
"cyan": "\033[36m",
|
"cyan": "\033[36m",
|
||||||
"blue": "\033[34m",
|
|
||||||
"brightblue": "\033[94m",
|
|
||||||
"magenta": "\033[35m",
|
|
||||||
"dim": "\033[2m",
|
"dim": "\033[2m",
|
||||||
"bold": "\033[1m",
|
"bold": "\033[1m",
|
||||||
}
|
}
|
||||||
|
|
||||||
SYSTEM = "SYSTEM"
|
SYSTEM = "SYSTEM"
|
||||||
VOICE = "VOICE"
|
VOICE = "VOICE"
|
||||||
HELP = "HELP"
|
|
||||||
|
|
||||||
|
|
||||||
class Console:
|
class Console:
|
||||||
@ -49,17 +45,7 @@ class Console:
|
|||||||
return text
|
return text
|
||||||
return f"{_COLORS[color]}{text}{RESET}"
|
return f"{_COLORS[color]}{text}{RESET}"
|
||||||
|
|
||||||
def paint(self, text: str, color: str | None) -> str:
|
|
||||||
"""public colorizer for pre-coloring a fragment of a message (e.g. a command
|
|
||||||
word) before passing it to emit() with color=None"""
|
|
||||||
return self._paint(text, color)
|
|
||||||
|
|
||||||
def emit(self, prefix: str, message: str, color: str | None = None) -> None:
|
def emit(self, prefix: str, message: str, color: str | None = None) -> None:
|
||||||
"""print one line: ``HH:MM:SS [prefix] message`` (message optionally colored)"""
|
"""print one line: ``HH:MM:SS [prefix] message`` (message optionally colored)"""
|
||||||
line = f"{self._stamp()} {self._paint(f'[{prefix}]', 'dim')} {self._paint(message, color)}"
|
line = f"{self._stamp()} {self._paint(f'[{prefix}]', 'dim')} {self._paint(message, color)}"
|
||||||
print(line, file=self.stream, flush=True)
|
print(line, file=self.stream, flush=True)
|
||||||
|
|
||||||
def line(self, message: str, color: str | None = None) -> None:
|
|
||||||
"""print a bare continuation line (no timestamp/prefix) — for multi-row blocks
|
|
||||||
like the help menu, indented under a preceding header"""
|
|
||||||
print(self._paint(message, color), file=self.stream, flush=True)
|
|
||||||
|
|||||||
@ -16,9 +16,9 @@ import sys
|
|||||||
import time
|
import time
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from . import __version__, audio, grammar, inject, target
|
from . import audio, grammar, inject, target
|
||||||
from .config import Config
|
from .config import Config
|
||||||
from .console import HELP, SYSTEM, VOICE, Console
|
from .console import SYSTEM, VOICE, Console
|
||||||
from .stt import Transcriber
|
from .stt import Transcriber
|
||||||
|
|
||||||
log = logging.getLogger(__name__)
|
log = logging.getLogger(__name__)
|
||||||
@ -117,8 +117,6 @@ class Daemon:
|
|||||||
self._ptt = _PTTKey()
|
self._ptt = _PTTKey()
|
||||||
self._pending: dict[str, int] = {}
|
self._pending: dict[str, int] = {}
|
||||||
self._console = Console()
|
self._console = Console()
|
||||||
self._last_stt_ms = 0.0
|
|
||||||
self._last_audio_s = 0.0
|
|
||||||
|
|
||||||
def _install_signals(self) -> None:
|
def _install_signals(self) -> None:
|
||||||
signal.signal(signal.SIGTERM, self._on_signal)
|
signal.signal(signal.SIGTERM, self._on_signal)
|
||||||
@ -169,58 +167,31 @@ class Daemon:
|
|||||||
parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.wake_fuzzy_threshold,
|
parsed = grammar.parse(transcript, cfg.wake_phrases, cfg.wake_fuzzy_threshold,
|
||||||
cfg.command_fuzzy_threshold, require_wake, filler=cfg.filler_words)
|
cfg.command_fuzzy_threshold, require_wake, filler=cfg.filler_words)
|
||||||
if parsed is None or parsed.action is None:
|
if parsed is None or parsed.action is None:
|
||||||
self._console.emit(VOICE, f'heard "{transcript}" -> no command matched {self._timing()}',
|
self._console.emit(VOICE, f'heard "{transcript}" -> no command matched', "yellow")
|
||||||
"yellow")
|
|
||||||
return
|
return
|
||||||
action = parsed.action
|
action = parsed.action
|
||||||
|
|
||||||
# a command was recognized — echo what we heard (green) before acting. note the
|
|
||||||
# matched wake phrase (magenta) when the transcript didn't literally contain it
|
|
||||||
# (so a loose match like "okay clouds" -> "okay claude" is visible).
|
|
||||||
head = self._console.paint(f'heard "{transcript}" -> {self._describe(action)}', "green")
|
|
||||||
note = ""
|
|
||||||
if parsed.wake and parsed.wake.replace(" ", "") not in transcript.lower().replace(" ", ""):
|
|
||||||
note = (self._console.paint(" (wake: ", "green")
|
|
||||||
+ self._console.paint(parsed.wake, "magenta")
|
|
||||||
+ self._console.paint(")", "green"))
|
|
||||||
tail = self._console.paint(f" {self._timing()}", "green")
|
|
||||||
self._console.emit(VOICE, f"{head}{note}{tail}")
|
|
||||||
|
|
||||||
def blue(s):
|
|
||||||
return self._console.paint(s, "brightblue")
|
|
||||||
if action.name == "mode":
|
if action.name == "mode":
|
||||||
new_mode = str(action.arg)
|
new_mode = str(action.arg)
|
||||||
if new_mode != self.mode:
|
if new_mode != self.mode:
|
||||||
self.mode = new_mode
|
self.mode = new_mode
|
||||||
self._console.emit(SYSTEM, f"{blue('mode')} -> {new_mode}")
|
self._console.emit(SYSTEM, f"mode -> {new_mode}", "cyan")
|
||||||
self._refresh_state()
|
self._refresh_state()
|
||||||
return
|
return
|
||||||
if action.name == "set":
|
if action.name == "set":
|
||||||
session = target.set_target(str(action.arg))
|
session = target.set_target(str(action.arg))
|
||||||
self._pending.pop(session, None)
|
self._pending.pop(session, None)
|
||||||
self._console.emit(SYSTEM, f"{blue('set sticky')} -> {session}")
|
self._console.emit(SYSTEM, f"set sticky -> {session}", "cyan")
|
||||||
self._refresh_state()
|
self._refresh_state()
|
||||||
return
|
return
|
||||||
if action.name == "unset":
|
if action.name == "unset":
|
||||||
target.unset_target()
|
target.unset_target()
|
||||||
self._console.emit(SYSTEM, f"{blue('unset')} (cleared)")
|
self._console.emit(SYSTEM, "unset (cleared)", "cyan")
|
||||||
self._refresh_state()
|
self._refresh_state()
|
||||||
return
|
return
|
||||||
if action.name == "list":
|
if action.name == "list":
|
||||||
sessions = target.list_sessions()
|
sessions = target.list_sessions()
|
||||||
self._console.emit(SYSTEM, f"{blue('list')} -> "
|
self._console.emit(SYSTEM, "list -> " + (", ".join(sessions) if sessions else "(none running)"))
|
||||||
+ (", ".join(sessions) if sessions else "(none running)"))
|
|
||||||
return
|
|
||||||
if action.name == "commands":
|
|
||||||
self._console.emit(HELP, "voice commands:")
|
|
||||||
for usage, desc in grammar.command_menu():
|
|
||||||
self._console.line(f" {self._console.paint(f'{usage:<26}', 'brightblue')} {desc}")
|
|
||||||
return
|
|
||||||
if action.name == "customs":
|
|
||||||
self._console.emit(SYSTEM, "custom commands arrive in v0.2.0 (contexts.toml)")
|
|
||||||
return
|
|
||||||
if action.name == "version":
|
|
||||||
self._console.emit(SYSTEM, f"claudedo {__version__}")
|
|
||||||
return
|
return
|
||||||
if action.name == "debug":
|
if action.name == "debug":
|
||||||
self._console.emit(VOICE, f'debug: "{action.arg}"', "yellow")
|
self._console.emit(VOICE, f'debug: "{action.arg}"', "yellow")
|
||||||
@ -231,15 +202,12 @@ class Daemon:
|
|||||||
self._console.emit(VOICE, f'heard "{transcript}" -> {reason} -> '
|
self._console.emit(VOICE, f'heard "{transcript}" -> {reason} -> '
|
||||||
f'{self._describe(action)} did nothing', "red")
|
f'{self._describe(action)} did nothing', "red")
|
||||||
return
|
return
|
||||||
self._inject(session, action)
|
self._inject(session, transcript, reason, action)
|
||||||
|
|
||||||
def _inject(self, session: str, action) -> None:
|
def _inject(self, session: str, transcript: str, reason: str, action) -> None:
|
||||||
"""run a resolved command against `session`, tracking the uncommitted-input
|
"""run a resolved command against `session`, tracking the uncommitted-input
|
||||||
buffer so backspace/erase delete only back to the last submit boundary.
|
buffer so backspace/erase delete only back to the last submit boundary"""
|
||||||
|
heard = f'heard "{transcript}" ({reason})'
|
||||||
the 'heard ...' echo is already printed by _handle and the [session] prefix
|
|
||||||
names the target, so these lines just report the keystrokes injected.
|
|
||||||
"""
|
|
||||||
name = action.name
|
name = action.name
|
||||||
|
|
||||||
if name == "type":
|
if name == "type":
|
||||||
@ -249,38 +217,36 @@ class Daemon:
|
|||||||
if self.config.type_autosend:
|
if self.config.type_autosend:
|
||||||
inject.send_named(session, inject.keys.SUBMIT)
|
inject.send_named(session, inject.keys.SUBMIT)
|
||||||
self._pending[session] = 0
|
self._pending[session] = 0
|
||||||
self._console.emit(session, f"typed {text!r}"
|
self._console.emit(session, f"{heard} -> typed {text!r}"
|
||||||
+ (" + send" if self.config.type_autosend else ""), "green")
|
+ (" + send" if self.config.type_autosend else ""), "green")
|
||||||
return
|
return
|
||||||
if name == "space":
|
if name == "space":
|
||||||
n = int(action.arg)
|
n = int(action.arg)
|
||||||
inject.perform(session, action)
|
inject.perform(session, action)
|
||||||
self._pending[session] = self._pending.get(session, 0) + n
|
self._pending[session] = self._pending.get(session, 0) + n
|
||||||
self._console.emit(session, f"space x{n}", "green")
|
self._console.emit(session, f"{heard} -> space x{n}", "green")
|
||||||
return
|
return
|
||||||
if name == "backspace":
|
if name == "backspace":
|
||||||
n = int(action.arg)
|
have = self._pending.get(session, 0)
|
||||||
|
n = min(int(action.arg), have)
|
||||||
if n:
|
if n:
|
||||||
inject.perform(session, action)
|
inject.perform(session, grammar.Action("backspace", n))
|
||||||
self._pending[session] = max(0, self._pending.get(session, 0) - n)
|
self._pending[session] = have - n
|
||||||
self._console.emit(session, f"backspace x{n}", "green")
|
self._console.emit(session, f"{heard} -> backspace x{n}"
|
||||||
|
+ ("" if n == int(action.arg) else " (capped at boundary)"), "green")
|
||||||
return
|
return
|
||||||
if name == "erase":
|
if name == "erase":
|
||||||
n = self._pending.get(session, 0)
|
n = self._pending.get(session, 0)
|
||||||
if n:
|
if n:
|
||||||
inject.perform(session, grammar.Action("erase", n))
|
inject.perform(session, grammar.Action("erase", n))
|
||||||
self._pending[session] = 0
|
self._pending[session] = 0
|
||||||
self._console.emit(session, f"erase x{n} (to last boundary)", "green")
|
self._console.emit(session, f"{heard} -> erase x{n} (to last boundary)", "green")
|
||||||
return
|
return
|
||||||
|
|
||||||
inject.perform(session, action)
|
inject.perform(session, action)
|
||||||
if name == "submit":
|
if name == "submit":
|
||||||
self._pending[session] = 0
|
self._pending[session] = 0
|
||||||
self._console.emit(session, f"injected {self._describe(action)}", "green")
|
self._console.emit(session, f"{heard} -> {self._describe(action)}", "green")
|
||||||
|
|
||||||
def _timing(self) -> str:
|
|
||||||
"""compact STT latency suffix for heard lines (transcribe ms on audio secs)"""
|
|
||||||
return f"({self._last_stt_ms:.0f}ms/{self._last_audio_s:.1f}s)"
|
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _describe(action) -> str:
|
def _describe(action) -> str:
|
||||||
@ -305,8 +271,7 @@ class Daemon:
|
|||||||
self._console.emit(SYSTEM, f"claudedo {self.mode} mode — Ctrl-C to stop", "bold")
|
self._console.emit(SYSTEM, f"claudedo {self.mode} mode — Ctrl-C to stop", "bold")
|
||||||
self._console.emit(SYSTEM, f"model {cfg.stt_model} ({cfg.stt_language}) · mic {dev} · "
|
self._console.emit(SYSTEM, f"model {cfg.stt_model} ({cfg.stt_language}) · mic {dev} · "
|
||||||
f"target {target_now}")
|
f"target {target_now}")
|
||||||
wakes = ", ".join(self._console.paint(p, "magenta") for p in cfg.wake_phrases)
|
self._console.emit(SYSTEM, "wake: " + ", ".join(cfg.wake_phrases))
|
||||||
self._console.emit(SYSTEM, f"wake: {wakes}")
|
|
||||||
|
|
||||||
def _refresh_state(self) -> None:
|
def _refresh_state(self) -> None:
|
||||||
write_state(os.getpid(), self.mode, target.read_active())
|
write_state(os.getpid(), self.mode, target.read_active())
|
||||||
@ -326,15 +291,12 @@ class Daemon:
|
|||||||
break
|
break
|
||||||
if audio_chunk is None:
|
if audio_chunk is None:
|
||||||
continue
|
continue
|
||||||
t0 = time.monotonic()
|
|
||||||
transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate)
|
transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate)
|
||||||
self._last_stt_ms = (time.monotonic() - t0) * 1000.0
|
|
||||||
self._last_audio_s = audio_chunk.size / self.config.samplerate
|
|
||||||
if not transcript:
|
if not transcript:
|
||||||
continue
|
continue
|
||||||
if self.mode == "listen" and not self._has_wake(transcript):
|
if self.mode == "listen" and not self._has_wake(transcript):
|
||||||
if self.config.print_heard:
|
if self.config.print_heard:
|
||||||
self._console.emit(VOICE, f'heard (dropped) "{transcript}" {self._timing()}', "red")
|
self._console.emit(VOICE, f'heard (dropped) "{transcript}"', "red")
|
||||||
else:
|
else:
|
||||||
self._console.emit(VOICE, "dropped: non-wake speech (not recorded)", "dim")
|
self._console.emit(VOICE, "dropped: non-wake speech (not recorded)", "dim")
|
||||||
continue
|
continue
|
||||||
|
|||||||
@ -50,9 +50,6 @@ _STICKY_VERBS = ("set", "sticky", "switch")
|
|||||||
_ONESHOT_VERBS = ("target",)
|
_ONESHOT_VERBS = ("target",)
|
||||||
_UNSET_VERBS = ("unset", "unsticky")
|
_UNSET_VERBS = ("unset", "unsticky")
|
||||||
_LIST_VERBS = ("list", "sessions")
|
_LIST_VERBS = ("list", "sessions")
|
||||||
_COMMANDS_VERBS = ("commands", "help", "menu")
|
|
||||||
_CUSTOMS_VERBS = ("customs", "custom")
|
|
||||||
_VERSION_VERBS = ("version",)
|
|
||||||
_SELECT_VERBS = ("select", "option", "choose", "number")
|
_SELECT_VERBS = ("select", "option", "choose", "number")
|
||||||
|
|
||||||
# every command/synonym word, for biasing the STT toward the vocabulary we expect.
|
# every command/synonym word, for biasing the STT toward the vocabulary we expect.
|
||||||
@ -60,8 +57,7 @@ _COMMAND_WORDS = (
|
|||||||
_YES_VERBS + _NO_VERBS + _APPROVE_VERBS + _DENY_VERBS + _SUBMIT_VERBS
|
_YES_VERBS + _NO_VERBS + _APPROVE_VERBS + _DENY_VERBS + _SUBMIT_VERBS
|
||||||
+ _CANCEL_VERBS + _TYPE_VERBS + _BACKSPACE_VERBS + _SPACE_VERBS + _ADD_VERBS
|
+ _CANCEL_VERBS + _TYPE_VERBS + _BACKSPACE_VERBS + _SPACE_VERBS + _ADD_VERBS
|
||||||
+ _ERASE_VERBS + _DEBUG_VERBS + _MODE_VERBS + _STICKY_VERBS + _ONESHOT_VERBS + _UNSET_VERBS
|
+ _ERASE_VERBS + _DEBUG_VERBS + _MODE_VERBS + _STICKY_VERBS + _ONESHOT_VERBS + _UNSET_VERBS
|
||||||
+ _LIST_VERBS + _COMMANDS_VERBS + _CUSTOMS_VERBS + _VERSION_VERBS
|
+ _LIST_VERBS + _SELECT_VERBS + ("ptt", "listen")
|
||||||
+ _SELECT_VERBS + ("ptt", "listen")
|
|
||||||
+ ("one", "two", "three", "four")
|
+ ("one", "two", "three", "four")
|
||||||
)
|
)
|
||||||
DEFAULT_FILLER = ("select", "use", "choose")
|
DEFAULT_FILLER = ("select", "use", "choose")
|
||||||
@ -87,14 +83,11 @@ class ParsedCommand:
|
|||||||
|
|
||||||
one_shot is the session short-name from a leading ``target <name>`` (this command
|
one_shot is the session short-name from a leading ``target <name>`` (this command
|
||||||
only; does not change the sticky default), or None. action is the command to run,
|
only; does not change the sticky default), or None. action is the command to run,
|
||||||
or None if nothing matched after the wake phrase / one-shot / filler. wake is the
|
or None if nothing matched after the wake phrase / one-shot / filler.
|
||||||
configured wake phrase that matched (e.g. "okay claude" for a heard "okay clouds"),
|
|
||||||
or None.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
one_shot: str | None
|
one_shot: str | None
|
||||||
action: Action | None
|
action: Action | None
|
||||||
wake: str | None = None
|
|
||||||
|
|
||||||
|
|
||||||
def normalize(text: str) -> str:
|
def normalize(text: str) -> str:
|
||||||
@ -128,32 +121,6 @@ def initial_prompt(wake_phrases: list[str]) -> str:
|
|||||||
return ", ".join(vocabulary(wake_phrases))
|
return ", ".join(vocabulary(wake_phrases))
|
||||||
|
|
||||||
|
|
||||||
def command_menu() -> list[tuple[str, str]]:
|
|
||||||
"""the voice command menu as (usage, description) rows, for the `commands` cmd.
|
|
||||||
|
|
||||||
a small curated list keyed off the verb groups — the speakable command surface,
|
|
||||||
NOT the cc shell kit.
|
|
||||||
"""
|
|
||||||
return [
|
|
||||||
("yes / no", "answer a yes/no prompt"),
|
|
||||||
("one..four", "pick numbered option 1-4"),
|
|
||||||
("approve / deny", "allow / deny a permission prompt"),
|
|
||||||
("send", "submit (Enter)"),
|
|
||||||
("cancel", "back out (Escape)"),
|
|
||||||
("type <text>", "insert literal text (no submit)"),
|
|
||||||
("space [n] / add a space", "insert n spaces"),
|
|
||||||
("backspace [n]", "delete n chars (to last submit)"),
|
|
||||||
("erase", "wipe the current input"),
|
|
||||||
("debug <text>", "echo to console (no inject)"),
|
|
||||||
("set <name>", "sticky target -> claude-<name>"),
|
|
||||||
("target <name> <cmd>", "one-shot to another session"),
|
|
||||||
("unset / list", "clear sticky / list sessions"),
|
|
||||||
("mode ptt|listen", "switch input mode"),
|
|
||||||
("commands / customs", "this menu / custom commands (v0.2.0)"),
|
|
||||||
("version", "print the claudedo version"),
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def _ratio(a: str, b: str) -> float:
|
def _ratio(a: str, b: str) -> float:
|
||||||
return SequenceMatcher(None, a, b).ratio()
|
return SequenceMatcher(None, a, b).ratio()
|
||||||
|
|
||||||
@ -164,15 +131,13 @@ def _wake_variants(phrase: str) -> set[str]:
|
|||||||
return {norm, norm.replace(" ", "")}
|
return {norm, norm.replace(" ", "")}
|
||||||
|
|
||||||
|
|
||||||
def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
|
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
|
||||||
require_wake: bool) -> tuple[str | None, str | None]:
|
require_wake: bool) -> str | None:
|
||||||
"""return (command remainder, matched wake phrase).
|
"""return the command remainder after the wake phrase.
|
||||||
|
|
||||||
if ``require_wake`` (listen mode) and no wake phrase is found at the start, the
|
if ``require_wake`` (listen mode) and no wake phrase is found at the start,
|
||||||
remainder is None so the daemon discards the utterance. if not required (ptt
|
return None so the daemon discards the utterance. if not required (ptt mode),
|
||||||
mode), a leading wake phrase is stripped when present but its absence is fine.
|
a leading wake phrase is stripped when present but its absence is fine.
|
||||||
the matched phrase is the configured wake phrase that best matched (e.g. "okay
|
|
||||||
claude" for a heard "okay clouds"), or None when none matched.
|
|
||||||
|
|
||||||
matches leniently on a despaced prefix (whisper splits/joins the coined word
|
matches leniently on a despaced prefix (whisper splits/joins the coined word
|
||||||
inconsistently) but always slices the remainder on a WORD boundary of the
|
inconsistently) but always slices the remainder on a WORD boundary of the
|
||||||
@ -180,11 +145,10 @@ def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
|
|||||||
"""
|
"""
|
||||||
norm = normalize(transcript)
|
norm = normalize(transcript)
|
||||||
if not norm:
|
if not norm:
|
||||||
return (None, None) if require_wake else ("", None)
|
return None if require_wake else ""
|
||||||
words = norm.split(" ")
|
words = norm.split(" ")
|
||||||
|
|
||||||
best_remainder: str | None = None
|
best_remainder: str | None = None
|
||||||
best_phrase: str | None = None
|
|
||||||
best_score = 0.0
|
best_score = 0.0
|
||||||
for phrase in wake_phrases:
|
for phrase in wake_phrases:
|
||||||
variants = _wake_variants(phrase)
|
variants = _wake_variants(phrase)
|
||||||
@ -198,18 +162,10 @@ def strip_wake_match(transcript: str, wake_phrases: list[str], threshold: float,
|
|||||||
if score >= threshold and score > best_score:
|
if score >= threshold and score > best_score:
|
||||||
best_score = score
|
best_score = score
|
||||||
best_remainder = " ".join(words[take:]).strip()
|
best_remainder = " ".join(words[take:]).strip()
|
||||||
best_phrase = phrase
|
|
||||||
|
|
||||||
if best_remainder is not None:
|
if best_remainder is not None:
|
||||||
return best_remainder, best_phrase
|
return best_remainder
|
||||||
return (None, None) if require_wake else (norm, None)
|
return None if require_wake else norm
|
||||||
|
|
||||||
|
|
||||||
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
|
|
||||||
require_wake: bool) -> str | None:
|
|
||||||
"""return the command remainder after the wake phrase (None if no wake in listen
|
|
||||||
mode). thin wrapper over strip_wake_match for callers that don't need the phrase"""
|
|
||||||
return strip_wake_match(transcript, wake_phrases, threshold, require_wake)[0]
|
|
||||||
|
|
||||||
|
|
||||||
def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool:
|
def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool:
|
||||||
@ -296,14 +252,8 @@ def match_command(remainder: str, threshold: float) -> Action | None:
|
|||||||
return Action("set", name) if name else None
|
return Action("set", name) if name else None
|
||||||
if _fuzzy_in(head, _UNSET_VERBS, threshold) and not rest:
|
if _fuzzy_in(head, _UNSET_VERBS, threshold) and not rest:
|
||||||
return Action("unset")
|
return Action("unset")
|
||||||
if _fuzzy_in(head, _CUSTOMS_VERBS, threshold):
|
|
||||||
return Action("customs")
|
|
||||||
if _fuzzy_in(head, _COMMANDS_VERBS, threshold):
|
|
||||||
return Action("commands")
|
|
||||||
if _fuzzy_in(head, _LIST_VERBS, threshold):
|
if _fuzzy_in(head, _LIST_VERBS, threshold):
|
||||||
return Action("list")
|
return Action("list")
|
||||||
if _fuzzy_in(head, _VERSION_VERBS, threshold):
|
|
||||||
return Action("version")
|
|
||||||
|
|
||||||
return None
|
return None
|
||||||
|
|
||||||
@ -333,7 +283,7 @@ def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
|
|||||||
ParsedCommand with action=None means a wake phrase was present but no command
|
ParsedCommand with action=None means a wake phrase was present but no command
|
||||||
matched.
|
matched.
|
||||||
"""
|
"""
|
||||||
remainder, wake = strip_wake_match(transcript, wake_phrases, wake_threshold, require_wake)
|
remainder = strip_wake(transcript, wake_phrases, wake_threshold, require_wake)
|
||||||
if remainder is None:
|
if remainder is None:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
@ -345,4 +295,4 @@ def parse(transcript: str, wake_phrases: list[str], wake_threshold: float,
|
|||||||
|
|
||||||
tokens = _strip_filler(tokens, filler, command_threshold)
|
tokens = _strip_filler(tokens, filler, command_threshold)
|
||||||
action = match_command(" ".join(tokens), command_threshold)
|
action = match_command(" ".join(tokens), command_threshold)
|
||||||
return ParsedCommand(one_shot=one_shot, action=action, wake=wake)
|
return ParsedCommand(one_shot=one_shot, action=action)
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user