Compare commits
No commits in common. "17db65858e2e279e67b6a92281192e004f12706f" and "732bad4c8d4dfb43aa955951844e2bf635b49110" have entirely different histories.
17db65858e
...
732bad4c8d
32
README.md
32
README.md
@ -61,23 +61,37 @@ claudedo test-audio
|
||||
|
||||
## Usage
|
||||
|
||||
**Run it in a terminal you watch — that's the product.** You launch `claudedo
|
||||
start`, it does a quick mic check, then drops into a visible listen loop that prints
|
||||
`heard → matched → sent` for every utterance. That terminal is your
|
||||
recognition/action console; you attach to the `claude-<name>` session in another pane
|
||||
to watch the keystrokes land. There is no backgrounding/daemon mode — the whole point
|
||||
is the console you read.
|
||||
|
||||
```bash
|
||||
claudedo start # mic-check, then the visible listen loop (listen mode default)
|
||||
claudedo start # run the daemon (foreground; listen mode by default)
|
||||
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
|
||||
claudedo start --skip-audio-check # skip the pre-listen mic check
|
||||
claudedo status # running? mode? target session?
|
||||
claudedo stop # stop a running daemon
|
||||
claudedo switch <name> # retarget to claude-<name>
|
||||
claudedo test-audio # verify the mic capture path
|
||||
```
|
||||
|
||||
Background it in its own tmux session:
|
||||
|
||||
```bash
|
||||
tmux new-session -d -s claudedo 'claudedo start'
|
||||
```
|
||||
|
||||
### Autostart
|
||||
|
||||
WSL has no real boot, so autostart is rc-based and **opt-in**. `install.sh` ships
|
||||
`~/.config/claudedo/autostart.sh`, which starts the daemon in a `claudedo-daemon`
|
||||
tmux session once per WSL session — but only when `CLAUDEDO_AUTOSTART=1` is set.
|
||||
Enable it by uncommenting the `export CLAUDEDO_AUTOSTART=1` line in the cc-kit marker
|
||||
block of your rc; disable it by re-commenting (or deleting the file). Watch its logs
|
||||
with `tmux attach -t claudedo-daemon`.
|
||||
|
||||
If your WSL runs systemd (`systemd=true` in `/etc/wsl.conf`), `install.sh` also
|
||||
installs an optional user unit — enable it instead with:
|
||||
|
||||
```bash
|
||||
systemctl --user enable --now claudedo
|
||||
```
|
||||
|
||||
### Modes
|
||||
|
||||
- **listen (default)** — continuous capture; only acts on utterances that **start
|
||||
|
||||
156
install.sh
156
install.sh
@ -1,156 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# claudedo bootstrap — does the system setup pip can't. idempotent: re-running is
|
||||
# safe and won't duplicate the shell-rc cc kit. run from the repo root.
|
||||
set -euo pipefail
|
||||
|
||||
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
ASOUNDRC="$HOME/.asoundrc"
|
||||
MARKER_BEGIN="# >>> claudedo cc kit >>>"
|
||||
MARKER_END="# <<< claudedo cc kit <<<"
|
||||
|
||||
say() { printf '\n\033[1;36m==> %s\033[0m\n' "$*"; }
|
||||
warn() { printf '\033[1;33m!! %s\033[0m\n' "$*" >&2; }
|
||||
die() { printf '\033[1;31mxx %s\033[0m\n' "$*" >&2; exit 1; }
|
||||
|
||||
# 1. windows-side checks (cannot automate — check and instruct) -----------------
|
||||
say "checking WSLg audio bridge"
|
||||
if [ ! -e /mnt/wslg/PulseServer ]; then
|
||||
die "WSLg PulseServer missing (/mnt/wslg/PulseServer). claudedo needs WSLg audio.
|
||||
update WSL ('wsl --update' in Windows) or install WSL from the Microsoft Store,
|
||||
then restart WSL ('wsl --shutdown') and re-run this script."
|
||||
fi
|
||||
echo " /mnt/wslg/PulseServer present"
|
||||
|
||||
cat <<'EOF'
|
||||
|
||||
MANUAL WINDOWS STEP (this script cannot do it for you):
|
||||
Windows Settings -> Privacy & security -> Microphone ->
|
||||
enable "Let desktop apps access your microphone".
|
||||
Without this, the mic is silent inside WSL. Do it now if you haven't.
|
||||
|
||||
EOF
|
||||
|
||||
# 2. WSL audio deps (apt) -------------------------------------------------------
|
||||
say "installing WSL audio dependencies (apt)"
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y libportaudio2 libasound2t64 libasound2-plugins \
|
||||
alsa-utils pulseaudio-utils
|
||||
|
||||
# 3. ALSA -> Pulse routing ------------------------------------------------------
|
||||
say "configuring ALSA -> Pulse routing (~/.asoundrc)"
|
||||
if [ -f "$ASOUNDRC" ] && grep -q "type pulse" "$ASOUNDRC"; then
|
||||
echo " ~/.asoundrc already routes to pulse"
|
||||
else
|
||||
{
|
||||
echo "pcm.!default { type pulse }"
|
||||
echo "ctl.!default { type pulse }"
|
||||
} >> "$ASOUNDRC"
|
||||
echo " wrote pulse default to ~/.asoundrc"
|
||||
fi
|
||||
if [ -z "${PULSE_SERVER:-}" ] && [ -e /mnt/wslg/PulseServer ]; then
|
||||
export PULSE_SERVER="unix:/mnt/wslg/PulseServer"
|
||||
echo " exported PULSE_SERVER=$PULSE_SERVER (WSLg usually sets this already)"
|
||||
fi
|
||||
|
||||
# 4. verify audio (fail loudly with guidance) -----------------------------------
|
||||
say "verifying audio path"
|
||||
if pactl info >/dev/null 2>&1; then
|
||||
DEFAULT_SRC="$(pactl info | sed -n 's/^Default Source: //p')"
|
||||
echo " Default Source: ${DEFAULT_SRC:-<none>}"
|
||||
if ! pactl list sources short 2>/dev/null | grep -q RDPSource; then
|
||||
warn "RDPSource not listed by pactl — mic may not be bridged. check Windows mic permission."
|
||||
fi
|
||||
else
|
||||
warn "pactl info failed — pulseaudio-utils installed but no server reachable yet."
|
||||
fi
|
||||
|
||||
TESTWAV="/tmp/claudedo_test.wav"
|
||||
if arecord -D default -f S16_LE -c 1 -r 16000 -d 2 "$TESTWAV" >/dev/null 2>&1 && [ -s "$TESTWAV" ]; then
|
||||
echo " arecord captured 2s -> $TESTWAV ($(stat -c%s "$TESTWAV") bytes)"
|
||||
else
|
||||
warn "arecord could not capture. fix-chain: apt deps above + ~/.asoundrc + Windows mic permission.
|
||||
debug anytime with: claudedo test-audio"
|
||||
fi
|
||||
|
||||
# 5. python install + model prime -----------------------------------------------
|
||||
say "installing the claudedo python package"
|
||||
PIP="${PIP:-pip3}"
|
||||
"$PIP" install -e "$REPO_DIR"
|
||||
|
||||
say "priming the faster-whisper model (so first run isn't slow)"
|
||||
MODEL="$(sed -n 's/^model *= *"\(.*\)".*/\1/p' "$REPO_DIR/config.toml" | head -1)"
|
||||
MODEL="${MODEL:-small}"
|
||||
python3 - "$MODEL" <<'PY' || warn "model prime failed — first run will download it"
|
||||
import sys
|
||||
from faster_whisper import WhisperModel
|
||||
WhisperModel(sys.argv[1], device="cpu", compute_type="int8")
|
||||
print(" primed faster-whisper model:", sys.argv[1])
|
||||
PY
|
||||
|
||||
# 6. cc kit as a sourced file + rc wiring (idempotent) --------------------------
|
||||
say "installing the cc kit (~/.config/claudedo/cc.sh)"
|
||||
CONF_DIR="$HOME/.config/claudedo"
|
||||
mkdir -p "$CONF_DIR"
|
||||
install -m 0644 "$REPO_DIR/shell/cc.sh" "$CONF_DIR/cc.sh"
|
||||
echo " wrote $CONF_DIR/cc.sh"
|
||||
|
||||
# wire EVERY rc that exists (the user may have both zsh and bash).
|
||||
wired_any=0
|
||||
for RC in "$HOME/.zshrc" "$HOME/.bashrc"; do
|
||||
[ -f "$RC" ] || continue
|
||||
wired_any=1
|
||||
if grep -qF "$MARKER_BEGIN" "$RC"; then
|
||||
echo " cc kit marker already in $RC (not duplicating)"
|
||||
continue
|
||||
fi
|
||||
cp "$RC" "$RC.claudedo.bak"
|
||||
echo " backed up $RC -> $RC.claudedo.bak"
|
||||
cat >> "$RC" <<'CCKIT'
|
||||
|
||||
# >>> claudedo cc kit >>>
|
||||
[ -f ~/.config/claudedo/cc.sh ] && source ~/.config/claudedo/cc.sh
|
||||
# <<< claudedo cc kit <<<
|
||||
CCKIT
|
||||
echo " wired source-line block into $RC (open a new shell or 'source $RC')"
|
||||
done
|
||||
[ "$wired_any" = 1 ] || warn "no ~/.zshrc or ~/.bashrc found — add the marker block from README.md manually."
|
||||
|
||||
# warn about any OLD loose cc defs outside our markers (do not auto-delete).
|
||||
for RC in "$HOME/.zshrc" "$HOME/.bashrc"; do
|
||||
[ -f "$RC" ] || continue
|
||||
loose="$(grep -nE '^[[:space:]]*(cc|ccr|ccl|cck|cckl|_cc_name)[[:space:]]*\(\)' "$RC" \
|
||||
| grep -v 'claudedo' || true)"
|
||||
if [ -n "$loose" ]; then
|
||||
warn "old cc-function defs found in $RC (outside the claudedo markers):"
|
||||
echo "$loose" | sed 's/^/ /'
|
||||
echo " review and remove them by hand — the new sourced kit overrides them, but"
|
||||
echo " they are dead code. a backup is at $RC.claudedo.bak"
|
||||
fi
|
||||
done
|
||||
|
||||
# 7. tmux settings for reliable send-keys (idempotent ~/.tmux.conf append) -------
|
||||
say "configuring tmux for reliable send-keys (~/.tmux.conf)"
|
||||
TMUX_CONF="$HOME/.tmux.conf"
|
||||
TMUX_MARKER="# >>> claudedo tmux >>>"
|
||||
touch "$TMUX_CONF"
|
||||
if grep -qF "$TMUX_MARKER" "$TMUX_CONF"; then
|
||||
echo " claudedo tmux block already present (not duplicating)"
|
||||
else
|
||||
cat >> "$TMUX_CONF" <<'TMUXCONF'
|
||||
|
||||
# >>> claudedo tmux >>>
|
||||
# settings for reliable keystroke injection + notifications (do not edit inside the
|
||||
# markers; re-run install.sh to refresh). escape-time 0 stops injected Escape from
|
||||
# being misread; allow-passthrough + extended-keys let notifications and modified
|
||||
# keys (Shift+Enter) reach the claude pane; the larger history-limit keeps scrollback.
|
||||
set -g escape-time 0
|
||||
set -g history-limit 50000
|
||||
set -g allow-passthrough on
|
||||
set -s extended-keys on
|
||||
set -as terminal-features 'xterm*:extkeys'
|
||||
# <<< claudedo tmux <<<
|
||||
TMUXCONF
|
||||
echo " appended claudedo tmux settings to $TMUX_CONF (reload: tmux source-file ~/.tmux.conf)"
|
||||
fi
|
||||
|
||||
say "done. next: 'claudedo test-audio' then 'claudedo start'"
|
||||
67
shell/cc.sh
67
shell/cc.sh
@ -1,67 +0,0 @@
|
||||
# claudedo cc kit — claude-code-in-tmux session helpers.
|
||||
# POSIX sh; sources cleanly under bash and zsh. side-effect-free on source
|
||||
# (function definitions only — nothing runs at source time).
|
||||
#
|
||||
# every command REQUIRES an explicit project name. the session is always
|
||||
# "claude-<name>", a stable speakable handle: "cc libs" -> claude-libs, which the
|
||||
# voice daemon targets with "claudedo target libs" / "switch libs". the name->session
|
||||
# mapping here MUST match target.py's session_name() in the daemon.
|
||||
#
|
||||
# cc <name> start or reattach to claude-<name>; writes ~/.claude-active
|
||||
# ccr <name> reattach only (error if it doesn't exist); writes ~/.claude-active
|
||||
# ccl list running claude- sessions
|
||||
# cck <name> kill claude-<name>
|
||||
# cckl kill ALL claude- sessions
|
||||
|
||||
cc() {
|
||||
if [ -z "$1" ]; then
|
||||
echo "usage: cc <project-name>" >&2
|
||||
return 1
|
||||
fi
|
||||
session="claude-$1"
|
||||
echo "$session" > "$HOME/.claude-active"
|
||||
if tmux has-session -t "$session" 2>/dev/null; then
|
||||
tmux attach -t "$session"
|
||||
else
|
||||
tmux new-session -s "$session" "claude"
|
||||
fi
|
||||
}
|
||||
|
||||
ccr() {
|
||||
if [ -z "$1" ]; then
|
||||
echo "usage: ccr <project-name>" >&2
|
||||
return 1
|
||||
fi
|
||||
session="claude-$1"
|
||||
if tmux has-session -t "$session" 2>/dev/null; then
|
||||
echo "$session" > "$HOME/.claude-active"
|
||||
tmux attach -t "$session"
|
||||
else
|
||||
echo "no session '$session' — run 'cc $1' to start one" >&2
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
ccl() {
|
||||
tmux ls 2>/dev/null | grep '^claude-' || echo "no claude sessions running"
|
||||
}
|
||||
|
||||
cck() {
|
||||
if [ -z "$1" ]; then
|
||||
echo "usage: cck <project-name>" >&2
|
||||
return 1
|
||||
fi
|
||||
session="claude-$1"
|
||||
if tmux kill-session -t "$session" 2>/dev/null; then
|
||||
echo "killed $session"
|
||||
else
|
||||
echo "no session '$session'" >&2
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
cckl() {
|
||||
tmux ls 2>/dev/null | grep '^claude-' | cut -d: -f1 | while read -r s; do
|
||||
tmux kill-session -t "$s" && echo "killed $s"
|
||||
done
|
||||
}
|
||||
@ -1,3 +1,3 @@
|
||||
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)"""
|
||||
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)."""
|
||||
|
||||
__version__ = "0.1.0"
|
||||
|
||||
@ -1,226 +0,0 @@
|
||||
"""claudedo CLI: start | stop | status | test-audio | install"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import subprocess
|
||||
import sys
|
||||
import wave
|
||||
from pathlib import Path
|
||||
|
||||
from . import __version__, daemon, target
|
||||
from .config import Config, ConfigError, load_config
|
||||
|
||||
|
||||
def _setup_logging(verbose: bool) -> None:
|
||||
logging.basicConfig(
|
||||
level=logging.DEBUG if verbose else logging.INFO,
|
||||
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
|
||||
datefmt="%H:%M:%S",
|
||||
)
|
||||
|
||||
|
||||
def _load_or_die(path: str | None) -> Config:
|
||||
try:
|
||||
return load_config(path)
|
||||
except ConfigError as exc:
|
||||
print(f"config error: {exc}", file=sys.stderr)
|
||||
raise SystemExit(2)
|
||||
|
||||
|
||||
def cmd_start(args: argparse.Namespace) -> int:
|
||||
config = _load_or_die(args.config)
|
||||
if args.mode:
|
||||
config.mode = args.mode
|
||||
if not args.skip_audio_check:
|
||||
print("checking mic before listening (speak briefly) ...")
|
||||
peak = _probe_mic(config, seconds=2.0, verbose=False)
|
||||
if peak is None or peak < 0.02:
|
||||
print("mic check failed — no usable input.", file=sys.stderr)
|
||||
print("run `claudedo test-audio` to debug; or `claudedo start --skip-audio-check`",
|
||||
file=sys.stderr)
|
||||
return 1
|
||||
print(f"mic OK (peak {peak:.3f}).")
|
||||
try:
|
||||
daemon.run_daemon(config)
|
||||
except RuntimeError as exc:
|
||||
print(str(exc), file=sys.stderr)
|
||||
return 1
|
||||
return 0
|
||||
|
||||
|
||||
def _probe_mic(config: Config, seconds: float, verbose: bool):
|
||||
"""warm up the mic then capture for `seconds`; return peak amplitude or None.
|
||||
|
||||
None signals a hard capture failure (no PortAudio / device error) with guidance
|
||||
already printed; a float (possibly ~0) is a successful capture whose level the
|
||||
caller judges. shared by `start`'s precheck and `test-audio`.
|
||||
"""
|
||||
from . import audio as audio_mod
|
||||
|
||||
try:
|
||||
device = audio_mod.resolve_device(config.stt_device)
|
||||
if verbose:
|
||||
print("priming mic (RDPSource resumes from suspend) ...")
|
||||
audio_mod.warm_up(config.samplerate, config.channels, device)
|
||||
if verbose:
|
||||
print(f"capturing {seconds:.0f}s from "
|
||||
f"device={device if device is not None else 'default'} — speak now ...")
|
||||
chunk = audio_mod.record_while(
|
||||
config.samplerate, config.channels, device,
|
||||
held=_timed_hold(seconds), max_utterance=seconds + 1.0, min_utterance=0.0,
|
||||
)
|
||||
except Exception as exc:
|
||||
print(f"audio capture FAILED: {exc}", file=sys.stderr)
|
||||
print("fix-chain: install.sh apt deps + ~/.asoundrc pulse shim + Windows mic permission",
|
||||
file=sys.stderr)
|
||||
return None
|
||||
|
||||
if chunk is None or chunk.size == 0:
|
||||
print("captured no audio — check mic permission + RDPSource", file=sys.stderr)
|
||||
return None
|
||||
|
||||
peak = float(abs(chunk).max())
|
||||
if verbose:
|
||||
out = Path("/tmp/claudedo_test.wav")
|
||||
_write_wav(out, chunk, config.samplerate)
|
||||
print(f"captured {chunk.size / config.samplerate:.1f}s, peak amplitude {peak:.3f} -> {out}")
|
||||
return peak
|
||||
|
||||
|
||||
def cmd_stop(_args: argparse.Namespace) -> int:
|
||||
if daemon.stop_running():
|
||||
print("sent stop signal to claudedo")
|
||||
return 0
|
||||
print("claudedo is not running")
|
||||
return 1
|
||||
|
||||
|
||||
def cmd_status(_args: argparse.Namespace) -> int:
|
||||
pid = daemon.read_pid()
|
||||
if pid is None:
|
||||
print("claudedo: not running")
|
||||
return 1
|
||||
state = daemon.read_state() or {}
|
||||
print(f"claudedo: running (pid {pid})")
|
||||
print(f" mode: {state.get('mode', '?')}")
|
||||
print(f" target: {state.get('target') or '(none — run cc to attach)'}")
|
||||
return 0
|
||||
|
||||
|
||||
def _check_audio_tools() -> None:
|
||||
for tool in ("pactl", "arecord"):
|
||||
path = subprocess.run(["which", tool], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
|
||||
mark = "ok" if path.returncode == 0 else "MISSING (run install.sh)"
|
||||
print(f" {tool}: {mark}")
|
||||
|
||||
|
||||
def cmd_test_audio(args: argparse.Namespace) -> int:
|
||||
config = _load_or_die(args.config)
|
||||
print("== claudedo test-audio ==")
|
||||
print("WSLg PulseServer:", "present" if Path("/mnt/wslg/PulseServer").exists() else "MISSING")
|
||||
_check_audio_tools()
|
||||
|
||||
try:
|
||||
pactl = subprocess.run(["pactl", "info"], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
|
||||
if pactl.returncode == 0:
|
||||
for line in pactl.stdout.decode("utf-8", "replace").splitlines():
|
||||
if line.startswith("Default Source"):
|
||||
print(" ", line.strip())
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
|
||||
from . import audio as audio_mod
|
||||
print("\nsounddevice input devices:")
|
||||
try:
|
||||
for idx, dev in enumerate(audio_mod.list_devices()):
|
||||
if dev.get("max_input_channels", 0) > 0:
|
||||
print(f" [{idx}] {dev['name']} ({dev['max_input_channels']}ch)")
|
||||
except Exception as exc:
|
||||
print(f" could not list devices: {exc}", file=sys.stderr)
|
||||
|
||||
peak = _probe_mic(config, seconds=3.0, verbose=True)
|
||||
if peak is None:
|
||||
return 1
|
||||
if peak < 0.02:
|
||||
print("WARNING: near-silent capture — is the mic muted / permission denied?")
|
||||
print("fix-chain: Windows mic permission for desktop apps + a non-Krisp default input;")
|
||||
print(" if still silent, `wsl --shutdown` then reopen to re-attach RDPSource.")
|
||||
return 1
|
||||
print("mic OK.")
|
||||
return 0
|
||||
|
||||
|
||||
def _timed_hold(seconds: float):
|
||||
import time
|
||||
|
||||
end = [None]
|
||||
|
||||
def held() -> bool:
|
||||
now = time.monotonic()
|
||||
if end[0] is None:
|
||||
end[0] = now + seconds
|
||||
return now < end[0]
|
||||
|
||||
return held
|
||||
|
||||
|
||||
def _write_wav(path: Path, chunk, samplerate: int) -> None:
|
||||
import numpy as np
|
||||
|
||||
pcm = (np.clip(chunk, -1.0, 1.0) * 32767).astype("<i2")
|
||||
with wave.open(str(path), "wb") as wf:
|
||||
wf.setnchannels(1)
|
||||
wf.setsampwidth(2)
|
||||
wf.setframerate(samplerate)
|
||||
wf.writeframes(pcm.tobytes())
|
||||
|
||||
|
||||
def cmd_install(_args: argparse.Namespace) -> int:
|
||||
script = Path(__file__).resolve().parents[2] / "install.sh"
|
||||
if not script.is_file():
|
||||
print(f"install.sh not found at {script}", file=sys.stderr)
|
||||
return 1
|
||||
return subprocess.call(["bash", str(script)])
|
||||
|
||||
|
||||
def cmd_switch(args: argparse.Namespace) -> int:
|
||||
session = target.set_target(args.name)
|
||||
print(f"target -> {session}")
|
||||
return 0
|
||||
|
||||
|
||||
def build_parser() -> argparse.ArgumentParser:
|
||||
p = argparse.ArgumentParser(prog="claudedo", description="voice control for claude code")
|
||||
p.add_argument("--version", action="version", version=f"claudedo {__version__}")
|
||||
p.add_argument("-v", "--verbose", action="store_true", help="debug logging")
|
||||
p.add_argument("-c", "--config", help="path to config.toml")
|
||||
sub = p.add_subparsers(dest="command", required=True)
|
||||
|
||||
sp = sub.add_parser("start", help="run the daemon (foreground)")
|
||||
sp.add_argument("--mode", choices=("listen", "ptt"), help="override input mode")
|
||||
sp.add_argument("--skip-audio-check", action="store_true",
|
||||
help="skip the pre-listen mic check")
|
||||
sp.set_defaults(func=cmd_start)
|
||||
|
||||
sub.add_parser("stop", help="stop a running daemon").set_defaults(func=cmd_stop)
|
||||
sub.add_parser("status", help="show daemon status").set_defaults(func=cmd_status)
|
||||
sub.add_parser("test-audio", help="verify the mic capture path").set_defaults(func=cmd_test_audio)
|
||||
sub.add_parser("install", help="re-run the bootstrap (install.sh)").set_defaults(func=cmd_install)
|
||||
|
||||
sw = sub.add_parser("switch", help="set the active target session")
|
||||
sw.add_argument("name", help="project short-name (claude- prefix optional)")
|
||||
sw.set_defaults(func=cmd_switch)
|
||||
return p
|
||||
|
||||
|
||||
def main(argv: list[str] | None = None) -> int:
|
||||
parser = build_parser()
|
||||
args = parser.parse_args(argv)
|
||||
_setup_logging(getattr(args, "verbose", False))
|
||||
return args.func(args)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
@ -1,179 +0,0 @@
|
||||
"""mic capture via sounddevice — the WSL-hard part.
|
||||
|
||||
device selection resolves config's stt.device ("auto" | index | name substring) to
|
||||
a concrete sounddevice input device. two capture paths:
|
||||
- record_until_silence(): listen mode — stream until trailing silence segments the
|
||||
utterance (no streaming STT; chunk-on-silence is enough for commands).
|
||||
- record_while(predicate): ptt mode — capture while predicate() is true (key held).
|
||||
|
||||
the WSLg/PulseAudio path is verified separately by `claudedo test-audio`; if capture
|
||||
fails here the fix-chain is the apt deps + ~/.asoundrc + Windows mic permission.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import queue
|
||||
import time
|
||||
from typing import Callable
|
||||
|
||||
import numpy as np
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class AudioError(Exception):
|
||||
"""raised when no usable input device is found or capture fails"""
|
||||
|
||||
|
||||
def list_devices() -> list[dict]:
|
||||
"""return sounddevice's device table (for test-audio / debugging)"""
|
||||
import sounddevice as sd
|
||||
|
||||
return list(sd.query_devices())
|
||||
|
||||
|
||||
def resolve_device(spec: str) -> int | None:
|
||||
"""resolve a device spec to a sounddevice input index, or None for default.
|
||||
|
||||
spec: "auto" -> default input; a digit string -> that index; otherwise a
|
||||
case-insensitive substring of a device name with input channels.
|
||||
"""
|
||||
import sounddevice as sd
|
||||
|
||||
if spec in ("", "auto", "default"):
|
||||
return None
|
||||
if spec.isdigit():
|
||||
return int(spec)
|
||||
spec_low = spec.lower()
|
||||
for idx, dev in enumerate(sd.query_devices()):
|
||||
if dev.get("max_input_channels", 0) > 0 and spec_low in dev["name"].lower():
|
||||
return idx
|
||||
raise AudioError(f"no input device matching {spec!r}")
|
||||
|
||||
|
||||
def _rms(block: np.ndarray) -> float:
|
||||
if block.size == 0:
|
||||
return 0.0
|
||||
return float(np.sqrt(np.mean(np.square(block, dtype=np.float64))))
|
||||
|
||||
|
||||
def warm_up(samplerate: int, channels: int, device: int | None,
|
||||
timeout: float = 3.0) -> bool:
|
||||
"""open a short stream and read until the source produces audio.
|
||||
|
||||
WSLg's RDPSource suspends when idle and emits ~1-2s of silence while it resumes
|
||||
on the next read. priming here means the first real capture isn't lost to that
|
||||
warm-up gap. returns whether any non-silent block arrived before timeout (still
|
||||
safe to proceed either way — a truly silent mic just returns False).
|
||||
"""
|
||||
import sounddevice as sd
|
||||
|
||||
block_dur = 0.05
|
||||
blocksize = int(samplerate * block_dur)
|
||||
deadline = time.monotonic() + timeout
|
||||
with sd.InputStream(samplerate=samplerate, channels=channels, device=device,
|
||||
dtype="float32", blocksize=blocksize) as stream:
|
||||
while time.monotonic() < deadline:
|
||||
block, _overflowed = stream.read(blocksize)
|
||||
mono = block.reshape(-1) if channels == 1 else block.mean(axis=1)
|
||||
if _rms(mono) > 0.0:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def record_until_silence(samplerate: int, channels: int, device: int | None,
|
||||
silence_threshold: float, silence_duration: float,
|
||||
min_utterance: float, max_utterance: float,
|
||||
stop: Callable[[], bool] | None = None) -> np.ndarray | None:
|
||||
"""capture one utterance, ending after trailing silence. returns mono float32.
|
||||
|
||||
blocks until speech is detected and then trailing silence segments it, or until
|
||||
stop() returns true (clean shutdown). returns None if stopped before any speech
|
||||
or if the captured utterance is shorter than min_utterance.
|
||||
"""
|
||||
import sounddevice as sd
|
||||
|
||||
block_dur = 0.05
|
||||
blocksize = int(samplerate * block_dur)
|
||||
q: "queue.Queue[np.ndarray]" = queue.Queue()
|
||||
|
||||
def _cb(indata, _frames, _time, status):
|
||||
if status:
|
||||
log.debug("audio status: %s", status)
|
||||
q.put(indata.copy())
|
||||
|
||||
collected: list[np.ndarray] = []
|
||||
speaking = False
|
||||
silence_run = 0.0
|
||||
started_at = time.monotonic()
|
||||
|
||||
with sd.InputStream(samplerate=samplerate, channels=channels, device=device,
|
||||
dtype="float32", blocksize=blocksize, callback=_cb):
|
||||
while True:
|
||||
if stop is not None and stop():
|
||||
break
|
||||
try:
|
||||
block = q.get(timeout=0.2)
|
||||
except queue.Empty:
|
||||
if not speaking and time.monotonic() - started_at > 600:
|
||||
started_at = time.monotonic()
|
||||
continue
|
||||
mono = block.reshape(-1) if channels == 1 else block.mean(axis=1)
|
||||
level = _rms(mono)
|
||||
if level >= silence_threshold:
|
||||
speaking = True
|
||||
silence_run = 0.0
|
||||
collected.append(mono)
|
||||
elif speaking:
|
||||
silence_run += block_dur
|
||||
collected.append(mono)
|
||||
if silence_run >= silence_duration:
|
||||
break
|
||||
if speaking and (time.monotonic() - started_at) > max_utterance:
|
||||
log.debug("utterance hit max_utterance cap")
|
||||
break
|
||||
|
||||
if not collected:
|
||||
return None
|
||||
audio = np.concatenate(collected).astype(np.float32)
|
||||
if audio.size / samplerate < min_utterance:
|
||||
return None
|
||||
return audio
|
||||
|
||||
|
||||
def record_while(samplerate: int, channels: int, device: int | None,
|
||||
held: Callable[[], bool], max_utterance: float,
|
||||
min_utterance: float) -> np.ndarray | None:
|
||||
"""capture while held() is true (push-to-talk). returns mono float32 or None"""
|
||||
import sounddevice as sd
|
||||
|
||||
block_dur = 0.05
|
||||
blocksize = int(samplerate * block_dur)
|
||||
q: "queue.Queue[np.ndarray]" = queue.Queue()
|
||||
|
||||
def _cb(indata, _frames, _time, status):
|
||||
if status:
|
||||
log.debug("audio status: %s", status)
|
||||
q.put(indata.copy())
|
||||
|
||||
collected: list[np.ndarray] = []
|
||||
started_at = time.monotonic()
|
||||
with sd.InputStream(samplerate=samplerate, channels=channels, device=device,
|
||||
dtype="float32", blocksize=blocksize, callback=_cb):
|
||||
while held():
|
||||
try:
|
||||
block = q.get(timeout=0.1)
|
||||
except queue.Empty:
|
||||
continue
|
||||
mono = block.reshape(-1) if channels == 1 else block.mean(axis=1)
|
||||
collected.append(mono)
|
||||
if (time.monotonic() - started_at) > max_utterance:
|
||||
break
|
||||
|
||||
if not collected:
|
||||
return None
|
||||
audio = np.concatenate(collected).astype(np.float32)
|
||||
if audio.size / samplerate < min_utterance:
|
||||
return None
|
||||
return audio
|
||||
@ -1,4 +1,4 @@
|
||||
"""load and validate config.toml into a typed Config object with clear errors"""
|
||||
"""load and validate config.toml into a typed Config object with clear errors."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
@ -10,7 +10,7 @@ from pathlib import Path
|
||||
try:
|
||||
import tomllib as _toml
|
||||
_TOML_BINARY = True
|
||||
except ModuleNotFoundError:
|
||||
except ModuleNotFoundError: # python < 3.11
|
||||
import tomli as _toml
|
||||
_TOML_BINARY = True
|
||||
|
||||
@ -27,12 +27,12 @@ DEFAULT_CONFIG_PATHS = (
|
||||
|
||||
|
||||
class ConfigError(Exception):
|
||||
"""raised on a missing or invalid configuration value"""
|
||||
"""raised on a missing or invalid configuration value."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class Config:
|
||||
"""validated claudedo configuration"""
|
||||
"""validated claudedo configuration."""
|
||||
|
||||
wake_phrases: list[str]
|
||||
mode: str
|
||||
@ -53,7 +53,7 @@ class Config:
|
||||
|
||||
|
||||
def find_config_path(explicit: str | os.PathLike | None = None) -> Path:
|
||||
"""resolve the config file path, raising ConfigError if none is found"""
|
||||
"""resolve the config file path, raising ConfigError if none is found."""
|
||||
candidates: list[Path] = []
|
||||
if explicit:
|
||||
candidates.append(Path(explicit))
|
||||
@ -79,7 +79,7 @@ def _require(table: dict, section: str, key: str, types: tuple, default=None):
|
||||
|
||||
|
||||
def load_config(explicit: str | os.PathLike | None = None) -> Config:
|
||||
"""load config.toml from the first existing default path (or an explicit one)"""
|
||||
"""load config.toml from the first existing default path (or an explicit one)."""
|
||||
path = find_config_path(explicit)
|
||||
try:
|
||||
with open(path, "rb") as fh:
|
||||
|
||||
@ -1,262 +0,0 @@
|
||||
"""the capture -> stt -> match -> inject loop.
|
||||
|
||||
privacy invariant: in listen mode, any utterance that does not start with a wake
|
||||
phrase is discarded the instant grammar.parse() returns None — the transcript text
|
||||
is dropped and never stored or transmitted. nothing about non-command speech is
|
||||
persisted.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import signal
|
||||
import sys
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
from . import audio, grammar, inject, target
|
||||
from .config import Config
|
||||
from .stt import Transcriber
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
STATE_DIR = Path(os.environ.get("XDG_CACHE_HOME", str(Path.home() / ".cache"))) / "claudedo"
|
||||
PIDFILE = STATE_DIR / "claudedo.pid"
|
||||
STATEFILE = STATE_DIR / "state.json"
|
||||
|
||||
|
||||
def _ensure_state_dir() -> None:
|
||||
STATE_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
def write_state(pid: int, mode: str, target_session: str | None) -> None:
|
||||
"""write the running daemon's status for `claudedo status` to read"""
|
||||
_ensure_state_dir()
|
||||
STATEFILE.write_text(json.dumps({
|
||||
"pid": pid,
|
||||
"mode": mode,
|
||||
"target": target_session,
|
||||
"since": time.time(),
|
||||
}), encoding="utf-8")
|
||||
|
||||
|
||||
def read_state() -> dict | None:
|
||||
"""read the daemon status file, or None if absent/unreadable"""
|
||||
try:
|
||||
return json.loads(STATEFILE.read_text(encoding="utf-8"))
|
||||
except (FileNotFoundError, json.JSONDecodeError, OSError):
|
||||
return None
|
||||
|
||||
|
||||
def read_pid() -> int | None:
|
||||
"""return the pid of a running daemon, or None (also clears stale pidfiles)"""
|
||||
try:
|
||||
pid = int(PIDFILE.read_text(encoding="utf-8").strip())
|
||||
except (FileNotFoundError, ValueError, OSError):
|
||||
return None
|
||||
try:
|
||||
os.kill(pid, 0)
|
||||
except ProcessLookupError:
|
||||
PIDFILE.unlink(missing_ok=True)
|
||||
return None
|
||||
except PermissionError:
|
||||
return pid
|
||||
return pid
|
||||
|
||||
|
||||
def stop_running() -> bool:
|
||||
"""signal a running daemon to stop. returns whether one was found"""
|
||||
pid = read_pid()
|
||||
if pid is None:
|
||||
return False
|
||||
os.kill(pid, signal.SIGTERM)
|
||||
return True
|
||||
|
||||
|
||||
class _PTTKey:
|
||||
"""desk-only push-to-talk: 'held' while the configured key is down in the
|
||||
daemon's own terminal. there is deliberately NO global hotkey — a system-wide
|
||||
keyboard hook is the keylogger/cheat silhouette claudedo refuses to install. for
|
||||
hands-free-while-gaming use listen mode (voice trigger over the mic bridge).
|
||||
|
||||
implementation reads stdin in raw mode: press the key to start capture, press it
|
||||
again (or Enter) to stop. (terminals don't deliver key-up events, so true
|
||||
hold-to-talk isn't possible from a tty — this is press-toggle, documented.)
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._tty = sys.stdin.isatty()
|
||||
|
||||
def wait_press(self, stop) -> bool:
|
||||
import select
|
||||
|
||||
if not self._tty:
|
||||
log.warning("ptt mode needs a tty; falling back to a 3s timed capture")
|
||||
time.sleep(0.2)
|
||||
return not stop()
|
||||
while not stop():
|
||||
r, _, _ = select.select([sys.stdin], [], [], 0.2)
|
||||
if r:
|
||||
sys.stdin.read(1)
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
class Daemon:
|
||||
"""owns the capture/transcribe/inject loop and runtime mode switching"""
|
||||
|
||||
def __init__(self, config: Config) -> None:
|
||||
self.config = config
|
||||
self.mode = config.mode
|
||||
self._stop = False
|
||||
self._transcriber: Transcriber | None = None
|
||||
self._device: int | None = None
|
||||
self._ptt = _PTTKey()
|
||||
|
||||
def _install_signals(self) -> None:
|
||||
signal.signal(signal.SIGTERM, self._on_signal)
|
||||
signal.signal(signal.SIGINT, self._on_signal)
|
||||
|
||||
def _on_signal(self, _signum, _frame) -> None:
|
||||
log.info("stop requested")
|
||||
self._stop = True
|
||||
|
||||
def stopped(self) -> bool:
|
||||
return self._stop
|
||||
|
||||
def _load(self) -> None:
|
||||
cfg = self.config
|
||||
self._device = audio.resolve_device(cfg.stt_device)
|
||||
self._transcriber = Transcriber(
|
||||
model=cfg.stt_model, language=cfg.stt_language,
|
||||
device=cfg.stt_compute if cfg.stt_compute in ("cpu", "cuda") else "auto",
|
||||
compute_type="auto",
|
||||
)
|
||||
if audio.warm_up(cfg.samplerate, cfg.channels, self._device):
|
||||
log.info("mic warmed up (source live)")
|
||||
else:
|
||||
log.warning("mic warm-up saw only silence — check mic permission / RDPSource")
|
||||
|
||||
def _capture(self):
|
||||
cfg = self.config
|
||||
if self.mode == "ptt":
|
||||
print("[ptt] press the capture key in this terminal, speak, then press again to stop")
|
||||
if not self._ptt.wait_press(self.stopped):
|
||||
return None
|
||||
return audio.record_while(
|
||||
cfg.samplerate, cfg.channels, self._device,
|
||||
held=lambda: not self._ptt.wait_press(self.stopped),
|
||||
max_utterance=cfg.max_utterance, min_utterance=cfg.min_utterance,
|
||||
)
|
||||
return audio.record_until_silence(
|
||||
cfg.samplerate, cfg.channels, self._device,
|
||||
silence_threshold=cfg.silence_threshold, silence_duration=cfg.silence_duration,
|
||||
min_utterance=cfg.min_utterance, max_utterance=cfg.max_utterance,
|
||||
stop=self.stopped,
|
||||
)
|
||||
|
||||
def _handle(self, transcript: str) -> None:
|
||||
cfg = self.config
|
||||
require_wake = self.mode == "listen"
|
||||
action = grammar.parse(transcript, cfg.wake_phrases, cfg.match_threshold, require_wake)
|
||||
if action is None:
|
||||
self._emit(f'heard: "{transcript}" -> no command matched')
|
||||
return
|
||||
|
||||
if action.name == "mode":
|
||||
new_mode = str(action.arg)
|
||||
if new_mode != self.mode:
|
||||
self.mode = new_mode
|
||||
self._emit(f"mode -> {new_mode}")
|
||||
self._refresh_state()
|
||||
return
|
||||
if action.name == "switch":
|
||||
session = target.set_target(str(action.arg))
|
||||
self._emit(f"target -> {session}")
|
||||
self._refresh_state()
|
||||
return
|
||||
|
||||
session = target.resolve_target()
|
||||
if session is None:
|
||||
self._emit(f'heard: "{transcript}" -> matched: {self._describe(action)} '
|
||||
f'-> ERROR no target session (did nothing)')
|
||||
return
|
||||
self._emit(f'heard: "{transcript}" -> matched: {self._describe(action)} -> target {session}')
|
||||
if action.name == "type" and not cfg.type_autosend:
|
||||
inject.send_literal(session, str(action.arg))
|
||||
self._emit(f"injected: literal {str(action.arg)!r} -> {session}")
|
||||
return
|
||||
inject.perform(session, action)
|
||||
self._emit(f"injected: {self._describe(action)} -> {session}")
|
||||
|
||||
@staticmethod
|
||||
def _describe(action) -> str:
|
||||
if action.arg is None:
|
||||
return action.name.upper()
|
||||
return f"{action.name.upper()}({action.arg})"
|
||||
|
||||
@staticmethod
|
||||
def _emit(line: str) -> None:
|
||||
"""print a recognition/action line to the watched terminal"""
|
||||
print(line, flush=True)
|
||||
|
||||
def _has_wake(self, transcript: str) -> bool:
|
||||
"""true if the utterance starts with a wake phrase (listen-mode gate).
|
||||
|
||||
non-wake speech is dropped without ever printing the transcript — the privacy
|
||||
invariant: non-command speech is discarded, never recorded.
|
||||
"""
|
||||
cfg = self.config
|
||||
return grammar.strip_wake(transcript, cfg.wake_phrases, cfg.match_threshold, True) is not None
|
||||
|
||||
def _print_startup(self) -> None:
|
||||
cfg = self.config
|
||||
dev = cfg.stt_device if cfg.stt_device != "auto" else "default"
|
||||
target_now = target.read_active() or "(none — run cc to attach)"
|
||||
self._emit("── claudedo ─────────────────────────────────")
|
||||
self._emit(f" model: {cfg.stt_model} ({cfg.stt_language})")
|
||||
self._emit(f" mic: {dev}")
|
||||
self._emit(f" mode: {self.mode}")
|
||||
self._emit(f" target: {target_now}")
|
||||
self._emit(f" wake: {', '.join(cfg.wake_phrases)}")
|
||||
self._emit(" Ctrl-C to stop")
|
||||
self._emit("─────────────────────────────────────────────")
|
||||
|
||||
def _refresh_state(self) -> None:
|
||||
write_state(os.getpid(), self.mode, target.read_active())
|
||||
|
||||
def run(self) -> None:
|
||||
"""run the daemon loop until a stop signal arrives"""
|
||||
_ensure_state_dir()
|
||||
PIDFILE.write_text(str(os.getpid()), encoding="utf-8")
|
||||
self._install_signals()
|
||||
try:
|
||||
self._load()
|
||||
self._refresh_state()
|
||||
self._print_startup()
|
||||
while not self._stop:
|
||||
audio_chunk = self._capture()
|
||||
if self._stop:
|
||||
break
|
||||
if audio_chunk is None:
|
||||
continue
|
||||
transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate)
|
||||
if not transcript:
|
||||
continue
|
||||
if self.mode == "listen" and not self._has_wake(transcript):
|
||||
self._emit("dropped: non-wake speech (not recorded)")
|
||||
continue
|
||||
self._handle(transcript)
|
||||
finally:
|
||||
PIDFILE.unlink(missing_ok=True)
|
||||
STATEFILE.unlink(missing_ok=True)
|
||||
log.info("claudedo stopped")
|
||||
|
||||
|
||||
def run_daemon(config: Config) -> None:
|
||||
"""entry point used by the CLI ``start`` command"""
|
||||
if read_pid() is not None:
|
||||
raise RuntimeError("claudedo is already running (see `claudedo status`)")
|
||||
Daemon(config).run()
|
||||
@ -1,159 +0,0 @@
|
||||
"""wake-phrase gate + command grammar matching (fuzzy, data-driven).
|
||||
|
||||
the matcher is lenient by design: whisper renders the coined word "claudedo"
|
||||
inconsistently, so wake-phrase detection normalizes case, strips spaces/punctuation,
|
||||
and accepts close variants. number words are normalized to digits before matching.
|
||||
|
||||
flow: transcript -> strip_wake() returns the command remainder (or None if no wake
|
||||
phrase in listen mode) -> match_command() maps the remainder to an Action.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from difflib import SequenceMatcher
|
||||
|
||||
_PUNCT = re.compile(r"[^a-z0-9 ]+")
|
||||
_WS = re.compile(r"\s+")
|
||||
|
||||
_NUMBER_WORDS = {
|
||||
"zero": "0", "oh": "0",
|
||||
"one": "1", "won": "1",
|
||||
"two": "2", "to": "2", "too": "2",
|
||||
"three": "3", "tree": "3",
|
||||
"four": "4", "for": "4", "fore": "4",
|
||||
}
|
||||
|
||||
_INDEX_WORDS = {"1": 1, "2": 2, "3": 3, "4": 4}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class Action:
|
||||
"""a matched command: a name plus an optional argument.
|
||||
|
||||
names: yes, no, select, approve, deny, submit, type, mode, switch, cancel.
|
||||
arg carries the select index (int), the literal text for ``type``, the mode for
|
||||
``mode``, or the session short-name for ``switch``.
|
||||
"""
|
||||
|
||||
name: str
|
||||
arg: object = None
|
||||
|
||||
|
||||
def normalize(text: str) -> str:
|
||||
"""lowercase, strip punctuation, collapse whitespace, map number words to digits"""
|
||||
text = text.lower().strip()
|
||||
text = _PUNCT.sub(" ", text)
|
||||
text = _WS.sub(" ", text).strip()
|
||||
if not text:
|
||||
return ""
|
||||
tokens = [_NUMBER_WORDS.get(tok, tok) for tok in text.split(" ")]
|
||||
return " ".join(tokens)
|
||||
|
||||
|
||||
def _ratio(a: str, b: str) -> float:
|
||||
return SequenceMatcher(None, a, b).ratio()
|
||||
|
||||
|
||||
def _wake_variants(phrase: str) -> set[str]:
|
||||
"""spaced and despaced forms of a wake phrase for lenient matching"""
|
||||
norm = normalize(phrase)
|
||||
return {norm, norm.replace(" ", "")}
|
||||
|
||||
|
||||
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
|
||||
require_wake: bool) -> str | None:
|
||||
"""return the command remainder after the wake phrase.
|
||||
|
||||
if ``require_wake`` (listen mode) and no wake phrase is found at the start,
|
||||
return None so the daemon discards the utterance. if not required (ptt mode),
|
||||
a leading wake phrase is stripped when present but its absence is fine.
|
||||
|
||||
matches leniently on a despaced prefix (whisper splits/joins the coined word
|
||||
inconsistently) but always slices the remainder on a WORD boundary of the
|
||||
spaced, normalized transcript — so the command portion keeps its spaces.
|
||||
"""
|
||||
norm = normalize(transcript)
|
||||
if not norm:
|
||||
return None if require_wake else ""
|
||||
words = norm.split(" ")
|
||||
|
||||
best_remainder: str | None = None
|
||||
best_score = 0.0
|
||||
for phrase in wake_phrases:
|
||||
variants = _wake_variants(phrase)
|
||||
max_words = phrase.count(" ") + 2
|
||||
for take in range(1, min(max_words, len(words)) + 1):
|
||||
head_despaced = "".join(words[:take])
|
||||
for variant in variants:
|
||||
if not variant:
|
||||
continue
|
||||
score = _ratio(head_despaced, variant)
|
||||
if score >= threshold and score > best_score:
|
||||
best_score = score
|
||||
best_remainder = " ".join(words[take:]).strip()
|
||||
|
||||
if best_remainder is not None:
|
||||
return best_remainder
|
||||
return None if require_wake else norm
|
||||
|
||||
|
||||
def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool:
|
||||
return any(_ratio(token, opt) >= threshold for opt in options)
|
||||
|
||||
|
||||
def match_command(remainder: str, threshold: float) -> Action | None:
|
||||
"""map a normalized command remainder to an Action, or None if unrecognized"""
|
||||
remainder = remainder.strip()
|
||||
if not remainder:
|
||||
return None
|
||||
tokens = remainder.split(" ")
|
||||
head = tokens[0]
|
||||
rest = tokens[1:]
|
||||
|
||||
if head in _INDEX_WORDS:
|
||||
return Action("select", _INDEX_WORDS[head])
|
||||
|
||||
if _fuzzy_in(head, ("yes", "yeah", "yep", "yup"), threshold):
|
||||
return Action("yes")
|
||||
if _fuzzy_in(head, ("no", "nope", "nah"), threshold):
|
||||
return Action("no")
|
||||
if _fuzzy_in(head, ("approve", "allow"), threshold):
|
||||
return Action("approve")
|
||||
if _fuzzy_in(head, ("deny", "reject"), threshold):
|
||||
return Action("deny")
|
||||
if _fuzzy_in(head, ("send", "enter", "submit"), threshold):
|
||||
return Action("submit")
|
||||
if _fuzzy_in(head, ("cancel", "escape", "stop"), threshold):
|
||||
return Action("cancel")
|
||||
|
||||
if _fuzzy_in(head, ("select", "option", "choose", "number"), threshold) and rest:
|
||||
if rest[0] in _INDEX_WORDS:
|
||||
return Action("select", _INDEX_WORDS[rest[0]])
|
||||
|
||||
if _fuzzy_in(head, ("type", "dictate", "write"), threshold):
|
||||
text = " ".join(rest).strip()
|
||||
return Action("type", text) if text else None
|
||||
|
||||
if _fuzzy_in(head, ("mode",), threshold) and rest:
|
||||
if _fuzzy_in(rest[0], ("ptt",), threshold) or "push" in rest[0]:
|
||||
return Action("mode", "ptt")
|
||||
if _fuzzy_in(rest[0], ("listen",), threshold):
|
||||
return Action("mode", "listen")
|
||||
return None
|
||||
|
||||
if _fuzzy_in(head, ("switch", "target"), threshold) and rest:
|
||||
name = "".join(rest)
|
||||
return Action("switch", name) if name else None
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def parse(transcript: str, wake_phrases: list[str], threshold: float,
|
||||
require_wake: bool) -> Action | None:
|
||||
"""full parse: wake gate then command match. None means discard"""
|
||||
remainder = strip_wake(transcript, wake_phrases, threshold, require_wake)
|
||||
if remainder is None:
|
||||
return None
|
||||
return match_command(remainder, threshold)
|
||||
@ -1,24 +1,15 @@
|
||||
"""output handlers: resolve a grammar.Action to keystrokes and emit them.
|
||||
"""inject keystrokes into a tmux session via ``tmux send-keys``.
|
||||
|
||||
the production handler (TmuxOutputHandler) injects via ``tmux send-keys`` — the ONLY
|
||||
mechanism by which claudedo affects claude code. PTY injection, never OS-level
|
||||
keyboard input: it works regardless of which window is focused and never touches
|
||||
Windows input or a game/anticheat's view (it is text into a linux pseudo-terminal).
|
||||
do not replace this with OS keystroke injection. this is also why claudedo is a
|
||||
standalone daemon and not an MCP server — MCP tools can only return content to claude,
|
||||
not inject into its input stream.
|
||||
|
||||
StdoutOutputHandler prints what WOULD be injected instead of touching tmux, so the
|
||||
grammar + keymap can be exercised end-to-end without a live claude session — the
|
||||
deterministic test path. both implement the same OutputHandler seam and are
|
||||
interchangeable.
|
||||
this is the ONLY mechanism by which claudedo affects claude code — PTY injection,
|
||||
never OS-level keyboard input. it works regardless of which window is focused and
|
||||
never touches Windows input or a game/anticheat's view (it is text into a linux
|
||||
pseudo-terminal). do not replace this with OS keystroke injection.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import subprocess
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
from . import keys, target
|
||||
|
||||
@ -26,133 +17,68 @@ log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class InjectError(Exception):
|
||||
"""raised when a tmux send-keys call fails"""
|
||||
"""raised when a tmux send-keys call fails."""
|
||||
|
||||
|
||||
class OutputHandler(ABC):
|
||||
"""abstract sink for resolved keystrokes.
|
||||
|
||||
concretes implement send_named (a sequence of named tmux keys) and send_literal
|
||||
(literal text, no submit). perform() maps a grammar.Action onto these and is shared
|
||||
by all handlers.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def send_named(self, session: str, key_tokens: list[str]) -> None:
|
||||
"""emit a sequence of named keys (e.g. ['1'] or ['Down', 'Enter'])"""
|
||||
|
||||
@abstractmethod
|
||||
def send_literal(self, session: str, text: str) -> None:
|
||||
"""emit literal text into the input box without submitting (``type``)"""
|
||||
|
||||
def perform(self, session: str, action) -> bool:
|
||||
"""resolve a grammar.Action to keystrokes and emit them. returns acted?.
|
||||
|
||||
``switch`` and ``mode`` are handled by the daemon (they change daemon state,
|
||||
not the claude session), so they are ignored here.
|
||||
"""
|
||||
name = action.name
|
||||
if name == "yes":
|
||||
self.send_named(session, keys.YES)
|
||||
elif name == "no":
|
||||
self.send_named(session, keys.NO)
|
||||
elif name == "approve":
|
||||
self.send_named(session, keys.APPROVE)
|
||||
elif name == "deny":
|
||||
self.send_named(session, keys.DENY)
|
||||
elif name == "submit":
|
||||
self.send_named(session, keys.SUBMIT)
|
||||
elif name == "cancel":
|
||||
self.send_named(session, keys.CANCEL)
|
||||
elif name == "select":
|
||||
seq = keys.SELECT_BY_INDEX.get(int(action.arg))
|
||||
if seq is None:
|
||||
log.warning("no keymap for select index %r", action.arg)
|
||||
return False
|
||||
self.send_named(session, seq)
|
||||
elif name == "type":
|
||||
self.send_literal(session, str(action.arg))
|
||||
else:
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
class TmuxOutputHandler(OutputHandler):
|
||||
"""production handler — injects keystrokes into a tmux session via send-keys"""
|
||||
|
||||
@staticmethod
|
||||
def _send_keys(session: str, args: list[str], literal: bool) -> None:
|
||||
cmd = ["tmux", "send-keys", "-t", session]
|
||||
if literal:
|
||||
cmd.append("-l")
|
||||
cmd.extend(args)
|
||||
result = subprocess.run(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.PIPE)
|
||||
if result.returncode != 0:
|
||||
err = result.stderr.decode("utf-8", "replace").strip()
|
||||
raise InjectError(f"tmux send-keys failed: {err}")
|
||||
|
||||
def send_named(self, session: str, key_tokens: list[str]) -> None:
|
||||
if not target.session_exists(session):
|
||||
log.warning("refusing to inject — session %r does not exist", session)
|
||||
return
|
||||
for token in key_tokens:
|
||||
self._send_keys(session, [token], literal=False)
|
||||
log.info("injected keys %s -> %s", key_tokens, session)
|
||||
|
||||
def send_literal(self, session: str, text: str) -> None:
|
||||
if not text:
|
||||
return
|
||||
if not target.session_exists(session):
|
||||
log.warning("refusing to inject — session %r does not exist", session)
|
||||
return
|
||||
self._send_keys(session, [text], literal=True)
|
||||
log.info("injected literal text (%d chars) -> %s", len(text), session)
|
||||
|
||||
|
||||
class StdoutOutputHandler(OutputHandler):
|
||||
"""test handler — prints what would be injected instead of touching tmux.
|
||||
|
||||
no session existence check (there is no real session); lets grammar + keymap be
|
||||
exercised end-to-end without a live claude session. records the last emission on
|
||||
``self.last`` for assertions.
|
||||
"""
|
||||
|
||||
def __init__(self, stream=None) -> None:
|
||||
import sys
|
||||
|
||||
self.stream = stream if stream is not None else sys.stdout
|
||||
self.last: tuple[str, object] | None = None
|
||||
|
||||
def send_named(self, session: str, key_tokens: list[str]) -> None:
|
||||
self.last = ("named", list(key_tokens))
|
||||
print(f"[stdout] keys {key_tokens} -> {session}", file=self.stream)
|
||||
|
||||
def send_literal(self, session: str, text: str) -> None:
|
||||
if not text:
|
||||
return
|
||||
self.last = ("literal", text)
|
||||
print(f"[stdout] literal {text!r} -> {session}", file=self.stream)
|
||||
|
||||
|
||||
_default_handler: OutputHandler = TmuxOutputHandler()
|
||||
|
||||
|
||||
def set_default_handler(handler: OutputHandler) -> None:
|
||||
"""swap the module-level handler the daemon drives (tmux in prod, stdout in tests)"""
|
||||
global _default_handler
|
||||
_default_handler = handler
|
||||
def _send_keys(session: str, args: list[str], literal: bool) -> None:
|
||||
cmd = ["tmux", "send-keys", "-t", session]
|
||||
if literal:
|
||||
cmd.append("-l")
|
||||
cmd.extend(args)
|
||||
result = subprocess.run(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.PIPE)
|
||||
if result.returncode != 0:
|
||||
err = result.stderr.decode("utf-8", "replace").strip()
|
||||
raise InjectError(f"tmux send-keys failed: {err}")
|
||||
|
||||
|
||||
def send_named(session: str, key_tokens: list[str]) -> None:
|
||||
"""module-level shim delegating to the default handler"""
|
||||
_default_handler.send_named(session, key_tokens)
|
||||
"""send a sequence of named tmux keys (e.g. ['1'] or ['Down', 'Enter'])."""
|
||||
if not target.session_exists(session):
|
||||
log.warning("refusing to inject — session %r does not exist", session)
|
||||
return
|
||||
for token in key_tokens:
|
||||
_send_keys(session, [token], literal=False)
|
||||
log.info("injected keys %s -> %s", key_tokens, session)
|
||||
|
||||
|
||||
def send_literal(session: str, text: str) -> None:
|
||||
"""module-level shim delegating to the default handler"""
|
||||
_default_handler.send_literal(session, text)
|
||||
"""insert literal text into the input box without submitting (``type``)."""
|
||||
if not text:
|
||||
return
|
||||
if not target.session_exists(session):
|
||||
log.warning("refusing to inject — session %r does not exist", session)
|
||||
return
|
||||
_send_keys(session, [text], literal=True)
|
||||
log.info("injected literal text (%d chars) -> %s", len(text), session)
|
||||
|
||||
|
||||
def perform(session: str, action) -> bool:
|
||||
"""module-level shim delegating to the default handler"""
|
||||
return _default_handler.perform(session, action)
|
||||
"""resolve a grammar.Action to keystrokes and inject them. returns acted?.
|
||||
|
||||
``switch`` and ``mode`` are handled by the daemon (they change daemon state, not
|
||||
the claude session), so they are ignored here.
|
||||
"""
|
||||
name = action.name
|
||||
if name == "yes":
|
||||
send_named(session, keys.YES)
|
||||
elif name == "no":
|
||||
send_named(session, keys.NO)
|
||||
elif name == "approve":
|
||||
send_named(session, keys.APPROVE)
|
||||
elif name == "deny":
|
||||
send_named(session, keys.DENY)
|
||||
elif name == "submit":
|
||||
send_named(session, keys.SUBMIT)
|
||||
elif name == "cancel":
|
||||
send_named(session, keys.CANCEL)
|
||||
elif name == "select":
|
||||
seq = keys.SELECT_BY_INDEX.get(int(action.arg))
|
||||
if seq is None:
|
||||
log.warning("no keymap for select index %r", action.arg)
|
||||
return False
|
||||
send_named(session, seq)
|
||||
elif name == "type":
|
||||
send_literal(session, str(action.arg))
|
||||
else:
|
||||
return False
|
||||
return True
|
||||
|
||||
@ -1,52 +0,0 @@
|
||||
"""faster-whisper wrapper: load a model once, transcribe audio chunks locally.
|
||||
|
||||
privacy invariant: transcription runs entirely on-device. audio handed here is a
|
||||
short in-memory chunk; nothing is written to disk or sent anywhere.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
|
||||
import numpy as np
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Transcriber:
|
||||
"""a loaded faster-whisper model that transcribes float32 mono audio chunks"""
|
||||
|
||||
def __init__(self, model: str = "small", language: str = "en", device: str = "auto",
|
||||
compute_type: str = "auto") -> None:
|
||||
self.language = language
|
||||
self._model = self._load(model, device, compute_type)
|
||||
|
||||
@staticmethod
|
||||
def _load(model: str, device: str, compute_type: str):
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
if device == "auto":
|
||||
device = "cpu"
|
||||
if compute_type == "auto":
|
||||
compute_type = "int8" if device == "cpu" else "float16"
|
||||
log.info("loading faster-whisper model=%s device=%s compute=%s", model, device, compute_type)
|
||||
return WhisperModel(model, device=device, compute_type=compute_type)
|
||||
|
||||
def transcribe(self, audio: np.ndarray, samplerate: int = 16000) -> str:
|
||||
"""transcribe a mono float32 numpy array to a stripped text string.
|
||||
|
||||
the audio must be 16 kHz mono float32 in [-1, 1]; resample upstream if not.
|
||||
"""
|
||||
if audio.dtype != np.float32:
|
||||
audio = audio.astype(np.float32)
|
||||
if audio.ndim > 1:
|
||||
audio = audio.reshape(-1)
|
||||
segments, _info = self._model.transcribe(
|
||||
audio,
|
||||
language=self.language,
|
||||
beam_size=1,
|
||||
vad_filter=True,
|
||||
condition_on_previous_text=False,
|
||||
)
|
||||
text = " ".join(seg.text for seg in segments).strip()
|
||||
return text
|
||||
@ -1,4 +1,4 @@
|
||||
"""resolve the active claude code tmux session from ~/.claude-active"""
|
||||
"""resolve the active claude code tmux session from ~/.claude-active."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
@ -26,7 +26,7 @@ def session_name(name: str) -> str:
|
||||
|
||||
|
||||
def read_active() -> str | None:
|
||||
"""return the target session name from ~/.claude-active, or None if unset"""
|
||||
"""return the target session name from ~/.claude-active, or None if unset."""
|
||||
try:
|
||||
name = ACTIVE_FILE.read_text(encoding="utf-8").strip()
|
||||
except FileNotFoundError:
|
||||
@ -38,7 +38,7 @@ def read_active() -> str | None:
|
||||
|
||||
|
||||
def write_active(name: str) -> None:
|
||||
"""overwrite ~/.claude-active with a session name (used by ``switch``)"""
|
||||
"""overwrite ~/.claude-active with a session name (used by ``switch``)."""
|
||||
ACTIVE_FILE.write_text(name + "\n", encoding="utf-8")
|
||||
|
||||
|
||||
@ -51,7 +51,7 @@ def set_target(name: str) -> str:
|
||||
|
||||
|
||||
def session_exists(name: str) -> bool:
|
||||
"""true if a tmux session with this name currently exists"""
|
||||
"""true if a tmux session with this name currently exists."""
|
||||
if not name:
|
||||
return False
|
||||
result = subprocess.run(
|
||||
@ -67,13 +67,6 @@ def resolve_target() -> str | None:
|
||||
|
||||
never guesses a target: on a missing/empty ~/.claude-active or a stale session
|
||||
name, this logs a clear warning and returns None so the caller injects nothing.
|
||||
|
||||
TODO: most-recently-active targeting (preferred over attached). today the target
|
||||
is the project most recently ATTACHED to (the cc kit writes ~/.claude-active on
|
||||
attach); upgrade to the session claude most recently asked a question in, via
|
||||
tmux session_activity timestamps (list-sessions -F '#{session_name}
|
||||
#{session_activity}', pick the highest-activity claude-* session) or by scraping
|
||||
panes (capture-pane) for a waiting-prompt UI.
|
||||
"""
|
||||
name = read_active()
|
||||
if not name:
|
||||
@ -83,3 +76,13 @@ def resolve_target() -> str | None:
|
||||
log.warning("target session %r no longer exists — skipping injection", name)
|
||||
return None
|
||||
return name
|
||||
|
||||
|
||||
# TODO: most-recently-active targeting (preferred over attached). today the target
|
||||
# is "the project most recently ATTACHED to" (the cc kit writes ~/.claude-active on
|
||||
# attach). upgrade to "the session claude most recently asked a question / produced
|
||||
# output in" via tmux session_activity timestamps:
|
||||
# tmux list-sessions -F '#{session_name} #{session_activity}'
|
||||
# pick the highest-activity claude-* session; or scrape panes
|
||||
# (tmux capture-pane -p -t <s>) for a waiting-prompt UI and target the session whose
|
||||
# pane currently shows one.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user