Compare commits

...

10 Commits

Author SHA1 Message Date
17db65858e feat: terminal-run only — drop systemd/autostart, start does mic-check + visible loop
terminal-run is the product, so remove all backgrounding: delete the
claudedo.service unit and autostart.sh, strip the systemd step and the
autostart source-line from install.sh (rc block now sources cc.sh only).

claudedo start now runs a mic check first (warm-up + brief capture, aborts with
guidance if silent; --skip-audio-check to bypass) then drops into a visible
listen loop printing the recognition/action log: a startup banner, then
heard -> matched -> target / injected per utterance, target/mode state changes,
and (listen mode) non-wake speech dropped WITHOUT the transcript per the privacy
invariant.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 19:30:36 -04:00
eb587692e1 fix: prime mic to skip RDPSource resume gap
WSLg's RDPSource suspends when idle and emits ~1-2s of silence while it resumes
on the first read, so a short timed capture (test-audio) or the first utterance
after daemon start could be lost. add audio.warm_up() that opens a stream and
reads until a non-silent block arrives (or times out); call it at daemon startup
and before test-audio's capture. test-audio now primes then captures 3s.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 19:09:08 -04:00
84c74603e5 feat: output-handler seam with tmux and stdout handlers
extract an OutputHandler abstract base; TmuxOutputHandler is production
(send-keys, PTY-only), StdoutOutputHandler prints what would be injected so
grammar+keymap run end-to-end without a live claude session (the deterministic
test path). module-level shims default to tmux so the daemon is unchanged.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 18:42:34 -04:00
d43004e4b9 feat: tmux send-keys settings in install.sh bootstrap
append escape-time 0, large history-limit, allow-passthrough, and extended-keys
to ~/.tmux.conf under an idempotent marker block (no clobber). required for
reliable keystroke injection and for notifications/modified-keys to reach the
claude pane.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 18:42:26 -04:00
66b08d290c docs: lead how-to-run with the terminal-run model
state terminal-run as the product (the claudedo start terminal is the
recognition/action console) and frame backgrounding/autostart/systemd as
optional extras, not the default.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 18:42:22 -04:00
7f4a6f6699 style: drop inline comments, trim docstring periods
remove inline comments (CLAUDE.md: docstrings only), strip trailing periods
from single-line docstrings, and fix a PulseArmy->PulseAudio typo. no behavior
change.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 18:42:17 -04:00
bf516143b5 install: shell cc kit, opt-in autostart, bootstrap
cc kit as a sourced ~/.config/claudedo/cc.sh (bash+zsh, forced explicit names).
opt-in rc autostart guarded by CLAUDEDO_AUTOSTART + an optional systemd user
unit. install.sh is idempotent: WSL audio deps, ~/.asoundrc pulse shim, audio
verify, model prime, and source-line rc wiring with backups.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 17:55:30 -04:00
7780a8d47c daemon: capture->stt->match->inject loop and CLI
daemon.py runs the loop with pidfile/state, runtime mode switching, and the
privacy invariant: in listen mode any non-wake utterance is dropped the instant
grammar.parse() returns None. __main__.py exposes start|stop|status|test-audio|
install|switch.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 17:55:25 -04:00
947b30c22e grammar: fuzzy wake gate and command matching
word-boundary wake stripping that's lenient on the coined word 'claudedo'
(despaced-prefix match) without swallowing the command's spaces. data-driven
phrase->action map; number words normalized to digits; 'target' aliases
'switch'.

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 17:55:21 -04:00
da7c39c4f2 audio: local STT and mic capture
stt.py wraps faster-whisper for fully on-device transcription. audio.py
captures via sounddevice with two paths: silence-segmented for listen mode
and held-key for ptt. resolves the input device from config (auto/index/name).

Signed-off-by: disqualifier <dev@disqualifier.me>
2026-06-25 17:55:17 -04:00
12 changed files with 1261 additions and 103 deletions

View File

@ -61,37 +61,23 @@ claudedo test-audio
## Usage ## Usage
**Run it in a terminal you watch — that's the product.** You launch `claudedo
start`, it does a quick mic check, then drops into a visible listen loop that prints
`heard → matched → sent` for every utterance. That terminal is your
recognition/action console; you attach to the `claude-<name>` session in another pane
to watch the keystrokes land. There is no backgrounding/daemon mode — the whole point
is the console you read.
```bash ```bash
claudedo start # run the daemon (foreground; listen mode by default) claudedo start # mic-check, then the visible listen loop (listen mode default)
claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes) claudedo start --mode ptt # push-to-talk instead (desk-only — see Modes)
claudedo start --skip-audio-check # skip the pre-listen mic check
claudedo status # running? mode? target session? claudedo status # running? mode? target session?
claudedo stop # stop a running daemon claudedo stop # stop a running daemon
claudedo switch <name> # retarget to claude-<name> claudedo switch <name> # retarget to claude-<name>
claudedo test-audio # verify the mic capture path claudedo test-audio # verify the mic capture path
``` ```
Background it in its own tmux session:
```bash
tmux new-session -d -s claudedo 'claudedo start'
```
### Autostart
WSL has no real boot, so autostart is rc-based and **opt-in**. `install.sh` ships
`~/.config/claudedo/autostart.sh`, which starts the daemon in a `claudedo-daemon`
tmux session once per WSL session — but only when `CLAUDEDO_AUTOSTART=1` is set.
Enable it by uncommenting the `export CLAUDEDO_AUTOSTART=1` line in the cc-kit marker
block of your rc; disable it by re-commenting (or deleting the file). Watch its logs
with `tmux attach -t claudedo-daemon`.
If your WSL runs systemd (`systemd=true` in `/etc/wsl.conf`), `install.sh` also
installs an optional user unit — enable it instead with:
```bash
systemctl --user enable --now claudedo
```
### Modes ### Modes
- **listen (default)** — continuous capture; only acts on utterances that **start - **listen (default)** — continuous capture; only acts on utterances that **start

156
install.sh Executable file
View File

@ -0,0 +1,156 @@
#!/usr/bin/env bash
# claudedo bootstrap — does the system setup pip can't. idempotent: re-running is
# safe and won't duplicate the shell-rc cc kit. run from the repo root.
set -euo pipefail
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ASOUNDRC="$HOME/.asoundrc"
MARKER_BEGIN="# >>> claudedo cc kit >>>"
MARKER_END="# <<< claudedo cc kit <<<"
say() { printf '\n\033[1;36m==> %s\033[0m\n' "$*"; }
warn() { printf '\033[1;33m!! %s\033[0m\n' "$*" >&2; }
die() { printf '\033[1;31mxx %s\033[0m\n' "$*" >&2; exit 1; }
# 1. windows-side checks (cannot automate — check and instruct) -----------------
say "checking WSLg audio bridge"
if [ ! -e /mnt/wslg/PulseServer ]; then
die "WSLg PulseServer missing (/mnt/wslg/PulseServer). claudedo needs WSLg audio.
update WSL ('wsl --update' in Windows) or install WSL from the Microsoft Store,
then restart WSL ('wsl --shutdown') and re-run this script."
fi
echo " /mnt/wslg/PulseServer present"
cat <<'EOF'
MANUAL WINDOWS STEP (this script cannot do it for you):
Windows Settings -> Privacy & security -> Microphone ->
enable "Let desktop apps access your microphone".
Without this, the mic is silent inside WSL. Do it now if you haven't.
EOF
# 2. WSL audio deps (apt) -------------------------------------------------------
say "installing WSL audio dependencies (apt)"
sudo apt-get update
sudo apt-get install -y libportaudio2 libasound2t64 libasound2-plugins \
alsa-utils pulseaudio-utils
# 3. ALSA -> Pulse routing ------------------------------------------------------
say "configuring ALSA -> Pulse routing (~/.asoundrc)"
if [ -f "$ASOUNDRC" ] && grep -q "type pulse" "$ASOUNDRC"; then
echo " ~/.asoundrc already routes to pulse"
else
{
echo "pcm.!default { type pulse }"
echo "ctl.!default { type pulse }"
} >> "$ASOUNDRC"
echo " wrote pulse default to ~/.asoundrc"
fi
if [ -z "${PULSE_SERVER:-}" ] && [ -e /mnt/wslg/PulseServer ]; then
export PULSE_SERVER="unix:/mnt/wslg/PulseServer"
echo " exported PULSE_SERVER=$PULSE_SERVER (WSLg usually sets this already)"
fi
# 4. verify audio (fail loudly with guidance) -----------------------------------
say "verifying audio path"
if pactl info >/dev/null 2>&1; then
DEFAULT_SRC="$(pactl info | sed -n 's/^Default Source: //p')"
echo " Default Source: ${DEFAULT_SRC:-<none>}"
if ! pactl list sources short 2>/dev/null | grep -q RDPSource; then
warn "RDPSource not listed by pactl — mic may not be bridged. check Windows mic permission."
fi
else
warn "pactl info failed — pulseaudio-utils installed but no server reachable yet."
fi
TESTWAV="/tmp/claudedo_test.wav"
if arecord -D default -f S16_LE -c 1 -r 16000 -d 2 "$TESTWAV" >/dev/null 2>&1 && [ -s "$TESTWAV" ]; then
echo " arecord captured 2s -> $TESTWAV ($(stat -c%s "$TESTWAV") bytes)"
else
warn "arecord could not capture. fix-chain: apt deps above + ~/.asoundrc + Windows mic permission.
debug anytime with: claudedo test-audio"
fi
# 5. python install + model prime -----------------------------------------------
say "installing the claudedo python package"
PIP="${PIP:-pip3}"
"$PIP" install -e "$REPO_DIR"
say "priming the faster-whisper model (so first run isn't slow)"
MODEL="$(sed -n 's/^model *= *"\(.*\)".*/\1/p' "$REPO_DIR/config.toml" | head -1)"
MODEL="${MODEL:-small}"
python3 - "$MODEL" <<'PY' || warn "model prime failed — first run will download it"
import sys
from faster_whisper import WhisperModel
WhisperModel(sys.argv[1], device="cpu", compute_type="int8")
print(" primed faster-whisper model:", sys.argv[1])
PY
# 6. cc kit as a sourced file + rc wiring (idempotent) --------------------------
say "installing the cc kit (~/.config/claudedo/cc.sh)"
CONF_DIR="$HOME/.config/claudedo"
mkdir -p "$CONF_DIR"
install -m 0644 "$REPO_DIR/shell/cc.sh" "$CONF_DIR/cc.sh"
echo " wrote $CONF_DIR/cc.sh"
# wire EVERY rc that exists (the user may have both zsh and bash).
wired_any=0
for RC in "$HOME/.zshrc" "$HOME/.bashrc"; do
[ -f "$RC" ] || continue
wired_any=1
if grep -qF "$MARKER_BEGIN" "$RC"; then
echo " cc kit marker already in $RC (not duplicating)"
continue
fi
cp "$RC" "$RC.claudedo.bak"
echo " backed up $RC -> $RC.claudedo.bak"
cat >> "$RC" <<'CCKIT'
# >>> claudedo cc kit >>>
[ -f ~/.config/claudedo/cc.sh ] && source ~/.config/claudedo/cc.sh
# <<< claudedo cc kit <<<
CCKIT
echo " wired source-line block into $RC (open a new shell or 'source $RC')"
done
[ "$wired_any" = 1 ] || warn "no ~/.zshrc or ~/.bashrc found — add the marker block from README.md manually."
# warn about any OLD loose cc defs outside our markers (do not auto-delete).
for RC in "$HOME/.zshrc" "$HOME/.bashrc"; do
[ -f "$RC" ] || continue
loose="$(grep -nE '^[[:space:]]*(cc|ccr|ccl|cck|cckl|_cc_name)[[:space:]]*\(\)' "$RC" \
| grep -v 'claudedo' || true)"
if [ -n "$loose" ]; then
warn "old cc-function defs found in $RC (outside the claudedo markers):"
echo "$loose" | sed 's/^/ /'
echo " review and remove them by hand — the new sourced kit overrides them, but"
echo " they are dead code. a backup is at $RC.claudedo.bak"
fi
done
# 7. tmux settings for reliable send-keys (idempotent ~/.tmux.conf append) -------
say "configuring tmux for reliable send-keys (~/.tmux.conf)"
TMUX_CONF="$HOME/.tmux.conf"
TMUX_MARKER="# >>> claudedo tmux >>>"
touch "$TMUX_CONF"
if grep -qF "$TMUX_MARKER" "$TMUX_CONF"; then
echo " claudedo tmux block already present (not duplicating)"
else
cat >> "$TMUX_CONF" <<'TMUXCONF'
# >>> claudedo tmux >>>
# settings for reliable keystroke injection + notifications (do not edit inside the
# markers; re-run install.sh to refresh). escape-time 0 stops injected Escape from
# being misread; allow-passthrough + extended-keys let notifications and modified
# keys (Shift+Enter) reach the claude pane; the larger history-limit keeps scrollback.
set -g escape-time 0
set -g history-limit 50000
set -g allow-passthrough on
set -s extended-keys on
set -as terminal-features 'xterm*:extkeys'
# <<< claudedo tmux <<<
TMUXCONF
echo " appended claudedo tmux settings to $TMUX_CONF (reload: tmux source-file ~/.tmux.conf)"
fi
say "done. next: 'claudedo test-audio' then 'claudedo start'"

67
shell/cc.sh Normal file
View File

@ -0,0 +1,67 @@
# claudedo cc kit — claude-code-in-tmux session helpers.
# POSIX sh; sources cleanly under bash and zsh. side-effect-free on source
# (function definitions only — nothing runs at source time).
#
# every command REQUIRES an explicit project name. the session is always
# "claude-<name>", a stable speakable handle: "cc libs" -> claude-libs, which the
# voice daemon targets with "claudedo target libs" / "switch libs". the name->session
# mapping here MUST match target.py's session_name() in the daemon.
#
# cc <name> start or reattach to claude-<name>; writes ~/.claude-active
# ccr <name> reattach only (error if it doesn't exist); writes ~/.claude-active
# ccl list running claude- sessions
# cck <name> kill claude-<name>
# cckl kill ALL claude- sessions
cc() {
if [ -z "$1" ]; then
echo "usage: cc <project-name>" >&2
return 1
fi
session="claude-$1"
echo "$session" > "$HOME/.claude-active"
if tmux has-session -t "$session" 2>/dev/null; then
tmux attach -t "$session"
else
tmux new-session -s "$session" "claude"
fi
}
ccr() {
if [ -z "$1" ]; then
echo "usage: ccr <project-name>" >&2
return 1
fi
session="claude-$1"
if tmux has-session -t "$session" 2>/dev/null; then
echo "$session" > "$HOME/.claude-active"
tmux attach -t "$session"
else
echo "no session '$session' — run 'cc $1' to start one" >&2
return 1
fi
}
ccl() {
tmux ls 2>/dev/null | grep '^claude-' || echo "no claude sessions running"
}
cck() {
if [ -z "$1" ]; then
echo "usage: cck <project-name>" >&2
return 1
fi
session="claude-$1"
if tmux kill-session -t "$session" 2>/dev/null; then
echo "killed $session"
else
echo "no session '$session'" >&2
return 1
fi
}
cckl() {
tmux ls 2>/dev/null | grep '^claude-' | cut -d: -f1 | while read -r s; do
tmux kill-session -t "$s" && echo "killed $s"
done
}

View File

@ -1,3 +1,3 @@
"""claudedo — voice-control daemon for claude code (local STT -> tmux send-keys).""" """claudedo — voice-control daemon for claude code (local STT -> tmux send-keys)"""
__version__ = "0.1.0" __version__ = "0.1.0"

226
src/claudedo/__main__.py Normal file
View File

@ -0,0 +1,226 @@
"""claudedo CLI: start | stop | status | test-audio | install"""
from __future__ import annotations
import argparse
import logging
import subprocess
import sys
import wave
from pathlib import Path
from . import __version__, daemon, target
from .config import Config, ConfigError, load_config
def _setup_logging(verbose: bool) -> None:
logging.basicConfig(
level=logging.DEBUG if verbose else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
datefmt="%H:%M:%S",
)
def _load_or_die(path: str | None) -> Config:
try:
return load_config(path)
except ConfigError as exc:
print(f"config error: {exc}", file=sys.stderr)
raise SystemExit(2)
def cmd_start(args: argparse.Namespace) -> int:
config = _load_or_die(args.config)
if args.mode:
config.mode = args.mode
if not args.skip_audio_check:
print("checking mic before listening (speak briefly) ...")
peak = _probe_mic(config, seconds=2.0, verbose=False)
if peak is None or peak < 0.02:
print("mic check failed — no usable input.", file=sys.stderr)
print("run `claudedo test-audio` to debug; or `claudedo start --skip-audio-check`",
file=sys.stderr)
return 1
print(f"mic OK (peak {peak:.3f}).")
try:
daemon.run_daemon(config)
except RuntimeError as exc:
print(str(exc), file=sys.stderr)
return 1
return 0
def _probe_mic(config: Config, seconds: float, verbose: bool):
"""warm up the mic then capture for `seconds`; return peak amplitude or None.
None signals a hard capture failure (no PortAudio / device error) with guidance
already printed; a float (possibly ~0) is a successful capture whose level the
caller judges. shared by `start`'s precheck and `test-audio`.
"""
from . import audio as audio_mod
try:
device = audio_mod.resolve_device(config.stt_device)
if verbose:
print("priming mic (RDPSource resumes from suspend) ...")
audio_mod.warm_up(config.samplerate, config.channels, device)
if verbose:
print(f"capturing {seconds:.0f}s from "
f"device={device if device is not None else 'default'} — speak now ...")
chunk = audio_mod.record_while(
config.samplerate, config.channels, device,
held=_timed_hold(seconds), max_utterance=seconds + 1.0, min_utterance=0.0,
)
except Exception as exc:
print(f"audio capture FAILED: {exc}", file=sys.stderr)
print("fix-chain: install.sh apt deps + ~/.asoundrc pulse shim + Windows mic permission",
file=sys.stderr)
return None
if chunk is None or chunk.size == 0:
print("captured no audio — check mic permission + RDPSource", file=sys.stderr)
return None
peak = float(abs(chunk).max())
if verbose:
out = Path("/tmp/claudedo_test.wav")
_write_wav(out, chunk, config.samplerate)
print(f"captured {chunk.size / config.samplerate:.1f}s, peak amplitude {peak:.3f} -> {out}")
return peak
def cmd_stop(_args: argparse.Namespace) -> int:
if daemon.stop_running():
print("sent stop signal to claudedo")
return 0
print("claudedo is not running")
return 1
def cmd_status(_args: argparse.Namespace) -> int:
pid = daemon.read_pid()
if pid is None:
print("claudedo: not running")
return 1
state = daemon.read_state() or {}
print(f"claudedo: running (pid {pid})")
print(f" mode: {state.get('mode', '?')}")
print(f" target: {state.get('target') or '(none — run cc to attach)'}")
return 0
def _check_audio_tools() -> None:
for tool in ("pactl", "arecord"):
path = subprocess.run(["which", tool], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
mark = "ok" if path.returncode == 0 else "MISSING (run install.sh)"
print(f" {tool}: {mark}")
def cmd_test_audio(args: argparse.Namespace) -> int:
config = _load_or_die(args.config)
print("== claudedo test-audio ==")
print("WSLg PulseServer:", "present" if Path("/mnt/wslg/PulseServer").exists() else "MISSING")
_check_audio_tools()
try:
pactl = subprocess.run(["pactl", "info"], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
if pactl.returncode == 0:
for line in pactl.stdout.decode("utf-8", "replace").splitlines():
if line.startswith("Default Source"):
print(" ", line.strip())
except FileNotFoundError:
pass
from . import audio as audio_mod
print("\nsounddevice input devices:")
try:
for idx, dev in enumerate(audio_mod.list_devices()):
if dev.get("max_input_channels", 0) > 0:
print(f" [{idx}] {dev['name']} ({dev['max_input_channels']}ch)")
except Exception as exc:
print(f" could not list devices: {exc}", file=sys.stderr)
peak = _probe_mic(config, seconds=3.0, verbose=True)
if peak is None:
return 1
if peak < 0.02:
print("WARNING: near-silent capture — is the mic muted / permission denied?")
print("fix-chain: Windows mic permission for desktop apps + a non-Krisp default input;")
print(" if still silent, `wsl --shutdown` then reopen to re-attach RDPSource.")
return 1
print("mic OK.")
return 0
def _timed_hold(seconds: float):
import time
end = [None]
def held() -> bool:
now = time.monotonic()
if end[0] is None:
end[0] = now + seconds
return now < end[0]
return held
def _write_wav(path: Path, chunk, samplerate: int) -> None:
import numpy as np
pcm = (np.clip(chunk, -1.0, 1.0) * 32767).astype("<i2")
with wave.open(str(path), "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(samplerate)
wf.writeframes(pcm.tobytes())
def cmd_install(_args: argparse.Namespace) -> int:
script = Path(__file__).resolve().parents[2] / "install.sh"
if not script.is_file():
print(f"install.sh not found at {script}", file=sys.stderr)
return 1
return subprocess.call(["bash", str(script)])
def cmd_switch(args: argparse.Namespace) -> int:
session = target.set_target(args.name)
print(f"target -> {session}")
return 0
def build_parser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(prog="claudedo", description="voice control for claude code")
p.add_argument("--version", action="version", version=f"claudedo {__version__}")
p.add_argument("-v", "--verbose", action="store_true", help="debug logging")
p.add_argument("-c", "--config", help="path to config.toml")
sub = p.add_subparsers(dest="command", required=True)
sp = sub.add_parser("start", help="run the daemon (foreground)")
sp.add_argument("--mode", choices=("listen", "ptt"), help="override input mode")
sp.add_argument("--skip-audio-check", action="store_true",
help="skip the pre-listen mic check")
sp.set_defaults(func=cmd_start)
sub.add_parser("stop", help="stop a running daemon").set_defaults(func=cmd_stop)
sub.add_parser("status", help="show daemon status").set_defaults(func=cmd_status)
sub.add_parser("test-audio", help="verify the mic capture path").set_defaults(func=cmd_test_audio)
sub.add_parser("install", help="re-run the bootstrap (install.sh)").set_defaults(func=cmd_install)
sw = sub.add_parser("switch", help="set the active target session")
sw.add_argument("name", help="project short-name (claude- prefix optional)")
sw.set_defaults(func=cmd_switch)
return p
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
_setup_logging(getattr(args, "verbose", False))
return args.func(args)
if __name__ == "__main__":
sys.exit(main())

179
src/claudedo/audio.py Normal file
View File

@ -0,0 +1,179 @@
"""mic capture via sounddevice — the WSL-hard part.
device selection resolves config's stt.device ("auto" | index | name substring) to
a concrete sounddevice input device. two capture paths:
- record_until_silence(): listen mode stream until trailing silence segments the
utterance (no streaming STT; chunk-on-silence is enough for commands).
- record_while(predicate): ptt mode capture while predicate() is true (key held).
the WSLg/PulseAudio path is verified separately by `claudedo test-audio`; if capture
fails here the fix-chain is the apt deps + ~/.asoundrc + Windows mic permission.
"""
from __future__ import annotations
import logging
import queue
import time
from typing import Callable
import numpy as np
log = logging.getLogger(__name__)
class AudioError(Exception):
"""raised when no usable input device is found or capture fails"""
def list_devices() -> list[dict]:
"""return sounddevice's device table (for test-audio / debugging)"""
import sounddevice as sd
return list(sd.query_devices())
def resolve_device(spec: str) -> int | None:
"""resolve a device spec to a sounddevice input index, or None for default.
spec: "auto" -> default input; a digit string -> that index; otherwise a
case-insensitive substring of a device name with input channels.
"""
import sounddevice as sd
if spec in ("", "auto", "default"):
return None
if spec.isdigit():
return int(spec)
spec_low = spec.lower()
for idx, dev in enumerate(sd.query_devices()):
if dev.get("max_input_channels", 0) > 0 and spec_low in dev["name"].lower():
return idx
raise AudioError(f"no input device matching {spec!r}")
def _rms(block: np.ndarray) -> float:
if block.size == 0:
return 0.0
return float(np.sqrt(np.mean(np.square(block, dtype=np.float64))))
def warm_up(samplerate: int, channels: int, device: int | None,
timeout: float = 3.0) -> bool:
"""open a short stream and read until the source produces audio.
WSLg's RDPSource suspends when idle and emits ~1-2s of silence while it resumes
on the next read. priming here means the first real capture isn't lost to that
warm-up gap. returns whether any non-silent block arrived before timeout (still
safe to proceed either way a truly silent mic just returns False).
"""
import sounddevice as sd
block_dur = 0.05
blocksize = int(samplerate * block_dur)
deadline = time.monotonic() + timeout
with sd.InputStream(samplerate=samplerate, channels=channels, device=device,
dtype="float32", blocksize=blocksize) as stream:
while time.monotonic() < deadline:
block, _overflowed = stream.read(blocksize)
mono = block.reshape(-1) if channels == 1 else block.mean(axis=1)
if _rms(mono) > 0.0:
return True
return False
def record_until_silence(samplerate: int, channels: int, device: int | None,
silence_threshold: float, silence_duration: float,
min_utterance: float, max_utterance: float,
stop: Callable[[], bool] | None = None) -> np.ndarray | None:
"""capture one utterance, ending after trailing silence. returns mono float32.
blocks until speech is detected and then trailing silence segments it, or until
stop() returns true (clean shutdown). returns None if stopped before any speech
or if the captured utterance is shorter than min_utterance.
"""
import sounddevice as sd
block_dur = 0.05
blocksize = int(samplerate * block_dur)
q: "queue.Queue[np.ndarray]" = queue.Queue()
def _cb(indata, _frames, _time, status):
if status:
log.debug("audio status: %s", status)
q.put(indata.copy())
collected: list[np.ndarray] = []
speaking = False
silence_run = 0.0
started_at = time.monotonic()
with sd.InputStream(samplerate=samplerate, channels=channels, device=device,
dtype="float32", blocksize=blocksize, callback=_cb):
while True:
if stop is not None and stop():
break
try:
block = q.get(timeout=0.2)
except queue.Empty:
if not speaking and time.monotonic() - started_at > 600:
started_at = time.monotonic()
continue
mono = block.reshape(-1) if channels == 1 else block.mean(axis=1)
level = _rms(mono)
if level >= silence_threshold:
speaking = True
silence_run = 0.0
collected.append(mono)
elif speaking:
silence_run += block_dur
collected.append(mono)
if silence_run >= silence_duration:
break
if speaking and (time.monotonic() - started_at) > max_utterance:
log.debug("utterance hit max_utterance cap")
break
if not collected:
return None
audio = np.concatenate(collected).astype(np.float32)
if audio.size / samplerate < min_utterance:
return None
return audio
def record_while(samplerate: int, channels: int, device: int | None,
held: Callable[[], bool], max_utterance: float,
min_utterance: float) -> np.ndarray | None:
"""capture while held() is true (push-to-talk). returns mono float32 or None"""
import sounddevice as sd
block_dur = 0.05
blocksize = int(samplerate * block_dur)
q: "queue.Queue[np.ndarray]" = queue.Queue()
def _cb(indata, _frames, _time, status):
if status:
log.debug("audio status: %s", status)
q.put(indata.copy())
collected: list[np.ndarray] = []
started_at = time.monotonic()
with sd.InputStream(samplerate=samplerate, channels=channels, device=device,
dtype="float32", blocksize=blocksize, callback=_cb):
while held():
try:
block = q.get(timeout=0.1)
except queue.Empty:
continue
mono = block.reshape(-1) if channels == 1 else block.mean(axis=1)
collected.append(mono)
if (time.monotonic() - started_at) > max_utterance:
break
if not collected:
return None
audio = np.concatenate(collected).astype(np.float32)
if audio.size / samplerate < min_utterance:
return None
return audio

View File

@ -1,4 +1,4 @@
"""load and validate config.toml into a typed Config object with clear errors.""" """load and validate config.toml into a typed Config object with clear errors"""
from __future__ import annotations from __future__ import annotations
@ -10,7 +10,7 @@ from pathlib import Path
try: try:
import tomllib as _toml import tomllib as _toml
_TOML_BINARY = True _TOML_BINARY = True
except ModuleNotFoundError: # python < 3.11 except ModuleNotFoundError:
import tomli as _toml import tomli as _toml
_TOML_BINARY = True _TOML_BINARY = True
@ -27,12 +27,12 @@ DEFAULT_CONFIG_PATHS = (
class ConfigError(Exception): class ConfigError(Exception):
"""raised on a missing or invalid configuration value.""" """raised on a missing or invalid configuration value"""
@dataclass @dataclass
class Config: class Config:
"""validated claudedo configuration.""" """validated claudedo configuration"""
wake_phrases: list[str] wake_phrases: list[str]
mode: str mode: str
@ -53,7 +53,7 @@ class Config:
def find_config_path(explicit: str | os.PathLike | None = None) -> Path: def find_config_path(explicit: str | os.PathLike | None = None) -> Path:
"""resolve the config file path, raising ConfigError if none is found.""" """resolve the config file path, raising ConfigError if none is found"""
candidates: list[Path] = [] candidates: list[Path] = []
if explicit: if explicit:
candidates.append(Path(explicit)) candidates.append(Path(explicit))
@ -79,7 +79,7 @@ def _require(table: dict, section: str, key: str, types: tuple, default=None):
def load_config(explicit: str | os.PathLike | None = None) -> Config: def load_config(explicit: str | os.PathLike | None = None) -> Config:
"""load config.toml from the first existing default path (or an explicit one).""" """load config.toml from the first existing default path (or an explicit one)"""
path = find_config_path(explicit) path = find_config_path(explicit)
try: try:
with open(path, "rb") as fh: with open(path, "rb") as fh:

262
src/claudedo/daemon.py Normal file
View File

@ -0,0 +1,262 @@
"""the capture -> stt -> match -> inject loop.
privacy invariant: in listen mode, any utterance that does not start with a wake
phrase is discarded the instant grammar.parse() returns None the transcript text
is dropped and never stored or transmitted. nothing about non-command speech is
persisted.
"""
from __future__ import annotations
import json
import logging
import os
import signal
import sys
import time
from pathlib import Path
from . import audio, grammar, inject, target
from .config import Config
from .stt import Transcriber
log = logging.getLogger(__name__)
STATE_DIR = Path(os.environ.get("XDG_CACHE_HOME", str(Path.home() / ".cache"))) / "claudedo"
PIDFILE = STATE_DIR / "claudedo.pid"
STATEFILE = STATE_DIR / "state.json"
def _ensure_state_dir() -> None:
STATE_DIR.mkdir(parents=True, exist_ok=True)
def write_state(pid: int, mode: str, target_session: str | None) -> None:
"""write the running daemon's status for `claudedo status` to read"""
_ensure_state_dir()
STATEFILE.write_text(json.dumps({
"pid": pid,
"mode": mode,
"target": target_session,
"since": time.time(),
}), encoding="utf-8")
def read_state() -> dict | None:
"""read the daemon status file, or None if absent/unreadable"""
try:
return json.loads(STATEFILE.read_text(encoding="utf-8"))
except (FileNotFoundError, json.JSONDecodeError, OSError):
return None
def read_pid() -> int | None:
"""return the pid of a running daemon, or None (also clears stale pidfiles)"""
try:
pid = int(PIDFILE.read_text(encoding="utf-8").strip())
except (FileNotFoundError, ValueError, OSError):
return None
try:
os.kill(pid, 0)
except ProcessLookupError:
PIDFILE.unlink(missing_ok=True)
return None
except PermissionError:
return pid
return pid
def stop_running() -> bool:
"""signal a running daemon to stop. returns whether one was found"""
pid = read_pid()
if pid is None:
return False
os.kill(pid, signal.SIGTERM)
return True
class _PTTKey:
"""desk-only push-to-talk: 'held' while the configured key is down in the
daemon's own terminal. there is deliberately NO global hotkey — a system-wide
keyboard hook is the keylogger/cheat silhouette claudedo refuses to install. for
hands-free-while-gaming use listen mode (voice trigger over the mic bridge).
implementation reads stdin in raw mode: press the key to start capture, press it
again (or Enter) to stop. (terminals don't deliver key-up events, so true
hold-to-talk isn't possible from a tty — this is press-toggle, documented.)
"""
def __init__(self) -> None:
self._tty = sys.stdin.isatty()
def wait_press(self, stop) -> bool:
import select
if not self._tty:
log.warning("ptt mode needs a tty; falling back to a 3s timed capture")
time.sleep(0.2)
return not stop()
while not stop():
r, _, _ = select.select([sys.stdin], [], [], 0.2)
if r:
sys.stdin.read(1)
return True
return False
class Daemon:
"""owns the capture/transcribe/inject loop and runtime mode switching"""
def __init__(self, config: Config) -> None:
self.config = config
self.mode = config.mode
self._stop = False
self._transcriber: Transcriber | None = None
self._device: int | None = None
self._ptt = _PTTKey()
def _install_signals(self) -> None:
signal.signal(signal.SIGTERM, self._on_signal)
signal.signal(signal.SIGINT, self._on_signal)
def _on_signal(self, _signum, _frame) -> None:
log.info("stop requested")
self._stop = True
def stopped(self) -> bool:
return self._stop
def _load(self) -> None:
cfg = self.config
self._device = audio.resolve_device(cfg.stt_device)
self._transcriber = Transcriber(
model=cfg.stt_model, language=cfg.stt_language,
device=cfg.stt_compute if cfg.stt_compute in ("cpu", "cuda") else "auto",
compute_type="auto",
)
if audio.warm_up(cfg.samplerate, cfg.channels, self._device):
log.info("mic warmed up (source live)")
else:
log.warning("mic warm-up saw only silence — check mic permission / RDPSource")
def _capture(self):
cfg = self.config
if self.mode == "ptt":
print("[ptt] press the capture key in this terminal, speak, then press again to stop")
if not self._ptt.wait_press(self.stopped):
return None
return audio.record_while(
cfg.samplerate, cfg.channels, self._device,
held=lambda: not self._ptt.wait_press(self.stopped),
max_utterance=cfg.max_utterance, min_utterance=cfg.min_utterance,
)
return audio.record_until_silence(
cfg.samplerate, cfg.channels, self._device,
silence_threshold=cfg.silence_threshold, silence_duration=cfg.silence_duration,
min_utterance=cfg.min_utterance, max_utterance=cfg.max_utterance,
stop=self.stopped,
)
def _handle(self, transcript: str) -> None:
cfg = self.config
require_wake = self.mode == "listen"
action = grammar.parse(transcript, cfg.wake_phrases, cfg.match_threshold, require_wake)
if action is None:
self._emit(f'heard: "{transcript}" -> no command matched')
return
if action.name == "mode":
new_mode = str(action.arg)
if new_mode != self.mode:
self.mode = new_mode
self._emit(f"mode -> {new_mode}")
self._refresh_state()
return
if action.name == "switch":
session = target.set_target(str(action.arg))
self._emit(f"target -> {session}")
self._refresh_state()
return
session = target.resolve_target()
if session is None:
self._emit(f'heard: "{transcript}" -> matched: {self._describe(action)} '
f'-> ERROR no target session (did nothing)')
return
self._emit(f'heard: "{transcript}" -> matched: {self._describe(action)} -> target {session}')
if action.name == "type" and not cfg.type_autosend:
inject.send_literal(session, str(action.arg))
self._emit(f"injected: literal {str(action.arg)!r} -> {session}")
return
inject.perform(session, action)
self._emit(f"injected: {self._describe(action)} -> {session}")
@staticmethod
def _describe(action) -> str:
if action.arg is None:
return action.name.upper()
return f"{action.name.upper()}({action.arg})"
@staticmethod
def _emit(line: str) -> None:
"""print a recognition/action line to the watched terminal"""
print(line, flush=True)
def _has_wake(self, transcript: str) -> bool:
"""true if the utterance starts with a wake phrase (listen-mode gate).
non-wake speech is dropped without ever printing the transcript the privacy
invariant: non-command speech is discarded, never recorded.
"""
cfg = self.config
return grammar.strip_wake(transcript, cfg.wake_phrases, cfg.match_threshold, True) is not None
def _print_startup(self) -> None:
cfg = self.config
dev = cfg.stt_device if cfg.stt_device != "auto" else "default"
target_now = target.read_active() or "(none — run cc to attach)"
self._emit("── claudedo ─────────────────────────────────")
self._emit(f" model: {cfg.stt_model} ({cfg.stt_language})")
self._emit(f" mic: {dev}")
self._emit(f" mode: {self.mode}")
self._emit(f" target: {target_now}")
self._emit(f" wake: {', '.join(cfg.wake_phrases)}")
self._emit(" Ctrl-C to stop")
self._emit("─────────────────────────────────────────────")
def _refresh_state(self) -> None:
write_state(os.getpid(), self.mode, target.read_active())
def run(self) -> None:
"""run the daemon loop until a stop signal arrives"""
_ensure_state_dir()
PIDFILE.write_text(str(os.getpid()), encoding="utf-8")
self._install_signals()
try:
self._load()
self._refresh_state()
self._print_startup()
while not self._stop:
audio_chunk = self._capture()
if self._stop:
break
if audio_chunk is None:
continue
transcript = self._transcriber.transcribe(audio_chunk, self.config.samplerate)
if not transcript:
continue
if self.mode == "listen" and not self._has_wake(transcript):
self._emit("dropped: non-wake speech (not recorded)")
continue
self._handle(transcript)
finally:
PIDFILE.unlink(missing_ok=True)
STATEFILE.unlink(missing_ok=True)
log.info("claudedo stopped")
def run_daemon(config: Config) -> None:
"""entry point used by the CLI ``start`` command"""
if read_pid() is not None:
raise RuntimeError("claudedo is already running (see `claudedo status`)")
Daemon(config).run()

159
src/claudedo/grammar.py Normal file
View File

@ -0,0 +1,159 @@
"""wake-phrase gate + command grammar matching (fuzzy, data-driven).
the matcher is lenient by design: whisper renders the coined word "claudedo"
inconsistently, so wake-phrase detection normalizes case, strips spaces/punctuation,
and accepts close variants. number words are normalized to digits before matching.
flow: transcript -> strip_wake() returns the command remainder (or None if no wake
phrase in listen mode) -> match_command() maps the remainder to an Action.
"""
from __future__ import annotations
import re
from dataclasses import dataclass
from difflib import SequenceMatcher
_PUNCT = re.compile(r"[^a-z0-9 ]+")
_WS = re.compile(r"\s+")
_NUMBER_WORDS = {
"zero": "0", "oh": "0",
"one": "1", "won": "1",
"two": "2", "to": "2", "too": "2",
"three": "3", "tree": "3",
"four": "4", "for": "4", "fore": "4",
}
_INDEX_WORDS = {"1": 1, "2": 2, "3": 3, "4": 4}
@dataclass(frozen=True)
class Action:
"""a matched command: a name plus an optional argument.
names: yes, no, select, approve, deny, submit, type, mode, switch, cancel.
arg carries the select index (int), the literal text for ``type``, the mode for
``mode``, or the session short-name for ``switch``.
"""
name: str
arg: object = None
def normalize(text: str) -> str:
"""lowercase, strip punctuation, collapse whitespace, map number words to digits"""
text = text.lower().strip()
text = _PUNCT.sub(" ", text)
text = _WS.sub(" ", text).strip()
if not text:
return ""
tokens = [_NUMBER_WORDS.get(tok, tok) for tok in text.split(" ")]
return " ".join(tokens)
def _ratio(a: str, b: str) -> float:
return SequenceMatcher(None, a, b).ratio()
def _wake_variants(phrase: str) -> set[str]:
"""spaced and despaced forms of a wake phrase for lenient matching"""
norm = normalize(phrase)
return {norm, norm.replace(" ", "")}
def strip_wake(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> str | None:
"""return the command remainder after the wake phrase.
if ``require_wake`` (listen mode) and no wake phrase is found at the start,
return None so the daemon discards the utterance. if not required (ptt mode),
a leading wake phrase is stripped when present but its absence is fine.
matches leniently on a despaced prefix (whisper splits/joins the coined word
inconsistently) but always slices the remainder on a WORD boundary of the
spaced, normalized transcript so the command portion keeps its spaces.
"""
norm = normalize(transcript)
if not norm:
return None if require_wake else ""
words = norm.split(" ")
best_remainder: str | None = None
best_score = 0.0
for phrase in wake_phrases:
variants = _wake_variants(phrase)
max_words = phrase.count(" ") + 2
for take in range(1, min(max_words, len(words)) + 1):
head_despaced = "".join(words[:take])
for variant in variants:
if not variant:
continue
score = _ratio(head_despaced, variant)
if score >= threshold and score > best_score:
best_score = score
best_remainder = " ".join(words[take:]).strip()
if best_remainder is not None:
return best_remainder
return None if require_wake else norm
def _fuzzy_in(token: str, options: tuple[str, ...], threshold: float) -> bool:
return any(_ratio(token, opt) >= threshold for opt in options)
def match_command(remainder: str, threshold: float) -> Action | None:
"""map a normalized command remainder to an Action, or None if unrecognized"""
remainder = remainder.strip()
if not remainder:
return None
tokens = remainder.split(" ")
head = tokens[0]
rest = tokens[1:]
if head in _INDEX_WORDS:
return Action("select", _INDEX_WORDS[head])
if _fuzzy_in(head, ("yes", "yeah", "yep", "yup"), threshold):
return Action("yes")
if _fuzzy_in(head, ("no", "nope", "nah"), threshold):
return Action("no")
if _fuzzy_in(head, ("approve", "allow"), threshold):
return Action("approve")
if _fuzzy_in(head, ("deny", "reject"), threshold):
return Action("deny")
if _fuzzy_in(head, ("send", "enter", "submit"), threshold):
return Action("submit")
if _fuzzy_in(head, ("cancel", "escape", "stop"), threshold):
return Action("cancel")
if _fuzzy_in(head, ("select", "option", "choose", "number"), threshold) and rest:
if rest[0] in _INDEX_WORDS:
return Action("select", _INDEX_WORDS[rest[0]])
if _fuzzy_in(head, ("type", "dictate", "write"), threshold):
text = " ".join(rest).strip()
return Action("type", text) if text else None
if _fuzzy_in(head, ("mode",), threshold) and rest:
if _fuzzy_in(rest[0], ("ptt",), threshold) or "push" in rest[0]:
return Action("mode", "ptt")
if _fuzzy_in(rest[0], ("listen",), threshold):
return Action("mode", "listen")
return None
if _fuzzy_in(head, ("switch", "target"), threshold) and rest:
name = "".join(rest)
return Action("switch", name) if name else None
return None
def parse(transcript: str, wake_phrases: list[str], threshold: float,
require_wake: bool) -> Action | None:
"""full parse: wake gate then command match. None means discard"""
remainder = strip_wake(transcript, wake_phrases, threshold, require_wake)
if remainder is None:
return None
return match_command(remainder, threshold)

View File

@ -1,15 +1,24 @@
"""inject keystrokes into a tmux session via ``tmux send-keys``. """output handlers: resolve a grammar.Action to keystrokes and emit them.
this is the ONLY mechanism by which claudedo affects claude code PTY injection, the production handler (TmuxOutputHandler) injects via ``tmux send-keys`` the ONLY
never OS-level keyboard input. it works regardless of which window is focused and mechanism by which claudedo affects claude code. PTY injection, never OS-level
never touches Windows input or a game/anticheat's view (it is text into a linux keyboard input: it works regardless of which window is focused and never touches
pseudo-terminal). do not replace this with OS keystroke injection. Windows input or a game/anticheat's view (it is text into a linux pseudo-terminal).
do not replace this with OS keystroke injection. this is also why claudedo is a
standalone daemon and not an MCP server MCP tools can only return content to claude,
not inject into its input stream.
StdoutOutputHandler prints what WOULD be injected instead of touching tmux, so the
grammar + keymap can be exercised end-to-end without a live claude session the
deterministic test path. both implement the same OutputHandler seam and are
interchangeable.
""" """
from __future__ import annotations from __future__ import annotations
import logging import logging
import subprocess import subprocess
from abc import ABC, abstractmethod
from . import keys, target from . import keys, target
@ -17,10 +26,62 @@ log = logging.getLogger(__name__)
class InjectError(Exception): class InjectError(Exception):
"""raised when a tmux send-keys call fails.""" """raised when a tmux send-keys call fails"""
def _send_keys(session: str, args: list[str], literal: bool) -> None: class OutputHandler(ABC):
"""abstract sink for resolved keystrokes.
concretes implement send_named (a sequence of named tmux keys) and send_literal
(literal text, no submit). perform() maps a grammar.Action onto these and is shared
by all handlers.
"""
@abstractmethod
def send_named(self, session: str, key_tokens: list[str]) -> None:
"""emit a sequence of named keys (e.g. ['1'] or ['Down', 'Enter'])"""
@abstractmethod
def send_literal(self, session: str, text: str) -> None:
"""emit literal text into the input box without submitting (``type``)"""
def perform(self, session: str, action) -> bool:
"""resolve a grammar.Action to keystrokes and emit them. returns acted?.
``switch`` and ``mode`` are handled by the daemon (they change daemon state,
not the claude session), so they are ignored here.
"""
name = action.name
if name == "yes":
self.send_named(session, keys.YES)
elif name == "no":
self.send_named(session, keys.NO)
elif name == "approve":
self.send_named(session, keys.APPROVE)
elif name == "deny":
self.send_named(session, keys.DENY)
elif name == "submit":
self.send_named(session, keys.SUBMIT)
elif name == "cancel":
self.send_named(session, keys.CANCEL)
elif name == "select":
seq = keys.SELECT_BY_INDEX.get(int(action.arg))
if seq is None:
log.warning("no keymap for select index %r", action.arg)
return False
self.send_named(session, seq)
elif name == "type":
self.send_literal(session, str(action.arg))
else:
return False
return True
class TmuxOutputHandler(OutputHandler):
"""production handler — injects keystrokes into a tmux session via send-keys"""
@staticmethod
def _send_keys(session: str, args: list[str], literal: bool) -> None:
cmd = ["tmux", "send-keys", "-t", session] cmd = ["tmux", "send-keys", "-t", session]
if literal: if literal:
cmd.append("-l") cmd.append("-l")
@ -30,55 +91,68 @@ def _send_keys(session: str, args: list[str], literal: bool) -> None:
err = result.stderr.decode("utf-8", "replace").strip() err = result.stderr.decode("utf-8", "replace").strip()
raise InjectError(f"tmux send-keys failed: {err}") raise InjectError(f"tmux send-keys failed: {err}")
def send_named(self, session: str, key_tokens: list[str]) -> None:
def send_named(session: str, key_tokens: list[str]) -> None:
"""send a sequence of named tmux keys (e.g. ['1'] or ['Down', 'Enter'])."""
if not target.session_exists(session): if not target.session_exists(session):
log.warning("refusing to inject — session %r does not exist", session) log.warning("refusing to inject — session %r does not exist", session)
return return
for token in key_tokens: for token in key_tokens:
_send_keys(session, [token], literal=False) self._send_keys(session, [token], literal=False)
log.info("injected keys %s -> %s", key_tokens, session) log.info("injected keys %s -> %s", key_tokens, session)
def send_literal(self, session: str, text: str) -> None:
def send_literal(session: str, text: str) -> None:
"""insert literal text into the input box without submitting (``type``)."""
if not text: if not text:
return return
if not target.session_exists(session): if not target.session_exists(session):
log.warning("refusing to inject — session %r does not exist", session) log.warning("refusing to inject — session %r does not exist", session)
return return
_send_keys(session, [text], literal=True) self._send_keys(session, [text], literal=True)
log.info("injected literal text (%d chars) -> %s", len(text), session) log.info("injected literal text (%d chars) -> %s", len(text), session)
def perform(session: str, action) -> bool: class StdoutOutputHandler(OutputHandler):
"""resolve a grammar.Action to keystrokes and inject them. returns acted?. """test handler — prints what would be injected instead of touching tmux.
``switch`` and ``mode`` are handled by the daemon (they change daemon state, not no session existence check (there is no real session); lets grammar + keymap be
the claude session), so they are ignored here. exercised end-to-end without a live claude session. records the last emission on
``self.last`` for assertions.
""" """
name = action.name
if name == "yes": def __init__(self, stream=None) -> None:
send_named(session, keys.YES) import sys
elif name == "no":
send_named(session, keys.NO) self.stream = stream if stream is not None else sys.stdout
elif name == "approve": self.last: tuple[str, object] | None = None
send_named(session, keys.APPROVE)
elif name == "deny": def send_named(self, session: str, key_tokens: list[str]) -> None:
send_named(session, keys.DENY) self.last = ("named", list(key_tokens))
elif name == "submit": print(f"[stdout] keys {key_tokens} -> {session}", file=self.stream)
send_named(session, keys.SUBMIT)
elif name == "cancel": def send_literal(self, session: str, text: str) -> None:
send_named(session, keys.CANCEL) if not text:
elif name == "select": return
seq = keys.SELECT_BY_INDEX.get(int(action.arg)) self.last = ("literal", text)
if seq is None: print(f"[stdout] literal {text!r} -> {session}", file=self.stream)
log.warning("no keymap for select index %r", action.arg)
return False
send_named(session, seq) _default_handler: OutputHandler = TmuxOutputHandler()
elif name == "type":
send_literal(session, str(action.arg))
else: def set_default_handler(handler: OutputHandler) -> None:
return False """swap the module-level handler the daemon drives (tmux in prod, stdout in tests)"""
return True global _default_handler
_default_handler = handler
def send_named(session: str, key_tokens: list[str]) -> None:
"""module-level shim delegating to the default handler"""
_default_handler.send_named(session, key_tokens)
def send_literal(session: str, text: str) -> None:
"""module-level shim delegating to the default handler"""
_default_handler.send_literal(session, text)
def perform(session: str, action) -> bool:
"""module-level shim delegating to the default handler"""
return _default_handler.perform(session, action)

52
src/claudedo/stt.py Normal file
View File

@ -0,0 +1,52 @@
"""faster-whisper wrapper: load a model once, transcribe audio chunks locally.
privacy invariant: transcription runs entirely on-device. audio handed here is a
short in-memory chunk; nothing is written to disk or sent anywhere.
"""
from __future__ import annotations
import logging
import numpy as np
log = logging.getLogger(__name__)
class Transcriber:
"""a loaded faster-whisper model that transcribes float32 mono audio chunks"""
def __init__(self, model: str = "small", language: str = "en", device: str = "auto",
compute_type: str = "auto") -> None:
self.language = language
self._model = self._load(model, device, compute_type)
@staticmethod
def _load(model: str, device: str, compute_type: str):
from faster_whisper import WhisperModel
if device == "auto":
device = "cpu"
if compute_type == "auto":
compute_type = "int8" if device == "cpu" else "float16"
log.info("loading faster-whisper model=%s device=%s compute=%s", model, device, compute_type)
return WhisperModel(model, device=device, compute_type=compute_type)
def transcribe(self, audio: np.ndarray, samplerate: int = 16000) -> str:
"""transcribe a mono float32 numpy array to a stripped text string.
the audio must be 16 kHz mono float32 in [-1, 1]; resample upstream if not.
"""
if audio.dtype != np.float32:
audio = audio.astype(np.float32)
if audio.ndim > 1:
audio = audio.reshape(-1)
segments, _info = self._model.transcribe(
audio,
language=self.language,
beam_size=1,
vad_filter=True,
condition_on_previous_text=False,
)
text = " ".join(seg.text for seg in segments).strip()
return text

View File

@ -1,4 +1,4 @@
"""resolve the active claude code tmux session from ~/.claude-active.""" """resolve the active claude code tmux session from ~/.claude-active"""
from __future__ import annotations from __future__ import annotations
@ -26,7 +26,7 @@ def session_name(name: str) -> str:
def read_active() -> str | None: def read_active() -> str | None:
"""return the target session name from ~/.claude-active, or None if unset.""" """return the target session name from ~/.claude-active, or None if unset"""
try: try:
name = ACTIVE_FILE.read_text(encoding="utf-8").strip() name = ACTIVE_FILE.read_text(encoding="utf-8").strip()
except FileNotFoundError: except FileNotFoundError:
@ -38,7 +38,7 @@ def read_active() -> str | None:
def write_active(name: str) -> None: def write_active(name: str) -> None:
"""overwrite ~/.claude-active with a session name (used by ``switch``).""" """overwrite ~/.claude-active with a session name (used by ``switch``)"""
ACTIVE_FILE.write_text(name + "\n", encoding="utf-8") ACTIVE_FILE.write_text(name + "\n", encoding="utf-8")
@ -51,7 +51,7 @@ def set_target(name: str) -> str:
def session_exists(name: str) -> bool: def session_exists(name: str) -> bool:
"""true if a tmux session with this name currently exists.""" """true if a tmux session with this name currently exists"""
if not name: if not name:
return False return False
result = subprocess.run( result = subprocess.run(
@ -67,6 +67,13 @@ def resolve_target() -> str | None:
never guesses a target: on a missing/empty ~/.claude-active or a stale session never guesses a target: on a missing/empty ~/.claude-active or a stale session
name, this logs a clear warning and returns None so the caller injects nothing. name, this logs a clear warning and returns None so the caller injects nothing.
TODO: most-recently-active targeting (preferred over attached). today the target
is the project most recently ATTACHED to (the cc kit writes ~/.claude-active on
attach); upgrade to the session claude most recently asked a question in, via
tmux session_activity timestamps (list-sessions -F '#{session_name}
#{session_activity}', pick the highest-activity claude-* session) or by scraping
panes (capture-pane) for a waiting-prompt UI.
""" """
name = read_active() name = read_active()
if not name: if not name:
@ -76,13 +83,3 @@ def resolve_target() -> str | None:
log.warning("target session %r no longer exists — skipping injection", name) log.warning("target session %r no longer exists — skipping injection", name)
return None return None
return name return name
# TODO: most-recently-active targeting (preferred over attached). today the target
# is "the project most recently ATTACHED to" (the cc kit writes ~/.claude-active on
# attach). upgrade to "the session claude most recently asked a question / produced
# output in" via tmux session_activity timestamps:
# tmux list-sessions -F '#{session_name} #{session_activity}'
# pick the highest-activity claude-* session; or scrape panes
# (tmux capture-pane -p -t <s>) for a waiting-prompt UI and target the session whose
# pane currently shows one.