Hiro League spawns several background processes: the hirocli server, gateway instances, and channel plugin subprocesses. Getting this right on Windows, under a debugger, requires a specific set of conventions. This page documents the design and the problems it solves.
The problem
Background process management looks simple: subprocess.Popen, store the PID, check it later. In practice, three things go wrong on Windows.
1. pythonw.exe creates a child with a different PID
The original code used pythonw.exe to suppress the console window. pythonw.exe is a stub launcher — when you Popen it, the PID you get back belongs to the launcher, not the actual Python interpreter. The interpreter runs as a separate child process with a different PID.
This made the PID file unreliable: the parent stored the launcher PID, the child stored its own PID, and the two raced to write the same file.
2. sys.executable is unreliable under debuggers and entry-point scripts
When VS Code launches a program through debugpy, sys.executable is set to the program being debugged — in this case hirocli.exe, an entry-point stub — not the Python interpreter. Spawning a child with sys.executable -m hirogateway.main then fails with No module named hirogateway.main because hirocli.exe is not a Python interpreter.
The same failure occurs when the parent process has a restricted PYTHONPATH. The VS Code launch config for the server sets a PYTHONPATH that includes hirocli/src but not gateway/src, so the gateway module is not findable even if the interpreter is correct.
3. A crashing child silently erased its own PID file
The original --foreground path in hirogateway.main wrapped _run_gateway() in a try/finally and called remove_pid() in the finally block. The intent was to clean up on a clean stop. The effect was that any crash — port conflict, import error, config problem — deleted the PID file on the way out.
The caller’s immediate status check then found no PID file, reported the gateway as “stopped”, and showed no error. The failure was completely silent because stderr was also sent to DEVNULL.
The solution
Three rules fix all three problems.
Rule 1 — Spawn via uv run --directory
Instead of resolving a Python interpreter path, use uv:
cmd = ["uv", "run", "--directory", str(workspace_root), "python", "-m", "hirogateway.main", ...]
uv run --directory always uses the correct venv and resolves all workspace packages from pyproject.toml and uv.lock, regardless of how the parent process was launched. It is the same mechanism used by channel plugin spawning.
The workspace root is found by find_workspace_root() in hiro_commons.process, which walks up from the calling module’s file to find the pyproject.toml containing [tool.uv.workspace].
Rule 2 — Only the child writes the PID
The parent process never writes the PID file. It spawns the child and then polls for the PID file to appear:
spawn_detached(cmd, stderr_log=instance_path / "stderr.log")
child_pid = wait_for_pid(instance_path, PID_FILENAME) # polls up to 5s
wait_for_pid() loops until read_pid() returns a value and is_running(pid) confirms the process is alive. If the child crashes before writing its PID, wait_for_pid() raises a RuntimeError with the timeout and last known PID — so the caller gets an error instead of a false “started” response.
The child writes its own PID at startup:
# In the --foreground entry point (gateway/main.py)
write_pid(instance_path, PID_FILENAME) # writes os.getpid()
os.getpid() inside the child is always the child’s real PID, regardless of how the parent spawned it.
Rule 3 — Only stop_process() removes the PID file
The child never removes the PID file — not on clean exit, not in a finally block, not ever. The PID file is persistent state that says “this instance was started”. Only stop_process() removes it, after killing the process.
A stale PID file from a crashed process is handled correctly: is_running(pid) returns False, so get_status() reports the instance as stopped, and the next start_instance() call clears it and starts fresh.
stderr goes to a log file
spawn_detached() accepts an optional stderr_log path. Both gateway instances and the hirocli server pass instance_path / "stderr.log" or workspace_path / "stderr.log". Crashes produce a readable file instead of disappearing into DEVNULL.
Invariants
| Concern | Owner |
|---|
| Writes PID file | Child process (write_pid(instance_path, PID_FILENAME)) |
| Removes PID file | stop_process() only |
| Discovers real child PID | wait_for_pid() polling loop |
| Finds correct Python interpreter | uv run --directory <workspace_root> |
| Records crash output | stderr_log path passed to spawn_detached() |
Module responsibilities
| Module | Responsibility |
|---|
hiro_commons/process.py | spawn_detached(), wait_for_pid(), find_workspace_root(), uv_python_cmd() |
hirogateway/service.py | Calls spawn_detached + wait_for_pid for gateway instances |
hirogateway/main.py | Child entry point — calls write_pid() at startup, does not remove PID on exit |
hirocli/tools/server.py | Calls spawn_detached + wait_for_pid for the hirocli server |
hirocli/runtime/server_process.py | Calls write_pid() at startup; _spawn_server() uses spawn_detached for self-restart |
Never write the PID from the parent process after calling spawn_detached. On Windows, launchers and stubs mean the PID returned by Popen is not necessarily the PID of the running Python interpreter.