| .idea | ||
| debugging | ||
| prompts | ||
| src/jayjaders_llm_harness | ||
| tests | ||
| .gitignore | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
llm-agent-harness
A harness for running an LLM-based agent, suited to my needs:
- uses self-hosted LLMs behind an OpenAI-compatible API (explicitly supported:
ollama,lm-studio,lemonade) - terminal-first
- human access to and control over conversation/session messages
- bash command execution tool is sandboxed inside a container
Table of Contents
Quick Start
# Create and activate the virtualenv
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install .
# Create the required container volume (if using podman)
podman volume create llm-harness-tool-workspace
# Launch in interactive mode
start-harness-jayjader
Installation
Python / the harness itself
The harness uses Python 3.14. You need to create a virtualenv and install the harness's dependencies inside of it. The simplest way to do this is to run the following shell commands from the root directory of the harness's source code:
python3 -m venv .venv
source .venv/bin/activate
pip install .
If you prefer to use uv, do the following:
uv venv .venv
source .venv/bin/activate
uv pip install .
Additional prerequisites
| Feature | Preconditions |
|---|---|
tmux integration |
- tmux installed- harness process started inside an existing tmux client session |
execute_sandboxed_command tool |
- container runtime engine installed (podman or docker)- a named volume created that can be mounted (ex: podman volume create llm-harness-tool-workspace) |
Usage
$ start-harness-jayjader -h
usage: start-harness-jayjader [-h] [--model-name MODEL_NAME] [--stream] [--session-name SESSION_NAME] [--resume-session RESUME_SESSION | --clone-session CLONE_SESSION]
[--interactive | --one-shot] [--test-connection | --refresh-models | --prompt PROMPT | --prompt-file PROMPT_FILE]
Jayjader's LLM Harness
options:
-h, --help show this help message and exit
--model-name, --model, -m MODEL_NAME
The name of the model to use for inference. If omitted, the last model used by the harness will be used.
--stream, -s Have the API stream responses (default: False)
--session-name, -n SESSION_NAME
Name for the new session
--resume-session, -r RESUME_SESSION
Resume an existing session by its ID
--clone-session, -c CLONE_SESSION
Clone an existing session and continue from it
--interactive Run in interactive mode (default when no mode specified)
--one-shot Run in one shot mode
--test-connection Test connection to inference provider API
--refresh-models Refresh the list of models the API provides
--prompt, -p PROMPT The exact prompt to be sent to the LLM as the "user message"
--prompt-file, -P PROMPT_FILE
The path to a file whose contents will be sent to the LLM as the "user message"
Example uses:
start-harness-jayjader
Starts the harness in interactive mode. If the list of available models is not known, the user will be asked to confirm the refresh of this list with the declared inference provider. The previous model (name stored in the harness's data directory) will be used. If no previous model is known, the user is prompted to choose from the list of known models. The session name will be the current date and time as a string.
start-harness-jayjader --interactive
Same as no args.
start-harness-jayjader --test-connection, start-harness-jayjader --interactive --test-connection
Tests the connection to the inference provider, then same as no args.
start-harness-jayjader --one-shot --test-connection
Tests the connection to the inference provider, then exits.
start-harness-jayjader --refresh-models, start-harness-jayjader --interactive --refresh-models
Refreshes the list of models available at the inference provider, then same as no args.
start-harness-jayjader --one-shot --refresh-models
Refreshes the list of models available at the inference provider, then exits.
start-harness-jayjader --one-shot --prompt <message>
Runs the agentic inference loop on "<message>", then exits. The previous model (stored in the harness's data directory) will be used. The session name will be the current date and time as a string. If the list of available models is not known or empty, or the previous used model is not known or not among the list of available models, a warning is printed to stderr. The harness then exits without running inference.
start-harness-jayjader --one-shot --prompt-file <file path>
Same as --one-shot --promt <message>, but the agentic inference loop is run on the contents of <file path>.
start-harness-jayjader --session-name <name>, start-harness-jayjader --interactive --session-name <name>
Same as no args, but the session name will be "<name>" instead of the current date and time.
start-harness-jayjader --one-shot --session-name <name> --prompt-file <file path>, start-harness-jayjader --one-shot --session-name <name> --prompt <message>
Same as --one-shot --prompt <message> or --one-shot --prompt-file <file path>, but the session name will be "<name>" instead of the current date and time.
start-harness-jayjader --resume-session, start-harness-jayjader --interactive --resume-session
Resumes an existing/prior session instead of starting a new (empty) session. Otherwise, same as no args.
start-harness-jayjader --one-shot --resume-session --prompt <message>, start-harness-jayjader --one-shot --resume-session --prompt-file <file path>
Resumes an existing/prior session instead of starting a new (empty) session. Otherwise, same as --one-shot --prompt <message> or --one-shot --prompt-file <file path>.
start-harness-jayjader --clone-session <name>, start-harness-jayjader --interactive --clone-session <name>
Clones an existing/prior session instead of starting a new (empty) session. The resulting session is named "<name>-clone". Otherwise, same as no args.
start-harness-jayjader --one-shot --clone-session <name> --prompt <message>, start-harness-jayjader --one-shot --clone-session <name> --prompt-file <file path>
Clones an existing/prior session instead of starting a new (empty) session. The resulting session is named "<name>-clone". Otherwise, same as --one-shot --prompt <message> or --one-shot --prompt-file <file path>.
start-harness-jayjader --clone-session <existing name> --session-name <clone name>, start-harness-jayjader --interactive --clone-session <existing name> --session-name <clone name>
Clones the existing/prior session named "<existing name>" instead of starting a new (empty) session. The resulting session is named "<clone name>". Otherwise, same as no args.
start-harness-jayjader --one-shot --clone-session <existing name> --session-name <clone name> --prompt <message>, start-harness-jayjader --one-shot --clone-session <existing name> --session-name <clone name> --prompt-file <file path>
Clones the existing/prior session named "<existing name>" instead of starting a new (empty) session. The resulting session is named "<clone name>". Otherwise, same as --one-shot --prompt <message> or --one-shot --prompt-file <file path>.
Interactive vs One-Shot
The harness runs the same "agentic loop" (inference → tool calls → more inference until a final answer) in both modes. The main difference is whether you stay inside the harness afterward. While in interactive mode you input the prompt once the harness has started, in one-shot mode you must provide the prompt up-front, as a command-line argument to the harness.
Interactive Mode (default)
Start in interactive mode by simply running the harness, or optionally with the --interactive flag. You will be greeted with a > prompt where you can type a message and run commands (e.g. /model, /refresh-models, /session new). After each assistant response, you are returned to the same prompt where you can run a command or continue the conversation with the LLM.
Arguments for Interactive Mode
| Argument | Behavior |
|---|---|
--model-name <name> |
Sets the initial model used for inference |
--stream |
Enables streaming responses by default during interactions |
--session-name <name> |
Sets a custom name for the session |
--resume-session <name> |
Chooses an existing/prior session to be resumed |
--clone-session <name> |
Chooses an existing/prior session to be cloned. The cloned session is then resumed, leaving the original session unaltered. |
--test-connection |
Tests the connection to the inference provider before entering interactive mode. If the test fails, the harness exits instead. |
--refresh-models |
Refreshes the list of models available at the inference provider before entering interactive mode |
Commands in Interactive Mode
| Command | Description | Autocomplete Available? | Requires Tmux? |
|---|---|---|---|
/model <name> |
Switch which model is used for the next generation(s) | ✅ | ❌ |
/session new |
Start a new session with zero message history. The session name will be the current date and time. | N/A | ❌ |
/session rename <name> |
Rename the current session | N/A | ❌ |
/session resume <name> |
Resume an existing session | ✅ | ❌ |
/session clone |
Clone the current session into a new one | N/A | ❌ |
/session show-messages |
Display all the messages in the current session | N/A | ❌ |
/session edit |
Open the session log file in nvim |
N/A | ✅ |
/stream |
Toggle streaming inference responses on/off | N/A | ❌ |
/continue |
Trigger inference with the current session state | N/A | ❌ |
/editor |
Edit the prompt in nvim before sending |
N/A | ✅ |
/refresh-models |
Refresh the list of available models from the current inference provider | N/A | ❌ |
/test-connection |
Test connectivity to the inference provider | N/A | ❌ |
/promptfile <path> |
Load the contents of a file in <root harness data dir>/prompts as the next prompt and start inference |
✅ | ❌ |
One-Shot Mode
Start in one-shot mode by specifying the --one-shot flag. In one-shot mode, the harness needs some work to do: you must either specify a command (--refresh-models, test-connection) or provide a prompt to trigger inference. You can either write your prompt inline (e.g. --prompt "What do you know about LLMs?") or give the harness a path to a file containing your prompt (e.g. --prompt-file ./my_prompt.txt). After the command is executed or the agentic loop finishes, the harness exits with shell code 0.
Arguments for One-Shot Mode
| Argument | Behavior |
|---|---|
--model-name <name> |
Sets the model used for inference |
--stream |
Enables streaming responses |
--session-name <name> |
Sets a custom name for the inference session |
--resume-session <name> |
Chooses an existing/prior session to be resumed for inference |
--clone-session <name> |
Chooses an existing/prior session to be cloned and resumed for inference. The original session is not modified. |
--test-connection |
Tests the connection and then exits |
--refresh-models |
Refreshes the list of models available at the inference provider and then exits |
--prompt <message> |
Sets the prompt used for inference to "<message>", runs the agentic inference loop, and then exits. |
--prompt-file <file path> |
Sets the prompt used for inference to the contents of <file path>, runs the agentic inference loop, and then exits. |
Differences between one-shot and interactive modes
| Situation | One-Shot Mode | Interactive Mode |
|---|---|---|
| No model specified as command-line argument and no previous model saved in state directory | The harness prints a warning message to stderr and then exits. |
The user is prompted to chose one of the known models to be used before they can input a prompt or run a command. |
The list of known models is empty and --refresh-models is not present as a command-line argument |
The harness prints a warning to stderr and then exits. |
The user is asked if the harness can refresh the model list. If they refuse, the harness exits. If they accept, the list is fetched from the inference provider. If the previous model is among the list and no model was chosen via command-line argument, that model is chosen by the harness. If the model chosen via command-line argument is among the list, that model is chosen by the harness. Otherwise, the harness prompts the user to chose from the list before allowing them to input a prompt or run commands. |
Many of the commands in interactive mode have a similar command-line flag that can be used in one-shot mode, though their syntax and behavior are not always exactly equivalent.
| Interactive Command | One-Shot Flag | Behavior Differences |
|---|---|---|
/test-connection |
--test-connection |
In interactive mode, the connection test is run and then the harness properly enters interactive mode. In one-shot mode, the connection test is run and then the harness exits. |
/refresh-models |
--refresh-models |
In interactive mode, the model list is refreshed and then the harness properly enters interactive mode. In one-shot mode, the model list is refreshed and then the harness exits. |
/stream |
--stream |
In interactive mode, this toggles between on and off. In one-shot mode, streaming is on if the flag is present, off otherwise. |
/promptfile <path> |
--prompt-file <path> |
In interactive mode, paths are considered relative to <root harness data dir>/prompts. In one-shot mode, paths are absolute or relative to the current working directory. |
/session resume <name> |
--resume-session <name> |
(None) |
/session clone |
--clone-session <name> |
In interactive mode, this clones the current session. In one-shot mode, this clones the given named session. |
The main difference in options is with renaming an existing session. In interactive mode, you can use /session rename <name> to change the name of the current session. One-shot mode has no direct equivalent; you cannot change the name of an existing session. The closest approximation is to combine --clone-session and --session-name flags. Note that this is equivalent to running the following two commands in interactive mode in succession: /session clone, then /session rename <name>.
Environment Variables
The following settings can be overridden by setting environment variables and/or preparing a .env file (python-dotenv is used to load the values found in any surrounding .env file):
| Env var name | Setting / Description | Expected value format | Default value |
|---|---|---|---|
HARNESS_ROOT_DIRS_NAME |
the name used for the harness state and data directories | any valid directory name | "jayjaders-llm-harness" |
XDG_STATE_HOME |
the parent directory in which the harness creates its state directory (HARNESS_ROOT_DIRS_NAME) to store state that is to be resumed on next run (like the current model) |
any valid path | ~/.local/state on Linux, ~/Library/Application Support on macOS |
XDG_DATA_HOME |
the parent directory in which the harness creates its data directory (HARNESS_ROOT_DIRS_NAME) to store longer-term harness data (like the session message logs and reusable prompts) |
any valid path | ~/.local/share on Linux, ~/Library/Application Support on macOS |
OLLAMA_API_URL |
host address of the inference provider used | any valid url | "http://localhost:11434" |
HARNESS_INFERENCE_PROVIDER |
the software running as inference provider | "ollama" or "lm-studio" or "lemonade" |
"ollama" |
SANDBOX_CONTAINER_RUNTIME_EXECUTABLE |
the container engine used on the host machine for sandboxing command execution | "podman" or "docker" |
"podman" |
SANDBOX_CONTAINER_IMAGE |
the container image used for sandboxing command execution | any non-empty string that resolves to a container image | "alpine" |
SANDBOX_CONTAINER_WORKSPACE_VOLUME_NAME |
the container volume used for receiving and persisting file writes caused by sandboxed command execution | any non-empty string that resolves to an existing container volume | "llm-harness-tool-workspace" |
Models
The harness will attempt to read the list of available models from the API on first startup. It saves this list in a file named models.jsonl and reads from it on subsequent startups. The harness will force you to choose a model if it cannot find the model that was used when it last ran. To force a refresh of the list of available models from within a running instance of the harness, use the /refresh-models command. To force a refresh at the start of interactive mode, pass the --refresh-models flag. To just refresh the list of models and then exit, pass the --refresh-models flag in one-shot mode.
The harness will attempt to determine which models support tool-calling when fetching/refreshing the list, depending on the declared type for the inference provider.
Local Files
The harness stores information in several files on-disk:
| Location / File Name | Information Stored |
|---|---|
$XDG_STATE_HOME/$HARNESS_ROOT_DIRS_NAME/models.jsonl |
The list of models available at the current inference provider |
$XDG_STATE_HOME/$HARNESS_ROOT_DIRS_NAME/previous_model_name |
The name of the last model used by the harness for inference |
$XDG_STATE_HOME/$HARNESS_ROOT_DIRS_NAME/logs/<YYYY-MM-DD_HH-MM-SS> |
The logs for harness runs, with the date time being when the harness is started |
$XDG_DATA_HOME/$HARNESS_ROOT_DIRS_NAME/sessions/session_<name>.jsonl |
The past messages for each session - these are the single source of truth for session contents as well as which sessions the harness knows about |
Development
You might find it useful to install the additional dev dependencies after creating the virtualenv, and to install the project in "editable" mode:
.venv/bin/pip install -e '.[dev]'
"Editable" mode allows you to test source code modifications without re-running the installation command, while the dev dependencies are required for:
- running tests (with
pytest) - (re)formatting source code (with
ruff)
Sandboxing
The harness relies on a docker or podman container to sandbox the arbitrary shell command execution tool that it proposes to LLMs. In theory, any OCI-compatible engine should work, but for now the only supported tools are those that allow running as if they were podman (the exact invocation the harness performs is along the lines of <engine> run -u <user id> -v <volume name>:<mountpoint inside container> -w <working directory> <image name> and can be found in ./tools.py). The harness also relies on a named volume being mounted inside the running container to provide a writable directory to the agent; not only allowing the agent to write to files and the like but, equally importantly, letting the user access any files created by the agent's tool calls once the sandboxing container shuts down.