Workflow Design

Enki's workflow model is built as a Rust-managed DAG runtime that sits beside the agent loop, not on top of prompt-only orchestration. The design goal is to make long-running multi-step work resumable, inspectable, and deterministic while still letting agents do the actual task execution.

Design goals

Keep workflow structure explicit instead of embedding control flow in prompts
Separate orchestration state from agent execution state
Make runs resumable through persisted state and event logs
Support both direct agent targeting and capability-based routing
Allow human gates, transforms, and branching without leaving the runtime
Expose the same core workflow model to Rust, Python, and JavaScript

Core architecture

The workflow system is split across three layers inside the Rust core:

types.rs: serializable definitions for tasks, nodes, edges, run state, events, and interventions
runtime.rs: the workflow builder, validation, run driver, node execution, branching, retries, and intervention handling
persistence.rs: on-disk storage for workflow definitions, run state snapshots, interventions, task workspaces, and event logs

This keeps the workflow model declarative while centralizing execution semantics in one runtime.

Workflow model

The runtime works with a few core concepts:

TaskDefinition: a reusable task template with a target, prompt, bindings, transforms, retry policy, and failure policy
WorkflowDefinition: a DAG of nodes and edges plus workflow-level retry and failure behavior
WorkflowNodeKind: node behavior, currently task, decision, human_gate, transform, and join
WorkflowEdgeTransition: edge activation rules, currently always, on_success, on_failure, and condition
WorkflowContext: the shared structured value map that carries workflow input and node outputs forward

This model lets Enki keep orchestration logic in typed data instead of re-deriving the next step from a model response.

Builder pattern

Workflows are assembled through WorkflowRuntimeBuilder:

add reusable tasks
add workflow definitions
register transforms
configure the workspace home
inject a task runner
optionally attach an event listener

The builder registers built-in transforms up front:

identity
extract_content

It also validates the full workflow set before the runtime becomes usable, which catches graph and reference issues early.

Execution model

A workflow run starts from a WorkflowRequest with a workflow_id and structured input. The runtime:

loads the workflow definition
initializes a run id and persisted run directory
seeds the workflow context with input
creates NodeRunState entries for every node
drives the DAG until the run is completed, paused, failed, or completed with failures

The run loop is stateful rather than prompt-driven. It resolves ready nodes, executes them, records outputs into context, evaluates outgoing edges, and advances downstream nodes based on transition rules.

Task execution boundary

The workflow runtime does not execute agent prompts directly. Instead, it delegates task nodes through a WorkflowTaskRunner.

That separation is important:

workflow orchestration owns graph control, retries, branching, and persistence
the task runner owns how a task is actually carried out by an agent or runtime

In practice, Enki wires this to the multi-agent runtime, which means workflow tasks can target:

a specific agent_id
a set of capabilities

This lets workflows route work declaratively while still reusing the same agent runtime primitives.

Context, bindings, and transforms

Workflows pass structured data forward through WorkflowContext.

The current implementation supports:

input_bindings to map workflow input or prior outputs into task prompt variables
input_transform to reshape task input before execution
output_transform to normalize task output before it is stored
node output_key values so downstream nodes can reference prior results

This pattern gives workflows dataflow semantics rather than forcing every step to parse raw model text.

Failure and retry design

Both tasks and workflows can define:

RetryPolicy
WorkflowFailurePolicy

Current failure policies are:

continue_best_effort
fail_workflow
pause_for_intervention

This lets the runtime distinguish between recoverable node failure, terminal workflow failure, and runs that should stop for human input before continuing.

Human intervention

Human gates are first-class workflow nodes, and failures can also escalate into interventions.

The runtime records pending interventions with:

workflow id
run id
node id
prompt
reason
response
resolution timestamps

This gives Enki a resumable human-in-the-loop mechanism without breaking the workflow run into ad hoc external state.

Persistence model

Workflow state is stored under the workflow workspace rooted at:

<workspace_home>/.atomiagent/workflows

Each run gets its own directory with persisted artifacts such as:

workflow.json for the snapshot of the definition used for that run
state.json for current run state
events.jsonl for the append-only event log
interventions.json for pending and resolved intervention records
tasks/<node_id>/ for per-node task workspace data

This storage model is what makes inspect, list_runs, and resume possible.

Observability

The runtime emits structured workflow events such as:

workflow started
node ready
node started
node completed
node failed
retry scheduled
node skipped
intervention requested
intervention resolved
workflow paused
workflow completed

These events are suitable for CLIs, SDK surfaces, and future monitoring UIs because they expose orchestration state directly rather than hiding it in free-form logs.

Relationship to the Builder CLI

The enki builder CLI is the main manifest-driven interface for local workflow use, but it is a client of the same runtime design rather than a separate implementation.

That means:

the CLI defines tasks and workflows in TOML
the Rust runtime validates and executes the DAG
persisted runs can be inspected and resumed through runtime-backed commands

The current CLI intentionally limits some features, but the underlying workflow runtime is broader and is also exposed through Rust, Python, and JavaScript bindings.

Language binding strategy

Workflow bindings mirror the same Rust workflow engine instead of reimplementing orchestration per language.

Today that means:

Rust exposes WorkflowRuntime directly
Python exposes EnkiWorkflowRuntime
JavaScript and TypeScript expose NativeWorkflowRuntime

All of them share the same runtime semantics for graph validation, persistence, branching, interventions, and resumed execution.

Example workflow shape

A common pattern is:

research node gathers structured notes
transform node extracts the normalized content
decision node checks whether the output is usable
writer node produces a final summary
join node merges converging paths

That structure is a good fit for workflows because orchestration decisions stay explicit in the graph while agents remain focused on individual tasks.

Design goals​

Core architecture​

Workflow model​

Builder pattern​

Execution model​

Task execution boundary​

Context, bindings, and transforms​

Failure and retry design​

Human intervention​

Persistence model​

Observability​

Relationship to the Builder CLI​

Language binding strategy​

Example workflow shape​

Related docs​