Reasoning Loop Guide

Complete guide to the Symbiont agentic reasoning loop: a typestate-enforced Observe-Reason-Gate-Act (ORGA) cycle for autonomous agent behavior.

Overview
1. Design Principles
Quick Start
1. Minimal Example
2. With Tool Definitions
Phase System
1. Typestate Pattern
2. Phase Flow
Inference Providers
1. Cloud Provider (OpenRouter)
Policy Gate
1. Built-in Gates
2. Policy Denial Feedback
Action Execution
Knowledge-Reasoning Bridge
Conversation Management
1. Conversation Type
2. Token Budget Enforcement
Durable Journal
Configuration
1. LoopConfig
2. Recovery Strategies
Testing
Implementation Phases
Next Steps

Overview

The reasoning loop is the core execution engine for autonomous agents in Symbiont. It drives a multi-turn conversation between an LLM, a policy gate, and external tools through a structured cycle:

Observe — Collect results from previous tool executions
Reason — LLM produces proposed actions (tool calls or text responses)
Gate — Policy engine evaluates each proposed action
Act — Approved actions are dispatched to tool executors

The loop continues until the LLM produces a final text response, hits iteration/token limits, or times out.

Design Principles

Compile-time safety: Invalid phase transitions are caught at compile time via Rust’s type system
Opt-in complexity: The loop works with just a provider and policy gate; knowledge bridge, Cedar policies, and human-in-the-loop are all optional
Backward compatible: Adding new features (like the knowledge bridge) never breaks existing code
Observable: Every phase emits journal events and tracing spans

Quick Start

Minimal Example

use std::sync::Arc;
use symbi_runtime::reasoning::circuit_breaker::CircuitBreakerRegistry;
use symbi_runtime::reasoning::context_manager::DefaultContextManager;
use symbi_runtime::reasoning::conversation::{Conversation, ConversationMessage};
use symbi_runtime::reasoning::executor::DefaultActionExecutor;
use symbi_runtime::reasoning::loop_types::{BufferedJournal, LoopConfig};
use symbi_runtime::reasoning::policy_bridge::DefaultPolicyGate;
use symbi_runtime::reasoning::reasoning_loop::ReasoningLoopRunner;
use symbi_runtime::types::AgentId;

// Set up the runner with default components
let runner = ReasoningLoopRunner {
    provider: Arc::new(my_inference_provider),
    policy_gate: Arc::new(DefaultPolicyGate::permissive()),
    executor: Arc::new(DefaultActionExecutor::default()),
    context_manager: Arc::new(DefaultContextManager::default()),
    circuit_breakers: Arc::new(CircuitBreakerRegistry::default()),
    journal: Arc::new(BufferedJournal::new(1000)),
    knowledge_bridge: None,
};

// Build a conversation
let mut conv = Conversation::with_system("You are a helpful assistant.");
conv.push(ConversationMessage::user("What is 6 * 7?"));

// Run the loop
let result = runner.run(AgentId::new(), conv, LoopConfig::default()).await;

println!("Output: {}", result.output);
println!("Iterations: {}", result.iterations);
println!("Tokens used: {}", result.total_usage.total_tokens);

With Tool Definitions

use symbi_runtime::reasoning::inference::ToolDefinition;

let config = LoopConfig {
    max_iterations: 10,
    tool_definitions: vec![
        ToolDefinition {
            name: "web_search".into(),
            description: "Search the web for information".into(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": {
                    "query": { "type": "string" }
                },
                "required": ["query"]
            }),
        },
    ],
    ..Default::default()
};

let result = runner.run(agent_id, conv, config).await;

Phase System

Typestate Pattern

The loop uses Rust’s type system to enforce valid phase transitions at compile time. Each phase is a zero-sized type marker:

pub struct Reasoning;      // LLM produces proposed actions
pub struct PolicyCheck;    // Each action evaluated by the gate
pub struct ToolDispatching; // Approved actions executed
pub struct Observing;      // Results collected for next iteration

The AgentLoop<Phase> struct carries the loop state and can only call methods appropriate to its current phase. For example, AgentLoop<Reasoning> only exposes produce_output(), which consumes self and returns AgentLoop<PolicyCheck>.

This means the following mistakes are compile errors, not runtime bugs:

Skipping the policy check
Dispatching tools without reasoning first
Observing results without dispatching

Phase Flow

                    ┌─────────────────────────────────────────┐
                    │                                         │
                    ▼                                         │
    ┌──────────────────────┐                                  │
    │  AgentLoop<Reasoning>│                                  │
    │  produce_output()    │                                  │
    └──────────┬───────────┘                                  │
               │                                              │
               ▼                                              │
    ┌──────────────────────┐                                  │
    │ AgentLoop<PolicyCheck>│                                 │
    │  check_policy()      │                                  │
    └──────────┬───────────┘                                  │
               │                                              │
               ▼                                              │
    ┌────────────────────────────┐                            │
    │ AgentLoop<ToolDispatching> │                            │
    │  dispatch_tools()          │                            │
    └──────────┬─────────────────┘                            │
               │                                              │
               ▼                                              │
    ┌──────────────────────┐     Continue    ┌───────────┐    │
    │ AgentLoop<Observing> │───────────────▶│ Reasoning  │────┘
    │  observe_results()   │                └───────────┘
    └──────────┬───────────┘
               │ Complete
               ▼
         ┌───────────┐
         │ LoopResult │
         └───────────┘

Inference Providers

The InferenceProvider trait abstracts over LLM backends:

#[async_trait]
pub trait InferenceProvider: Send + Sync {
    async fn complete(
        &self,
        conversation: &Conversation,
        options: &InferenceOptions,
    ) -> Result<InferenceResponse, InferenceError>;

    fn provider_name(&self) -> &str;
    fn default_model(&self) -> &str;
    fn supports_native_tools(&self) -> bool;
    fn supports_structured_output(&self) -> bool;
}

Cloud Provider (OpenRouter)

The CloudInferenceProvider connects to OpenRouter (or any OpenAI-compatible endpoint):

export OPENROUTER_API_KEY="sk-or-..."
export OPENROUTER_MODEL="google/gemini-2.0-flash-001"  # optional

use symbi_runtime::reasoning::providers::cloud::CloudInferenceProvider;

let provider = CloudInferenceProvider::from_env()
    .expect("OPENROUTER_API_KEY must be set");

Policy Gate

Every proposed action passes through the policy gate before execution:

#[async_trait]
pub trait ReasoningPolicyGate: Send + Sync {
    async fn evaluate_action(
        &self,
        agent_id: &AgentId,
        action: &ProposedAction,
        state: &LoopState,
    ) -> LoopDecision;
}

pub enum LoopDecision {
    Allow,
    Deny { reason: String },
    Modify { modified_action: Box<ProposedAction>, reason: String },
}

Built-in Gates

DefaultPolicyGate::permissive() — Allows all actions (development/testing)
DefaultPolicyGate::new() — Default policy rules
OpaPolicyGateBridge — Bridges to the OPA-based policy engine
CedarGate — Cedar policy language integration

Policy Denial Feedback

When an action is denied, the denial reason is fed back to the LLM as a policy feedback observation, allowing it to adjust its approach on the next iteration.

Action Execution

ActionExecutor Trait

#[async_trait]
pub trait ActionExecutor: Send + Sync {
    async fn execute_actions(
        &self,
        actions: &[ProposedAction],
        config: &LoopConfig,
        circuit_breakers: &CircuitBreakerRegistry,
    ) -> Vec<Observation>;
}

Built-in Executors

Executor	Description
`DefaultActionExecutor`	Parallel dispatch with per-tool timeouts
`EnforcedActionExecutor`	Delegates through `ToolInvocationEnforcer` → MCP pipeline
`KnowledgeAwareExecutor`	Intercepts knowledge tools, delegates rest to inner executor

Circuit Breakers

Each tool has an associated circuit breaker that tracks failures:

Closed (normal): Tool calls proceed normally
Open (tripped): Too many consecutive failures; calls rejected immediately
Half-open (probing): Limited calls allowed to test recovery

let circuit_breakers = CircuitBreakerRegistry::new(CircuitBreakerConfig {
    failure_threshold: 3,
    recovery_timeout: Duration::from_secs(60),
    half_open_max_calls: 1,
});

Knowledge-Reasoning Bridge

The KnowledgeBridge connects the agent’s knowledge store (hierarchical memory, knowledge base, vector search) to the reasoning loop.

Setup

use symbi_runtime::reasoning::knowledge_bridge::{KnowledgeBridge, KnowledgeConfig};

let bridge = Arc::new(KnowledgeBridge::new(
    context_manager.clone(),  // Arc<dyn context::ContextManager>
    KnowledgeConfig {
        max_context_items: 5,
        relevance_threshold: 0.3,
        auto_persist: true,
    },
));

let runner = ReasoningLoopRunner {
    // ... other fields ...
    knowledge_bridge: Some(bridge),
};

How It Works

Before each reasoning step:

Search terms are extracted from recent user/tool messages
query_context() and search_knowledge() retrieve relevant items
Results are formatted and injected as a system message (replacing the previous injection)

During tool dispatch: The KnowledgeAwareExecutor intercepts two special tools:

recall_knowledge — Searches the knowledge base and returns formatted results
```
{ "query": "capital of France", "limit": 5 }
```

store_knowledge — Stores a new fact as a subject-predicate-object triple

{ "subject": "Earth", "predicate": "has", "object": "one moon", "confidence": 0.95 }

All other tool calls are delegated to the inner executor unchanged.

After loop completion: If auto_persist is enabled, the bridge extracts assistant responses and stores them as working memory for future conversations.

Backward Compatibility

Setting knowledge_bridge: None makes the runner behave identically to before — no context injection, no knowledge tools, no persistence.

Conversation Management

Conversation Type

Conversation manages an ordered sequence of messages with serialization to both OpenAI and Anthropic API formats:

let mut conv = Conversation::with_system("You are a helpful assistant.");
conv.push(ConversationMessage::user("Hello"));
conv.push(ConversationMessage::assistant("Hi there!"));

// Serialize for API calls
let openai_msgs = conv.to_openai_messages();
let (system, anthropic_msgs) = conv.to_anthropic_messages();

Token Budget Enforcement

The in-loop ContextManager (not to be confused with the knowledge ContextManager) manages the conversation token budget:

Sliding Window: Remove oldest messages first
Observation Masking: Hide verbose tool results
Anchored Summary: Keep system message + N recent messages

Durable Journal

Every phase transition emits a JournalEntry to the configured JournalWriter:

pub struct JournalEntry {
    pub sequence: u64,
    pub timestamp: DateTime<Utc>,
    pub agent_id: AgentId,
    pub iteration: u32,
    pub event: LoopEvent,
}

pub enum LoopEvent {
    Started { agent_id, config },
    ReasoningComplete { iteration, actions, usage },
    PolicyEvaluated { iteration, action_count, denied_count },
    ToolsDispatched { iteration, tool_count, duration },
    ObservationsCollected { iteration, observation_count },
    Terminated { reason, iterations, total_usage, duration },
    RecoveryTriggered { iteration, tool_name, strategy, error },
}

The default BufferedJournal stores entries in memory. Production deployments can implement JournalWriter for persistent storage.

Configuration

LoopConfig

pub struct LoopConfig {
    pub max_iterations: u32,        // Default: 25
    pub max_total_tokens: u32,      // Default: 100,000
    pub timeout: Duration,          // Default: 5 minutes
    pub default_recovery: RecoveryStrategy,
    pub tool_timeout: Duration,     // Default: 30 seconds
    pub max_concurrent_tools: usize, // Default: 10
    pub context_token_budget: usize, // Default: 8,000
    pub tool_definitions: Vec<ToolDefinition>,
}

Recovery Strategies

When tool execution fails, the loop can apply different recovery strategies:

Strategy	Description
`Retry`	Retry with exponential backoff
`Fallback`	Try alternative tools
`CachedResult`	Use a cached result if fresh enough
`LlmRecovery`	Ask the LLM to find an alternative approach
`Escalate`	Route to a human operator queue
`DeadLetter`	Give up and log the failure

Testing

Unit Tests (No API Key Required)

cargo test -j2 -p symbi-runtime --lib -- reasoning::knowledge

Integration Tests with Mock Provider

cargo test -j2 -p symbi-runtime --test knowledge_reasoning_tests

Live Tests with Real LLM

OPENROUTER_API_KEY="sk-or-..." OPENROUTER_MODEL="google/gemini-2.0-flash-001" \
  cargo test -j2 -p symbi-runtime --features http-input --test reasoning_live_tests -- --nocapture

Implementation Phases

The reasoning loop was built in five phases, each adding capabilities:

Phase	Focus	Key Components
1	Core loop	`conversation`, `inference`, `phases`, `reasoning_loop`
2	Resilience	`circuit_breaker`, `executor`, `context_manager`, `policy_bridge`
3	DSL integration	`human_critic`, `pipeline_config`, REPL builtins
4	Multi-agent	`agent_registry`, `critic_audit`, `saga`
5	Observability	`cedar_gate`, `journal`, `metrics`, `scheduler`, `tracing_spans`
Bridge	Knowledge	`knowledge_bridge`, `knowledge_executor`

Next Steps

Runtime Architecture — Full system architecture overview
Security Model — Policy enforcement and audit trails
DSL Guide — Agent definition language
API Reference — Complete API documentation