Policy-Driven Router for SLM-First Architecture

Overview

This document outlines the implementation of a comprehensive Policy-Driven Routing system for Symbiont’s SLM-first architecture. The system intelligently routes requests between Small Language Models (SLMs) and Large Language Models (LLMs) based on configurable policies, task classification, confidence monitoring, and capability matching.

Design Goals

Intelligent Routing: Automatic SLM-first routing with LLM fallback based on confidence and policies
Policy-Driven Decisions: Configurable rules for routing logic with multiple policy types
Task Classification: Automatic categorization of requests into task types for optimal model selection
Confidence Monitoring: Adaptive learning system that tracks model performance and confidence
Thread-Safe Architecture: Async-first design with proper concurrency handling
Comprehensive Integration: Deep integration with scheduler, tool invocation, and model catalog systems

System Architecture

Routing Engine Overview

graph TD
    A[RoutingEngine] --> B[TaskClassifier]
    A --> C[PolicyEvaluator]
    A --> D[ConfidenceMonitor]
    A --> E[ModelCatalog]
    
    B --> F[TaskType Classification]
    C --> G[Policy Rules Engine]
    D --> H[Confidence Tracking]
    E --> I[SLM Selection]
    
    A --> J[RouteDecision]
    J --> K[SLM Execution]
    J --> L[LLM Fallback]
    
    K --> M[Confidence Evaluation]
    M --> N[Success/Retry Logic]
    M --> D

Core Components

graph LR
    A[RoutingContext] --> B[DefaultRoutingEngine]
    B --> C[evaluate_policies]
    B --> D[classify_task]
    B --> E[select_slm_model]
    B --> F[monitor_confidence]
    
    C --> G[PolicyRule Matching]
    D --> H[TaskType Assignment]
    E --> I[Model Selection]
    F --> J[Confidence Updates]
    
    B --> K[RouteDecision]
    K --> L[ModelSelection::SLM]
    K --> M[ModelSelection::LLM]

Implemented Rust Structures

Core Routing Configuration

/// Routing configuration for SLM-first architecture
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct RoutingConfig {
    /// Enable intelligent routing
    pub enabled: bool,
    /// Global routing policies
    pub policies: Vec<PolicyRule>,
    /// Confidence thresholds for routing decisions
    pub confidence_thresholds: ConfidenceConfig,
    /// Task classification settings
    pub classification: ClassificationConfig,
    /// SLM selection preferences
    pub slm_preferences: SlmPreferences,
    /// LLM fallback configuration
    pub llm_fallback: LlmFallbackConfig,
}

/// Policy rule for routing decisions
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct PolicyRule {
    pub id: String,
    pub description: String,
    pub task_types: Vec<TaskType>,
    pub conditions: Vec<PolicyCondition>,
    pub action: PolicyAction,
    pub priority: u8,
}

/// Task types for intelligent routing
#[derive(Debug, Clone, PartialEq, Eq, Hash, serde::Serialize, serde::Deserialize)]
pub enum TaskType {
    Intent,
    Extract,
    Template,
    BoilerplateCode,
    CodeGeneration,
    Reasoning,
    Analysis,
    Summarization,
    Translation,
    QA,
    Custom(String),
}

Routing Engine and Decision Types

/// Core routing engine trait
#[async_trait::async_trait]
pub trait RoutingEngine: Send + Sync {
    async fn route_request(&self, context: &RoutingContext) -> RoutingResult<RouteDecision>;
    async fn evaluate_confidence(&self, context: &RoutingContext, result: &ModelResponse) -> RoutingResult<f64>;
    fn update_config(&mut self, config: RoutingConfig) -> RoutingResult<()>;
}

/// Routing context for decision making
#[derive(Debug, Clone)]
pub struct RoutingContext {
    pub request_id: String,
    pub agent_id: AgentId,
    pub task_type: TaskType,
    pub content: String,
    pub metadata: HashMap<String, String>,
    pub timestamp: chrono::DateTime<chrono::Utc>,
}

/// Route decision output
#[derive(Debug, Clone)]
pub struct RouteDecision {
    pub selection: ModelSelection,
    pub confidence: f64,
    pub reasoning: String,
    pub policies_applied: Vec<String>,
    pub fallback_available: bool,
    pub metadata: HashMap<String, String>,
}

/// Model selection type
#[derive(Debug, Clone, PartialEq)]
pub enum ModelSelection {
    SLM { model_id: String, provider: String },
    LLM { provider_type: LlmProviderType },
    Skip { reason: String },
}

Confidence Monitoring and Policy Engine

/// Confidence monitoring system
pub struct ConfidenceMonitor {
    confidence_history: Arc<RwLock<Vec<ConfidenceEntry>>>,
    config: ConfidenceConfig,
}

#[derive(Debug, Clone)]
pub struct ConfidenceEntry {
    pub model_id: String,
    pub task_type: TaskType,
    pub confidence: f64,
    pub actual_quality: Option<f64>,
    pub timestamp: chrono::DateTime<chrono::Utc>,
    pub metadata: HashMap<String, String>,
}

/// Policy evaluation engine
pub struct PolicyEvaluator {
    rules: Vec<PolicyRule>,
    config: PolicyConfig,
}

/// Policy conditions for rule matching
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub enum PolicyCondition {
    TaskTypeEquals(TaskType),
    TaskComplexityLevel(u8),
    ContentLengthRange { min: usize, max: usize },
    RequiredCapabilities(Vec<ModelCapability>),
    TimeOfDay { start: chrono::NaiveTime, end: chrono::NaiveTime },
    AgentIdMatches(String),
    Custom { key: String, value: String },
}

/// Policy actions for routing decisions
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub enum PolicyAction {
    PreferSLM { min_confidence: f64 },
    RequireLLM { reason: String },
    Skip { reason: String },
    Custom { action: String, parameters: HashMap<String, String> },
}

Security and Validation

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SecuritySettings {
    /// Enable additional syscall filtering
    pub strict_syscall_filtering: bool,
    /// Disable debugging interfaces
    pub disable_debugging: bool,
    /// Enable audit logging
    pub enable_audit_logging: bool,
    /// Encryption requirements
    pub require_encryption: bool,
}

impl SlmFirstConfig {
    /// Validate the SLM configuration
    pub fn validate(&self) -> Result<(), ConfigError> {
        // Validate default sandbox profile exists
        if !self.sandbox_profiles.contains_key(&self.default_sandbox_profile) {
            return Err(ConfigError::InvalidValue {
                key: "slm_first.default_sandbox_profile".to_string(),
                reason: format!("Profile '{}' not found in sandbox_profiles", 
                               self.default_sandbox_profile),
            });
        }

        // Validate model definitions have unique IDs
        let mut model_ids = std::collections::HashSet::new();
        for model in &self.model_allow_lists.global_models {
            if !model_ids.insert(&model.id) {
                return Err(ConfigError::InvalidValue {
                    key: "slm_first.model_allow_lists.global_models".to_string(),
                    reason: format!("Duplicate model ID: {}", model.id),
                });
            }
        }

        // Validate agent model mappings reference existing models
        for (agent_id, model_ids) in &self.model_allow_lists.agent_model_maps {
            for model_id in model_ids {
                if !self.model_allow_lists.global_models
                    .iter().any(|m| &m.id == model_id) {
                    return Err(ConfigError::InvalidValue {
                        key: format!("slm_first.model_allow_lists.agent_model_maps.{}", agent_id),
                        reason: format!("Model ID '{}' not found in global_models", model_id),
                    });
                }
            }
        }

        // Validate sandbox profiles
        for (profile_name, profile) in &self.sandbox_profiles {
            profile.validate()
                .map_err(|e| ConfigError::InvalidValue {
                    key: format!("slm_first.sandbox_profiles.{}", profile_name),
                    reason: e.to_string(),
                })?;
        }

        Ok(())
    }

    /// Get allowed models for a specific agent
    pub fn get_allowed_models(&self, agent_id: &str) -> Vec<&ModelDefinition> {
        // Check agent-specific mappings first
        if let Some(model_ids) = self.model_allow_lists.agent_model_maps.get(agent_id) {
            self.model_allow_lists.global_models
                .iter()
                .filter(|model| model_ids.contains(&model.id))
                .collect()
        } else {
            // Fall back to all global models if no specific mapping
            self.model_allow_lists.global_models.iter().collect()
        }
    }
}

impl SandboxProfile {
    /// Validate sandbox profile configuration
    pub fn validate(&self) -> Result<(), Box<dyn std::error::Error>> {
        // Validate resource constraints
        if self.resources.max_memory_mb == 0 {
            return Err("max_memory_mb must be > 0".into());
        }
        if self.resources.max_cpu_cores <= 0.0 {
            return Err("max_cpu_cores must be > 0".into());
        }

        // Validate filesystem paths
        for path in &self.filesystem.read_paths {
            if path.is_empty() {
                return Err("read_paths cannot contain empty strings".into());
            }
        }

        // Validate process limits
        if self.process_limits.max_execution_time_seconds == 0 {
            return Err("max_execution_time_seconds must be > 0".into());
        }

        Ok(())
    }

    /// Create a secure default profile
    pub fn secure_default() -> Self {
        Self {
            resources: ResourceConstraints {
                max_memory_mb: 512,
                max_cpu_cores: 1.0,
                max_disk_mb: 100,
                gpu_access: GpuAccess::None,
                max_io_bandwidth_mbps: Some(10),
            },
            filesystem: FilesystemControls {
                read_paths: vec!["/tmp/sandbox/*".to_string()],
                write_paths: vec!["/tmp/sandbox/output/*".to_string()],
                denied_paths: vec!["/etc/*".to_string(), "/proc/*".to_string()],
                allow_temp_files: true,
                max_file_size_mb: 10,
            },
            process_limits: ProcessLimits {
                max_child_processes: 0,
                max_execution_time_seconds: 300,
                allowed_syscalls: vec!["read".to_string(), "write".to_string(), "open".to_string()],
                process_priority: 19,
            },
            network: NetworkPolicy {
                access_mode: NetworkAccessMode::None,
                allowed_destinations: vec![],
                max_bandwidth_mbps: None,
            },
            security: SecuritySettings {
                strict_syscall_filtering: true,
                disable_debugging: true,
                enable_audit_logging: true,
                require_encryption: true,
            },
        }
    }

    /// Create a standard default profile (less restrictive)
    pub fn standard_default() -> Self {
        Self {
            resources: ResourceConstraints {
                max_memory_mb: 1024,
                max_cpu_cores: 2.0,
                max_disk_mb: 500,
                gpu_access: GpuAccess::Shared { max_memory_mb: 1024 },
                max_io_bandwidth_mbps: Some(50),
            },
            filesystem: FilesystemControls {
                read_paths: vec!["/tmp/*".to_string(), "/home/sandbox/*".to_string()],
                write_paths: vec!["/tmp/*".to_string(), "/home/sandbox/*".to_string()],
                denied_paths: vec!["/etc/passwd".to_string(), "/etc/shadow".to_string()],
                allow_temp_files: true,
                max_file_size_mb: 100,
            },
            process_limits: ProcessLimits {
                max_child_processes: 5,
                max_execution_time_seconds: 600,
                allowed_syscalls: vec![], // Empty means allow all
                process_priority: 0,
            },
            network: NetworkPolicy {
                access_mode: NetworkAccessMode::Restricted,
                allowed_destinations: vec![
                    NetworkDestination {
                        host: "api.openai.com".to_string(),
                        port: Some(443),
                        protocol: Some(NetworkProtocol::HTTPS),
                    },
                ],
                max_bandwidth_mbps: Some(100),
            },
            security: SecuritySettings {
                strict_syscall_filtering: false,
                disable_debugging: false,
                enable_audit_logging: true,
                require_encryption: false,
            },
        }
    }
}

Integration with Existing Config

The SLM configuration integrates into the existing Config struct:

/// Updated main application configuration
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Config {
    /// API configuration
    pub api: ApiConfig,
    /// Database configuration
    pub database: DatabaseConfig,
    /// Logging configuration
    pub logging: LoggingConfig,
    /// Security configuration
    pub security: SecurityConfig,
    /// Storage configuration
    pub storage: StorageConfig,
    /// SLM-first configuration
    pub slm_first: SlmFirstConfig,
}

impl Default for Config {
    fn default() -> Self {
        Self {
            api: ApiConfig::default(),
            database: DatabaseConfig::default(),
            logging: LoggingConfig::default(),
            security: SecurityConfig::default(),
            storage: StorageConfig::default(),
            slm_first: SlmFirstConfig::default(),
        }
    }
}

impl Config {
    /// Enhanced validation including SLM config
    pub fn validate(&self) -> Result<(), ConfigError> {
        // Existing validation...
        
        // Validate SLM configuration if enabled
        if self.slm_first.enabled {
            self.slm_first.validate()?;
        }
        
        Ok(())
    }
}

Example TOML Configuration

# Example symbi.toml with SLM-first configuration

[api]
port = 8080
host = "127.0.0.1"
timeout_seconds = 60
max_body_size = 16777216

[database]
qdrant_url = "http://localhost:6333"
qdrant_collection = "agent_knowledge"
vector_dimension = 1536

[logging]
level = "info"
format = "Pretty"
structured = false

[security]
enable_compression = true
enable_backups = true
enable_safety_checks = true

[storage]
context_path = "./agent_storage"
git_clone_path = "./temp_repos"
backup_path = "./backups"
max_context_size_mb = 100

# SLM-First Configuration
[slm_first]
enabled = true
default_sandbox_profile = "secure"

# Global Model Definitions
[[slm_first.model_allow_lists.global_models]]
id = "llama2-7b"
name = "Llama 2 7B"
provider = { HuggingFace = { model_path = "meta-llama/Llama-2-7b-hf" } }
capabilities = ["TextGeneration", "Reasoning"]

[slm_first.model_allow_lists.global_models.resource_requirements]
min_memory_mb = 16384
preferred_cpu_cores = 4.0

[[slm_first.model_allow_lists.global_models]]
id = "codellama-7b"
name = "Code Llama 7B"
provider = { HuggingFace = { model_path = "codellama/CodeLlama-7b-hf" } }
capabilities = ["CodeGeneration", "TextGeneration"]

[slm_first.model_allow_lists.global_models.resource_requirements]
min_memory_mb = 16384
preferred_cpu_cores = 4.0
gpu_requirements = { min_vram_mb = 8192, compute_capability = "7.0" }

# Agent-Specific Model Mappings
[slm_first.model_allow_lists.agent_model_maps]
"security_scanner" = ["llama2-7b"]
"code_generator" = ["codellama-7b", "llama2-7b"]
"data_processor" = ["llama2-7b"]

[slm_first.model_allow_lists]
allow_runtime_overrides = false

# Sandbox Profile Definitions
[slm_first.sandbox_profiles.secure]
[slm_first.sandbox_profiles.secure.resources]
max_memory_mb = 512
max_cpu_cores = 1.0
max_disk_mb = 100
gpu_access = "None"
max_io_bandwidth_mbps = 10

[slm_first.sandbox_profiles.secure.filesystem]
read_paths = ["/tmp/sandbox/*"]
write_paths = ["/tmp/sandbox/output/*"]
denied_paths = ["/etc/*", "/proc/*", "/sys/*"]
allow_temp_files = true
max_file_size_mb = 10

[slm_first.sandbox_profiles.secure.process_limits]
max_child_processes = 0
max_execution_time_seconds = 300
allowed_syscalls = ["read", "write", "open", "close", "mmap", "munmap"]
process_priority = 19

[slm_first.sandbox_profiles.secure.network]
access_mode = "None"
allowed_destinations = []

[slm_first.sandbox_profiles.secure.security]
strict_syscall_filtering = true
disable_debugging = true
enable_audit_logging = true
require_encryption = true

# Standard Profile (Less Restrictive)
[slm_first.sandbox_profiles.standard]
[slm_first.sandbox_profiles.standard.resources]
max_memory_mb = 1024
max_cpu_cores = 2.0
max_disk_mb = 500
gpu_access = { Shared = { max_memory_mb = 1024 } }
max_io_bandwidth_mbps = 50

[slm_first.sandbox_profiles.standard.filesystem]
read_paths = ["/tmp/*", "/home/sandbox/*"]
write_paths = ["/tmp/*", "/home/sandbox/*"]
denied_paths = ["/etc/passwd", "/etc/shadow"]
allow_temp_files = true
max_file_size_mb = 100

[slm_first.sandbox_profiles.standard.process_limits]
max_child_processes = 5
max_execution_time_seconds = 600
allowed_syscalls = []  # Empty means allow all
process_priority = 0

[slm_first.sandbox_profiles.standard.network]
access_mode = "Restricted"
max_bandwidth_mbps = 100

[[slm_first.sandbox_profiles.standard.network.allowed_destinations]]
host = "api.openai.com"
port = 443
protocol = "HTTPS"

[[slm_first.sandbox_profiles.standard.network.allowed_destinations]]
host = "huggingface.co"
port = 443
protocol = "HTTPS"

[slm_first.sandbox_profiles.standard.security]
strict_syscall_filtering = false
disable_debugging = false
enable_audit_logging = true
require_encryption = false

Design Rationale

1. Hierarchical Model Allow Lists

Problem: Different agents need access to different models based on their function and security requirements.

Solution: Three-tier hierarchy:

Global Models: System-wide model definitions with capabilities and requirements
Agent-Specific Mappings: Per-agent model access control
Runtime Overrides: Optional API-based dynamic reconfiguration

Benefits:

Centralized model management
Granular access control
Operational flexibility
Clear audit trail

2. Comprehensive Sandbox Profiles

Problem: SLM runners need strict resource and security constraints to prevent abuse.

Solution: Named profiles with full resource, filesystem, process, and network controls.

Benefits:

Reusable security configurations
Defense in depth
Resource predictability
Clear security boundaries

3. Configuration Integration

Problem: New features must integrate seamlessly with existing configuration patterns.

Solution: Follow established patterns:

Serde-based serialization
Environment variable overrides
Validation with descriptive errors
Secure handling of sensitive data

Benefits:

Consistent developer experience
Backward compatibility
Operational familiarity
Maintainability

Environment Variable Overrides

The configuration supports environment variable overrides following the existing pattern:

# Enable SLM-first mode
export SLM_FIRST_ENABLED=true

# Override default sandbox profile
export SLM_FIRST_DEFAULT_SANDBOX_PROFILE=standard

# Runtime override capability
export SLM_FIRST_ALLOW_RUNTIME_OVERRIDES=true

Implementation Considerations

Performance: Model loading and sandbox creation should be lazy-loaded
Security: All filesystem paths should be canonicalized and validated
Monitoring: Resource usage should be tracked and reported
Extensibility: New model providers and sandbox features should be easy to add
Testing: Comprehensive unit tests for validation logic and edge cases

Migration Path

Add SlmFirstConfig to existing Config struct with default disabled
Implement validation logic with comprehensive error messages
Add environment variable support
Create example configurations and documentation
Implement runtime API endpoints for dynamic configuration
Add monitoring and logging for SLM operations

This design provides a robust, secure, and extensible foundation for SLM-first capabilities in Symbiont while maintaining consistency with existing patterns and practices.