Scheduling Guide
๐ Other Languages
| English | ไธญๆ็ฎไฝ | Espaรฑol | Portuguรชs | ๆฅๆฌ่ช | Deutsch |
Overview
Symbiontโs scheduling system provides production-grade cron-based task execution for AI agents. The system supports:
- Cron schedules: Traditional cron syntax for recurring tasks
- One-shot jobs: Run once at a specific time
- Heartbeat pattern: Continuous assessment-action-sleep cycles for monitoring agents
- Session isolation: Ephemeral, shared, or fully isolated agent contexts
- Delivery routing: Multiple output channels (Stdout, LogFile, Webhook, Slack, Email, Custom)
- Policy enforcement: Security and compliance checks before execution
- Production hardening: Jitter, concurrency limits, dead-letter queues, and AgentPin verification
Architecture
The scheduling system is built on three core components:
โโโโโโโโโโโโโโโโโโโโโโโ
โ CronScheduler โ Background tick loop (1-second intervals)
โ (Tick Loop) โ Job selection and execution orchestration
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ SqliteJobStore โ Persistent job storage
โ (Job Storage) โ Transaction support, state management
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โDefaultAgentSchedulerโ Agent execution runtime
โ (Execution Engine) โ AgentContext lifecycle management
โโโโโโโโโโโโโโโโโโโโโโโ
CronScheduler
The CronScheduler is the primary entry point. It manages:
- Background tick loop running at 1-second intervals
- Job selection based on next run time
- Concurrency control and jitter injection
- Metrics collection and health monitoring
- Graceful shutdown with in-flight job tracking
SqliteJobStore
The SqliteJobStore provides durable job persistence with:
- ACID transactions for job state updates
- Job lifecycle tracking (Active, Paused, Completed, Failed, DeadLetter)
- Run history with audit trail
- Query capabilities for filtering by status, agent ID, etc.
DefaultAgentScheduler
The DefaultAgentScheduler executes scheduled agents:
- Creates isolated or shared
AgentContextinstances - Manages session lifecycle (create, execute, destroy)
- Routes delivery to configured channels
- Enforces policy gates before execution
DSL Syntax
Schedule Block Structure
Schedule blocks are defined in Symbiont DSL files:
schedule {
name: "daily-report"
agent: "reporter-agent"
cron: "0 9 * * *"
session_mode: "ephemeral_with_summary"
delivery: ["stdout", "log_file"]
policy {
require_approval: false
max_runtime: "5m"
}
}
Cron Syntax
Standard cron syntax with five fields:
โโโโโโโโโโโโโโ minute (0-59)
โ โโโโโโโโโโโโ hour (0-23)
โ โ โโโโโโโโโโ day of month (1-31)
โ โ โ โโโโโโโโ month (1-12)
โ โ โ โ โโโโโโ day of week (0-6, Sunday = 0)
โ โ โ โ โ
* * * * *
Examples:
# Every day at 9 AM
cron: "0 9 * * *"
# Every Monday at 6 PM
cron: "0 18 * * 1"
# Every 15 minutes
cron: "*/15 * * * *"
# First day of every month at midnight
cron: "0 0 1 * *"
One-Shot Jobs (At Syntax)
For jobs that run once at a specific time:
schedule {
name: "deployment-check"
agent: "health-checker"
at: "2026-02-15T14:30:00Z" # ISO 8601 timestamp
delivery: ["webhook"]
webhook_url: "https://ops.example.com/hooks/deployment"
}
Heartbeat Pattern
For continuous monitoring agents that assess โ act โ sleep:
schedule {
name: "system-monitor"
agent: "heartbeat-agent"
cron: "*/5 * * * *" # Wake every 5 minutes
heartbeat: {
enabled: true
context_mode: "ephemeral_with_summary"
max_iterations: 100 # Safety limit
}
}
The heartbeat agent follows this cycle:
- Assessment: Evaluate system state (e.g., check metrics, logs)
- Action: Take corrective action if needed (e.g., restart service, alert ops)
- Sleep: Wait until next scheduled tick
CLI Commands
The symbi cron command provides full lifecycle management:
List Jobs
# List all jobs
symbi cron list
# Filter by status
symbi cron list --status active
symbi cron list --status paused
# Filter by agent
symbi cron list --agent "reporter-agent"
# JSON output
symbi cron list --format json
Add Job
# From DSL file
symbi cron add --file agent.symbi --schedule "daily-report"
# Inline definition (JSON)
symbi cron add --json '{
"name": "quick-task",
"agent_id": "agent-123",
"cron_expr": "0 * * * *"
}'
Remove Job
# By job ID
symbi cron remove <job-id>
# By name
symbi cron remove --name "daily-report"
# Force remove (skip confirmation)
symbi cron remove <job-id> --force
Pause/Resume
# Pause job (stops scheduling, preserves state)
symbi cron pause <job-id>
# Resume paused job
symbi cron resume <job-id>
Status
# Job details with next run time
symbi cron status <job-id>
# Include last 10 run records
symbi cron status <job-id> --history 10
# Watch mode (auto-refresh every 5s)
symbi cron status <job-id> --watch
Run Now
# Trigger immediate execution (bypasses schedule)
symbi cron run <job-id>
# With custom input
symbi cron run <job-id> --input "Check production database"
History
# View run history for a job
symbi cron history <job-id>
# Last 20 runs
symbi cron history <job-id> --limit 20
# Filter by status
symbi cron history <job-id> --status failed
# Export to CSV
symbi cron history <job-id> --format csv > runs.csv
Heartbeat Pattern
HeartbeatContextMode
Controls how context persists across heartbeat iterations:
pub enum HeartbeatContextMode {
/// Fresh context each iteration, append summary to run history
EphemeralWithSummary,
/// Shared context across all iterations (memory accumulates)
SharedPersistent,
/// Fresh context each iteration, no summary (stateless)
FullyEphemeral,
}
EphemeralWithSummary (default):
- New
AgentContextper iteration - Summary of previous iteration appended to context
- Prevents unbounded memory growth
- Maintains continuity for related actions
SharedPersistent:
- Single
AgentContextreused across all iterations - Full conversation history preserved
- Higher memory usage
- Best for agents needing deep context (e.g., debugging sessions)
FullyEphemeral:
- New
AgentContextper iteration, no carryover - Lowest memory footprint
- Best for independent checks (e.g., API health probes)
Heartbeat Agent Example
agent heartbeat_monitor {
model: "claude-sonnet-4.5"
system_prompt: """
You are a system monitoring agent. On each heartbeat:
1. Check system metrics (CPU, memory, disk)
2. Review recent error logs
3. If issues detected, take action:
- Restart services if safe
- Alert ops team via Slack
- Log incident details
4. Summarize findings
5. Return 'sleep' when done
"""
}
schedule {
name: "heartbeat-monitor"
agent: "heartbeat_monitor"
cron: "*/10 * * * *" # Every 10 minutes
heartbeat: {
enabled: true
context_mode: "ephemeral_with_summary"
max_iterations: 50
}
delivery: ["log_file", "slack"]
slack_channel: "#ops-alerts"
}
Session Isolation
Session Modes
pub enum SessionIsolationMode {
/// Ephemeral context with summary carryover (default)
EphemeralWithSummary,
/// Shared persistent context across all runs
SharedPersistent,
/// Fully ephemeral, no state carryover
FullyEphemeral,
}
Configuration:
schedule {
name: "data-pipeline"
agent: "etl-agent"
cron: "0 2 * * *"
# Fresh context per run, summary of previous run included
session_mode: "ephemeral_with_summary"
}
Session Lifecycle
For each scheduled execution:
- Pre-execution: Check concurrency limits, apply jitter
- Session creation: Create
AgentContextbased onsession_mode - Policy gate: Evaluate policy conditions
- Execution: Run agent with input and context
- Delivery: Route output to configured channels
- Session cleanup: Destroy or persist context based on mode
- Post-execution: Update run record, collect metrics
Delivery Routing
Supported Channels
pub enum DeliveryChannel {
Stdout, // Print to console
LogFile, // Append to job-specific log file
Webhook, // HTTP POST to URL
Slack, // Slack webhook or API
Email, // SMTP email
Custom(String), // User-defined channel
}
Configuration Examples
Single channel:
schedule {
name: "backup"
agent: "backup-agent"
cron: "0 3 * * *"
delivery: ["log_file"]
}
Multiple channels:
schedule {
name: "security-scan"
agent: "scanner"
cron: "0 1 * * *"
delivery: ["log_file", "slack", "email"]
slack_channel: "#security"
email_recipients: ["ops@example.com", "security@example.com"]
}
Webhook delivery:
schedule {
name: "metrics-report"
agent: "metrics-agent"
cron: "*/30 * * * *"
delivery: ["webhook"]
webhook_url: "https://metrics.example.com/ingest"
webhook_headers: {
"Authorization": "Bearer ${METRICS_API_KEY}"
"Content-Type": "application/json"
}
}
DeliveryRouter Trait
Custom delivery channels implement:
#[async_trait]
pub trait DeliveryRouter: Send + Sync {
async fn route(
&self,
channel: &DeliveryChannel,
job: &CronJobDefinition,
run: &JobRunRecord,
output: &str,
) -> Result<(), SchedulerError>;
}
Policy Enforcement
PolicyGate
The PolicyGate evaluates schedule-specific policies before execution:
pub struct PolicyGate {
policy_engine: Arc<RealPolicyParser>,
}
impl PolicyGate {
pub async fn evaluate(
&self,
job: &CronJobDefinition,
context: &AgentContext,
) -> Result<SchedulePolicyDecision, SchedulerError>;
}
Policy Conditions
schedule {
name: "production-deploy"
agent: "deploy-agent"
cron: "0 0 * * 0" # Sunday midnight
policy {
# Require human approval before execution
require_approval: true
# Maximum runtime before forced termination
max_runtime: "30m"
# Require specific capabilities
require_capabilities: ["deployment", "production_write"]
# Time window enforcement (UTC)
allowed_hours: {
start: "00:00"
end: "04:00"
}
# Environment restrictions
allowed_environments: ["staging", "production"]
# AgentPin verification required
require_agent_pin: true
}
}
SchedulePolicyDecision
pub enum SchedulePolicyDecision {
Allow,
Deny { reason: String },
RequireApproval { approvers: Vec<String> },
}
Production Hardening
Jitter
Prevents thundering herd when multiple jobs share a schedule:
pub struct CronSchedulerConfig {
pub max_jitter_seconds: u64, // Random delay 0-N seconds
// ...
}
Example:
[scheduler]
max_jitter_seconds = 30 # Spread job starts across 30-second window
Per-Job Concurrency
Limit concurrent runs per job to prevent resource exhaustion:
schedule {
name: "data-sync"
agent: "sync-agent"
cron: "*/5 * * * *"
max_concurrent: 2 # Allow max 2 concurrent runs
}
If a job is already running at max concurrency, the scheduler skips the tick.
Dead-Letter Queue
Jobs exceeding max_retries move to DeadLetter status for manual review:
schedule {
name: "flaky-job"
agent: "unreliable-agent"
cron: "0 * * * *"
max_retries: 3 # After 3 failures, move to dead-letter
}
Recovery:
# List dead-lettered jobs
symbi cron list --status dead_letter
# Review failure reasons
symbi cron history <job-id> --status failed
# Reset job to active after fixing
symbi cron reset <job-id>
AgentPin Verification
Cryptographically verify agent identity before execution:
schedule {
name: "secure-task"
agent: "trusted-agent"
cron: "0 * * * *"
agent_pin_jwt: "${AGENT_PIN_JWT}" # ES256 JWT from agentpin-cli
policy {
require_agent_pin: true
}
}
The scheduler verifies:
- JWT signature using ES256 (ECDSA P-256)
- Agent ID matches
issclaim - Domain anchor matches expected origin
- Expiry (
exp) is valid
Failures trigger SecurityEventType::AgentPinVerificationFailed audit event.
HTTP API Endpoints
Schedule Management
POST /api/v1/schedule Create a new scheduled job.
curl -X POST http://localhost:8080/api/v1/schedule \
-H "Content-Type: application/json" \
-d '{
"name": "hourly-report",
"agent_id": "reporter",
"cron_expr": "0 * * * *",
"session_mode": "ephemeral_with_summary",
"delivery": ["stdout"]
}'
GET /api/v1/schedule List all jobs (filterable by status, agent ID).
curl "http://localhost:8080/api/v1/schedule?status=active&agent_id=reporter"
GET /api/v1/schedule/{job_id} Get job details.
curl http://localhost:8080/api/v1/schedule/job-123
PUT /api/v1/schedule/{job_id} Update job (cron expression, delivery, etc.).
curl -X PUT http://localhost:8080/api/v1/schedule/job-123 \
-H "Content-Type: application/json" \
-d '{"cron_expr": "0 */2 * * *"}'
DELETE /api/v1/schedule/{job_id} Delete job.
curl -X DELETE http://localhost:8080/api/v1/schedule/job-123
POST /api/v1/schedule/{job_id}/pause Pause job.
curl -X POST http://localhost:8080/api/v1/schedule/job-123/pause
POST /api/v1/schedule/{job_id}/resume Resume paused job.
curl -X POST http://localhost:8080/api/v1/schedule/job-123/resume
POST /api/v1/schedule/{job_id}/run Trigger immediate execution.
curl -X POST http://localhost:8080/api/v1/schedule/job-123/run \
-H "Content-Type: application/json" \
-d '{"input": "Run with custom input"}'
GET /api/v1/schedule/{job_id}/history Get run history.
curl "http://localhost:8080/api/v1/schedule/job-123/history?limit=20&status=failed"
GET /api/v1/schedule/{job_id}/next_run Get next scheduled run time.
curl http://localhost:8080/api/v1/schedule/job-123/next_run
Health Monitoring
GET /api/v1/health/scheduler Scheduler health and metrics.
curl http://localhost:8080/api/v1/health/scheduler
Response:
{
"status": "healthy",
"active_jobs": 15,
"paused_jobs": 3,
"in_flight_jobs": 2,
"metrics": {
"runs_total": 1234,
"runs_succeeded": 1180,
"runs_failed": 54,
"avg_execution_time_ms": 850
}
}
SDK Examples
JavaScript SDK
import { SymbiontClient } from '@symbiont/sdk-js';
const client = new SymbiontClient({
baseUrl: 'http://localhost:8080',
apiKey: process.env.SYMBI_API_KEY
});
// Create scheduled job
const job = await client.schedule.create({
name: 'daily-backup',
agentId: 'backup-agent',
cronExpr: '0 2 * * *',
sessionMode: 'ephemeral_with_summary',
delivery: ['webhook'],
webhookUrl: 'https://backup.example.com/notify'
});
console.log(`Created job: ${job.id}`);
// List active jobs
const activeJobs = await client.schedule.list({ status: 'active' });
console.log(`Active jobs: ${activeJobs.length}`);
// Get job status
const status = await client.schedule.getStatus(job.id);
console.log(`Next run: ${status.next_run}`);
// Trigger immediate run
await client.schedule.runNow(job.id, { input: 'Backup database' });
// Pause job
await client.schedule.pause(job.id);
// View history
const history = await client.schedule.getHistory(job.id, { limit: 10 });
history.forEach(run => {
console.log(`Run ${run.id}: ${run.status} (${run.execution_time_ms}ms)`);
});
// Resume job
await client.schedule.resume(job.id);
// Delete job
await client.schedule.delete(job.id);
Python SDK
from symbiont import SymbiontClient
client = SymbiontClient(
base_url='http://localhost:8080',
api_key=os.environ['SYMBI_API_KEY']
)
# Create scheduled job
job = client.schedule.create(
name='hourly-metrics',
agent_id='metrics-agent',
cron_expr='0 * * * *',
session_mode='ephemeral_with_summary',
delivery=['slack', 'log_file'],
slack_channel='#metrics'
)
print(f"Created job: {job.id}")
# List jobs for specific agent
jobs = client.schedule.list(agent_id='metrics-agent')
print(f"Found {len(jobs)} jobs for metrics-agent")
# Get job details
details = client.schedule.get(job.id)
print(f"Cron: {details.cron_expr}")
print(f"Next run: {details.next_run}")
# Update cron expression
client.schedule.update(job.id, cron_expr='*/30 * * * *')
# Trigger immediate run
run = client.schedule.run_now(job.id, input='Generate metrics report')
print(f"Run ID: {run.id}")
# Pause during maintenance
client.schedule.pause(job.id)
print("Job paused for maintenance")
# View recent failures
history = client.schedule.get_history(
job.id,
status='failed',
limit=5
)
for run in history:
print(f"Failed run {run.id}: {run.error_message}")
# Resume after maintenance
client.schedule.resume(job.id)
# Check scheduler health
health = client.schedule.health()
print(f"Scheduler status: {health.status}")
print(f"Active jobs: {health.active_jobs}")
print(f"In-flight jobs: {health.in_flight_jobs}")
Configuration
CronSchedulerConfig
pub struct CronSchedulerConfig {
/// Tick interval in seconds (default: 1)
pub tick_interval_seconds: u64,
/// Maximum jitter to prevent thundering herd (default: 0)
pub max_jitter_seconds: u64,
/// Global concurrency limit (default: 10)
pub max_concurrent_jobs: usize,
/// Enable metrics collection (default: true)
pub enable_metrics: bool,
/// Dead-letter retry threshold (default: 3)
pub default_max_retries: u32,
/// Graceful shutdown timeout (default: 30s)
pub shutdown_timeout_seconds: u64,
}
TOML Configuration
[scheduler]
tick_interval_seconds = 1
max_jitter_seconds = 30
max_concurrent_jobs = 20
enable_metrics = true
default_max_retries = 3
shutdown_timeout_seconds = 60
[scheduler.delivery]
# Webhook settings
webhook_timeout_seconds = 30
webhook_retry_attempts = 3
# Slack settings
slack_api_token = "${SLACK_API_TOKEN}"
slack_default_channel = "#ops"
# Email settings
smtp_host = "smtp.example.com"
smtp_port = 587
smtp_username = "${SMTP_USER}"
smtp_password = "${SMTP_PASS}"
email_from = "symbiont@example.com"
Environment Variables
# Scheduler settings
SYMBI_SCHEDULER_MAX_JITTER=30
SYMBI_SCHEDULER_MAX_CONCURRENT=20
# Delivery settings
SYMBI_SLACK_TOKEN=xoxb-...
SYMBI_WEBHOOK_AUTH_HEADER="Bearer secret-token"
# AgentPin verification
SYMBI_AGENTPIN_REQUIRED=true
SYMBI_AGENTPIN_DOMAIN=agent.example.com
Observability
Metrics (Prometheus-compatible)
# Total runs
symbiont_cron_runs_total{job_name="daily-report",status="succeeded"} 450
# Failed runs
symbiont_cron_runs_total{job_name="daily-report",status="failed"} 5
# Execution time histogram
symbiont_cron_execution_duration_seconds{job_name="daily-report"} 1.234
# In-flight jobs gauge
symbiont_cron_in_flight_jobs 3
# Dead-lettered jobs
symbiont_cron_dead_letter_total{job_name="flaky-job"} 2
Audit Events
All scheduler actions emit security events:
pub enum SecurityEventType {
CronJobCreated,
CronJobUpdated,
CronJobDeleted,
CronJobPaused,
CronJobResumed,
CronJobExecuted,
CronJobFailed,
CronJobDeadLettered,
AgentPinVerificationFailed,
}
Query audit log:
symbi audit query --type CronJobFailed --since "2026-02-01" --limit 50
Best Practices
- Use jitter for shared schedules: Prevent multiple jobs from starting simultaneously
- Set concurrency limits: Protect against resource exhaustion
- Monitor dead-letter queue: Review and fix failing jobs regularly
- Use EphemeralWithSummary: Prevents unbounded memory growth in long-running heartbeats
- Enable AgentPin verification: Cryptographically verify agent identity
- Configure delivery routing: Use appropriate channels for different job types
- Set policy gates: Enforce time windows, approvals, and capability checks
- Use heartbeat pattern for monitoring: Continuous assessment-action-sleep cycles
- Test schedules in staging: Validate cron expressions and job logic before production
- Export metrics: Integrate with Prometheus/Grafana for operational visibility
Troubleshooting
Job Not Running
- Check job status:
symbi cron status <job-id> - Verify cron expression: Use crontab.guru
- Check scheduler health:
curl http://localhost:8080/api/v1/health/scheduler - Review logs:
symbi logs --filter scheduler --level debug
Job Failing Repeatedly
- View history:
symbi cron history <job-id> --status failed - Check error messages in run records
- Verify agent configuration and capabilities
- Test agent outside scheduler:
symbi run <agent-id> --input "test" - Check policy gates: Ensure time windows and capabilities match
Dead-Lettered Job
- List dead-letter jobs:
symbi cron list --status dead_letter - Review failure pattern:
symbi cron history <job-id> - Fix root cause (agent code, permissions, external dependencies)
- Reset job:
symbi cron reset <job-id>
High Memory Usage
- Check session mode: Switch to
ephemeral_with_summaryorfully_ephemeral - Reduce heartbeat iterations: Lower
max_iterations - Monitor context size: Review agent output verbosity
- Enable context archiving: Configure retention policies
Migration from v0.9.0
The v1.0.0 release adds production hardening features. Update your job definitions:
schedule {
name: "my-job"
agent: "my-agent"
cron: "0 * * * *"
+
+ # Add concurrency limit
+ max_concurrent: 2
+
+ # Add AgentPin for identity verification
+ agent_pin_jwt: "${AGENT_PIN_JWT}"
+
+ policy {
+ require_agent_pin: true
+ }
}
Update configuration:
[scheduler]
tick_interval_seconds = 1
+ max_jitter_seconds = 30
+ default_max_retries = 3
+ shutdown_timeout_seconds = 60
No breaking API changes. All v0.9.0 jobs continue to work.