Runbear Feature Request: Agent-to-Agent Invocation & KB Search Subagents
Title: Enable Subagent Creation for Knowledge Base Search Delegation
Priority: High
Use Case: Token Budget Management & Scalability
Submitted by: Minted Analytics Team (Ask_Sam Agent)
***
Problem Statement
Our Ask_Sam agent (ff1379fe-51d1-4bd4-a870-e5ef6dc11d88) frequently hits the 1M token context limit due to:
1. 144 active tools consuming ~2,400 tokens per turn just for tool definitions
2. runbear_file_search results returning 5,000-15,000 tokens per query
3. Long-term memory (LTM) accumulating ~30,000 tokens of historical learnings
4. Multi-turn conversations requiring full context retention
Current Impact:
• Conversations terminated prematurely with "token limit exceeded" errors
• User experience disrupted mid-analysis
• Complex requests cannot be completed
***
Requested Feature: Agent-to-Agent Invocation
Core Capability:
runbear_invoke_agent(
agent_id: str,# Specialized subagent ID
task: str,# Scoped instruction
max_context_return: int,# Token limit for summary response
pass_context: bool = False# Whether to share parent context
)
Returns:
• Compressed summary (user-defined token limit)
• Full results remain in subagent's context
• Parent agent receives only the synthesized response
***
Proposed Architecture
Pattern 1: Knowledge Base Search Specialist
Ask_Sam (Parent Agent)
↓ Invokes
KB_Search_Specialist (Subagent)
- Has runbear_file_search access
- Executes 3-5 searches
- Receives 15,000 tokens of raw results
- Summarizes to 300 tokens
- Returns: "Summary: [key findings]"
↓ Returns to
Ask_Sam (receives 300 tokens, not 15,000)
Pattern 2: Code Analysis Specialist
Ask_Sam
↓ Invokes
GitLab_Code_Agent (Subagent)
- Has GitLab MCP access
- Searches 5 repositories
- Analyzes data lineage
- Returns: "Field X defined in file Y:Z, depends on tables A, B"
↓ Returns concise answer
Ask_Sam
***
Implementation Options
Option A: Dedicated Subagent Templates (Recommended)
Runbear provides pre-built specialist agents:
• kb_search_specialist - KB search + summarization
• code_analysis_specialist - GitLab/GitHub code inspection
• data_query_specialist - SQL/Snowflake/Hex analysis
Option B: Generic Agent Invocation
Allow any Runbear agent to invoke any other agent in the same workspace, with:
• Configurable result compression
• Context isolation (subagent context doesn't pollute parent)
• Timeout controls
Option C: Built-in Search Compression
Enhance runbear_file_search itself:
runbear_file_search(
query: List[str],
max_num_results: int = 5,
compress_results: bool = True,# NEW
compression_prompt: str = None# NEW: "Summarize in <200 words"
)
***
Expected Benefits
Token Savings:
• Current: 15,000 tokens per KB search
• With subagent: 300 tokens per delegated search
• Savings: ~14,700 tokens per search (~98% reduction)
Scalability:
• Parent agent can handle longer conversations (50+ turns vs 15-20 currently)
• Complex multi-step analysis becomes feasible
• Parallel specialist invocations possible
User Experience:
• No more "token limit exceeded" mid-conversation
• More sophisticated analysis without manual conversation splitting
• Better separation of concerns (parent = orchestration, subagents = execution)
***
Similar Patterns in Industry
• Hex Threads Agent: Uses create_thread / get_thread for analysis delegation
• Cursor Agent: Uses CURSOR_LAUNCH_AGENT for code tasks
• OpenAI Assistants API: Supports agent-to-agent tool calling
• LangChain: Multi-agent orchestration with context isolation
***
Proposed Pilot
Test Case: Minted Analytics Ask_Sam
Scenario: "Explain data lineage for bi_customers.mm_status"
Current Flow (45,000 tokens):
1. Search KB for "mm_status" → 8,000 tokens
2. Search GitLab refs → 10,000 tokens
3. Search Slack discussions → 9,000 tokens
4. Synthesize answer → 3,000 tokens
5. Tool defs (144 tools) → 2,400 tokens
6. LTM → 12,000 tokens
Total: ~45,000 tokens
With Subagents (~8,000 tokens):
1. Invoke KB_Search_Specialist("mm_status") → 300 tokens returned
2. Invoke GitLab_Code_Specialist("mm_status") → 400 tokens returned
3. Synthesize answer → 3,000 tokens
4. Tool defs (60 tools, reduced) → 1,000 tokens
5. LTM → 3,000 tokens
Total: ~8,000 tokens (82% reduction)
***
Alternative Workarounds (If Feature Delayed)
1. Disable unused tools (144 → 80) - saves 1,000 tokens/turn
2. Aggressive LTM pruning - archive entries older than 90 days
3. External MCP server - custom KB search compression service
4. Manual conversation splits - user restarts every 15 turns (poor UX)
***
Contact for Follow-up
Organization: Minted.com
Primary Contact: Patrick Codrington (patrick.codrington@minted.com)
Agent ID: ff1379fe-51d1-4bd4-a870-e5ef6dc11d88
Slack Workspace: minted.slack.com (proj_ant_nothing_to_see_here)
Willing to participate in beta testing: Yes
Please authenticate to join the conversation.
In Review
💡 Feature Request
7 days ago

Patrick Codrington
Get notified by email when there are changes.
In Review
💡 Feature Request
7 days ago

Patrick Codrington
Get notified by email when there are changes.