Advanced Features (not tested for now..)¶
The AI Imaging Agent includes several advanced features for power users and specialized use cases.
Control Tags¶
Control tags modify agent behavior using special syntax in your queries.
Exclude Tools¶
Filter out specific tools from results:
Syntax: [EXCLUDE:tool1|tool2|tool3]
Use cases:
- You've already tried certain tools
- Exclude tools you don't have access to
- Filter by licensing (exclude proprietary tools)
- Remove tools with specific limitations
Example:
You: Segment kidneys [EXCLUDE:totalsegmentator]
Agent: [Returns kidney segmentation tools except TotalSegmentator]
You: Find open-source options [EXCLUDE:proprietarytool1|proprietarytool2]
Agent: [Returns only open-source tools]
Disable Reranking¶
Skip the CrossEncoder reranking step for faster results:
Benefits:
- ✅ Faster retrieval (~2x speedup)
- ✅ Lower computational cost
- ✅ Good for broad exploratory queries
Trade-offs:
- ❌ Potentially less accurate ranking
- ❌ May include less relevant tools
- ❌ Semantic matching only (no cross-attention)
When to use:
- Quick exploration
- Very specific queries (already well-targeted)
- When speed matters more than precision
Force Refinement¶
Request clarification even with good results:
Results in:
- Agent asks clarifying questions
- More focused recommendations
- Opportunity to specify requirements
Example:
You: Analyze this CT scan [REFINE]
Agent: I can see this is a CT scan. What specific analysis are you interested in?
- Organ segmentation
- Tumor detection
- Bone analysis
- Vascular analysis
You: Tumor detection
Agent: [Provides tumor detection tools specifically]
Alternative Searches¶
Request the agent to search with different strategies.
Requesting Alternatives¶
Use natural language:
What happens:
- Agent formulates alternative query
- Uses semantic neighbors for expansion
- Searches with different emphasis
- Returns new set of recommendations
Limit: Up to 3 alternative searches per conversation
When to Use¶
- Initial results don't quite match
- Want to see different approaches
- Exploring the catalog
- Looking for specialized tools
Example conversation:
You: Segment lungs from this CT
Agent: [Provides general lung segmentation tools]
You: Can you search for alternatives?
Agent: [Searches with emphasis on "airway segmentation", "pulmonary analysis"]
You: Show me other options
Agent: [Searches with emphasis on "CT thorax processing", "respiratory imaging"]
Multi-Model Support¶
Selecting Different Models¶
The UI provides a model selector dropdown:
Available models (configurable in config.yaml):
- gpt-4o-mini: Faster, lower cost
- gpt-4o: Higher accuracy, multimodal
- gpt-5.1: Latest capabilities (if available)
- Custom endpoints: EPFL, local servers, etc.
Model Trade-offs¶
| Model | Speed | Cost | Accuracy | Vision |
|---|---|---|---|---|
| gpt-4o-mini | ⚡⚡⚡ | 💰 | ⭐⭐⭐ | ✅ |
| gpt-4o | ⚡⚡ | 💰💰 | ⭐⭐⭐⭐ | ✅✅ |
| gpt-5.1 | ⚡ | 💰💰💰 | ⭐⭐⭐⭐⭐ | ✅✅✅ |
When to Switch Models¶
Use gpt-4o-mini when:
- Doing quick explorations
- Cost is a concern
- Tasks are straightforward
- Query is well-specified
Use gpt-4o when:
- Complex visual analysis needed
- Accuracy is critical
- Ambiguous queries
- Multi-step reasoning required
Use gpt-5.1 when:
- Maximum accuracy needed
- Complex multi-modal tasks
- Research/publication work
Repository Info Tool¶
What It Does¶
The agent can fetch detailed information about GitHub repositories:
You: Tell me about TotalSegmentator
Agent: [Fetches repo info from GitHub via DeepWiki or repocards]
Repository: wasserth/TotalSegmentator
Description: Automated multi-organ segmentation in CT and MR images
Stars: 1.2k
Language: Python
Topics: segmentation, medical-imaging, deep-learning
Last Updated: 2024-03-15
License: Apache-2.0
Data Sources¶
- DeepWiki MCP (primary): Fast, pre-indexed repository documentation
- Repocards (fallback): Direct library-based fetch
Usage¶
Ask about tools naturally:
Conversation State Management¶
State Tracking¶
The agent maintains state across conversation:
- Uploaded files: All files in session
- Preview images: Converted images for VLM
- Excluded tools: Tools filtered via
[EXCLUDE:] - Conversation history: Previous messages and context
- Turn counter: Current conversation turn
Viewing State¶
In the sidebar (debug mode):
{
"conversation_turn": 3,
"uploaded_files": ["scan.dcm", "brain.nii"],
"excluded_tools": ["tool1", "tool2"],
"preview_images": ["/tmp/scan_preview.png"]
}
Resetting State¶
To start fresh: - Refresh the page - Clear uploaded files - Start new conversation
Query Expansion¶
How It Works¶
Your query is automatically expanded with semantic neighbors:
Original: "segment brain"
Expanded: "segment brain segmentation parcellation extraction
anatomy neuroimaging cranial"
Based on:
- BGE-M3 embeddings
- Catalog vocabulary
- Cosine similarity >0.75
- Top 10 neighbors
Benefits¶
- ✅ Finds tools using different terminology
- ✅ Broader coverage of catalog
- ✅ Handles synonyms automatically
- ✅ No manual synonym dictionaries
Customization¶
Expansion is automatic but considers: - Your exact query terms (boosted weight) - Semantically similar terms - Format tokens from uploaded files
Format-Aware Matching¶
Input Format Tokens¶
File uploads add format tokens to queries:
How It Helps¶
- Narrows results: Shows compatible tools first
- Boosts relevance: DICOM tools rank higher for DICOM
- Compatibility check: Agent verifies format support
Supported Formats¶
Tokens added for:
- File extension (.dcm, .nii, .png)
- Detected format (DICOM, NIfTI, TIFF)
- Modality for medical images (CT, MRI, XR)
- Dimensions (2D, 3D, 4D)
Iterative Retrieval¶
Auto-Retry on Low Results¶
If initial search returns <5 candidates:
- Retry #1: Alternative query with semantic expansion
- Retry #2: Further expansion with broader terms
- Max 2 retries: Then return best available
Why It Matters¶
- Handles rare/specialized queries
- Finds tools even with limited matches
- Automatic - no user action needed
Example¶
Query: "segment rare anatomical structure"
Initial: 2 candidates found
Retry 1: Expanded to "segment anatomy structure region organ"
Result: 7 candidates found ✓
Debug Features¶
Prompt Logging¶
Enable in .env:
Saves:
- VLM prompts sent to API
- Images included in prompts
- Response JSON
- Timestamp and metadata
Location: logs/prompts/YYYYMMDD_HHMMSS/
Contents:
logs/prompts/20240315_143022/
├── prompt.txt # Text prompt
├── image_0.png # Uploaded image
├── response.json # API response
└── metadata.json # Request metadata
Execution Traces¶
Always shown in chat (expandable):
Shows:
- Tool calls made
- Parameters used
- API responses
- Timing information
Catalog Synchronization¶
Auto-Refresh¶
Configured via .env:
Behavior:
- Background thread checks for catalog updates
- Reloads FAISS index if changed
- No UI interruption
- Logs refresh activity
Manual Sync¶
Force synchronization:
Updates:
- Software catalog
- Embeddings
- FAISS index
- Vocabulary for expansion
Advanced Configuration¶
Custom Catalog¶
Use your own tool catalog:
Format: JSONL with schema.org SoftwareSourceCode
API Endpoints¶
Configure custom OpenAI-compatible endpoints in config.yaml:
available_models:
- display_name: "Local LLM"
name: "llama-3.1"
base_url: "http://localhost:8000/v1"
api_key_env: "LOCAL_API_KEY"
Pipeline Parameters¶
Fine-tune retrieval:
TOP_K=8 # Candidates to retrieve
NUM_CHOICES=3 # Final recommendations
RERANK_TOP_N=20 # Candidates before reranking
Next Steps¶
- Dive into Architecture Overview
- Learn about Development and Contributing
- Check Environment Variables Reference