Skip to content

Advanced Features (not tested for now..)

The AI Imaging Agent includes several advanced features for power users and specialized use cases.

Control Tags

Control tags modify agent behavior using special syntax in your queries.

Exclude Tools

Filter out specific tools from results:

Find lung segmentation tools [EXCLUDE:totalsegmentator|medicalsam]

Syntax: [EXCLUDE:tool1|tool2|tool3]

Use cases:

  • You've already tried certain tools
  • Exclude tools you don't have access to
  • Filter by licensing (exclude proprietary tools)
  • Remove tools with specific limitations

Example:

You: Segment kidneys [EXCLUDE:totalsegmentator]
Agent: [Returns kidney segmentation tools except TotalSegmentator]

You: Find open-source options [EXCLUDE:proprietarytool1|proprietarytool2]
Agent: [Returns only open-source tools]

Notes

Only [EXCLUDE:...] is currently interpreted as a control tag for retrieval filtering.

Alternative Searches

Request the agent to search with different strategies.

Requesting Alternatives

Use natural language:

Can you search for alternatives?
Show me other options
Find different tools
What else is available?

What happens:

  • Agent formulates alternative query
  • Uses different phrasing/keywords for broader coverage
  • Searches with different emphasis
  • Returns new set of recommendations

Limit: Up to 3 alternative searches per conversation

When to Use

  • Initial results don't quite match
  • Want to see different approaches
  • Exploring the catalog
  • Looking for specialized tools

Example conversation:

You: Segment lungs from this CT
Agent: [Provides general lung segmentation tools]

You: Can you search for alternatives?
Agent: [Searches with emphasis on "airway segmentation", "pulmonary analysis"]

You: Show me other options
Agent: [Searches with emphasis on "CT thorax processing", "respiratory imaging"]

Multi-Model Support

Selecting Different Models

The UI provides a model selector dropdown:

Available models (configurable in config.yaml):

  • gpt-4o-mini: Faster, lower cost
  • gpt-4o: Higher accuracy, multimodal
  • gpt-5.1: Latest capabilities (if available)
  • Custom endpoints: EPFL, local servers, etc.

Model Trade-offs

Model Speed Cost Accuracy Vision
gpt-4o-mini โšกโšกโšก ๐Ÿ’ฐ โญโญโญ โœ…
gpt-4o โšกโšก ๐Ÿ’ฐ๐Ÿ’ฐ โญโญโญโญ โœ…โœ…
gpt-5.1 โšก ๐Ÿ’ฐ๐Ÿ’ฐ๐Ÿ’ฐ โญโญโญโญโญ โœ…โœ…โœ…

When to Switch Models

Use gpt-4o-mini when:

  • Doing quick explorations
  • Cost is a concern
  • Tasks are straightforward
  • Query is well-specified

Use gpt-4o when:

  • Complex visual analysis needed
  • Accuracy is critical
  • Ambiguous queries
  • Multi-step reasoning required

Use gpt-5.1 when:

  • Maximum accuracy needed
  • Complex multi-modal tasks
  • Research/publication work

Repository Info Tool

What It Does

The agent can fetch detailed information about GitHub repositories:

You: Tell me about TotalSegmentator
Agent: [Fetches repo info from GitHub via DeepWiki or repocards]

Repository: wasserth/TotalSegmentator
Description: Automated multi-organ segmentation in CT and MR images
Stars: 1.2k
Language: Python
Topics: segmentation, medical-imaging, deep-learning
Last Updated: 2024-03-15
License: Apache-2.0

Data Sources

  1. DeepWiki MCP (primary): Fast, pre-indexed repository documentation
  2. Repocards (fallback): Direct library-based fetch

Usage

Ask about tools naturally:

What is [tool name]?
Tell me more about [repository]
Show me details for [tool]

Conversation State Management

State Tracking

The agent maintains state across conversation:

  • Uploaded files: All files in session
  • Preview images: Converted images for VLM
  • Excluded tools: Tools filtered via [EXCLUDE:]
  • Conversation history: Previous messages and context
  • Turn counter: Current conversation turn

Viewing State

In the sidebar (debug mode):

{
  "conversation_turn": 3,
  "uploaded_files": ["scan.dcm", "brain.nii"],
  "excluded_tools": ["tool1", "tool2"],
  "preview_images": ["/tmp/scan_preview.png"]
}

Resetting State

To start fresh: - Refresh the page - Clear uploaded files - Start new conversation

Retrieval Query Behavior

How It Works

The retrieval pipeline currently does not add semantic neighbor terms. Instead, it builds retrieval queries from:

User text: "segment brain"
+ format hints from files: format:DICOM / format:NIfTI
+ compact image metadata: modality, anatomy, dimensions (when available)

Based on:

  • BGE-M3 embeddings
  • Format-aware hinting from uploaded files
  • Metadata-aware context from image inspection

Benefits

  • โœ… Stronger format compatibility matching
  • โœ… Better ranking for modality/dimension-specific tasks
  • โœ… More predictable retrieval behavior

Customization

If initial results are too sparse, the pipeline retries with a broader query formulation automatically.

Format-Aware Matching

Input Format Tokens

File uploads add format tokens to queries:

Uploaded: scan.dcm (DICOM)
Query enhancement: "segment lungs format:DICOM format:CT format:3D"

How It Helps

  • Narrows results: Shows compatible tools first
  • Boosts relevance: DICOM tools rank higher for DICOM
  • Compatibility check: Agent verifies format support

Supported Formats

Tokens added for: - File extension (.dcm, .nii, .png) - Detected format (DICOM, NIfTI, TIFF) - Modality for medical images (CT, MRI, XR) - Dimensions (2D, 3D, 4D)

Iterative Retrieval

Auto-Retry on Low Results

If initial search returns <5 candidates:

  1. Retry #1: Alternative query with semantic expansion
  2. Retry #2: Further expansion with broader terms
  3. Max 2 retries: Then return best available

Why It Matters

  • Handles rare/specialized queries
  • Finds tools even with limited matches
  • Automatic - no user action needed

Example

Query: "segment rare anatomical structure"
Initial: 2 candidates found
Retry 1: Expanded to "segment anatomy structure region organ"
Result: 7 candidates found โœ“

Debug Features

Prompt Logging

Enable in .env:

LOG_PROMPTS=1

Saves:

  • VLM prompts sent to API
  • Images included in prompts
  • Response JSON
  • Timestamp and metadata

Location: logs/prompts/YYYYMMDD_HHMMSS/

Contents:

logs/prompts/20240315_143022/
โ”œโ”€โ”€ prompt.txt          # Text prompt
โ”œโ”€โ”€ image_0.png         # Uploaded image
โ”œโ”€โ”€ response.json       # API response
โ””โ”€โ”€ metadata.json       # Request metadata

Execution Traces

Always shown in chat (expandable):

<details>
<summary>๐Ÿ”ง Execution Trace</summary>
...detailed logs...
</details>

Shows:

  • Tool calls made
  • Parameters used
  • API responses
  • Timing information

Catalog Synchronization

Auto-Refresh

Configured via .env:

SYNC_EVERY_HOURS=24

Behavior:

  • Background thread checks for catalog updates
  • Reloads FAISS index if changed
  • No UI interruption
  • Logs refresh activity

Manual Sync

Force synchronization:

ai_agent sync

Updates:

  • Software catalog
  • Embeddings
  • FAISS index
  • Vocabulary for expansion

Advanced Configuration

Custom Catalog

Use your own tool catalog:

SOFTWARE_CATALOG=/path/to/custom_catalog.jsonl

Format: JSONL with schema.org SoftwareSourceCode

API Endpoints

Configure custom OpenAI-compatible endpoints in config.yaml:

available_models:
  - display_name: "Local LLM"
    name: "llama-3.1"
    base_url: "http://localhost:8000/v1"
    api_key_env: "LOCAL_API_KEY"

Pipeline Parameters

Fine-tune retrieval:

TOP_K=8              # Candidates to retrieve
NUM_CHOICES=3        # Final recommendations
RERANK_TOP_N=20      # Candidates before reranking

Next Steps