Retrieval Pipeline¶

The retrieval stage is the first phase of the AI Imaging Agent's two-stage pipeline. It performs fast text-based search to find candidate tools from the software catalog.

Overview¶

Goal: Quickly narrow down the software catalog to most relevant candidates

Characteristics:

⚡ Fast (~100-300ms)
🔢 Deterministic and reproducible
🚫 No LLM calls
💰 Low cost (no API fees)

Pipeline Stages¶

graph LR
    A[User Query] --> B[Query Enhancement]
    B --> C[Embedding]
    C --> D[FAISS Search]
    D --> E[CrossEncoder Rerank]
    E --> F[Top-K Candidates]

Step 1: Query Enhancement¶

Format Token Injection¶

When users upload files, format tokens are added to the query:

# User uploads: scan.dcm
# User query: "segment lungs"

# Enhanced query:
"segment lungs format:DICOM format:CT format:3D"

Format tokens added:

File extension (format:DICOM, format:NIfTI)
Image modality from metadata (format:CT, format:MRI)
Dimensionality (format:2D, format:3D, format:4D)

Why this helps:

Matches tools that support specific formats
Boosts DICOM-compatible tools for DICOM input
Ensures dimension compatibility (3D tools for volumes)

Control Tag Processing¶

Special tags are extracted and processed:

query = "segment lungs [EXCLUDE:tool1|tool2] [NO_RERANK]"

# Extracted:
clean_query = "segment lungs"
excluded_tools = ["tool1", "tool2"]
skip_rerank = True

Supported tags: - [EXCLUDE:tool1|tool2]: Filter tools from results - [NO_RERANK]: Skip CrossEncoder reranking - [REFINE]: Force clarification (handled by agent, not retrieval)

Step 2: Query Expansion¶

Semantic Similarity-Based Expansion¶

Queries are expanded with semantically related terms from the catalog vocabulary:

query = "segment brain"

# Semantic neighbors (cosine similarity > 0.75):
expansion_terms = [
    "segmentation",     # 0.89
    "parcellation",     # 0.82
    "extraction",       # 0.79
    "anatomy",          # 0.78
    "neuroimaging"      # 0.76
]

# Expanded query:
"segment brain segmentation parcellation extraction anatomy neuroimaging"

How it works:

Extract vocabulary from catalog (all tool names, descriptions, keywords)
Embed vocabulary using BGE-M3
At query time, find top-N nearest neighbors (cosine similarity)
Add neighbors to query

Benefits:

Automatic synonym handling
No manual dictionaries needed
Adapts to catalog changes
Handles domain-specific terminology

Parameters:

Similarity threshold: 0.75
Max expansion terms: 10
Vocabulary updated on catalog sync

Alternative Query Generation¶

On retry (when initial results < 5 tools):

# Initial query
query1 = "segment rare structure"
results = 2 tools  # Too few!

# Retry 1: Broader expansion
query2 = "segment structure anatomy region organ tissue"
results = 7 tools  # Better!

# Retry 2: Even broader (if needed)
query3 = "segment analysis detection extraction processing"

Max retries: 2

Step 3: Embedding¶

BGE-M3 Model¶

Model: BAAI/bge-m3

Characteristics:

Multilingual (but used for English)
1024-dimensional embeddings
Trained for retrieval tasks
Fast inference (~10ms per query)

Embedding process:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-m3")
query_vector = model.encode(
    query,
    normalize_embeddings=True  # L2 normalization for cosine similarity
)
# Returns: np.array of shape (1024,)

Catalog Embedding¶

Software tools are pre-embedded during indexing:

# For each tool in catalog:
tool_text = f"{tool.name} {tool.description} {' '.join(tool.keywords)}"
tool_vector = model.encode(tool_text, normalize_embeddings=True)

# Store in FAISS index
faiss_index.add(tool_vector)

Index structure:

FAISS IndexFlatIP (inner product = cosine similarity for normalized vectors)
~150 tools in current catalog
Index size: ~600KB

Step 4: FAISS Search¶

Vector Search¶

FAISS performs fast similarity search:

import faiss

# Search for top 20 most similar tools
scores, indices = faiss_index.search(
    query_vector.reshape(1, -1),
    k=20
)

# Returns:
# scores: [0.85, 0.82, 0.79, ...]  # Cosine similarities
# indices: [42, 17, 89, ...]        # Tool IDs in catalog

Search algorithm:

IndexFlatIP: Exact search (brute force)
Fast for catalog size (~150 tools)
Could use IVF for larger catalogs (>10k tools)

Why top-20:

More candidates than needed (default final: 8)
Provides options for reranking
Balances recall vs. later stage cost

Candidate Retrieval¶

candidates = [catalog[idx] for idx in indices[:20]]
candidate_scores = scores[:20].tolist()

# Example candidates:
[
    {
        "name": "TotalSegmentator",
        "score": 0.85,
        "description": "Automated multi-organ segmentation...",
        ...
    },
    ...
]

Step 5: CrossEncoder Reranking¶

Why Rerank?¶

BiEncoder (BGE-M3) limitations:

Encodes query and documents independently
No query-document interaction
Misses subtle relevance signals

CrossEncoder benefits:

Jointly encodes query + document
Cross-attention between query and doc
More accurate relevance scoring
Slower (not suitable for entire catalog)

Reranking Model¶

Model: BAAI/bge-reranker-v2-m3

Characteristics:

Trained on MS-MARCO passage ranking
6 layers, fast inference (~50ms per pair)
Direct relevance score (no embedding)

Reranking process:

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("BAAI/bge-reranker-v2-m3")

# Score each (query, candidate) pair
pairs = [(query, candidate.description) for candidate in candidates]
rerank_scores = reranker.predict(pairs)

# Re-sort by rerank scores
sorted_indices = np.argsort(rerank_scores)[::-1]
reranked_candidates = [candidates[i] for i in sorted_indices][:8]

Output: Top-8 candidates with refined ranking

Skip Reranking¶

With [NO_RERANK] tag:

if "[NO_RERANK]" in query:
    # Skip CrossEncoder, use FAISS scores directly
    final_candidates = candidates[:8]
else:
    # Apply reranking
    final_candidates = rerank(candidates)[:8]

Trade-off:

✅ Faster (~200ms saved)
❌ Potentially less accurate
Good for: Quick exploration, well-specified queries

Output Format¶

Candidate Schema¶

Each candidate passed to Stage 2:

{
    "name": "TotalSegmentator",
    "description": "Automated multi-organ segmentation for CT and MRI",
    "url": "https://github.com/wasserth/TotalSegmentator",
    "keywords": ["segmentation", "medical-imaging", "CT", "MRI"],
    "license": "Apache-2.0",
    "supporting_data": {
        "modalities": ["CT", "MRI"],
        "dimensions": ["3D"],
        "formats": ["DICOM", "NIfTI"],
        "demo_url": "https://huggingface.co/spaces/..."
    },
    "retrieval_score": 0.85,  # FAISS or rerank score
}

Fields used by VLM:

Essential for understanding tool capability
Formatted as table in VLM prompt
Enables comparative reasoning

Index Management¶

Building the Index¶

Done during catalog sync:

ai_agent sync

Process:

Load catalog JSONL
Embed each tool description
Build FAISS index
Save to disk: artifacts/rag_index/

Files:

artifacts/rag_index/
├── index.faiss          # FAISS binary index
└── meta.json            # Metadata (tool IDs, config)

Loading the Index¶

At startup:

from retriever.vector_index import VectorIndex

index = VectorIndex()
index.load("artifacts/rag_index")

# Ready for queries
results = index.search(query, k=20)

Updating the Index¶

When catalog changes: 1. Sync detects new/modified tools 2. Re-embed entire catalog (fast, ~2 seconds) 3. Rebuild FAISS index 4. Reload in pipeline (no restart needed)

Performance Optimization¶

Caching¶

Model loading:

BGE-M3 and CrossEncoder loaded once at startup
Kept in memory for entire session

Index loading:

FAISS index loaded once
Small enough to fit in memory (~MB)

Batch Processing¶

For multiple queries (testing, batch mode):

# Batch embed multiple queries
query_vectors = model.encode(queries, batch_size=32)

# Batch FAISS search
scores, indices = index.search(query_vectors, k=20)

GPU Acceleration¶

Models can use GPU if available:

model = SentenceTransformer("BAAI/bge-m3", device="cuda")
reranker = CrossEncoder("...", device="cuda")

Retrieval Metrics¶

Monitored Metrics¶

During retrieval:

Number of candidates found: Should be ≥8 for good coverage
Average similarity score: Higher = better match
Reranking impact: Score change after reranking
Query expansion: Number of terms added

Logging¶

Retrieval events logged:

INFO retriever.vector_index: FAISS search: query="segment lungs", results=20, top_score=0.85
INFO retriever.reranker: Reranking 20 candidates, top_score_change=+0.12
INFO retriever.pipeline: Final candidates: 8, avg_score=0.78

Limitations¶

Current Limitations¶

English only: No multilingual support (though model is capable)
Small catalog: ~150 tools (FAISS overkill, but scales)
No filtering: Can't filter by license, modality in retrieval (done in Stage 2)
Fixed vocabulary: Expansion vocabulary from catalog only

Future Enhancements¶

Hybrid search: Combine semantic + keyword (BM25)
Metadata filters: Pre-filter by modality, license, format
Personalization: User history, preferences
Dynamic expansion: Learn expansion terms from user behavior

Retrieval Pipeline¶

Overview¶

Pipeline Stages¶

Step 1: Query Enhancement¶

Format Token Injection¶

Control Tag Processing¶

Step 2: Query Expansion¶

Semantic Similarity-Based Expansion¶

Alternative Query Generation¶

Step 3: Embedding¶

BGE-M3 Model¶

Catalog Embedding¶

Step 4: FAISS Search¶

Vector Search¶

Candidate Retrieval¶

Step 5: CrossEncoder Reranking¶

Why Rerank?¶

Reranking Model¶

Skip Reranking¶

Output Format¶

Candidate Schema¶

Index Management¶

Building the Index¶

Loading the Index¶

Updating the Index¶

Performance Optimization¶

Caching¶

Batch Processing¶

GPU Acceleration¶

Retrieval Metrics¶

Monitored Metrics¶

Logging¶

Limitations¶

Current Limitations¶

Future Enhancements¶

Next Steps¶