Project Guide¶

This guide is a practical map of the entire repository for contributors and maintainers.

It focuses on: - What each folder is responsible for - Which Python environment and package workflow are the defaults - Which commands are currently valid - What to improve next in architecture, testing, performance, and developer experience

1) System Summary¶

AI Imaging Agent is a RAG plus VLM recommender for imaging software.

High-level flow: 1. User uploads file(s) and enters a task. 2. Retrieval stage finds candidate tools (BGE-M3 + FAISS + reranker). 3. Agent/VLM stage ranks candidates with image-aware reasoning. 4. UI renders ranked recommendations and optional demo links.

Primary orchestrator: src/ai_agent/api/pipeline.py

2) Default Python Environment And Packages (Dev Container Canonical)¶

Assume development is done inside the dev container.

Source of truth: - Dev container: .devcontainer/devcontainer.json - Package metadata and pinned dependencies: pyproject.toml - Secondary dependency list: requirements.txt

Default environment: - OS: Debian Bookworm (dev container) - Python: 3.12 - Environment manager: uv - Virtual environment path: .venv

Recommended commands:

uv venv
uv pip install -e .
uv pip install -e ".[dev]"

Run and test:

ai_agent chat
ai_agent sync
pytest tests/

Important note on command drift: - CLI officially supports chat and sync in src/ai_agent/cli.py. - justfile currently references ai_agent ui, which does not match current CLI modes. - Documentation in this guide follows the actual CLI implementation.

3) Repository Top-Level Map¶

.github/: automation and agent instructions
.devcontainer/: dev container build and editor defaults
docs/: MkDocs source pages
src/: application source code
tests/: test suite
data/: sample data assets
tools/: container/tooling helpers
CHANGELOG.md: release history
config.yaml: model/provider configuration
mkdocs.yml: docs site navigation and theme
pyproject.toml: package metadata, dependencies, entrypoints

4) Detailed Source Folder Responsibilities¶

Package root: src/ai_agent/

4.1 `src/ai_agent/agent/`¶

Purpose: conversational orchestration using PydanticAI.

Key files: - src/ai_agent/agent/agent.py: agent setup, tool wiring, response flow - src/ai_agent/agent/models.py: state/output models - src/ai_agent/agent/utils.py: helper utilities and guardrails - src/ai_agent/agent/tools/: concrete tool implementations - src/ai_agent/agent/tools/mcp/: MCP adapters

Boundary: - Should orchestrate tools and policy, not own retrieval internals.

4.2 `src/ai_agent/api/`¶

Purpose: pipeline orchestration between inputs, retrieval, and selection.

Key file: - src/ai_agent/api/pipeline.py

Responsibilities: - validate files - extract metadata - build retrieval query - call retrieval and selection stages - manage index refresh/reload behavior

Boundary: - Keep UI concerns out of this module.

4.3 `src/ai_agent/retriever/`¶

Purpose: deterministic retrieval stack (no LLM calls).

Key files: - src/ai_agent/retriever/text_embedder.py - src/ai_agent/retriever/vector_index.py - src/ai_agent/retriever/reranker.py - src/ai_agent/retriever/software_doc.py

Boundary: - Retrieval quality logic should stay here.

4.4 `src/ai_agent/generator/`¶

Purpose: selection schema and prompting primitives.

Key files: - src/ai_agent/generator/prompts.py - src/ai_agent/generator/schema.py

Boundary: - Keep this layer focused on schema and prompt contracts, not transport/UI concerns.

4.5 `src/ai_agent/ui/`¶

Purpose: Gradio app and interaction handling.

Key files: - src/ai_agent/ui/app.py - src/ai_agent/ui/handlers.py - src/ai_agent/ui/components.py - src/ai_agent/ui/formatters.py - src/ai_agent/ui/state.py - src/ai_agent/ui/visualizations.py

Boundary: - UI should call orchestrators, not reimplement retrieval/selection decisions.

4.6 `src/ai_agent/utils/`¶

Purpose: cross-cutting utility functions.

Key files: - src/ai_agent/utils/config.py - src/ai_agent/utils/file_validator.py - src/ai_agent/utils/image_meta.py - src/ai_agent/utils/image_io.py - src/ai_agent/utils/previews.py - src/ai_agent/utils/tags.py - src/ai_agent/utils/temp_file_manager.py

Boundary: - Keep utilities reusable and independent from UI-specific logic.

4.7 `src/ai_agent/catalog/`¶

Purpose: catalog synchronization and refresh helpers.

Key file: - src/ai_agent/catalog/sync.py

Boundary: - Catalog IO and sync logic should stay isolated from ranking logic.

4.8 `src/ai_agent/core/`¶

Purpose: shared core coordination such as pipeline registry.

Key file: - src/ai_agent/core/pipeline_registry.py

Boundary: - Keep core primitives minimal and dependency-light.

4.9 `src/ai_agent/queries/`¶

Purpose: query assets used by catalog sync/retrieval support.

Key file: - src/ai_agent/queries/get_relevant_software.rq

Boundary: - Keep query definitions versioned and testable.

4.10 `src/ai_agent/cli.py`¶

Purpose: command entry point and mode dispatch.

Current modes: - chat - sync

This is the command contract docs should follow.

5) Supporting Folders¶

5.1 `tests/`¶

Contains unit/integration tests and test fixtures under tests/data/.

Improvement target: - add more focused tests for UI handler edge cases and tool failure handling.

5.2 `tools/`¶

Container and deployment support assets.

Notable file: - tools/image/Dockerfile (uv + Python 3.12 baseline)

5.3 docs/¶

Documentation source for MkDocs.

Add new pages to mkdocs.yml nav to keep docs discoverable.

6) Known Inconsistencies To Track¶

justfile uses ai_agent ui, while src/ai_agent/cli.py defines chat and sync.
Installation docs often show pip-first flow, while dev container bootstrap is uv-first.
requirements.txt is looser than pyproject.toml, which contains current pinned/runtime dependencies.

7) Codebase Improvement Guidelines¶

7.1 Architecture And Modularity¶

Keep strict stage boundaries: retrieval logic in retriever, selection contracts in generator, orchestration in api.
Minimize cross-layer imports from ui to low-level modules.
Introduce lightweight interface contracts for tool adapters to reduce coupling in agent/tools.
Centralize shared constants/env defaults to reduce duplicated configuration behavior.

7.2 Testing And Quality Gates¶

Add regression tests for format-token query construction and retry broadening behavior.
Add failure-path tests for image preview generation and graceful degradation.
Add contract tests for agent tool outputs (search, alternative search, repo info).
Enforce formatting/lint/type checks in CI (ruff, black --check, mypy, pytest).

7.3 Performance And Retrieval Quality¶

Add benchmark fixtures for retrieval latency and reranker throughput.
Track retrieval quality with a small fixed evaluation set (top-k recall, MRR).
Cache expensive metadata extraction where safe for repeated files in a session.
Make index reload behavior observable with structured counters in logs.

7.4 Developer Experience And CI¶

Align just tasks with real CLI contract (chat/sync).
Add a docs link checker in CI to prevent markdown drift.
Document one canonical local workflow (dev container first, optional local pip fallback).
Add a short maintainer checklist for release prep and changelog updates.

8) Practical Contributor Checklist¶

Before opening a PR: 1. Install/update in editable mode in the active environment. 2. Run tests relevant to changed modules. 3. Validate docs links if docs were touched. 4. Update CHANGELOG.md for user-visible changes. 5. Confirm command and environment docs still match real behavior.

README.md
docs/index.md
docs/architecture/overview.md
docs/development/structure.md
AGENTS.md
.github/copilot-instructions.md

Project Guide¶

1) System Summary¶

2) Default Python Environment And Packages (Dev Container Canonical)¶

3) Repository Top-Level Map¶

4) Detailed Source Folder Responsibilities¶

4.1 src/ai_agent/agent/¶

4.2 src/ai_agent/api/¶

4.3 src/ai_agent/retriever/¶

4.4 src/ai_agent/generator/¶

4.5 src/ai_agent/ui/¶

4.6 src/ai_agent/utils/¶

4.7 src/ai_agent/catalog/¶

4.8 src/ai_agent/core/¶

4.9 src/ai_agent/queries/¶

4.10 src/ai_agent/cli.py¶