Project Guide¶
This guide is a practical map of the entire repository for contributors and maintainers.
It focuses on: - What each folder is responsible for - Which Python environment and package workflow are the defaults - Which commands are currently valid - What to improve next in architecture, testing, performance, and developer experience
1) System Summary¶
AI Imaging Agent is a RAG plus VLM recommender for imaging software.
High-level flow: 1. User uploads file(s) and enters a task. 2. Retrieval stage finds candidate tools (BGE-M3 + FAISS + reranker). 3. Agent/VLM stage ranks candidates with image-aware reasoning. 4. UI renders ranked recommendations and optional demo links.
Primary orchestrator: src/ai_agent/api/pipeline.py
2) Default Python Environment And Packages (Dev Container Canonical)¶
Assume development is done inside the dev container.
Source of truth:
- Dev container: .devcontainer/devcontainer.json
- Package metadata and pinned dependencies: pyproject.toml
- Secondary dependency list: requirements.txt
Default environment: - OS: Debian Bookworm (dev container) - Python: 3.12 - Environment manager: uv - Virtual environment path: .venv
Recommended commands:
Run and test:
Important note on command drift:
- CLI officially supports chat and sync in src/ai_agent/cli.py.
- justfile currently references ai_agent ui, which does not match current CLI modes.
- Documentation in this guide follows the actual CLI implementation.
3) Repository Top-Level Map¶
.github/: automation and agent instructions.devcontainer/: dev container build and editor defaults- docs/: MkDocs source pages
src/: application source codetests/: test suitedata/: sample data assetstools/: container/tooling helpersCHANGELOG.md: release historyconfig.yaml: model/provider configurationmkdocs.yml: docs site navigation and themepyproject.toml: package metadata, dependencies, entrypoints
4) Detailed Source Folder Responsibilities¶
Package root: src/ai_agent/
4.1 src/ai_agent/agent/¶
Purpose: conversational orchestration using PydanticAI.
Key files:
- src/ai_agent/agent/agent.py: agent setup, tool wiring, response flow
- src/ai_agent/agent/models.py: state/output models
- src/ai_agent/agent/utils.py: helper utilities and guardrails
- src/ai_agent/agent/tools/: concrete tool implementations
- src/ai_agent/agent/tools/mcp/: MCP adapters
Boundary: - Should orchestrate tools and policy, not own retrieval internals.
4.2 src/ai_agent/api/¶
Purpose: pipeline orchestration between inputs, retrieval, and selection.
Key file:
- src/ai_agent/api/pipeline.py
Responsibilities: - validate files - extract metadata - build retrieval query - call retrieval and selection stages - manage index refresh/reload behavior
Boundary: - Keep UI concerns out of this module.
4.3 src/ai_agent/retriever/¶
Purpose: deterministic retrieval stack (no LLM calls).
Key files:
- src/ai_agent/retriever/text_embedder.py
- src/ai_agent/retriever/vector_index.py
- src/ai_agent/retriever/reranker.py
- src/ai_agent/retriever/software_doc.py
Boundary: - Retrieval quality logic should stay here.
4.4 src/ai_agent/generator/¶
Purpose: selection schema and prompting primitives.
Key files:
- src/ai_agent/generator/prompts.py
- src/ai_agent/generator/schema.py
Boundary: - Keep this layer focused on schema and prompt contracts, not transport/UI concerns.
4.5 src/ai_agent/ui/¶
Purpose: Gradio app and interaction handling.
Key files:
- src/ai_agent/ui/app.py
- src/ai_agent/ui/handlers.py
- src/ai_agent/ui/components.py
- src/ai_agent/ui/formatters.py
- src/ai_agent/ui/state.py
- src/ai_agent/ui/visualizations.py
Boundary: - UI should call orchestrators, not reimplement retrieval/selection decisions.
4.6 src/ai_agent/utils/¶
Purpose: cross-cutting utility functions.
Key files:
- src/ai_agent/utils/config.py
- src/ai_agent/utils/file_validator.py
- src/ai_agent/utils/image_meta.py
- src/ai_agent/utils/image_io.py
- src/ai_agent/utils/previews.py
- src/ai_agent/utils/tags.py
- src/ai_agent/utils/temp_file_manager.py
Boundary: - Keep utilities reusable and independent from UI-specific logic.
4.7 src/ai_agent/catalog/¶
Purpose: catalog synchronization and refresh helpers.
Key file:
- src/ai_agent/catalog/sync.py
Boundary: - Catalog IO and sync logic should stay isolated from ranking logic.
4.8 src/ai_agent/core/¶
Purpose: shared core coordination such as pipeline registry.
Key file:
- src/ai_agent/core/pipeline_registry.py
Boundary: - Keep core primitives minimal and dependency-light.
4.9 src/ai_agent/queries/¶
Purpose: query assets used by catalog sync/retrieval support.
Key file:
- src/ai_agent/queries/get_relevant_software.rq
Boundary: - Keep query definitions versioned and testable.
4.10 src/ai_agent/cli.py¶
Purpose: command entry point and mode dispatch.
Current modes:
- chat
- sync
This is the command contract docs should follow.
5) Supporting Folders¶
5.1 tests/¶
Contains unit/integration tests and test fixtures under tests/data/.
Improvement target: - add more focused tests for UI handler edge cases and tool failure handling.
5.2 tools/¶
Container and deployment support assets.
Notable file:
- tools/image/Dockerfile (uv + Python 3.12 baseline)
5.3 docs/¶
Documentation source for MkDocs.
Add new pages to mkdocs.yml nav to keep docs discoverable.
6) Known Inconsistencies To Track¶
justfileusesai_agent ui, whilesrc/ai_agent/cli.pydefineschatandsync.- Installation docs often show pip-first flow, while dev container bootstrap is uv-first.
requirements.txtis looser thanpyproject.toml, which contains current pinned/runtime dependencies.
7) Codebase Improvement Guidelines¶
7.1 Architecture And Modularity¶
- Keep strict stage boundaries: retrieval logic in
retriever, selection contracts ingenerator, orchestration inapi. - Minimize cross-layer imports from
uito low-level modules. - Introduce lightweight interface contracts for tool adapters to reduce coupling in
agent/tools. - Centralize shared constants/env defaults to reduce duplicated configuration behavior.
7.2 Testing And Quality Gates¶
- Add regression tests for format-token query construction and retry broadening behavior.
- Add failure-path tests for image preview generation and graceful degradation.
- Add contract tests for agent tool outputs (search, alternative search, repo info).
- Enforce formatting/lint/type checks in CI (
ruff,black --check,mypy,pytest).
7.3 Performance And Retrieval Quality¶
- Add benchmark fixtures for retrieval latency and reranker throughput.
- Track retrieval quality with a small fixed evaluation set (top-k recall, MRR).
- Cache expensive metadata extraction where safe for repeated files in a session.
- Make index reload behavior observable with structured counters in logs.
7.4 Developer Experience And CI¶
- Align
justtasks with real CLI contract (chat/sync). - Add a docs link checker in CI to prevent markdown drift.
- Document one canonical local workflow (dev container first, optional local pip fallback).
- Add a short maintainer checklist for release prep and changelog updates.
8) Practical Contributor Checklist¶
Before opening a PR:
1. Install/update in editable mode in the active environment.
2. Run tests relevant to changed modules.
3. Validate docs links if docs were touched.
4. Update CHANGELOG.md for user-visible changes.
5. Confirm command and environment docs still match real behavior.
9) Related References¶
README.md- docs/index.md
- docs/architecture/overview.md
- docs/development/structure.md
AGENTS.md.github/copilot-instructions.md