This document describes the current Repository.run_analysis() behavior in src/analysis/repositories.py.
flowchart TD
A[Input repository URL] --> B{Public repository}
B -- No --> X[Stop with error]
B -- Yes --> C{Cached and not force refresh}
C -- Yes --> Z[Load cached result]
C -- No --> D[Run GIMIE analysis]
D --> E[Run atomic LLM pipeline]
subgraph S1[Atomic LLM pipeline]
E1[Prepare repository context]
E2[Compile repository context]
E3[Generate structured output]
E4[Classify repository type and discipline]
E5[Identify related organizations]
E6[Build SoftwareSourceCode model]
end
E --> E1 --> E2 --> E3 --> E4 --> E5 --> E6
E6 --> F[Run ORCID author enrichment]
F --> G{Run user enrichment}
G -- Yes --> H[Run user enrichment step]
G -- No --> I[Skip user enrichment]
H --> J{Run organization enrichment}
I --> J
J -- Yes --> K[Run organization enrichment step]
J -- No --> L[Skip organization enrichment]
K --> M[Run linked entities enrichment]
L --> M
M --> N{Run author linked entities}
N -- Yes --> O[Run author linked entities step]
N -- No --> P[Skip author linked entities]
O --> Q[Run final EPFL assessment]
P --> Q
Q --> R[Run validation]
R --> S[Save in cache]
S --> T[Return output and usage stats]
The repository analysis aggregates both:
input_tokens, output_tokens), andestimated_input_tokens, estimated_output_tokens).These values are returned in APIOutput.stats.