# Core Concepts **Gunz-ML** is more than just a logging utility; it is a research-centric SDK designed to manage the entire lifecycle of deep learning experiments in a distributed environment. ## 1. The Research SDK Philosophy Most experiment trackers are passive sinks for data. Gunz-ML acts as a **bridge**. It allows you to: * **Write:** Log high-frequency metrics and artifacts during training. * **Read:** Query past results to inform the current HPO (Hyperparameter Optimization) loop. * **Extract:** Programmatically download artifacts from Juno for downstream analysis in notebooks. ## 2. Tracking vs. Management The library distinguishes between two levels of operation: * **Tracking (`gunz_ml.integrations`):** Low-level logic to ensure metrics reach MLflow and Optuna without database locks or race conditions. * **Management (`gunz_ml.management`):** High-level logic (e.g., `TrackingManager`) used to find the best runs, prune failed trials, and generate comparison reports across studies. ## 3. Distributed Safety In a Slurm-based cluster environment, multiple workers often try to initialize the same study simultaneously. Gunz-ML implements **Initialisation First Policy**: * Studies are pre-scaffolded using the `gunz-ml init` CLI. * Workers use `safe_set_experiment` to verify the environment is ready before starting, preventing the common "database is locked" errors in MariaDB. ## 4. The Juno Ecosystem Gunz-ML is designed to communicate with **Juno**, the unified experiment infrastructure. * **MLflow:** Stores run metadata, parameters, and time-series metrics. * **Optuna (MariaDB):** Stores the relational data for HPO trials. * **MinIO (S3):** Stores large binary artifacts (model checkpoints, .pt files, and plots). By standardizing on these backends, Gunz-ML ensures that your research is reproducible, queryable, and persistent.