structuring deep learning projects
At some point a deep learning project leaps out of the single Jupyter notebook and into a more structured codebase on its way to deployment, either standalone or as part of a larger system. Over the years, through trial, error and sleuthing on github, I have settled on a basic, cookie-cutter structure that works pretty well for me for most low TRL academic projects. Here is a quick overview of the structure along with a brief description of each folder/file.
The Structure
my_deep_learning_project/
│├── src/
│ ├── __init__.py
│ ├── dataloading/
│ │ ├── __init__.py
│ │ ├── dataset.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── baselines.py
│ │ ├── proposed_models.py
│ ├── experiments/
│ │ ├── __init__.py
│ │ ├── train.py
| | ├── metrics.py
| | ├── evaluate.py
| |── notebooks/
│ | ├── EDA.ipynb
│ | ├── ResultsReview.ipynb
| ├── utils.py
│├── results/
│├── data/
|├── stdout/
│├── requirements.txt
│├── .gitignore
│├── README.md
The Breakdown
-
src/: This is the main source code directory containing all the code for data loading, model definitions, experiments, and utilities.-
dataloading/: Contains code related to datasets and preprocessing. Thedataset.pyfile typically defines custom dataset classes. A benchmarking suite can have multiple datasets each of which can be defined in dedicated<name>_dataset.pyfiles. -
models/: Contains implementations of the models.baselines.pyincludes standard models for comparison, whileproposed_models.pycontains the implementation of the new model(s) being proposed. Alternatively, each model can be defined in its own file viz.<model_name>.py. -
training/: Contains scripts for training and evaluating models.train.pyhandles the training loop,metrics.pydefines evaluation metrics. -
evaluation/: Containsevaluate.pyfor evaluating trained models on test datasets. -
notebooks/: Jupyter notebooks for exploratory data analysis (EDA), demos, reviewing results, etc. -
utils.py: A utility module for helper functions used across the project.
-
-
results/: Directory to store model outputs, logs, and visualizations. -
data/: Directory for raw and processed datasets. -
stdout/: Directory to store standard output logs from training and evaluation runs. -
requirements.txt: Lists the Python dependencies for the project. -
.gitignore: Specifies files and directories to be ignored by Git.
The Example
Check out this template at my repo cookiecutterdl4r.