Running the Pipeline: Production Execution
Source:vignettes/running_the_pipeline.Rmd
running_the_pipeline.RmdNote: This vignette is currently in development and subject to change.
Introduction
This vignette covers production mode execution in Picard—running your pipeline for official analysis results.
Development and testing workflows are covered in Developing the Pipeline. Production mode adds rigorous validation, semantic versioning, and audit trails to ensure results are reproducible and suitable for publications or regulatory submissions.
The Production Pipeline
The official execution script is main.R in your project
root. It:
- Validates your code state (git clean, all changes committed)
- Increments your study version (semantic versioning)
- Runs the complete pipeline with all validations
- Creates a release branch for reproducibility
- Generates PR metadata for code review
- Saves production-quality results in a versioned folder
# Run production pipeline
source("main.R")When to Run Production
Run main.R for:
- Formal analysis runs: Official results for publications or regulatory submissions
- Final results: When you’re confident in the code and ready for version history
- Multi-database comparisons: Ensures consistency across databases
- Code review: Results go through PR review before acceptance
Production mode places versioned results in
exec/results/[database]/[version]/ (e.g.,
1.0.0/).
Running Production Mode
Prerequisites
Before running production mode:
-
Commit all changes:
git add .andgit commit -m "..." -
Be on develop branch (or feature branch):
git checkout develop -
Pull latest changes:
git pull - Verify configuration: Check config.yml for correctness
You can also do this by using the saveWork() function
which we describe in Developing
the Pipeline to save your work and prepare for production.
Basic Usage
# Navigate to study repository
setwd("~/studies/myStudy")
# Run production pipeline with patch version increment
source("main.R")
# When prompted, answer questions about version increment:
# What type of version change? [major/minor/patch]
# You typically choose: "patch" (bug fixes), "minor" (new analyses), "major" (breaking changes)Programmatic Production Execution
library(picard)
# Run production pipeline directly
execStudyPipeline(
configBlock = c("primaryDB", "secondaryDB"),
updateType = "minor" # Version increment type
)Understanding the Pipeline Workflow
Production execution follows four main phases:
Setup: Validate configuration, load execution settings, create output directories
Generate Cohorts: Load cohort and concept set manifests, validate all definitions exist, generate cohorts in database, retrieve cohort counts
Run Analysis Tasks: For each task in
analysis/tasks/, load configuration, execute task code, check for errors, record resultsPost-Processing: Generate version logs, create PR metadata, save PENDING_PR.md
Handling Errors and Failures
Production mode validates code state strictly. Common issues:
“Cannot run production pipeline on main branch!” -
Solution: Switch to develop: git checkout develop
“Code state validation failed - uncommitted changes”
- Solution: Commit all changes: git add . and
git commit -m "..."
“Cohort manifest not found” - Solution: See Loading Inputs
Reviewing Results
After production mode, results are organized in versioned folders:
exec/results/[database]/1.1.0/ # Version 1.1.0
├── 00_buildCohorts/
├── 01_firstAnalysisTask/
├── 02_secondAnalysisTask/
└── picard_log_1.1.0_*.txt
Plus additional files for code review:
PENDING_PR.md # PR details for manual review
NEWS.md # Updated with version info
Code Review Workflow
Production mode enables structured code review:
-
Run pipeline:
source("main.R")on develop branch - Review PENDING_PR.md: Check proposed version, changes logged in NEWS.md
-
Review code: Inspect changes on release branch:
git checkout release/1.1.0 - Create PR: Use details from PENDING_PR.md to create PR in GitHub/Bitbucket
- Merge: After review and approval, merge to main
-
Cleanup: Run
clearPendingPR()to remove metadata file
Integration with Git
Git Branches
Production mode:
- Creates a release branch:
release/[version] - Runs pipeline on that branch
- Saves PR metadata pointing to main
- Expects manual PR creation and merge
main ←──── PR from release/1.1.0 ─── release/1.1.0
↑ ↑
│ └─ Production run here
│ (all commits included)
└────────────────────────────────────────── Merged after review
Monitoring Pipeline Execution
Log Files
Picard creates detailed execution logs in
exec/logs/:
-
Production run:
picard_log_1.1.0_*.txt
Review logs to understand which tasks ran, their duration, and any warnings:
[14:32:01] Starting cohort generation...
[14:32:15] Cohort generation completed successfully!
[14:32:16] Executing task 1/3: 01_descriptiveStats.R
[14:33:42] ✓ Task completed successfully
[14:33:43] Executing task 2/3: 02_primaryAnalysis.R
Troubleshooting
“Tasks not running in expected order”
Picard runs tasks in alphabetical order. Ensure file names have numeric prefixes:
01_buildCohorts.R ✓ Runs first
02_descriptiveAnalysis.R ✓ Runs second
03_primaryAnalysis.R ✓ Runs third
analysis_task.R ✗ Runs last (no prefix)
“Results folder not created”
Manually create output folder:
# Ensure output structure exists
exec_path <- fs::path(here::here(), "exec/results/primary_db/1.0.0")
fs::dir_create(exec_path, recurse = TRUE)“Previous version results disappeared”
Results are organized by version in
exec/results/[database]/[version]/. Check different version
folders:
# List all version folders
list.dirs("exec/results/primary_db", recursive = FALSE)Next Steps
- Develop and test: Use Developing the Pipeline workflows
- Verify code quality: Ensure all tasks run successfully and produce expected results
-
Run production: When ready for official results,
use
main.R - Review and merge: Follow code review workflow before accepting to main branch
-
Archive results: Use
zipAndArchive()to preserve important results
See Also
- Developing the Pipeline - Testing and iteration during development
- The Picard Repository Structure - Where results are organized
- Launching a Study - Initial setup
- Loading Inputs - Cohort and concept set setup
- Post-Processing Steps - Working with results after execution