Orchestrate Pipeline Export with Merging and QC
Source:R/postProcess.R
orchestratePipelineExport.RdOrchestrates complete pipeline export process: merges results across all tasks for a specified pipeline version, generates reference files (cohortKey, databaseInfo, schema_review), runs QC validation on cohort completeness, and generates execution metadata.
Arguments
- pipelineVersion
Character. Pipeline version (e.g., "1.0.0")
- dbIds
Character vector of database configuration IDs from config.yml
- resultsPath
Character. Path to results root folder. Defaults to "exec/results"
- exportPath
Character. Path where combined results will be saved. Defaults to "dissemination/export/merge"
- cohortsFolderPath
Character. Path to cohorts folder for the CohortManifest. Defaults to "inputs/cohorts". If the path exists and contains a cohort manifest, generates a cohortKey reference file with id, label, and tags.
Value
Data frame summarizing all merged tasks with columns:
taskName: Name of the task
fileCount: Number of result files found for that task
totalRows: Total rows across all result files
filesExported: Comma-separated list of exported file names
Details
The function orchestrates the complete pipeline export:
Validates code state (git commit must be clean)
Validates environment state and snapshots renv.lock
Discovers tasks for the specified pipeline version
Merges results across all databases for each task via importAndBind()
Generates reference files: cohortKey.csv, databaseInfo.csv
Reviews schema of exported files (schema_review.csv)
Validates cohort completeness (qc_cohortValidation.csv)
Generates execution metadata (qc_processMeta.csv)
Output files created in version export folder:
Merged result CSVs (per task)
cohortKey.csv: Cohort reference with ids and metadata
databaseInfo.csv: Databases included in merge operation
schema_review.csv: Column-level inspection of all files
qc_cohortValidation.csv: Cohort completeness validation results
qc_processMeta.csv: Execution metadata and summary statistics
executionTimestamp: When the export ran
pipelineVersion: Version being exported
codeCommitSha: Git commit SHA of code at execution time
lockfileHash: Hash of renv.lock for dependency reproducibility
filesExported: Comma-separated list of exported file names
The function:
Scans the first database's version folder to discover available tasks
For each task found, calls importAndBind() to merge across databases
Tracks which files were successfully merged
Returns a summary data frame of the merge operation
Expected folder structure: