The Ulysses Standard Repository Structure
Source:vignettes/picard_repository_structure.Rmd
picard_repository_structure.RmdIntroduction
Picard is built on the Ulysses framework, which follows the philosophy that real-world evidence (RWE) studies should be organized and versioned as software projects. Just as databases require a schema to organize data, RWE studies benefit from a standard directory structure to organize code, inputs, analyses, and outputs. This standardization enables:
- Reproducibility: Anyone can understand the project structure at a glance
- Collaboration: Multiple contributors follow the same conventions
- Automation: Consistent organization enables reliable workflows and tooling
- Version control: Clear separation of concerns makes git history more meaningful
Picard uses the Ulysses repository structure, adding specialized
directories and configuration for cohort-based studies, Evidence
Generation Plans, and results dissemination. This vignette describes the
standard Ulysses repository structure created when you initialize a
project using launchUlyssesRepo().
Pipeline Workflow and Folder Organization
The Ulysses repository organizes folders to match the flow of a real-world evidence study:
┌─────────────────────────────────────────────────────────────────┐
│ 1. META & CONFIG │
│ (config.yml, README.md, NEWS.md, main.R, test_main.R) │
│ ↓ │
│ 2. INPUTS │
│ inputs/cohorts/ + inputs/conceptSets/ │
│ (Define phenotypes, cohorts, covariates) │
│ ↓ │
│ 3. ANALYSIS │
│ analysis/tasks/ + analysis/src/ + analysis/migrations/ │
│ (Execute analyses, generate statistics) │
│ ↓ │
│ 4. EXECUTION OUTPUT │
│ exec/results/[database]/[version]/[task]/ │
│ (Raw results by task, database, version) │
│ ↓ │
│ 5. DISSEMINATION │
│ dissemination/export/ + dissemination/quarto/ │
│ (Format results, create Study Hub website) │
└─────────────────────────────────────────────────────────────────┘
Workflow sequence: 1. Initialize with metadata (config.yml defines databases and credentials) 2. Load or create inputs (cohorts and concept sets) 3. Execute analysis tasks (code in analysis/tasks runs using inputs) 4. Raw results written to exec/results organized by database and version 5. Post-processing (migrations) and formatting (excel, CSV) 6. Dissemination via Study Hub (quarto website) and formatted exports
Ulysses Repository Outline
A newly initialized Picard study has the following high-level structure:
study-repository/
├── analysis/ # Study analysis code and workflows
├── inputs/ # Cohort definitions and concept sets
├── dissemination/ # Results and evidence outputs
├── exec/ # Execution artifacts and logs
├── docs/ # Generated documentation (pkgdown)
├── extras/ # Reference scripts and development files
├── config.yml # Project configuration
├── main.R # Production pipeline execution script
├── README.md # Project overview
├── NEWS.md # Release notes and changelog
├── .gitignore # Git configuration
└── study.Rproj # RStudio project file
Elements of the Ulysses Repository
Vital Files
These files are essential to the project and should be maintained throughout its lifecycle.
project.Rproj
An RStudio project file that configures the working directory and development environment. The Ulysses structure uses this to ensure consistent behavior across team members. When you open this file in RStudio, the working directory is automatically set to the project root.
study-repository.Rproj
If using VS Code, Picard supports project detection through
.code-workspace files and agent instructions.
.gitignore
Prevents sensitive files and intermediate outputs from being
committed to version control. The standard .gitignore for
Ulysses repository includes:
-
renv/- Local package library snapshots (renv-specific files) -
exec/- Execution results and temporary files -
.env- Environment variables and credentials -
*.log- Log files - RStudio temporary files (
.Rhistory,.RData, etc.)
This ensures that your git repository contains only source code and documentation, not generated outputs or sensitive credentials.
README.md
The README serves as the project’s front door, communicating key study information: description, objectives, key personnel, status, and links to vital resources. The README includes:
- Study metadata: Title, ID, start/end dates, study type, therapeutic area
- Status badges: Current version and project status
- Tags: Keywords for searching similar studies within your organization
- Links: References to ATLAS, protocols, publications, and repository
Ulysses auto-generates a README template when you launch a project. You should customize the Study Description section to clearly explain your research question and study design, following your organization’s documentation standards.
Example README.md structure (auto-generated):
# Cardiovascular Risk Assessment in Diabetes (Id: myStudyRepo)
<!-- badge: start -->


<!-- badge: end -->
## Study Information
- Study Id: myStudyRepo
- Study Title: Cardiovascular Risk Assessment in Diabetes
- Study Start Date: 2026-04-07
- Expected Study End Date: 2028-04-07
- Study Type: Cohort Study
- Therapeutic Area: Cardiovascular/Endocrinology
## Study Description
Add a short description about the study!
## Contributors
- Jane Doe, Institution Name
## Study Links
- [ATLAS Cohort Definitions](https://atlas.example.com/)
- [Study Repository](https://github.com/org/repo)The README is auto-generated when you launch a study using
launchUlyssesRepo(). You should edit the Study Description
section to explain your research question and study design. Update the
badges and version as your study progresses.
NEWS.md
Tracks changes across study versions. When you run a production
pipeline with execStudyPipeline(), the Ulysses workflow
automatically updates NEWS.md with version information and change
summaries. This creates an audit trail of what changed in each
release.
Format follows semantic versioning conventions:
config.yml
Central configuration file specifying parameters needed to establish database connections. Uses YAML format with two section types:
- default: Universal study settings (project name, version)
- block headers: Database-specific configurations (dbms, credentials, schemas)
When you source a block header in a task file, the pipeline runs using only that block’s configuration, enabling multi-database studies.
Important: Connection details vary by database system. The codebase distinguishes between:
-
Snowflake: Uses
connectionStringformat (JDBC connection string) -
PostgreSQL, SQL Server, MySQL, Oracle, Redshift:
Use
serverandportfields
Protecting Credentials with !expr:
The !expr tag (from the config package) allows you to
evaluate R code within the config file. This is critical for security:
it enables pulling credentials from environment variables rather than
storing them as plain text in config.yml.
user: !expr Sys.getenv('dbUser') # Evaluates R code: retrieves DB_USER from environment
password: !expr Sys.getenv('dbPassword') # Evaluates R code: retrieves DB_PASSWORD from environmentYou can use any R function wrapped in !expr to retrieve
credentials, including:
-
Environment variables:
!expr Sys.getenv('VAR_NAME') -
Keyring package:
!expr keyring::key_get(service = 'picard', username = 'atlasUser') -
Custom functions:
!expr my_secure_fetch_credential('db_password')
Best practice: Always use !expr with a
secure credential storage system. Never store passwords or connection
strings as plain text in config.yml or commit them to git.
Common credentials:
-
dbms: Database type (snowflake, sql server, postgresql, mysql, oracle, redshift) -
user: Database username (from environment variable via!expr Sys.getenv()) -
password: Database password (from environment variable) -
databaseName: Internal reference name (snake_case with database + snapshot date) -
databaseLabel: Pretty name for output formatting -
cdmDatabaseSchema: Schema containing OMOP CDM tables (format:schemaordatabase.schema) -
vocabDatabaseSchema: Schema containing vocabulary tables (usually same as cdmDatabaseSchema) -
workDatabaseSchema: Schema where user has write access (for cohort tables and intermediary work) -
tempEmulationSchema: Optional schema for temp tables (snowflake, oracle) -
cohortTable: Name of cohort table to create (default:{repoName}_{databaseName})
Example config.yml with Snowflake and PostgreSQL:
# Config File for my_study
default:
projectName: my_study
version: 1.0.0
# Snowflake: Uses connectionString format
snowflake_prod:
dbms: snowflake
connectionString: !expr Sys.getenv('dbConnectionString')
user: !expr Sys.getenv('dbUser')
password: !expr Sys.getenv('dbPassword')
databaseName: snowflake_prod_20260101
databaseLabel: Snowflake Production
cdmDatabaseSchema: omop_schema
vocabDatabaseSchema: omop_schema
workDatabaseSchema: work_schema
tempEmulationSchema: temp_schema
cohortTable: my_study_cohorts
# PostgreSQL: Uses server/port format
postgres_local:
dbms: postgresql
server: localhost
port: 5432
user: !expr Sys.getenv('pgUser')
password: !expr Sys.getenv('pgPassword')
databaseName: postgres_local_20260101
databaseLabel: PostgreSQL Local
cdmDatabaseSchema: public
vocabDatabaseSchema: public
workDatabaseSchema: results
cohortTable: my_study_pg_cohortsSetting up environment variables:
In your .Renviron file (in project or home
directory):
dbUser=your_db_username
dbPassword=your_db_password
dbConnectionString=jdbc:snowflake://account.snowflakecomputing.com:443
pgUser=postgres_user
pgPassword=postgres_password
Load before running pipeline:
readRenviron("~/.Renviron")
main.R
The primary execution script for running the study pipeline in production mode. This is the script team members run to execute the full study workflow. The Ulysses workflow generates this file based on your project configuration.
See Running the Pipeline for detailed information about main.R and the execution workflow.
Analysis Folder
Contains the study code organized into executable analysis tasks.
analysis/tasks/
Individual R scripts that perform analytical steps. Each task is a self-contained unit that:
- Loads necessary inputs (cohorts, concept sets, configuration)
- Performs a specific analytical step
- Saves results to a standardized output location
Note: Cohort generation is a built-in Picard feature handled automatically when you run the pipeline. You do not create a cohort generation task file. Tasks start after cohorts are generated. See Running the Pipeline for details on pipeline execution and cohort generation.
Tasks are named sequentially (01_, 02_, etc.) and executed in order:
analysis/tasks/
├── 01_descriptiveStats.R
├── 02_primaryAnalysis.R
├── 03_sensitivityAnalysis.R
Each task is independent and can be tested individually during
development using testStudyTask().
Task File Format and Validation:
Picard enforces a standardized task file structure to ensure
consistency and enable add-on modules to import external task files. Use
makeTaskFile() to create new tasks—it automatically
generates files in the correct format.
Required task file sections (validated by
validateStudyTask()):
- A. Meta: Metadata about the task (title, author, description, purpose)
- B. Dependencies: Input files or objects required by the task
-
C. Connection Settings: Configuration block name
and pipeline version (uses template variables
!||configBlock||!and!||pipelineVersion||!) -
D. Task Settings: Setup section where you:
- Create
executionSettings <- createExecutionSettingsFromConfig(configBlock = configBlock) - Create
outputFolder <- setOutputFolder(executionSettings = executionSettings, pipelineVersion = pipelineVersion, taskName = "task_name")
- Create
- E. Script: Actual analysis code that performs the task
Validation Rules:
All task files must pass validation before execution:
- File must exist and be readable
- All five sections (A, B, C, D, E) must be present
- Template variables
!||configBlock||!and!||pipelineVersion||!must be defined in section C - Section D must create an
executionSettingsobject viacreateExecutionSettingsFromConfig() - Section D must create an
outputFolderobject viasetOutputFolder() - Section E must contain actual code (not just comments or template placeholders)
Why Validation Matters:
Standardized task files enable:
- Consistency: All tasks follow the same structure across your study
- Portability: Add-on modules and packages can import external tasks that conform to the format
- Reproducibility: Clear documentation of dependencies and configuration
- Automation: The pipeline can reliably execute tasks knowing they meet structural requirements
Example: If you have a task that follows the required format, you could import it from another Picard study or an add-on package rather than rewriting it.
analysis/src/
Reusable functions and helpers for custom logic specific to your
study. When external packages (like CohortPrevalence,
FeatureExtraction, SelfControlledCaseSeries)
provide the required functionality, use those directly—no need for
custom helpers. Use analysis/src/ only for functions you’ve
written that don’t have a package namespace.
Organize by functionality:
analysis/src/
├── cohortHelpers.R # Custom cohort manipulation or validation functions
├── analysisHelpers.R # Custom statistical or analysis utilities
├── outputHelpers.R # Custom table formatting or export functions
└── diagnostics.R # Custom validation or diagnostic functions
Example: - ✅ Use library functions:
library(CohortPrevalence) → provides
computePrevalence() - ✅ Write custom helper: You create
customPrevalence() function to run bespoke prevalence tool
→ save in prevHelpers.R
Important: Functions in src/ must be
sourced in the task files that use them. In your task’s section B
(Dependencies), source the helper files:
# Section B. Dependencies
source(here::here("analysis/src/cohortHelpers.R"))
source(here::here("analysis/src/outputHelpers.R"))This ensures all dependencies are explicit and documented, making task execution more transparent and reproducible.
analysis/migrations/
Post-processing scripts that clean and reshape pipeline results for
dissemination. Migrations are numbered to correspond with their source
analysis tasks (e.g., 02_migrate_surveillance.R cleans
output from task 02_surveillance.R).
Purpose:
After orchestratePipelineExport() binds raw results with
metadata, migrations handle data wrangling tasks that weren’t necessary
during execution:
- Aggregating: Combine results across subgroups or time periods
- Standardizing: Apply demographic weighting or statistical adjustments
- Pivoting: Reshape long format to wide for publication tables
- Deriving: Calculate new metrics (confidence intervals, effect size categories, standardized rates)
- Filtering: Remove rows below minimum cell counts or meeting exclusion criteria
Workflow:
Task 02: Pipeline Execution
↓
orchestratePipelineExport() → raw results to dissemination/export/merge/
↓
Migration 02: Data Wrangling (02_migrate_surveillance.R)
↓
dissemination/export/pretty/ → finalized, publication-ready results
Example Structure:
analysis/migrations/
├── 02_migrate_surveillance.R
│ # Aggregate surveillance counts by age/sex strata
│ # Apply census weighting for standardization
│ # Generate crude and standardized prevalence/incidence rates
├── 03_migrate_comparative.R
│ # Pivot comparative analysis results wide
│ # Calculate confidence intervals and p-values
└── 05_migrate_sensitivity.R
# Combine sensitivity analysis variants into summary table
Each migration reads from dissemination/export/merge/
(raw exported results), performs transformations, and writes cleaned
data to dissemination/export/pretty/ for final
dissemination.
Inputs Folder
This folder stores cohort definitions and concept sets that define the study populations and other components for the analysis.
inputs/cohorts/
Cohort definitions defining the study populations. A cohort is a set of persons who satisfy one or more inclusion criteria for a duration of time. In OHDSI research, cohorts are the foundation of observational studies.
Cohort Definition Types:
-
CIRCE-Based Definitions (JSON): Standard OHDSI
approach
- Uses CIRCE-BE Java library for standardized representation
- Stored as JSON files in
json/folder - Ensures consistent serialization to SQL: same definition always generates identical populations
- Typically imported from ATLAS
-
Custom SQL Definitions: For specialized logic
- Manual SQL queries in
sql/folder - Used when CIRCE cannot express the logic you need
- Manual SQL queries in
-
Dependency-Based Cohorts: Derived from existing
cohorts
- Subsets of existing cohorts (apply additional inclusion criteria)
- Unions of multiple cohorts (combine populations)
- Stored in
sql/folder or as derivative definitions
Folder Structure:
inputs/cohorts/
├── json/
│ ├── 001_primaryPopulation.json # CIRCE-based from ATLAS
│ ├── 002_comparativeArm.json # CIRCE-based from ATLAS
│ └── 003_outcomeDefinition.json # CIRCE-based from ATLAS
├── sql/
│ ├── 004_primarySubset.sql # Subset of cohort 001
│ └── 005_combinedPopulation.sql # Union of cohorts 001 & 002
├── cohortsLoad.csv # Metadata index for cohort enrichment
└── cohortManifest.sqlite # Provenance & metadata tracking database
cohortsLoad.csv:
A CSV file with metadata for each cohort. When
loadCohortManifest() is called, this file is used to enrich
CohortDef objects by matching file_name with actual cohort
files. Used to track where cohorts came from and organize them with
tags.
Columns: - atlasId: ATLAS cohort ID (integer, e.g.,
1, 42) - label: Display name
(character, e.g., "Type 2 Diabetes patients") -
category: Broad grouping (character, e.g.,
"Disease Populations") - subCategory:
Sub-grouping (character, optional, e.g., "Endocrine") -
file_name: Relative path to JSON file (character, e.g.,
"json/t2dm_patients.json")
Example:
atlasId,label,category,subCategory,file_name
1,Type 2 Diabetes,Disease Populations,Endocrine,json/t2dm_patients.json
2,Diabetes Complications,Disease Populations,Endocrine,json/diabetes_complications.json
When loaded, these metadata fields are converted to tags on each
cohort for later querying (e.g., $getCohortsByTag(),
$getCohortsByLabel()).
cohortManifest.sqlite:
SQLite database created/managed by CohortManifest class.
Contains cohort_manifest table tracking all cohort
metadata:
| Column | Type | Purpose |
|---|---|---|
id |
INTEGER PRIMARY KEY | Sequential cohort ID assigned by Picard |
label |
TEXT NOT NULL | Cohort display name |
tags |
TEXT | Serialized tags (e.g.,
"atlasId: 1 \| category: Disease Populations \| subCategory: Endocrine") |
filePath |
TEXT NOT NULL | Full path to cohort definition file |
hash |
TEXT NOT NULL | MD5 hash of SQL for change detection |
cohortType |
TEXT DEFAULT ‘circe’ | Type: 'circe' (ATLAS JSON), 'sql'
(custom), 'subset', 'union',
'complement' (dependency-based) |
timestamp |
DATETIME DEFAULT CURRENT_TIMESTAMP | When cohort was added to manifest |
status |
TEXT DEFAULT ‘active’ | Status tracking: 'active', 'missing',
'archived'
|
deleted_at |
DATETIME | Soft-delete timestamp if cohort removed |
Workflow:
- Create
cohortsLoad.csvwith metadata for your cohorts (usecreateBlankCohortsLoadFile()) - Import ATLAS cohort JSON definitions to
json/folder (useimportAtlasCohorts()) - Add custom SQL or dependency-based cohorts to
sql/(e.g., subsets, unions) - Call
loadCohortManifest()to scan directories and enrich withcohortsLoad.csvmetadata - First load creates
cohortManifest.sqlitedatabase; subsequent loads verify file hashes - Use CohortManifest methods to query cohorts:
$getCohortById(1),$getCohortsByTag("category: Disease Populations")
See also: Loading Inputs for detailed guidance on creating, importing, and managing cohort manifests.
inputs/conceptSets/
Stores CIRCE-based concept set definitions for identifying cohorts and extracting covariates:
inputs/conceptSets/
├── json/
│ ├── exposure_antidiabetic.json # Drug exposure concept set
│ ├── outcome_mi.json # Condition outcome concept set
│ └── covariate_hypertension.json # Covariate measurement concept set
├── conceptSetsLoad.csv # Metadata index for concept set enrichment
└── conceptSetManifest.sqlite # Provenance & metadata tracking database
conceptSetsLoad.csv:
Similar to cohortsLoad.csv, this CSV provides metadata for enriching
ConceptSetDef objects. Columns: - atlasId:
ATLAS concept set ID (integer, e.g., 456, 789)
- label: Display name (character, e.g.,
"Antidiabetic medications") - category: Broad
grouping (character, e.g., "Medications" or
"Diagnoses") - subCategory: Optional
sub-grouping (character, e.g., "Endocrine Drugs") -
domain: OMOP clinical domain (required, character): -
drug_exposure - Medication/drug concept sets -
condition_occurrence - Diagnosis concept sets -
measurement - Lab/test result concept sets -
procedure - Medical procedure concept sets -
observation - Observation concept sets -
device_exposure - Device/equipment concept sets -
visit_occurrence - Visit type concept sets -
init - Not yet classified (placeholder) -
sourceCode: Rarely used; TRUE if concept set represents
source codes instead of standard concepts (character: TRUE
or FALSE) - file_name: Relative path to JSON
file (character, e.g., "json/hypertension.json")
Example:
atlasId,label,category,subCategory,domain,sourceCode,file_name
456,Antidiabetic Medications,Medications,Endocrine,drug_exposure,FALSE,json/exposure_antidiabetic.json
789,Acute MI,Diagnoses,Cardiovascular,condition_occurrence,FALSE,json/outcome_mi.json
1001,Hypertension,Diagnoses,Cardiovascular,condition_occurrence,FALSE,json/covariate_hypertension.json
When loaded, metadata fields are converted to tags on each concept
set for querying (e.g., $getConceptSetsByTag(),
$getConceptSetsByLabel()).
conceptSetManifest.sqlite:
SQLite database created/managed by ConceptSetManifest
class. Contains concept_set_manifest table:
| Column | Type | Purpose |
|---|---|---|
id |
INTEGER PRIMARY KEY | Sequential concept set ID assigned by Picard |
label |
TEXT NOT NULL | Concept set display name |
tags |
TEXT | Serialized tags (e.g.,
"atlasId: 456 \| domain: drug_exposure \| category: Medications") |
filePath |
TEXT NOT NULL | Full path to concept set JSON file |
hash |
TEXT NOT NULL | MD5 hash of JSON for change detection |
timestamp |
DATETIME DEFAULT CURRENT_TIMESTAMP | When concept set added to manifest |
status |
TEXT DEFAULT ‘active’ | Status: 'active', 'missing',
'archived'
|
deleted_at |
DATETIME | Soft-delete timestamp if removed |
Workflow:
- Create or update
conceptSetsLoad.csvwith metadata (usecreateBlankConceptSetsLoadFile()) - Import ATLAS concept set JSON definitions to
json/folder (useimportAtlasConceptSets()) - Call
loadConceptSetManifest()to scan directories and enrich withconceptSetsLoad.csvmetadata - First load creates
conceptSetManifest.sqlitedatabase; subsequent loads verify file hashes - Use ConceptSetManifest methods to query:
$getConceptSetById(456),$getConceptSetsByTag("domain: drug_exposure") - Extract source codes and dependencies using
$extractSourceCodes()(requires ExecutionSettings)
See also: Loading Inputs for detailed guidance on creating, importing, and managing concept set manifests.
Dissemination Folder
This folder organizes results, evidence outputs, and documentation for sharing with stakeholders. Contains three main subdirectories.
dissemination/documents/
Static written reports and supplementary materials (PDFs, Word docs, etc.); anything that provides narrative context or detailed information for readers but isn’t generated directly from the pipeline. Examples include:
dissemination/documents/
├── mainReport.docx
├── supplementaryMaterial.pdf
└── analysisProtocol.md
dissemination/quarto/
In Picard, the Study Hub is the primary dissemination format for sharing results with stakeholders. A Study Hub is an interactive website that communicates study objectives, analytical assumptions, and final results in a unified, professional HTML format. Quarto is the tool used to construct these Study Hub websites, enabling you to weave together narrative text, analysis code, and results into linked HTML pages that are reproducible and automatically updated when data changes.
Quarto allows you to combine narrative text, R code, and results in documents that render to HTML, PDF, or Word. Quarto files (.qmd) contain markdown text interspersed with code chunks that execute when the document is rendered, automatically embedding results directly into the report. This ensures your documentation always reflects the latest data and findings.
Pre-formatted on initialization: When you initialize
a Picard project, the dissemination/quarto/ folder is
automatically set up with a standard Study Hub structure including
template files:
dissemination/quarto/
├── _quarto.yml # Quarto configuration for website
├── index.qmd # Landing page (created from README.md)
├── news.qmd # News/changelog page (created from NEWS.md)
├── egp.qmd # Evidence Generation Plan template
├── results.qmd # Results template for data integration
├── style.css # Custom CSS styling for the website
├── R/ # Helper functions directory
├── images/ # Images and figures directory
└── _site/ # Rendered HTML output (generated upon build)
Use buildStudyHub() to render your documentation into
the _site/ folder, which creates the final interactive
website.
dissemination/export/
Results exported from the pipeline via
orchestratePipelineExport() and processed into various
formats for dissemination:
dissemination/export/
├── merge/ # Raw merged results from orchestratePipelineExport()
│ ├── cohortKey.csv # Cohort definitions (generated)
│ ├── databaseInfo.csv # Database metadata (generated)
│ ├── schema_review.csv # Schema validation (generated)
│ ├── task_01_results.csv
│ ├── task_02_results.csv
│ └── task_03_results.csv
├── pretty/ # Formatted results from migration scripts (Excel, CSV, etc.)
│ ├── mainResults.xlsx
│ ├── sensitivity_analyses.xlsx
│ └── suppTable1_demographics.xlsx
└── studyHubOutput/ # Files sourced by the Study Hub for dynamic rendering
├── table1_demographics.csv
├── figure1_incidence.csv
└── results_summary.json
Workflow:
Merge phase:
orchestratePipelineExport(pipelineVersion, dbIds)reads raw results fromexec/results/across all databases and tasks for a given version. It combines results into long-format CSV files indissemination/export/merge/, along with reference files (cohortKey, databaseInfo, schema_review).-
Format phase: Migration scripts (e.g.,
02_migrate_surveillance.R) read fromdissemination/export/merge/and perform data wrangling, reshaping, and formatting. Output depends on use case:-
For publication/reports: Write formatted Excel/CSV
files to
dissemination/export/pretty/ -
For Study Hub: Write data tables and figures to
dissemination/export/studyHubOutput/
-
For publication/reports: Write formatted Excel/CSV
files to
-
Dissemination:
- Files in
pretty/are copied directly to publications or referenced by static Quarto reports - Files in
studyHubOutput/are sourced dynamically by Quarto files indissemination/quarto/to create interactive tables and figures in the Study Hub website
- Files in
Exec Folder
This folder contains execution artifacts and logs from running the pipeline. Results are organized by database, pipeline version, and task to support parallel development and multi-database studies.
exec/results/
Raw results from each pipeline execution, organized hierarchically:
exec/results/
├── primary_db/ # Created per config.yml database ID
│ ├── 1.0.0/ # Versioned results (semantic versions)
│ │ ├── 00_buildCohorts/
│ │ │ ├── cohortCounts.csv
│ │ │ ├── cohortResults.csv
│ │ │ └── buildLog.txt
│ │ ├── 01_descriptiveStats/
│ │ │ ├── demographics.csv
│ │ │ └── flowchart.csv
│ │ ├── 02_surveillance/
│ │ │ └── incidenceCounts.csv
│ │ └── 03_primaryAnalysis/
│ │ └── modelResults.csv
│ ├── 1.1.0/ # Another versioned run
│ │ ├── 00_buildCohorts/
│ │ └── ...
│ └── dev/ # Development/test results (temporary)
│ ├── 00_buildCohorts/
│ └── ...
└── secondary_db/
├── 1.0.0/
└── dev/
How results are organized:
-
Database folders: One folder per database
configured in
config.yml. Database folder name is snake_case version of thedatabaseNamefrom config (e.g.,optum_dod→optum_dod/) -
Version folders: Within each database, results are
organized by
pipelineVersion(e.g.,1.0.0/,1.0.1/). A specialdev/folder holds temporary test results -
Task folders: Within each version, one folder per
task created by
setOutputFolder()(e.g.,00_buildCohorts/,01_descriptiveStats/)
Task file execution: Each task script in
analysis/tasks/ creates an output folder via:
outputFolder <- setOutputFolder(
executionSettings = executionSettings,
pipelineVersion = pipelineVersion,
taskName = "00_buildCohorts"
)This creates:
exec/results/{databaseName}/{pipelineVersion}/{taskName}/
where task results (CSVs, logs, etc.) are written.
Development vs. Production: - Development
mode (test_main.R): Uses
pipelineVersion = "dev" so results go to
exec/results/{db}/dev/. Results are temporary and don’t
interfere with versioned runs. - Production mode
(main.R): Uses semantic versioning (e.g.,
pipelineVersion = "1.0.0") so results go to
exec/results/{db}/1.0.0/. Results are retained for
post-processing and archival.
Extras Folder
Reference scripts and development artifacts that support the study but aren’t part of the core pipeline.
extras/test_main.R
Development variant of main.R for rapid iteration during development.
Uses testStudyPipeline() instead of
execStudyPipeline(), skips validations, and places results
in the dev/ output folder.
See Running the Pipeline for details on using test_main.R.
Next Steps
Now that you understand the repository structure, the next steps depend on where you are in your project:
Just initialized a project? See Launching a Study for how to configure and launch a Picard study.
Setting up cohorts and concept sets? See Loading Inputs for working with cohort and concept set manifests.
Ready to execute analyses? See Running the Pipeline to learn about test mode vs. production execution.
Processing results? See Post-Processing Steps for organizing and exporting results.