Design Type¶

Select the overarching structure of your study. You may refine this later based on data availability and the types of inference required.

Experimental – An intervention (e.g., gene knockout, compound exposure) is applied to samples or systems with random assignment to treatment or control conditions. Maximizes internal validity and supports causal inference.
- Example: Randomized CRISPR-based perturbation of transcription factors in iPSC-derived cardiomyocytes to assess regulatory cascade dynamics.
Quasi-experimental – No full randomization, but structured comparisons exist (e.g., batch effects, pre/post treatment, matched controls). Common in studies constrained by patient groups, ethical considerations, or data reuse.
- Example: Pre/post longitudinal RNA-seq analysis of PBMCs from patients before and after anti-TNF therapy initiation, without random assignment.
Observational – Biological or clinical variation is measured without any applied intervention. This includes case-control, cohort, or cross-sectional studies using primary or secondary data (e.g., GTEx, TCGA, dbGaP).
- Example: Integrating public TCGA RNA-seq data to identify expression signatures associated with TP53 mutation status across cancer types.

Design Rationale¶

Why is this design suitable for the biological or clinical system under study?
Does your study involve interventions (e.g., CRISPR edits, drug screens), or are you analyzing pre-existing omics datasets?
Are patient samples, tissues, or cell models pre-stratified by disease, genotype, or exposure?
- Example: Matched case-control study of patients with idiopathic pulmonary fibrosis vs. unaffected siblings.
Which components of causal inference are strengthened or constrained by your design?
- Example: Temporal resolution in a pre/post study supports causal ordering but lacks randomization.
What assumptions (e.g., no unmeasured confounding, correct time-ordering, linearity) are critical to justify your conclusions?
- Example: Assuming no differential misclassification of exposure across outcome groups.
What tradeoffs are you accepting in exchange for feasibility, ethics, or biological realism?
- Example: Using existing tissue biobank samples limits control over experimental consistency but enables access to rare phenotypes.

What are your negative and/or positive controls (e.g., vehicle-treated samples, isogenic wild-type lines, healthy donors)?
- Example: Untreated cell line replicates serve as negative controls; cells treated with known apoptosis inducer serve as positive control.
Are technical replicates, batch corrections, or normalization strategies in place to mitigate artifacts?
- Example: ComBat used to adjust for batch effects across multiple sequencing runs.
What threats to validity are most relevant in your context (e.g., batch effects, immortal time bias, collider stratification)?
- Example: Case/control imbalance across sequencing lanes introduces potential confounding.
How will you quantify or detect confounding? Are you planning to use statistical adjustment (e.g., covariate inclusion, inverse probability weighting)?
- Example: Include sex, age, and batch as covariates in DESeq2 model.
Are your comparison groups biologically or clinically meaningful, or are they constructed for statistical contrast?
- Example: Defining "resilient" vs. "affected" individuals with the same genotype to isolate protective transcriptomic signatures.