Skip to content

Design Type

Select the overarching structure of your study. You may refine this later based on data availability and the types of inference required.

  • Experimental – An intervention (e.g., gene knockout, compound exposure) is applied to samples or systems with random assignment to treatment or control conditions. Maximizes internal validity and supports causal inference.

    • Example: Randomized CRISPR-based perturbation of transcription factors in iPSC-derived cardiomyocytes to assess regulatory cascade dynamics.
  • Quasi-experimental – No full randomization, but structured comparisons exist (e.g., batch effects, pre/post treatment, matched controls). Common in studies constrained by patient groups, ethical considerations, or data reuse.

    • Example: Pre/post longitudinal RNA-seq analysis of PBMCs from patients before and after anti-TNF therapy initiation, without random assignment.
  • Observational – Biological or clinical variation is measured without any applied intervention. This includes case-control, cohort, or cross-sectional studies using primary or secondary data (e.g., GTEx, TCGA, dbGaP).

    • Example: Integrating public TCGA RNA-seq data to identify expression signatures associated with TP53 mutation status across cancer types.

Design Rationale

  • Why is this design suitable for the biological or clinical system under study?
  • Does your study involve interventions (e.g., CRISPR edits, drug screens), or are you analyzing pre-existing omics datasets?
  • Are patient samples, tissues, or cell models pre-stratified by disease, genotype, or exposure?
    • Example: Matched case-control study of patients with idiopathic pulmonary fibrosis vs. unaffected siblings.
  • Which components of causal inference are strengthened or constrained by your design?
    • Example: Temporal resolution in a pre/post study supports causal ordering but lacks randomization.
  • What assumptions (e.g., no unmeasured confounding, correct time-ordering, linearity) are critical to justify your conclusions?
    • Example: Assuming no differential misclassification of exposure across outcome groups.
  • What tradeoffs are you accepting in exchange for feasibility, ethics, or biological realism?
    • Example: Using existing tissue biobank samples limits control over experimental consistency but enables access to rare phenotypes.

Control Strategy

  • What are your negative and/or positive controls (e.g., vehicle-treated samples, isogenic wild-type lines, healthy donors)?
    • Example: Untreated cell line replicates serve as negative controls; cells treated with known apoptosis inducer serve as positive control.
  • Are technical replicates, batch corrections, or normalization strategies in place to mitigate artifacts?
    • Example: ComBat used to adjust for batch effects across multiple sequencing runs.
  • What threats to validity are most relevant in your context (e.g., batch effects, immortal time bias, collider stratification)?
    • Example: Case/control imbalance across sequencing lanes introduces potential confounding.
  • How will you quantify or detect confounding? Are you planning to use statistical adjustment (e.g., covariate inclusion, inverse probability weighting)?
    • Example: Include sex, age, and batch as covariates in DESeq2 model.
  • Are your comparison groups biologically or clinically meaningful, or are they constructed for statistical contrast?
    • Example: Defining "resilient" vs. "affected" individuals with the same genotype to isolate protective transcriptomic signatures.