Field Gaps¶

Use this section to articulate where the current body of research is limited, inconsistent, or incomplete. Focus on identifying precise methodological, mechanistic, or evidentiary shortcomings in the field you’re contributing to.

Underexplored or Overlooked Areas¶

What biological processes, model systems, or populations are understudied?
- Example: Limited transcriptomic profiling of pulmonary endothelial cells from Black or Indigenous populations.
Are there diseases, subtypes, or phenotypes that lack genomic or molecular characterization?
- Example: Rare pediatric-onset forms of systemic sclerosis lack genotype–phenotype correlation studies.
Are there assumptions in standard models that have not been empirically tested?
- Example: Assuming additive variant effects in polygenic risk scoring without modeling epistasis.

Replication and Reproducibility¶

Which published findings have not been independently validated?
- Example: Network-based biomarker signatures proposed in small discovery cohorts, lacking external validation.
Are there inconsistencies across studies or datasets that suggest technical or biological variability?
- Example: Discordant differential expression results across RNA-seq studies of the same disease due to batch effects or tissue sampling variability.
Are prior results based on small sample sizes, non-public data, or non-reproducible code?
- Example: Use of unpublished clinical cohorts with inaccessible phenotype definitions or restricted pipelines.

Mechanistic Gaps¶

What molecular or cellular mechanisms remain ambiguous or speculative?
- Example: Unclear downstream effects of non-coding variants in enhancers regulating immune genes in autoimmune disease.
Are there known associations (e.g., variant–phenotype, gene–pathway) lacking causal explanation?
- Example: Association between mitochondrial dysfunction and autism spectrum disorder without a mechanistic bridge from transcriptome to phenotype.
Where do data integration approaches fail to resolve underlying mechanisms?
- Example: Single-cell and bulk RNA-seq integration yielding inconsistent pathway enrichment due to lack of normalization for cell composition.

Computational and Technical Gaps¶

Are there limitations in current algorithms, tools, or pipelines (e.g., scalability, transparency, interpretability)?
- Example: Graph neural networks for drug repurposing often lack explainable outputs or uncertainty quantification.
Is there a lack of standardization across datasets, ontologies, or metadata formats?
- Example: Inconsistent phenotype naming across different rare disease consortia makes cross-cohort analysis difficult.
Are bias and generalizability concerns well-addressed in ML or statistical models?
- Example: Deep learning models trained on TCGA data may underperform on underrepresented populations due to sampling bias.

Opportunity for Contribution¶

What would a successful study add to the field?
- Example: A reproducible, phenotype-aware pipeline that integrates rare variant burden with tissue-specific co-expression networks.
How could your work fill one or more of these gaps with rigor?
- Example: Using matched unaffected carriers and cases to identify transcriptional signatures that define resilience, validated across independent cohorts.