Biology & Life Sciences

Long-Read Sequencing for Rare Disease Diagnosis: Closing the Diagnostic Gap

More than half of rare disease patients remain undiagnosed after standard short-read sequencing. Two 2025 studies demonstrate that long-read sequencing detects structural variants, repeat expansions, and epigenetic modifications invisible to conventional methods, solving 12-17% of previously intractable cases.

By ORAA Research
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Rare diseases collectively affect approximately 300-400 million people worldwide, yet obtaining a molecular diagnosis remains frustratingly difficult. Standard clinical genomics workflows rely on short-read sequencing (SRS), typically Illumina-based whole-genome or whole-exome sequencing, which reads DNA fragments of 150-300 base pairs. This technology excels at detecting single nucleotide variants and small insertions/deletions but struggles with the structural complexity that underlies many unsolved rare diseases. Two studies published in early 2025 demonstrate that long-read sequencing (LRS) can resolve cases that SRS cannot.

Why Short Reads Miss Variants

The human genome is not a simple linear sequence. It contains vast repetitive regions, segmental duplications, tandem repeat expansions, and complex structural rearrangements that cannot be reconstructed from short sequencing reads. When 150-bp fragments are mapped to a reference genome, reads originating from repetitive regions map ambiguously to multiple locations, effectively rendering these regions invisible to variant calling.

This is not a minor gap. Structural variants (SVs)โ€”deletions, duplications, inversions, and translocations larger than 50 bpโ€”account for more nucleotide differences between any two human genomes than single nucleotide variants do. Short tandem repeat (STR) expansions cause well-known diseases (Huntington's, Fragile X, myotonic dystrophy) and likely contribute to many unsolved cases. Additionally, some Mendelian disease genes reside in regions that SRS cannot reliably cover.

The Negi et al. Study: Nanopore Sequencing for 41 Families

Negi et al. (2025), published in the American Journal of Human Genetics, sequenced 98 samples from 41 families with suspected rare monogenic diseases using Oxford Nanopore Technology (ONT). Their Napu pipeline generated haplotype-resolved genome assemblies, phased variant calls, and methylation profiles from single flow cells achieving approximately 36x coverage with 32-kb read N50.

Key findings include:

  • Coverage of previously inaccessible genes. On average, LRS covered coding exons in approximately 280 genes and 5 known Mendelian disease-associated genes that SRS could not cover. These are genes where short reads either fail to map or produce unreliable calls.
  • Detection of additional rare variants. LRS detected structural variants, tandem repeat expansions, and complex rearrangements that were absent from SRS variant call sets. These include variants in clinically relevant size ranges (50 bp to several megabases) that fall in the gap between SRS-detectable indels and cytogenetically visible rearrangements.
  • Comprehensive phasing. LRS completely phased 87% of protein-coding genes, enabling unambiguous determination of which variants sit on the same haplotypeโ€”critical for compound heterozygosity assessment in recessive disorders.
  • Diagnostic yield. The team established diagnostic variants in 11 probands, with causes including de novo variants, compound heterozygous variants, large-scale SVs, and epigenetic modifications. The diversity of causal mechanisms underscores that LRS captures variant classes that SRS systematically misses.

The Steyaert et al. Study: HiFi Sequencing for 114 Families

Steyaert et al. (2025), published in Genome Research as part of the Solve-RD pan-European program, applied PacBio HiFi long-read sequencing at 10x coverage to 293 individuals from 114 genetically undiagnosed rare disease families. Of these, 93 families had exhausted prior testing for neurological, neuromuscular, or epilepsy disorders, and 21 families had so-called "unsolvable" syndromes.

Results were telling:

  • 12 novel genetic diagnoses were established through LRS, including de novo and rare inherited SNVs, indels, SVs, and STR expansions. Among these, an MCF2/FGF13 fusion and a PSMA3 deletion were identified as candidate disease-causing variants in additional families.
  • Disease-causing variants found in 11.8% of previously unsolved families, with candidate variants in another 5.4%.
  • No common genetic cause was identified in the 21 "unsolvable" syndrome families, suggesting these conditions may involve non-genetic factors, extremely rare variants not yet catalogued, or complex oligogenic interactions.
The diagnostic yield of approximately 12-17% (combining definitive and candidate diagnoses) in families that had already undergone extensive prior genetic testing is significant. These are not easy casesโ€”they represent the residuum after standard diagnostics failed.

Complementary Strengths

The two studies employ different LRS technologies with distinct characteristics:

<
FeatureONT (Negi et al.)HiFi (Steyaert et al.)
Read lengthUltra-long (32 kb N50)Long (10-20 kb typical)
Base accuracy~95-99% (single read)>99.9% (consensus)
MethylationNative, simultaneousRequires separate analysis
Coverage~36x (1 flow cell)~10x
Cost per sampleLowerHigher per base

ONT provides ultra-long reads and native methylation detection, enabling resolution of the most complex structural variants and direct epigenetic profiling. HiFi sequencing provides near-perfect per-read accuracy, reducing the need for high coverage and simplifying variant calling pipelines. The choice between platforms depends on the specific diagnostic question.

Critical Considerations

Cost remains a barrier. LRS is more expensive per sample than SRS, and clinical laboratories face budget constraints. The cost-effectiveness argument depends on where in the diagnostic odyssey LRS is deployed: as a first-tier test (replacing SRS) or as a second-tier test (for SRS-negative cases). Current evidence supports second-tier deployment, but declining sequencing costs may shift this calculation.

Analytical pipelines are maturing but not standardized. Unlike SRS, where GATK best practices provide a widely accepted workflow, LRS analysis lacks consensus pipelines. Different callers perform differently for different variant types, and benchmarking resources for LRS are less developed.

Interpretation challenges remain. Detecting more variants is only useful if those variants can be interpreted. Many SVs and repeat expansions identified by LRS are novelโ€”absent from existing databasesโ€”making pathogenicity assessment difficult.

The unsolvable remain unsolved. The 21 "unsolvable" syndrome families in Steyaert et al. yielded no common genetic cause despite LRS, reminding us that sequencing technology alone cannot solve all rare diseases.

Open Questions

  • What is the optimal clinical pathway for integrating LRS into rare disease diagnosticsโ€”first-tier, second-tier, or triggered by specific phenotypic indicators?
  • Can LRS diagnostic yields improve with higher coverage, or are the remaining unsolved cases fundamentally different in nature?
  • How should clinical laboratories validate and report novel structural variants detected by LRS when database evidence is lacking?
  • Will adaptive samplingโ€”the ability to selectively sequence regions of interest in real time on nanopore platformsโ€”provide a cost-effective compromise between whole-genome LRS and targeted panels?

Closing Reflection

The rare disease diagnostic gap is not primarily a knowledge gapโ€”it is a resolution gap. Standard sequencing reads are too short to capture the full spectrum of human genetic variation. These two studies, from complementary technological perspectives, demonstrate that long-read sequencing resolves a meaningful fraction of previously intractable cases. For families enduring years-long diagnostic odysseys, a 12-17% solve rate from a single additional test represents genuine clinical impact. As costs decrease and analytical methods mature, LRS is positioned to become a standard component of the rare disease diagnostic toolkit.


References (2)

Negi, S., et al. (2025). Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection. American Journal of Human Genetics.
Steyaert, W., et al. (2025). Unraveling undiagnosed rare disease cases by HiFi long-read genome sequencing. Genome Research.

Explore this topic deeper

Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

Click to remove unwanted keywords

Search 7 keywords โ†’