Paper ReviewMathematics & StatisticsMachine/Deep Learning

Generating Graphs the Bayesian Way: Discrete Diffusion for Molecular and Network Design

Graphs are discrete, unordered structuresโ€”fundamentally different from the continuous data that standard diffusion models handle. Petersen et al. develop a Bayesian framework for discrete graph generation that combines diffusion and flow matching models with principled posterior inference.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Generating graph-structured dataโ€”molecules with specific properties, social networks with desired characteristics, knowledge graphs with correct relational structureโ€”is a central challenge in AI. Graphs are fundamentally different from images or text: they are discrete (nodes and edges are categorical, not continuous), unordered (there is no canonical ordering of nodes), and variable-sized (different graphs have different numbers of nodes and edges).

These properties make standard generative models (VAEs, GANs, continuous diffusion) poorly suited for graphs. Continuous diffusion adds Gaussian noise to pixel values; you cannot meaningfully add Gaussian noise to a graph's adjacency matrix (the result is no longer a valid graph). Autoregressive generation produces nodes in sequence, but the generation order is arbitraryโ€”and different orderings produce the same graph, creating a many-to-one redundancy.

Petersen et al. develop a Bayesian framework for discrete graph generation that addresses these challenges by:

  • Working in the discrete domain nativelyโ€”using discrete diffusion and flow matching that operate on categorical node and edge types
  • Performing posterior inference rather than just samplingโ€”enabling conditional generation (graphs with specific properties) through Bayesian conditioning
  • Handling graph symmetry through permutation-invariant architectures
  • The Discrete Diffusion Framework

    Continuous diffusion models corrupt data by adding Gaussian noise and then learn to reverse this corruption. Discrete diffusion models corrupt data by randomly replacing categorical values (node types, edge types) with random alternatives, then learn to reverse this corruptionโ€”recovering the original graph from a uniformly random graph.

    The forward process is simple: at each step, each node/edge type has a probability of being randomly reassigned. After many steps, the graph becomes uniformly randomโ€”all structural information is destroyed.

    The reverse process is the learned generative model: given a noisy graph, predict the original graph. By iteratively applying the reverse process from pure noise, the model generates new graphs that match the distribution of training data.

    Petersen et al.'s Bayesian contribution is enabling conditional generation through posterior inference. Given a desired property (a molecule with specific binding affinity, a network with specific degree distribution), the posterior distribution over graphs conditioned on the property can be approximated by modifying the reverse diffusion process to favor graphs consistent with the condition.

    Applications

    Molecular design: Generate molecules with target properties (solubility, binding affinity, toxicity) by conditioning the graph generation on property predictors.

    Knowledge graphs: Generate plausible knowledge graph completionsโ€”new edges that are consistent with the existing graph structure.

    Network synthesis: Generate synthetic networks with specific structural properties (clustering coefficient, degree distribution, community structure) for simulation and testing.

    Claims and Evidence

    <
    ClaimEvidenceVerdict
    Discrete diffusion handles graph structure nativelyFramework operates on categorical node/edge typesโœ… Supported
    Bayesian conditioning enables property-targeted generationPosterior inference demonstrated for conditional generationโœ… Supported
    Generated graphs match training distribution qualityEvaluation on molecular and network benchmarksโœ… Supported
    The approach outperforms autoregressive graph generationCompetitive on benchmarks; advantages in symmetry handlingโš ๏ธ Competitive, not uniformly superior

    Open Questions

  • Scalability: Current demonstrations involve graphs with tens to hundreds of nodes. Can discrete diffusion scale to graphs with thousands of nodes (protein structures, large social networks)?
  • Validity constraints: Not all graphs are valid molecules (valence rules, ring strain). How do we incorporate domain-specific validity constraints into the generation process?
  • Multi-objective conditioning: Real molecular design involves multiple simultaneous objectives (potency AND selectivity AND solubility). How do we condition on multiple properties without generating Pareto-suboptimal compromises?
  • Evaluation metrics: How do we evaluate the quality of generated graphs? Distributional metrics (comparing generated vs. real graph distributions) are standard but may miss important structural properties.
  • What This Means for Your Research

    For computational chemists, Bayesian graph generation provides a principled framework for molecular design that explicitly handles the discrete, unordered nature of molecular graphsโ€”a more natural fit than continuous generative models that must be adapted.

    For graph ML researchers, the discrete Bayesian framework provides a theoretically grounded alternative to the ad hoc adaptations of continuous generative models to discrete graph data that have dominated the field.

    References (1)

    [1] Petersen, O., Kollovieh, M., Lienen, M. (2025). Discrete Bayesian Sample Inference for Graph Generation. arXiv:2511.03015.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords โ†’