benchmark — Structure Prediction Benchmarking

The synth_pdb.benchmark and synth_pdb.benchmark_metrics modules provide a complete suite for evaluating AI structure prediction models (AlphaFold, ESMFold, RoseTTAFold, etc.) against ground-truth synthetic structures.

!!! note "The key insight" Because synth-pdb controls the ground truth, the benchmark is perfectly objective: there is no ambiguity about which experimental structure is "correct" or whether the reference contains modelling errors. This makes it ideal for blind comparison of structure prediction models.


Quick Start

from synth_pdb.benchmark import run_benchmark

# Score 20 structures predicted by ESMFold
results = run_benchmark(n_structures=20, predictor="esmfold")

# Print formatted summary
print(results.summary())

# Export to CSV for further analysis
results.to_csv("benchmark_results.csv")

Or from the command line:

# Install dependencies
pip install synth-pdb[gnn] transformers accelerate

# Run benchmark (downloads ESMFold ~700 MB on first use)
python scripts/run_benchmark.py --n-structures 20 --output results.csv

run_benchmark()

def run_benchmark(
    n_structures: int = 20,
    lengths: list[int] | None = None,
    conformations: list[str] | None = None,
    predictor: str | Callable[[str], str] = "esmfold",
    *,
    compute_shifts: bool = True,
    compute_gnn: bool = True,
    random_state: int = 42,
) -> BenchmarkResults

Generate synthetic ground-truth structures, fold them from sequence using a structure predictor, then evaluate the predictions against the ground truth using a comprehensive set of structural metrics.

Parameters

Parameter Type Default Description
n_structures int 20 Number of test structures to generate and evaluate.
lengths list[int] [20, 30, 50] Pool of chain lengths to sample from uniformly.
conformations list[str] ["alpha", "beta"] Pool of secondary structure types. Options: "alpha", "beta", "random".
predictor str or Callable "esmfold" "esmfold" for the built-in ESMFold backend, or any predictor_fn(sequence: str) → pdb_str callable.
compute_shifts bool True Compute NMR chemical shift RMSD (requires synth_pdb.chemical_shifts).
compute_gnn bool True Score both structures with the GNN pLDDT classifier.
random_state int 42 RNG seed for reproducibility.

Returns

A BenchmarkResults object.

Using a Custom Predictor

def my_predictor(sequence: str) -> str:
    """Return a PDB string for the given amino acid sequence."""
    # ... call your model here ...
    return pdb_string

results = run_benchmark(n_structures=10, predictor=my_predictor)

This accepts any callable — ColabFold, OmegaFold, a structure database lookup, even a simple homology modelling pipeline.


BenchmarkResults

@dataclass
class BenchmarkResults:
    results:      list[StructureResult]
    predictor:    str
    n_structures: int
    n_success:    int

Methods

summary() → str

Returns a formatted multi-line summary report:

━━ Benchmark: ESMFold (18/20 structures) ━━
  TM-score   mean=0.723  std=0.142  min=0.421  max=0.891
  GDT-TS     mean=0.681  std=0.159
  lDDT       mean=0.714  std=0.128
  Cα-RMSD    mean=2.84 Å  std=1.21 Å
  Shift RMSD mean=0.412 ppm  std=0.193 ppm
  GNN pLDDT  mean=0.834 (predicted structures)

  Structures with TM-score > 0.5 (same fold): 16/18 (89%)

to_csv(path: str) → None

Write full per-structure results (all 12 fields) to a CSV file.

to_dataframe() → pd.DataFrame

Return results as a pandas DataFrame (requires pandas).


StructureResult

Per-structure result returned in BenchmarkResults.results.

Field Type Description
sequence str Amino acid sequence (single-letter code).
length int Number of residues.
conformation str Ground-truth secondary structure type.
tm_score float TM-score ∈ [0, 1]. Values > 0.5 indicate the same fold.
gdt_ts float GDT-TS ∈ [0, 1]. CASP standard metric.
lddt_mean float Mean per-residue lDDT ∈ [0, 1].
rmsd float Cα-RMSD in Å after Kabsch superposition.
shift_rmsd float Weighted chemical shift RMSD in ppm. NaN if unavailable.
gnn_score_ref float GNN global quality score for the ground-truth structure.
gnn_score_pred float GNN global quality score for the predicted structure.
predictor_time_s float Wall-clock inference time in seconds.
error str Non-empty string if prediction failed; other fields are NaN.

Benchmark Metrics Reference

All metric functions live in synth_pdb.benchmark_metrics and operate on numpy arrays with no additional dependencies.

tm_score(ca_pred, ca_ref)

def tm_score(
    ca_pred: np.ndarray,   # [N, 3] predicted Cα coordinates
    ca_ref:  np.ndarray,   # [N, 3] reference Cα coordinates
    *,
    normalise_by: int | None = None,
) -> float

Compute TM-score. Returns a value in (0, 1]. Two unrelated structures score ≈ 0.17; two structures with the same fold score > 0.5.

For the mathematical definition and interpretation, see the scientific background page.

lddt(ca_pred, ca_ref)

def lddt(
    ca_pred:          np.ndarray,             # [N, 3]
    ca_ref:           np.ndarray,             # [N, 3]
    *,
    inclusion_radius: float = 15.0,           # Å
    thresholds:       tuple = (0.5, 1, 2, 4), # Å
) -> np.ndarray  # [N] per-residue lDDT ∈ [0, 1]

Per-residue lDDT. Does not require superposition. The global lDDT is float(np.mean(lddt(...))).

gdt_ts(ca_pred, ca_ref)

def gdt_ts(
    ca_pred: np.ndarray,             # [N, 3]
    ca_ref:  np.ndarray,             # [N, 3]
    *,
    cutoffs: tuple = (1.0, 2.0, 4.0, 8.0),  # Å
) -> float

GDT-TS — average fraction of Cα atoms within {1, 2, 4, 8} Å after superposition.

superpose_kabsch(mobile, reference)

def superpose_kabsch(
    mobile:    np.ndarray,  # [N, 3]
    reference: np.ndarray,  # [N, 3]
) -> tuple[np.ndarray, float]  # (rotated_coords, rmsd)

Optimally superpose mobile onto reference using the Kabsch algorithm (SVD-based). Returns the rotated coordinate array and the Cα-RMSD in Å.

shift_rmsd(pred_shifts, ref_shifts)

def shift_rmsd(
    pred_shifts:      dict[str, np.ndarray],  # nucleus → per-residue shifts
    ref_shifts:       dict[str, np.ndarray],
    *,
    nucleus_weights:  dict[str, float] | None = None,
) -> float  # weighted shift RMSD in ppm

Weighted chemical shift RMSD following SPARTA+ nucleus weights (H=1.0, C=0.25, N=0.1). NaN residues (missing assignments) are automatically excluded.

from synth_pdb.benchmark_metrics import shift_rmsd
import numpy as np

# Compare predicted and reference ¹H shifts for 10 residues
rmsd = shift_rmsd(
    {"H": np.array([8.1, 8.2, 8.3, 8.0, 7.9, 8.4, 8.1, 8.2, 8.0, 7.8])},
    {"H": np.array([8.0, 8.1, 8.4, 8.0, 7.8, 8.3, 8.2, 8.1, 8.0, 7.9])},
)
print(f"¹H shift RMSD: {rmsd:.4f} ppm")

extract_ca_coords(pdb_content)

def extract_ca_coords(pdb_content: str) -> np.ndarray  # [N, 3]

Lightweight, pure-Python PDB parser that extracts Cα coordinates in residue order. Handles duplicate residue numbers (keeps first occurrence per chain/residue pair).

from synth_pdb.benchmark_metrics import extract_ca_coords, tm_score

ca_ref  = extract_ca_coords(open("reference.pdb").read())
ca_pred = extract_ca_coords(open("predicted.pdb").read())

n = min(len(ca_ref), len(ca_pred))
score = tm_score(ca_pred[:n], ca_ref[:n])
print(f"TM-score: {score:.3f}")

CLI Reference

python scripts/run_benchmark.py [OPTIONS]

Options:
  --predictor {esmfold}       Structure prediction backend (default: esmfold)
  --n-structures INT          Number of test structures (default: 20)
  --lengths L [L ...]         Chain lengths to sample (default: 20 30 50)
  --conformations {alpha,beta,random} [...]
                              Secondary structure types (default: alpha beta)
  --output PATH               Save CSV results to this path
  --no-shifts                 Skip chemical shift RMSD
  --no-gnn                    Skip GNN quality scoring
  --random-state INT          RNG seed (default: 42)
  -v, --verbose               Enable DEBUG logging

Example Runs

# Full benchmark with all metrics
python scripts/run_benchmark.py \
    --n-structures 50 \
    --lengths 20 30 50 \
    --output results/esmfold_benchmark.csv

# Fast geometry-only benchmark
python scripts/run_benchmark.py \
    --n-structures 100 \
    --no-shifts --no-gnn \
    --output results/fast_benchmark.csv

# Alpha-helix only
python scripts/run_benchmark.py \
    --conformations alpha \
    --n-structures 30 \
    --output results/helix_benchmark.csv

Full API Reference

::: synth_pdb.benchmark handler: python options: members: - run_benchmark - BenchmarkResults - StructureResult

::: synth_pdb.benchmark_metrics handler: python options: members: - tm_score - lddt - gdt_ts - superpose_kabsch - shift_rmsd - extract_ca_coords


See Also