score — GNN Quality Scoring API
The synth_pdb.score module provides a single-import, zero-configuration interface
for scoring protein structures using the bundled Graph Attention Network (GNN) quality
classifier. It is the recommended entry point for all quality scoring tasks.
!!! note "Installation"
bash
pip install synth-pdb[gnn] # installs torch + torch_geometric
The synth_pdb.score module can be imported without PyTorch installed —
the dependency is only checked when a scoring function is actually called.
Quick Start
from synth_pdb.score import score_structure, score_batch
# Score a PDB file by path
result = score_structure("my_helix.pdb")
print(f"Global quality: {result.global_score:.3f} ({result.label})")
# Global quality: 0.999 (High Quality)
# Inspect per-residue pLDDT confidence
for i, (score, label) in enumerate(zip(result.per_residue, result.residue_labels)):
print(f" Residue {i+1:3d}: {score:.3f} [{label}]")
# Residue 1: 0.958 [Very High]
# Residue 2: 0.960 [Very High]
# ...
# Score a batch efficiently (model loaded once)
results = score_batch(["helix.pdb", "strand.pdb", "decoy.pdb"])
best = max(results, key=lambda r: r.global_score)
print(f"Best structure: global_score={best.global_score:.3f}")
score_structure()
def score_structure(
source: str | os.PathLike,
*,
model_path: str | None = None,
) -> QualityScore
Score a single protein structure and return a rich QualityScore object.
Parameters
| Parameter | Type | Description |
|---|---|---|
source |
str or path-like |
A file path ending in .pdb, or a raw PDB-format string. File detection is based on whether the string starts with a PDB record keyword (ATOM, REMARK, HEADER, MODEL). |
model_path |
str, optional |
Path to a custom .pt checkpoint. Defaults to the bundled gnn_quality_v2.pt (with per-residue head). Falls back to gnn_quality_v1.pt if v2 is unavailable. |
Returns
A QualityScore dataclass.
Raises
| Exception | Condition |
|---|---|
FileNotFoundError |
source looks like a file path but the file does not exist. |
ImportError |
torch or torch_geometric are not installed. |
ValueError |
The PDB contains fewer than 2 residues with Cα atoms. |
Examples
from synth_pdb.score import score_structure
# From a file path
result = score_structure("/data/structures/ubiquitin.pdb")
# From an inline PDB string
pdb_string = open("ubiquitin.pdb").read()
result = score_structure(pdb_string)
# Using a custom checkpoint
result = score_structure("ubiquitin.pdb", model_path="my_retrained_gnn.pt")
score_batch()
def score_batch(
sources: list[str | os.PathLike],
*,
model_path: str | None = None,
) -> list[QualityScore]
Score a list of structures efficiently. The GNN model is loaded once and
reused for all structures — significantly faster than calling score_structure()
in a loop for large collections.
If any individual structure fails (e.g. unparseable PDB, too few residues),
a sentinel QualityScore with global_score=NaN and label="Error" is
inserted at the corresponding index, so the output list always has the same
length as the input list.
Parameters
| Parameter | Type | Description |
|---|---|---|
sources |
list[str \| PathLike] |
Mixed list of file paths and/or PDB strings. |
model_path |
str, optional |
Custom checkpoint path. |
Returns
list[QualityScore] — one result per input, in the same order.
Example
import glob
from synth_pdb.score import score_batch
pdb_files = sorted(glob.glob("alphafold_predictions/*.pdb"))
results = score_batch(pdb_files)
# Rank by global quality score
ranked = sorted(zip(pdb_files, results), key=lambda x: x[1].global_score, reverse=True)
for path, r in ranked[:5]:
print(f"{path:50s} {r.global_score:.4f} {r.label}")
QualityScore
@dataclass
class QualityScore:
global_score: float
label: str
per_residue: list[float]
residue_labels: list[str]
features: dict[str, float]
n_residues: int
Returned by score_structure(), score_batch(), and GNNQualityClassifier.score().
Fields
| Field | Type | Description |
|---|---|---|
global_score |
float ∈ [0,1] |
P(Good) — probability the structure is biophysically plausible. Values > 0.5 are classified as High Quality. |
label |
str |
"High Quality" or "Low Quality". |
per_residue |
list[float] |
Per-residue pLDDT-like confidence ∈ [0, 1]. Length equals n_residues. Analogous to AlphaFold's per-residue pLDDT. |
residue_labels |
list[str] |
Human-readable confidence band for each residue. |
features |
dict[str, float] |
Mean per-feature summary of the GNN input graph (useful for debugging). Keys: sin_phi, cos_phi, sin_psi, cos_psi, b_factor_norm, seq_position, is_n_terminus, is_c_terminus. |
n_residues |
int |
Number of residues with Cα atoms in the PDB. |
pLDDT Confidence Bands
| Label | Score range | Interpretation (AlphaFold equivalent) |
|---|---|---|
"Very High" |
≥ 0.90 | Backbone and side-chain likely accurate |
"High" |
0.70–0.90 | Backbone likely accurate |
"Uncertain" |
0.50–0.70 | Use with caution |
"Low" |
< 0.50 | Likely disordered or incorrect geometry |
Example Usage
result = score_structure("my_protein.pdb")
# Find low-confidence regions
low_conf = [
i + 1 # 1-indexed residue number
for i, label in enumerate(result.residue_labels)
if label in ("Uncertain", "Low")
]
print(f"Low-confidence residues: {low_conf}")
# Check mean pLDDT
import numpy as np
mean_plddt = np.mean(result.per_residue)
print(f"Mean pLDDT: {mean_plddt:.3f}")
# Export to pandas for downstream analysis
import pandas as pd
df = pd.DataFrame({
"residue": range(1, result.n_residues + 1),
"plddt": result.per_residue,
"band": result.residue_labels,
})
df.to_csv("plddt_per_residue.csv", index=False)
GNNQualityClassifier
The lower-level class underlying score_structure(). Import it when you need
direct control over checkpoint loading or want to access the predict() method
for backward compatibility.
from synth_pdb.quality import GNNQualityClassifier
clf = GNNQualityClassifier() # auto-loads bundled weights
clf = GNNQualityClassifier(model_path="v2.pt") # explicit checkpoint
Methods
score(pdb_content: str) → QualityScore
The primary method. Equivalent to score_structure(pdb_content) but requires
a PDB string (not a file path).
predict(pdb_content: str) → (bool, float, dict)
Legacy method for backward compatibility with the ProteinQualityClassifier (RF)
API. Returns (is_good, probability, features_dict).
save(path: str) → None
Save model weights and architecture metadata to a .pt checkpoint.
load(path: str) → None
Load a checkpoint. The architecture (node features, hidden dim, etc.) is read from the checkpoint itself — no configuration file needed.
Retraining the Model
To retrain gnn_quality_v2.pt from scratch (e.g. after modifying the architecture
or adding training data):
python scripts/train_gnn_quality_filter.py \
--n-samples 200 \
--epochs 50 \
--output synth_pdb/quality/models/gnn_quality_v2.pt
The training script generates 200 synthetic structures across four classes (Good / Random / Distorted / Clashing) and trains with a joint objective:
$$\mathcal{L} = \mathcal{L}{\text{NLL}} + \lambda \cdot \mathcal{L}{\text{MSE}}$$
where $\mathcal{L}{\text{NLL}}$ is the global binary classification loss, $\mathcal{L}{\text{MSE}}$ is the per-residue Ramachandran Z-score regression loss, and $\lambda = 0.3$ by default.
Full API Reference
::: synth_pdb.score handler: python options: members: - score_structure - score_batch - QualityScore
::: synth_pdb.quality.gnn.gnn_classifier handler: python options: members: - GNNQualityClassifier - QualityScore