Back to Projects
Launching Soon
Project 02 Structural Bioinformatics · Deep Learning

Ligand Binding Site Predictor

A deep learning model that predicts potential ligand binding pockets on protein 3D structures — aiming to accelerate early-stage drug discovery by identifying interaction sites automatically.

PythonPyTorchBioPythonPyMOLAlphaFold

Motivation

Identifying where a small molecule (ligand) binds on a protein is one of the first and most critical steps in drug development. Traditionally this relies on expensive wet-lab experiments or computationally heavy docking simulations. This project explores whether a deep learning approach can predict binding pockets directly from protein structure — fast, cheaply, and at scale.

The intersection of structural biology and machine learning is exactly the kind of problem I find most exciting: it requires understanding the biology deeply enough to engineer the right features, then applying the right model architecture to extract patterns that aren't visible to the naked eye.

 Approach

  • Parse PDB/mmCIF protein structure files with BioPython
  • Represent protein surface as a 3D point cloud or voxel grid
  • Train a neural network to classify residues as pocket / non-pocket
  • Evaluate against known binding sites from sc-PDB database
  • Visualise predicted pockets in PyMOL

 Tech Stack

  • Language: Python 3.10+
  • Deep Learning: PyTorch (3D CNN / GNN)
  • Structure Parsing: BioPython, MDAnalysis
  • Structures: PDB, AlphaFold2 predictions
  • Visualisation: PyMOL, Matplotlib

Scientific Background

Proteins are large, folded molecules whose surface contains cavities and grooves — some of these are functional binding sites where small molecules (drugs, cofactors, substrates) can bind and modulate the protein's activity. The geometry and chemical environment of these pockets determines binding specificity.

This model takes a 3D protein structure and outputs a probability score for each surface residue indicating how likely it is to belong to a binding pocket. The features fed to the network include local geometry (curvature, solvent accessibility), amino acid physicochemical properties, and neighbourhood context encoded through graph convolutions.

Status & Next Steps

The project is currently in active development as part of my Master's research at UPF-UB. The data pipeline is complete and baseline models are being trained. Upcoming work includes:

  • Benchmarking against established tools (fpocket, SiteMap)
  • Exploring graph neural network architectures for better spatial reasoning
  • Testing on AlphaFold2-predicted structures to assess generalisation
  • Building a small web demo for interactive pocket visualisation