About the Centralized Variant Effect Prediction Tool

This platform provides a centralized hub for variant effect prediction, serving as a one-stop resource to assess the potential impact of genetic mutations on protein function. It compiles comprehensive data on human genomic variants and integrates multiple tools for efficient and insightful analysis.

Data Source and Coverage

The primary variant database was provided by the University of Mozambique. It includes all genomic positions with possible variants, enriched with detailed annotations:

  • Genomic coordinates (chromosome and position)
  • Reference and alternative alleles
  • Amino acid substitutions (when applicable)
  • Gene and protein identifiers (Ensembl, UniProt)
  • Prediction scores from multiple tools

Speed Optimization

Due to the massive size of the full genome dataset, the platform has been optimized for speed. By default, only variants related to three key proteins — collagen, keratin, and elastin — are preloaded. This allows users to perform rapid queries and obtain results almost instantly for common searches.

Extended Genome Searches

Users can also search for variants outside of the three default proteins. These extended searches access the full genome database and may take longer to complete. To enhance user experience, such queries are handled asynchronously. The system performs the analysis in the background and sends an email to the user once the results are ready.

Integrated Prediction Tools

For each variant, the platform provides prediction scores from established algorithms to estimate the functional impact. Current tools include:

  • SIFT
  • PolyPhen-2
  • AlphaMissense

Technical Architecture

The application backend uses structured tables to manage users, predictions, and variant results:

  • USERS: stores user credentials and session data
  • Prediction: links methods and scores to user queries
  • dbNSFP_Results: main variant data table with annotations and prediction outputs

All searches are recorded and linked to the user's account, ensuring traceability and personalized access to prior results.