Installation¶
Required Software¶
Pypka depends on the following software:
Python3.6
Python2
gawk
For the moment DelPhi depends on the following software:
gcc
gfortran
libgfortran4
These can all be installed via apt in Debian based systems:
apt install gawk gcc gfortran libgfortran4 python2
Cross Platform¶
PypKa has been developed for Linux users and it does not support Windows or MacOS. However, we do provide a docker image for these users.
docker container run -it -v ${PWD}:/pypka pedrishi/pypka:latest
Basic Example¶
API¶
A simple example of how to use Pypka as an API is provided below. In this snippet we are estimating the pKa values for all titrable sites in the 4lzt.pdb structure downloaded from the Protein Data Bank.
from pypka.pypka import Titration
params = {
'structure' : '4lzt.pdb',
'ncpus' : -1,
'epsin' : 15,
'ionicstr' : 0.1,
'pbc_dimensions': 0
#Set the ffinput when using PDB files from simulations
#'ffinput': 'GROMOS' # options: GROMOS, AMBER, CHARMM
}
tit = Titration(params)
pH = 7.0
for site in tit:
state = site.getProtState(pH)[0]
print(site.res_name, site.res_number, site.pK, state)
You may also try it out on a online notebook.
CLI¶
The same calculation can be done via CLI by resorting to a input parameter file.
structure = 4lzt.pdb
epsin = 15
ionicstr = 0.1
pbc_dimensions = 0
ncpus = -1
sites = all
#Set the ffinput when using PDB files from simulations
#Options: GROMOS, AMBER, CHARMM
To execute pypka simply type one of the two:
pypka parameters.dat
python3 -m pypka parameters.dat
Mandatory Parameters¶
-
structure
- Type
str
the PDB filename
-
epsin
- Type
float
internal dielectric to be used in the PB calculations.
-
ionicstr
- Type
float
ionic strength of the medium
-
pbc_dimensions
- Type
int
number of dimensions with periodic boundaries. 0 for solvated proteins and 2 for lipidic systems
-
ncpus
- Type
int
number of CPUs to use in the calculations (-1 to use all available)
Main Features¶
Easy to Install¶
As seen in the installation page, PypKa is easilly installed from the command line.
Easy to Use¶
The basic example provided can be effortlessly adjusted to your system. Although many parameters can be modified by an advanced user, simple users can confidently use the default values.
API & CLI¶
Every interface has its pros and cons. With PypKa you can choose to run you calculations via a python API or from the command-line. In the future we will present a webserver interface as well.
Fast and accurate¶
As a Poisson-Boltzmann-based pKa predictor, PypKa provides an
interesting trade-off between speed and accuracy. You can also
manually set some input parameters such as convergence
,
gsize
or sites
, to adjust the balance between
accuracy and speed as you please. Further speed gains can be achieved
by taking advantage of the highly scalable multiprocessing
capabilities (ncpus
).
Flexibility of Input Structures¶
PypKa supports both the Protein Data Bank and GROMACS input format and
the most popular atomistic force-fields (AMBER, CHARMM & GROMOS) as
well as experimentally determined structures. These structures are
preprocessed using a modified version of PDB2PQR according to the
defined ffinput
.
Lipidic Systems Support¶
It is possible to run calculations on membrane proteins, and in the future, on lipids. Currently only some lipids are supported (DMPC, POPC, POPE and cholesterol) but more will be added. To use this feature the user is required to name the lipids in their structures according to the PypKa definition.
Example POPC file
Example POPE file
Example DMPC file
Main Methods¶
-
getTitrableSites
(pdb)¶ Gets the all titrable sites from the pdb
- Parameters
pdb (string) – The filename a PDB file
- Returns
All titrable sites residue numbers found in the pdb file
- Return type
list
-
class
Titration
¶ Main pypka class
-
getAverageProt
(self, site, pH)¶
Calculates the average protonation of a site at a given pH value
- Parameters
site (string) – Residue number of the site with the suffix ‘N’ or ‘C’ to indicate a N-ter or C-ter, respectively.
pH (float) – pH value
- Returns
Average protonation of the site at the selected pH value
- Return type
float
-
getProtStategetProtState
(self, site, pH)¶
Indicates the most probable protonation state of a site at a given pH value
- Parameters
site (str) – Residue number of the site with the suffix ‘N’ or ‘C’ to indicate a N-ter or C-ter, respectively.
pH (float) – pH value
- Returns
A tuple with two elements. Example: (‘protonated’, 0.9)
The first element is a string indicating the most probable state of the site. It can be either ‘protonated’, ‘deprotonated’ or ‘undefined’. The ‘undefined’ state is prompted if the average protonation state is between 0.1 and 0.9.
The second element is a float of the average protonation of the site
- Return type
tuple
-
getParameters
()¶ Get the parameters used in the calculations
- Returns
Current state of DelPhi parameters
- Return type
string
-
Input Parameters¶
General Parameters¶
-
structure (str)
- Optional
No
The input filename in Protein Data Bank or GROMACS formats
-
ncpus (int)
- Optional
No
Number of CPUs to use in the calculations
-
ionicstr (float)
- Optional
No
Ionic strength of the medium
-
ffinput='GROMOS' (str)
- Allowed values
(‘GROMOS’, ‘AMBER’, ‘CHARMM’)
Forcefield of the input
structure
. GROMOS is also valid for experimentally obtained strucutures
-
ffID='G54a7' (str)
- Allowed values
(‘G54A7’)
For the time being, only GROMOS54a7 based charged and radii are allowed. In the future more forcefields will be added.
-
sites='all' (str)
CLI Syntax: sites_A = all sites_A = 1N, 18, 35, 48, 66, 129C
API Syntax: sites = ‘all’ sites = {‘A’: (‘1N’, ‘18’, ‘35’, ‘48’, ‘66’, ‘129C’)} Titration(parameters, sites=sites)
-
clean_pdb=True (boolean)
The input
structure
files are preprocessed with PDB2PQR. To skip this step setclean_pdb
to False.CLI Syntax: yes or no
Poisson-Boltzmann Parameters¶
-
epsin (float)
- Optional
No
Internal dielectric to be used in the PB calculations.
-
pbc_dimensions (int)
- Optional
No
- Allowed values
(0, 2)
Number of dimensions with periodic boundaries. Use 0 for solvated proteins and 2 for lipidic systems.
-
gsize=81 (int)
DelPhi number of grid points per side of the cubic lattice.
-
epssol=80.0 (float)
Solvent dielectric to be used in the PB calculations.
-
relpar=0.75 (float)
DelPhi relaxation parameter in non-linear iteration convergence process.
-
relfac=0.75 (float)
DelPhi spectral radius.
-
nonit=0 (float)
DelPhi number of non-linear iterations.
-
nlit=500 (float)
DelPhi number of linear iterations.
-
convergence=0.01 (float)
DelPhi convergence threshold for maximum change of potential.
-
scaleM=4 (float)
DelPhi reciprocal of one grid spacing for the finer grid.
-
scaleP=1 (float)
DelPhi reciprocal of one grid spacing for the coarse grid used in the focusing step.
-
slice=0.05 (float)
Only used when
pbc_dimensions
is 2. Fraction of the exterior of the lipid bilayer to be replicated with periodic boundary conditions.
-
pbx=False (boolean)
DelPhi periodic boundary conditions for the x edge of the grid
-
pby=False (boolean)
DelPhi periodic boundary conditions for the y edge of the grid
-
bndcon=4 (int)
- Allowed values
(1, 2, 3, 4)
DelPhi type of boundary condition.
Zero
Dipolar
Focusing
Coulombic
-
precision='single' (str)
- Allowed values
(‘single’, ‘double’)
Precision of the compiled DelPhi version being used.
Monte Carlo Parameters¶
-
seed (int)
Seed for the MC random generator
-
pH='0,14' (str)
pH value of the calculation. It can be a single value or a range.
CLI syntax:
pH = 0, 14
API syntax:
‘pH’: ‘0, 14’
-
pHstep=0.25 (float)
In case a pH range was provided
pH
, the pH step of said range ispHstep
FF Extension¶
Adding extra residues¶
PypKa supports canonical amino acids and DNA bases, as well as some lipid molecules and ions. However, for many studies these are not sufficient and one might need to manually add these extra residues/molecules. PypKa supports out-of-the-box GROMOS54a7 and CHARMM36m FFs, but this recipe is transferable to other FFs as well.
For demonstration purposes, we will add an FMOC residue to the CHARMM36m FF for which the parameters have been kindly supplied by Alexander van Teijlingen. DataBaseT.crg and DataBaseT.siz are the main files where the charges and radii of atoms need to be added. The new DataBaseT files will be generated by extend_db.py (tmp.crg and tmp.siz) provided one follows the steps describe below. If you believe your residue could benefit other users please create a pull request on GitHub or send the new parameters to the developers directly.
Add residue block to aa.rtp file¶
The lines should adhere to GROMACS .rtp file logic. Please use unique atom types to describe your residues in order to avoid clashes with previously declared atoms. Adding the suffix _RESIDUE to the intended atom type should be enough.
# EXTRA RESIDUES
[ FMO ] ; - Alexander van Teijlingen 07-12-2020
[ atoms ]
C1 C1_FMO -0.11 0
H1 H1_FMO 0.135 1
C2 C2_FMO -0.11 2
H2 H2_FMO 0.135 3
Add atom types to ffnonbonded.itp¶
; EXTRA RESIDUES
; FMOC - Alexander van Teijlingen 07-12-2020
H1_FMO 1 1.0080 0.000 A 0.242003727796 0.12552
H2_FMO 1 1.0080 0.000 A 0.242003727796 0.12552
H3_FMO 1 1.0080 0.000 A 0.242003727796 0.12552
H4_FMO 1 1.0080 0.000 A 0.242003727796 0.12552
H7_FMO 1 1.0080 0.000 A 0.242003727796 0.12552
Add placeholder info on DataBaseT_old.crg and DataBaseT_old.siz¶
The X.XXX placeholder marks the atom as a new addition.
H1 FMO X.XXX
H2 FMO X.XXX
H3 FMO X.XXX
H4 FMO X.XXX
H7 FMO X.XXX
H8 FMO X.XXX
Run extend_db¶
python3 extend_db.py
New tmp.crg and tmp.siz have been generated. Please inspect them to check the new residue has been added at the bottom correctly. After the visual inspection, rename them DataBaseT.crg and DataBaseT.siz, respectively.
mv tmp.crg DataBaseT.crg
mv tmp.siz DataBaseT.siz
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/mms-fcul/pypka/issues.
If you are reporting a bug, please include:
Your operating system name and version.
PypKa’s version and parameters used for the calculation
Origin of the input structure: Protein Data Bank or simulation (please specify the FF)
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
Write Documentation¶
PypKa could always use more documentation, whether as part of the official PypKa docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/mms-fcul/pypka/issues.
If you are proposing a feature:
Explain in detail how it would work.
Keep the scope as narrow as possible, to make it easier to implement.
Get Started!¶
Ready to contribute? Here’s how to set up pypka for local development.
Fork the pypka repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/pypka.git
Install your local copy into a virtualenv. Assuming you have pipenv installed, this is how you set up your fork for local development:
$ cd pypka/ $ pipenv install pypka
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass the tests:
$ make tests
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
The pull request should include tests.
If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy.
Deploying¶
A reminder for the maintainers on how to deploy. Make sure all your changes are committed. Then run:
$ bump2version patch # possible: major / minor / patch
$ git push
$ git push --tags
Future Work¶
We have built PypKa with a clear focus on three things:
Usability
Speed
Accuracy
Naturaly our development path also reflects these concerns.
Greater Usability¶
Provide a Webserver Interface and platform
More in-deepth user documentation
Improve code-level documentation
Create a series of tutorials
Implement a proper logging system to give the users more info
Expand our test suite
Faster Calculations¶
Allow the titration of a single site and all titrable residues within a cutoff radius
Implement an asynchronous algorithm to manage the Monte Carlo runs with a dynamic step and automatic pH range
Porting the code to python3
Accuracy Improvement¶
Support structures with multiple chainsImprove preprocessing guess of tautomer position guess
Extensive benchmark and further optimization of parametersSupport for CHARMM, AMBER and PARSE based charges and radii
Support more lipids
Support Nucleic Acids
API Reference¶
Titration¶
-
class
pypka.
Titration
(parameters, sites='all', debug=False, run='all')[source]¶ Main PypKa class
- Attributes:
params (Config): method parameters molecules (dict): titrating molecules ordered by chain
Molecule¶
-
class
molecule.
Molecule
(chain, sites)[source]¶ Molecule with more than one titrable sites
-
getArrayPosition
(atom_id)[source]¶ Returns the index of the atom in DelPhi position array
Disclamer: used only in testing
-
getAtomsList
()[source]¶ Returns a list of atoms details
Atoms details = (id, instance, atom index in DelPhi data structures) sorted by atom id number
-
getTautNAtoms
(taut_name)[source]¶ Return number of atoms of the tautomer named taut_name
Disclamer: it was used for testing
-
getTautomerInstance
(tautname, site_resnum)[source]¶ Return the tautomer instance named tautname
Tautomer instance must exist in the site of the residue number site_resnum
-
iterAtoms
()[source]¶ Generator that iterates through all atoms details (name, id, position) in the Site.
-
Site¶
-
class
titsite.
Titsite
(res_number, molecule)[source]¶ Titrable Site with more than one Tautomer
-
addReferenceTautomer
()[source]¶ Gets last tautomer from .sites file adds one and saves it as the reference tautomer
-
calc_interaction_between
(site2)[source]¶ Calculates the pairwise interaction energies related to the two sites
- Args:
site1 (Titsite) site2 (Titsite)
- Ensures:
to_write (str): site interaction energies formatted to be written in .dat file
-
getAverageProt
(pH)[source]¶ Calculates the average protonation of a site at a given pH value
- Args:
site (str): Residue number of the site with the suffix ‘N’ or ‘C’ to indicate a N-ter or C-ter, respectively.
pH (float): pH value
- Returns:
A float of the average protonation of the site at the selected pH value
-
getProtState
(pH)[source]¶ Indicates the most probable protonation state of a site at a given pH value
- Args:
site (str): Residue number of the site with the suffix ‘N’ or ‘C’ to indicate a N-ter or C-ter, respectively.
pH (float): pH value
Returns:
A tuple with two elements. Example: (‘protonated’, 0.9)
The first element is a string indicating the most probable state of the site. It can be either ‘protonated’, ‘deprotonated’ or ‘undefined’. The ‘undefined’ state is prompted if the average protonation state is between 0.1 and 0.9.
The second element is a float of the average protonation of the site
-
Tautomer¶
-
class
tautomer.
Tautomer
(name, site, molecule)[source]¶ Tautomers share the same site and atoms
Tautomers have different charge sets for the same atoms
-
CalcPotentialTautomer
()[source]¶ Run DelPhi simulation of single site tautomer
- Ensures:
self.esolvation (float): tautomer solvation energy self.p_sitpot (list): potential on site atoms
-
CalcPotentialTitratingMolecule
()[source]¶ Run DelPhi simulation of the site tautomer within the whole molecule
- Ensures:
self.esolvation (float): tautomer solvation energy self.p_sitpot (list): potential on site atoms
-
calcInteractionWith
(tautomer2, site_atom_list)[source]¶ Calculates the interaction energy between self tautomer and tautomer2
- Args:
tautomer2 (Tautomer) site_atom_list (list): atom numbers belonging to the site
-
loadChargeSet
(res_name, ref_tautomer)[source]¶ Reads .st file related to Tautomer with residue name res_name
Stores charge set related to both the Tautomer and the Reference Tautomer
- .st file named TYRtau1.st has charge set of TY0 and reference TY2
TYRtau2.st has charge set of TY1 and reference TY2
-
Pypka¶
A python module for flexible Poisson-Boltzmann based pKa calculations.
We have implemented a flexible tool to predict Poisson-Boltzmann-based pKa values of biomolecules. This is a free and open source project that provides a simple, reusable and extensible python API and CLI for pKa calculations with a valuable trade-off between fast and accurate predictions. With PypKa one can enable pKa calculations, including optional proton tautomerism, within existing protocols by adding a few extra lines of code. PypKa supports CPU parallel computing on anisotropic (membrane) and isotropic (protein) systems and allows the user to find a balance between accuracy and speed.
PypKa is written in Python and Cython and it is integrated with the Poisson-Boltzmann solver DelPhi Fortran77 via DelPhi4Py.
If you use PypKa in your research please cite the following paper:
Reis, P. B. P. S., Vila-Viçosa, D., Rocchia, W., & Machuqueiro, M. (2020). PypKa: A Flexible Python Module for Poisson–Boltzmann-Based pKa Calculations. Journal of Chemical Information and Modeling, 60(10), 4442–4448.
@article{reis2020jcim,
author = {Reis, Pedro B. P. S. and Vila-Viçosa, Diogo and Rocchia, Walter and Machuqueiro, Miguel},
title = {PypKa: A Flexible Python Module for Poisson–Boltzmann-Based pKa Calculations},
journal = {Journal of Chemical Information and Modeling},
volume = {60},
number = {10},
pages = {4442-4448},
year = {2020},
doi = {10.1021/acs.jcim.0c00718}
}
Availability¶
PypKa can be easily installed using the pip package manager.
Source code is freely available at GitHub under the LGPL-3.0 license. The package can be installed from PyPi.
Contacts¶
Please submit a github issue to report bugs and to request new features.
Alternatively you may find the developer here . Please visit our website for more information.