MD Simulation of Ubiquitine with AMBER

A simple tutorial to start your career in Molecular Dynamics

This tutorial aims to provide a comprehensive introduction to classical molecular dynamics simulations using the Amber software package. Amber is a widely used tool for simulating biomolecular systems at the atomic level, enabling researchers to investigate a wide range of phenomena, including protein folding, ligand-protein interactions, and enzyme catalysis.
This tutorial focuses on a small protein, Ubiquitin, which will be simulated for a time of 100 ns. Ubiquitin is a well-studied protein involved in various cellular processes, making it a suitable example for learning molecular dynamics simulations using Amber 2022.




Step 1: retrieve the structure

The Protein Data Bank (PDB) is a repository of three-dimensional structures of macromolecules, such as proteins and nucleic acids. To download the ubiquitin protein structure (PDB ID: 1UBI), follow these steps:

  • Access the PDB website: open a web browser and navigate to the PDB website;
  • Search for the ubiquitin structure;
  • Download the PDB file: on the structure page, locate the "Download" section. Click on the "PDB" option to download the PDB file for the ubiquitin structure;
  • Save the PDB file: choose a location on your computer to save the PDB file. The file will typically be named 1UBI.pdb.




Step 2: cleaning the pdb file for topology preparation

The PDB file must be prepared for simulation. It is recommended to keep a copy of the downloaded PDB file from the PDB as a reference. This is because you may need to refer back to the original PDB file if you encounter any problems during the preparation or simulation process.

  • Clean the file: CONNECT records define connectivity between atoms, but AMBER utilizes its own internal bonding information. Remove them from the PDB.
  • Add all necessary TER records, as well as END at the end of the PDB, if not present: The TER records mark the end of a chain, while the END record indicates the PDB file’s termination.
  • Strip all hydrogen atoms: It is recommended that AMBER be used to add hydrogen atoms to the structure being prepared. AMBER may sometimes have issues during topology creation. Therefore, it is recommended to remove them using a text editor (Atom, Gedit) or program like Pymol or VMD.
  • Always visualize the file using VMD: If there are no strange bonds and/or incorrect visualizations, the file can be used for the next step.




Step 3: building topolgy using TLeap module of AMBER (on local server)

Once you have the prepared pdb, you can pass it to TLeap module for topology construction. TLeap is a powerful tool within the Amber molecular simulation software suite specifically designed for preparing molecular systems for simulations. It offers a comprehensive set of functionalities to manipulate and modify molecular structures. For more information, visit this link.
For the purpose of this tutorial, it is recommended to open TLeap and type one command at a time to understand what is being done. The command is:

tleap

The workflow has a setting like this:

  • Load all the necessary force field (FF): this loading ALWAYS depend on the type of molecules present in the PDB you are using. If there are only proteins, it must be load only the protein FF (protein.ff19SB). If there are proteins and RNA, FFs of proteins and RNA(RNA.OL3) must be loaded. After the loading of all necessary FF, also the FF for water (water.TIP3P) must be loaded to solvate the complex.
    All force fields available for AMBER22 are at the following link.
    To load the FF for proteins, use the command:

    source leaprc.protein.ff19SB

  • Load the PDB. The PBD must be loaded using a variable name and the loadpdb command, as follows:

  • pdb = loadpdb yourpdb.pdb

  • Check the loaded PDB file for any errors using the check command:

    check

  • Check the charge of the loaded pdb using the charge command (keep in mind the charge of the complex under examination)

    charge

  • Save the AMBER processed PDB without water using the savepdb command:

    savepdb pdb yourfile_nowat.pdb

  • Save prmtop (topology file) and rst7 (coordinates file) without water using the saveamberparms command:

    saveamberparms pdb yourfile_nowat.prmtop yourfile_nowat.rst7

  • Solvate the system. The solvateOct command is used in the link below, but for now it is preferred a cubic box. Therefore, the command shown must be used:

    solvateBox pdb TIP3PBOX padding

    The padding is a number and represents the thickness of water to be inserted around the complex. A minimum padding of 10.0 angstroms is required.

  • Neutralize the system by adding counterions. Keeping track of the system's charge, use the command:

    addIons pdb ion 0 command

    For Ion use Na+/K+ or Cl- based on the system's charge. The 0 at the end of the command indicates that the total charge must be 0, therefore neutral. TLeap will replace water molecules with the necessary ions.

  • Save the hydrated and ionized PDB using the savepdb command

  • Save the hydrated and ionized prmtop and rst7 using the saveamberparms command

The prmtop is a common file format used to store protein topology information. This information includes the types of atoms in the protein, their positions, and their connections to each other. The rst7 is a less common file format that is also used to store protein topology information. It is typically used in conjunction with the prmtop file to provide additional information about the protein structure.
The topology refers to the arrangement of atoms in a molecule. This information is important for understanding how the molecule functions. The coordinates refer to the positions of the atoms in a molecule. This information is important for visualizing the molecule and understanding its structure.
For some specifications, please refer to the link.




Step 4: Energy Minimization

In MD simulations, Energy Minimization (EM) is the process of finding a structure of a molecular system that corresponds to a local minimum on its potential energy surface (PES). This is done by iteratively adjusting the positions of the atoms in the system until the force on each atom is approximately zero. The resulting structure is often referred to as the "optimized geometry" or "minimum energy structure" of the system.
EM is a common first step in MD simulations, as it provides a starting point for more complex simulations such as molecular dynamics trajectories. It can also be used to study the properties of molecules in their equilibrium states. EM is usually done in four steps, with decreasing constraints for the first 3 minimizations. It is recommended to run the minimizations step by step. It is done "in vacuum", so there are no parameters that indicate temperature, volume or pressure.
Four separate parameter files are required (which must have the .in extension for AMBER; example: em.in). The number of steps to be performed (maxcyc), how often to save coordinates (ntwx), etc. must be specified in these files. For all the parameters to specify in the parameter files, please refer to the AMBER tutorial.

Therefore, you need to prepare:

  • em.in, in which the costraints are at 20.0 kcal/mol
  • em2.in, costraints at 10.0 kcal/mol
  • em3.in, costraints at 5.0 kcal/mol
  • em4.in, no costraints

The constraints must be added at the end of each file.in, using the following wording (after backslash / sign ):

Hold the Ubiquitin fixed 20.0 RES x y END END

In this case x and y represent the first and last residue of the ubiquitin protein, so they would technically be 1 and 76. This wording is used to specify the range of residues within which to apply the constraints.

N.B.Compared to the parameters suggested in the tutorial, insert also:

  • ut = 9.0 ! non-bonded interactions cutoff
  • ncyc = 500 ! Initially do 500 steps of steepest descent minimization followed by 1500 steps (MAXCYC - NCYC) steps of conjugate gradient minimization.
  • ntb = 1 ! pbc for constant volume
  • ntpr = 1 ! print the progress of the minimization every ntpr step
  • iwrap = 1 ! enforcment of pbc conditions
  • ntwx = 100 ! frequency of writing coordinated to trajectory file

Once the files are prepared, the pmemd command is invoked to run the minimization. Example:

pmemd -O -i em.in -p complex.prmtop -c complex.rst7 –ref complex.rst7 -o mdout.em1 -inf mdinfo.em1 -r restrt.em1 -x mdcrd.em1

where mdinfo is the file with the run specifications, while mdout is the file where you can check how the run is proceeding (you can see the time and the various parameters of the specific step you are checking). The mdcrd file indicates the trajectory file.

IMPORTANT: from the second minimization onwards, you must switch to -c and -ref no longer complex.rst7, but the restart of the previous run. So in em2 it will be -c restrt.em2 -ref restrt.em2.

EM is a computational method that aims to find the most stable geometrical conformation of a molecular system. This process is based on the minimization of an energy function, which represents the total potential energy of the system. Different constraint forces can be applied to guide the minimization towards a specific configuration.
However, it is important to note that simple energy minimization does not involve writing an mdcrd trajectory file. This is because EM focuses exclusively on the geometrical configuration of the system, neglecting information about the movement and velocity of the molecules.
Despite specifying a value of ntwx=100 in the input file (.in), which indicates the number of minimization steps to be performed, the mdcrd file will not contain any frames describing the evolution of the system over time. The mdcrd trajectory is only generated if a dynamic simulation is performed, which simulates the motion of the molecules based on the laws of molecular mechanics.
In summary, EM is a static process that determines the most stable geometrical conformation of a system, while dynamic simulations generate trajectories that describe the motion of molecules over time. To obtain an mdcrd file containing the system trajectory, it is necessary to perform a dynamic simulation and not just an energy minimization.


At this point you'll probably reseamble one of these moods. Choose yours and go on!




Step 5: Equilibration (NVT)

After completing the four minimization steps, the system must be equilibrated first in temperature and then in pressure. An NVT simulation allows the molecules in the system to move around and interact with each other, gradually reaching a state of equilibrium where their properties like positions, velocities, and energies fluctuate around a stable average value.
In order to equilibrate the system, four NVT simulations with decreasing constraints are needed:

  • nvt1.in costraints at 0.5 kcal/mol
  • nvt2.in, costraints at 0.25 kcal/mol
  • nvt3.in, costraints at 0.10 kcal/mol
  • nvt4.in, costraints at 0.05 kcal/mol

The parameter file should be structured similarly to the previous minimization file. It should include the definition of the constraints and the range of residues to which they should be applied. For all the parameters to specify the parameter file, please refer to the link.
It is important to remember that the system is at a temperature of 0 K at the beginning of the first NVT simulation. Therefore, in NVT1, the system must be heated from 0 K to 300K (37°C) by adding two lines of parameters (only at the end of the file nvt1.in, after &end sign):

&wt type='TEMP0', istep1=0, istep2=2500000, value1=0.0,value2=300.0, &end &wt type='END', &end

In this case, the system is being forced to go from 0.0 K to 300.00 K over 2500000 steps, which is 5 ns.
In the other parameters files it would be sufficient fix the value1 to 300.0, such as follows:

&wt type='TEMP0', istep1=0, istep2=2500000, value1=300.0,value2=300.0, &end &wt type='END', &end

N.B. The following parameters should be used instead of those suggested in the manual (Heat.in):

  • cut = 9.0 ! non-bonded interactions cutoff
  • nstlim = 2500000 ! total number of steps of NVT; in this case 2500000 are 5 ps
  • dt = 0.002 ! timestep in ps; the timestep determines how muche the simulated time advances with each computational step
  • gamma_ln = 1.0 ! parameter related to Langevin thermostat, used to mantain constant temperature
  • ntpr = 500 ! frequency (in timesteps) at which pressure and temperature information are written
  • ntwx = 500 ! frequency (in timesteps) at which coordinates are writted
  • ntwr = 500 ! frequency (in timesteps) at which the system restart file is written


From NVT onwards (NVT, NPT, and MD), the commands run on the GPU. Proceed as follows:

  1. Check the GPU usage status using the command:
    nvidia-smi

    The command will display the details of the available GPUs, including their identification number (0/1) and whether or not they are being used (%). If the % value is 0, then the GPU is free.

  2. Run the command:
    export CUDA_VISIBLE_DEVICES=x

    where x is the number of the free GPU (0 or 1).

  3. Run the command pmemd.cuda: nohup pmemd.cuda -O -i nvt1.in -p complesso.prmtop –c restrt.em4 o mdout.nvt1 -inf mdinfo.nvt1 -r restrt.nvt1 -x mdcrd.nvt1 &

    Using nohup and & will run the command in the background. You can monitor the processe using:
    • htop to display the information about the CPU processes;
    • nvidia-smi o shown the GPU information; tail -f mdinfo file to see how long the run will take.





Step 6: Presurrization (NPT)

The NPT Ensemble is a simulation technique used to maintain a constant number of particles (N), pressure (P), and temperature (T) in a system. This is particularly useful for studying systems under pressure or in contact with a surrounding environment.
Performed the four NVT runs, a single NPT run is required, in which the system is relaxed (no costraints are present). The pressure is applied.
In this case, it is necessary to prepare a single npt.in file of parameters. The npt.in file should be similar to the previous nvt steps files. The costraints wording must be removed. For all the parameters to specify the parameter file, please refer to the link.

N.B.The following parameters should be used instead of those suggested in the manual:

  • ntx = 5 ! Number of timesteps for temperature equilibration
  • ntb = 2 ! This variable controls whether or not periodic boundaries are imposed on the system during the calculation of non-bonded interactions.
  • pres0 = 1.0 ! Target pressure in MPa
  • npt = 1 ! Type of NPT ensemble (1 for classical NPT)
  • taup = 2.0 ! Pressure relaxation time in picoseconds
  • ntr = 0 ! No restraints applied
  • temp0 = 300.0 ! Initial temperature in Kelvin
  • tempi = 300.0 ! Target temperature in Kelvin
  • gamma_ln = 1.0 ! Sphericity parameter for Langevin thermostat
  • nstlim = 2500000 ! Maximum number of timesteps
  • dt = 0.002 ! Timestep in picoseconds
  • ntpr = 500 ! Frequency (in timesteps) for printing pressure
  • ntwx = 500 ! Frequency (in timesteps) for writing trajectory data
  • ntwr = 500 ! Frequency (in timesteps) for writing restart data


Once the npt.in file has been appropriately prepared, the NPT can be launched as the previous NVTs were (using pmemd.cuda).




Step 7: Classical Molecular Dynamics (MD

In the production run, the actual data collection for the properties of interest occurs. This is the most time-consuming part of the simulation, as it requires a large number of steps to generate a statistically representative trajectory. The length of the production run will depend on the system being studied and the properties of interest. A general rule of thumb is to run the simulation for at least 10 times the relaxation time of the system. The relaxation time is the time it takes for the system to forget its initial state and reach equilibrium.
Once the system equilibration phase is complete, the system is ready for production.
A md.in parameter file is required, structured similarly to that of the previous steps! It will be sufficient to take the one from the npt and modify it by setting nstlim to the number of steps sufficient to run 100 ns of dynamics.

Remember that nstlim _ dt = time in ps and 1000 ps = 1 ns .

So if 2500000 steps were run in NPT (2500000 _ 0.002 = 5000 ps = 5 ns), how many nstlim are needed to run 100000 ps / 100 ns? The answer is simple: 50000000 (50000000 \* 0.002 = 100 ns).
Once the md.in file has been appropriately modified, the simulation can be launched as the previous NVT and NPT were (pmemd.cuda).
Of course the MD step will take a longer time to terminate! The run depends strictly from the number of atoms of the system.


Congrats, you've reached the end of this tutorial! Are you still sane? Go to the "Visualization of trajectory with VMD"


For all information about AMBER22, please visit the manual. Thanks to Alessia Pinto for beta-testing and debugging this tutorial!