Enter a sequence in text box above. Example:

(ACTG_5A)_2 [C] A G_10 T

Internal Coordinates

Rotation plots units




Inters




Intras




Contact

Prof. John H. Maddocks
MATH-FSB-EPFL, Station 8
Swiss Federal Institude of Technology
CH-1015 Lausanne
@: john (dot) maddocks (at) epfl (dot) ch


Credits

Code by L. de Bruin (debruin (at) lorentz (dot) leidenuniv (dot) nl), original concept by J. H. Maddocks, site designed and developed jointly. Thanks also to other members of the LCVMM, particularly A. Patelli and T. Zwahlen, and Masters students at the EPFL for testing and debugging the site.

Development of the web site was financially supported by the EPFL and the Swiss National Science Foundation under Award 200020 143613/1.

If you use the cgDNAweb site and found it helpful, please cite:

cgDNAweb: a web interface to the cgDNA sequence-dependent coarse-grain model of double-stranded DNA.
L. De Bruin, J.H. Maddocks
Nucleic Acids Research 46, issue W1 (2018), p. W5-W10
DOI:10.1093/nar/gky351

The cgDNA model has its own web page, which includes a much more detailed description of the mathematical model underlying the cgDNAweb page, and has appropriate citation information for use of the cgDNA model itself.

Getting started

cgDNAweb is an interface to allow interactive visualisation of the sequence-dependent ground state, or free energy minimizing, configuration of a DNA double helical fragment as predicted by the cgDNA model.

The cgDNA model

cgDNA is a sequence-dependent, coarse-grain, rigid-base model for double stranded DNA. For a full description of the model see the cgDNA website and the publications described there. Briefly, given an input sequence in the standard alphabet A,T,C,G, (along the reading, or Watson, strand in the 5'-3' direction) with one of the provided parameter sets (recommended default is PS4), the cgDNA model predicts a Gaussian (or multi-variate normal) probability density function (or PDF) for the dsDNA configuration expressed in standard internal dsDNA coordinates (more specifically a version of Curves+ helicoidal coordinates).

The cgDNAweb interface is primarily aimed at visualising the sequence-dependent minimum free energy, or ground state, configuration in the Gaussian that is predicted by the cgDNA model. The ground state can be visualised either in the 2D Plots pane, as graphs of the internal coordinates along the sequence, or in the 3D View pane as the 3D configuration at various levels of detail.

How to use cgDNAweb

Fill in a DNA sequence in the input form in the header at the top of page (try starting by typing a 10-20 bp long fragment, or you can copy paste much longer sequences; up to 1Kbp or so are feasible on a contemporary laptop). Hard limits for the website to function, are that the input sequences need to be at least two basepairs long and less than 3K bp. The input format is either a string of nucleotides from the {A,C,G,T} alphabet, (upper or lower case, which can be mixed, and with or without spaces), e.g 'A atCG c' or, for tandem repeats, a sequence between parentheses, and a number after an underscore, e.g. A_12 for poly(A) of length 12 or (AG)_100 for a poly(AG) molecule with 200 basepairs. Initially we recommend using only the default setting for the PS4 paramset which we regard as the current state of the art.

The input characters are all considered to be ASCII, and in principle carriage returns, newlines, spaces, and tabs may be included in the input and are all ignored. Attention to copy pasting from PDF documents where other hidden characters may be included and cause rejection of the input. The border of the input form turns red when the input is rejected, either because of length constraints, unrecognised characters or unmatched parentheses.

Click Go, and inspect the Output in either the 3D View or 2D Plots panes.

cgDNAweb can visualise superpositions of ground states for up to four different cases simultaneously, e.g. ground states of different sequences with the same parameter set can be compared, or differences for the same sequence with different parameter sets can be compared. To make multiple sequence inputs, use the arrow keys to switch to another input form, or click one of the white dots. A maximum of four sequences can be visualized at one time, but each of the four sequences can be modified and re-plotted individually. Specific base pair frames of different sequences will be aligned in the 3D viewer whenever a particular nucleotide is wrapped in square brackets in the input sequence, e.g. A_6 [T] A_6. In this case, the viewer will have the basepair with T at the center of the view. Note that there can be at most one set of brackets [ ] per sequence, therefore it is not possible to use this syntax within tandem repeats. No explicit [ ] in the input sequence defaults to the first base pair frame being taken for centering.

Strong and nonlocal sequence-dependence of the ground state shape are easily observed, especially in the 2D Plots pane. For example try comparing the four 19 bp fragments A_9 A [A] A_8, A_9 T [A] A_8, A_9 C [A] A_8, and A_9 G [A] A_8. Note also, that even for the uniform poly(A) sequence A_9 A [A] A_8 = A_19 (where the first input format allows centering of the 3D view on the 11th base pair) the local ground state configuration is (as it should be) different between the interior and close to the ends.

Strong intrinsic bends arise for some sequences, e.g. with phased A-tracts such as

(a_5 g_5 t_5 c_5 g)_2 a_5 g_5 t_5 c_5 [g](a_5 g_5 t_5 c_5 g)_2

And the sometimes significant consequences of reconstructing the same sequence with parameter sets fit to MD simulations with different protocols can be observed by comparing reconstructions of this single sequence with two different paramsets, e.g. PS2 and PS4.

Some documentation

Given the input sequence, and a selected parameter set from the dropdown menu next to the sequence input, you can view the ground state of the DNA in the '3D View' tab or in the '2D Plots' tab. Once a sequence or sequences have been entered, the arrow on the right of the darker blue navigation bar can be used to show or hide the the header panel for more or less viewing space.

Viewer controls

The controls for the 3D viewer are mainly mouse-based:
Rotate
left click and drag, or single finger drag on mobile.
Zoom
middle mouse click, scroll, or pinch on mobile.
Pan
shift+left click and drag, or triple finger drag on mobile.
Context menu
Accessed by right click or command+click on macOS, the viewer context menu gives access to:
Center on this base
Rotates and translates the DNA such that the base frame of the selected base coincides with the origin.
Center on this base pair
Rotates and translates the DNA such that the base pair frame of the selected base coincides with the origin.
Look at center/origin
Rotates the camera such that the origin is centred in the viewport.

The drop down menu to the right of the 3D view pane provides various viewing options. The cgDNA model assumes rigid bases, and each base can be visualised as its individual (heavy) atoms, or as a (red) frame (i.e. an origin and three orthonormal vectors embedded in the base according to Curves+ conventions), or as a box surrounding the base atoms, colour coded for sequence. Similarly the base pair, which is an appropriate average of the two associated rigid bases, can be visualized as either a (grey) frame or a surrounding bicolour box. When each base is selected with the mouse an inset window reports the strand location and composition of each base. The 3D view is centred on the basepair frame selected by [ ] in the input sequence, and this frame is highlighted by a larger RGB frame. The backbones are simple numerical interpolations of the base frame positions.

Sufficently long sequences eventually become unwieldy to manipulate interactively in the 3D viewer particularly when visualizing individual atoms. However long scale, large overall nontrivial shapes in sequences of up to 1Kbp are easily observed, for example in the presence of phased A-tracts.

2D plots pane

In this panel the ground state values of the twelve Curves+ helicoidal internal coordinates (6 inter base-pair, or junction, coordinates and 6 intra base-pair coordinates, each with three translation and three rotational degrees of freedom) plotted along each input sequence. Translational coordinates are reported in Ångströms with rotational coordinates (or components of the rotational Cayley vector) reported either in degrees or in cgDNA internal units of 1/5 radians or approximately 11 degrees. (This nonstandard rotational unit gives a good scaling between rotational and translational stiffnesses of DNA.) Placing the cursor over one of the plots returns the numerical values in the plot along with the local sequence context. For very long sequences the 2D plots become less useful.

Parameter sets

Parameters in the cgDNA model are estimated from large scale libraries of Molecular Dynamics (or MD) simulations. Differences in libraries can reflect different solvent conditions, such as temperature and ion species and concentration, or different MD simulation force fields and protocols, or different libraries of sequences in the training set. There are also differences in the choice of objective function and procedures for fitting cgDNA model parameters to given MD training set simulation data. All of the issues are discussed at length in the citations below. The four paramsets currently accessible through the cgDNAweb page can be briefly described as follows:

If in doubt, use parameter set 4 i.e. PS4.

Export pane

It is possible to export the configuration and other data in multiple ways, by selecting an entry in the 'Export' menu.

Atom coordinates as .pdb
This will download a .pdb format file with all of the heavy atoms in the DNA bases. These atoms have positions as specified by the Tsukuba convention. The phosphodiesther backbones are not provided. Note that, due to the available character length in the PDB format specification for atom coordinates, the output generated directly here for long DNA molecules may be incorrect. Up to 300bp or so in length is a conservative estimate below which no problem wil arise. Absolute coordinates of base and basepair frame origins are still available for longer length molecules in the .fra output format. And from .fra output absolute PDB coordinates of each atom can be computed locally by suitably modifying the available cgDNA scripts.

Frames as .fra file
This will download a Curves+ .fra format file describing the configuration of coordinate frames in the molecule. The file contains one line per frame and each line contains the following columns: <no of strand> <no of basepair> <d1_1> <d1_2> <d1_3> <d2_1> <d2_2> <d2_3> <d3_1> <d3_2> <d3_3> <r_1> <r_2> <r_3> where the triples d1,d2,d3 are the (absolute, or laboratory, nondimensional) coordinates of the three vectors making up each orthonormal frame, and r is the (absolute) position of the frame (in Angstroms).

Basepair frames as .fra file
The same as above, but for the base-pair frames.

Shapes as .svg
Downloads a zipped graphics file with all the plots of the internal coordinates as they are shown in the '2D Plots' tab, but without the interactivity.

Shapes as .txt
Downloads the internal configuration of the molecule as it appears in the cgDNA model. That is, a column vector containing (x_1, y_1, x_2, ..., y_{n-1}, x_n), where both the x_i and y_i are real 6-vectors, and x_i contains the 6 intra coordinates, and y_i the 6 inter coordinates, with translations in Angstrom, and the rotations in rad/5.

Stiffness matrix in CSC format, stored in JSON
This will download a zipped file with all the stiffness matrices of the input sequences in compressed sparse column format in a JSON file. This can be used by SciPy by also using the built in JSON decoder.

There are currently some issues with the download functions under the Safari web browser, so we recommend using Firefox or Chrome to access this functionality.

Stiffness Matrices

The cgDNA model predicts both a ground state configuration and a banded stiffness matrix for each input sequence. However cgDNAweb focuses on visualising the ground state shape in various ways, and only provides the stiffness matrix via download of a file with its numerical entries. This data may then be post-processed as wished. One possibility is to make plots of the eigenvalues of the stiffness matrix, as indicated in the figure below (in the scaling with translation variables in Angstroms and rotations in 1/5th radians). Another is to download and use the .txt ground state shape and associated .csc/.json stiffness matrix as inputs to a Monte Carlo simulation code (such as cgDNAmc) for sampling the cgDNA Gaussian PDF in order to compute various expectations of interest, e.g. sequence-dependent persistence lengths as described in [6]. However, for extensive post-processing simulations it is probably more efficient to install and use the available cgDNA Matlab/Octave scripts directly rather than using the cgDNAweb interface. Stiffness spectra

Figure The cgDNA stiffness matrix for a sequence of length N has 12N-6 positive eigenvalues, all real and positive, which can be sorted in ascending order and plotted against their index (scaled to always lie in [0,1] to facilitate comparison of spectra for sequences of different lengths). Spectra are here compared for four sequences A_300 (blue), (AT)_150 (green), the Widom 601 positioning sequence (see article, purple) and the A-tract sequence (A_5 G_5 T_5 C_5 T)_15 (red). The inset in the top left is a magnification of the lower part of the spectra.

Relevant citations

[1] cgDNAweb: a web interface to the cgDNA sequence-dependent coarse-grain model of double-stranded DNA.
L. De Bruin, J.H. Maddocks
Nucleic Acids Research 46, issue W1 (2018), p. W5-W10
DOI:10.1093/nar/gky351

[2] A sequence-dependent rigid-base model of DNA
O. Gonzalez, D. Petkevičiūtė, J.H. Maddocks
Journal of Chemical Physics 138, no. 5 (2013), p. 055122 1-28
DOI:10.1063/1.4789411

[3] cgDNA: a software package for the prediction of sequence-dependent coarse-grain free energies of B-form DNA
D. Petkevičiūtė, M. Pasi, O. Gonzalez, J.H. Maddocks
Nucleic Acids Research 42, no. 20 (2014), p. e153
DOI:10.1093/nar/gku825

[4] μABC: a systematic microsecond molecular dynamics study of tetranucleotide sequence effects in B-DNA
M. Pasi, J.H. Maddocks, D. Beveridge, T.C. Bishop, D.A. Case, T. Cheatham III, P.D. Dans, B. Jayaram, F. Lankas, C. Laughton, J. Mitchell, R. Osman, M. Orozco, A. Pérez, D. Petkevičiūtė, N. Spackova, J. Sponer, K. Zakrzewska, R. Lavery
Nucleic Acids Research 42, no. 19 (2014), p. 12272–12283
DOI:10.1093/nar/gku855

[5] Absolute versus relative entropy parameter estimation in a coarse-grain model of DNA
O. Gonzalez, M. Pasi, D. Petkevičiūtė, J. Glowacki, J.H. Maddocks
Multiscale Modeling and Simulation 15, no. 3 (2017), p. 1073 - 1107
DOI:10.1137/16M1086091

[6] Sequence-Dependent Persistence Lengths of DNA
J.S. Mitchell, J. Glowacki, A.E. Grandchamp, R.S. Manning, J.H. Maddocks
Journal of Chemical Theory and Computation 13 (2017), p. 1539-1555
DOI:10.1021/acs.jctc.6b00904

Browser support

This website has been tested and works on Chrome (v. 58), Firefox (v. 54) and Safari (v. 9.1.2), on macOS version up to 10.12 (Sierra). Other browsers may not have full functionality. In particular Internet Explorer 8 and 9 are not supported.

The 3D viewing pane of cgDNAweb does not function on the Safari browser on macOS 10.13 (High Sierra) perhaps due to documented issues with the WebGL library. Chrome and Firefox do run cgDNAweb satisfactory on the same version of macOS.

Resources used

This website used three.js for visualisation of DNA. Plots are generated with the help of d3.js. Symbols shown are from Font Awesome. Zips are generated with zip.js. Files are saved with FileSaver.js. Linear algebra calculations done with numeric.js.

Center on this base
Center on this basepair
Look at center/origin
sequence #:
basepair #:
nucleotide:
DNA strand: