schrodinger.analysis.visanalysis package

Introduction

Molecular structures reside in a three-dimensional world and naturally enough give rise to three-dimensional data. Much of this data can be represented on a three-dimensional grid of points and visualised, either via isosurfaces, or more recently by advanced volumetric rendering options built into packages such as PyMol and Maestro. Currently however, the Schrodinger Python APIs do not provide an easy method of generating or handling such data. This library seeks to address this deficiency by providing a simple method of creating and handling three-dimensional ‘volumetric-data’ as well as code to allow such data to be exported to packages such as PyMol and Maestro for visualisation.

Handling Three-Dimensional Data

Three-dimensional arrays are trivially handled in most programming languages, Python is no exception to this, with the Numpy array class providing all of the basic functionality that we require in terms of low-level data allocation and manipulation. However, like most implementations, the Numpy array class only allows data points to reside on zero-based, integer indices, something which is inconvenient when we need to be able to manipulate data in terms of real-world coordinates, which will frequently involve fractional values. Consequently we need to generate the concept of world-coordinates and array-coordinates, with their associated coordinate-frames.

Array Coordinates

Array-coordinates are integer coordinates specifying a given offset in the array. The set of valid array coordinates is bounded by zero at the lower end and the size of the array at the upper end. The size of the array is denoted as N throughout the code.

World Coordinates

World-coordinates are floating-point coordinates representing a position in space. The set of valid world coordinates is not bound by anything, however, only a certain fraction of world-coordinates will correspond to a valid array-coordinate. In reality there are three classes of world-coordinate specification:

1. Directly mappable world-coordinate. A directly mappable world-coordinate corresponds precisely to a valid location in array-coordinates. The value of the data at this location can be found by reading the underlying array directly. 2. Interpolatable world-coordinate. An interpolatable world-coordinate corresponds to a location that is in the bounds of the array-coordinates, but which does not correspond to a valid array coordinate, i.e. it lies between valid array-coordinates. A value cannot be retrieved directly from the underlying array in this case, however a valid value may be found via interpolation. 3. Invalid world-coordinate. An invalid world-coordinate corresponds to a location that lies outside of the bounds of the array-coordinates. A value cannot be retrieved directly from the underlying array in this case, additionally no value can be found via interpolation. Such situations may be handled by returning a default value (frequently zero) or by raising an error condition.

Converting Between Coordinate-Frames

Naturally it will be necessary to convert between the array-coordinate frame and the world-coordinate frame (and back again). Fortunately this is very simple as naturally enough there is a linear relationship between them:

world = resolution * array + origin

array = ( world - origin ) / resolution

Of course in creating these conversions we have introduced the notion of resolution and the origin. These concepts are simple enough, the resolution specifies how quickly the world-coordinate changes for a change in the array-coordinate, while the origin moves the array-coordinate around the world-coordinate frame.

Code Examples

Format Conversion

Loading a .vis File for Access in Python

The VolumeDataIO module contains the required functionality. Two lines of code are required:

import volumedataio as vdio
volumeData = vdio.LoadVisFile( "filename.vis" )
# ...use the volumeData object

Conversion to CCP4 Format

Pymol cannot read Maestro .vis files natively, it can however manipulate .ccp4 file with ease. The conversion of a .vis file to a .ccp4 file can be accomplished as follows:

import volumedataio as vdio
volumeData = vdio.LoadVisFile( "filename.vis" )
vdio.SaveCCP4File( volumeData, "filename.ccp4" )

Accessing Basic VolumeData Properties

Basic properties of a VolumeData object can be retrieved as follows:

# Assume that we have an initialised VolumeData instance called
# volumeData
print "volumeData X-Axis Resolution: %f" %      volumeData.CoordinateFrame.X.resolution
print "volumeData Y-Axis Resolution: %f" %      volumeData.CoordinateFrame.Y.resolution
print "volumeData Z-Axis Resolution: %f" %      volumeData.CoordinateFrame.Z.resolution
print "volumeData X-Axis Origin: %f" %          volumeData.CoordinateFrame.X.origin
print "volumeData Y-Axis Origin: %f" %          volumeData.CoordinateFrame.Y.origin
print "volumeData Z-Axis Origin: %f" %          volumeData.CoordinateFrame.Z.origin

Accessing the Underlying Data:

# Assume that we have an initialised VolumeData instance called
# volumeData

# Access by valid array-coordinate (fast, but limited)
print "Array-coordinate access %f" %            volumeData.getData()[ x ][ y ][ z ]

# Access by array-coordinate (slow, but utterly safe, performs
# interpolation )
print "Array-coordinate access %f " %           volumeData.getAtArrayCoordinate( ( x, y, z ) )

# Access by world-coordinate (slow, but utterly safe, performs
# interpolation ).
print "World-coordinate access %f" %            volumeData.getAtWorldCoordinate( ( x, y, z ) )

# Setting data-points

# All at once, theNewDataArray must be a correctly sized 3D-array.
volumeData.setData( theNewDataArray )

# Individually.
volumeData.getData()[ x ][ y ][ z ] = value

Create a vdW-Mask Around a Molecule

A vdW-mask stores 1.0 for all positions within the vdW-surface of a molecule and 0.0 for all other positions. It is a useful concept when marking out regions of interest and for higher-level analyses and visualisations:

import volumedataio as vdio
import volumedatastructureutils as vdsutils
import schrodinger.structure as structure

st = structure.Structure.read( "test.mae" )

# The volumedatastructureutils module allows us to generate
# bounding-box information from a structure easily.
# NOTE: We have to specify the resolution, in this case 0.5A.
# We also extend the overall VolumeData object by 3.0A in all
# directions. This allows for the vdW-radii of the atoms.

N, resolution, origin = vdsutils.BoundingInformation(
                                        st,
                                        resolution = ( 0.5, 0.5, 0.5 ),
                                        extend = ( 3.0, 3.0, 3.0 ) )

mask = vdsutils.VDWMask( st.atom,
                                         N = N,
                                         resolution = resolution,
                                         origin = origin )

vdio.SaveVisFile( mask, "mask.vis" )

Analysing a Conformational-Ensemble

VolumeData objects can be used to visualise the actual volume occupied by a conformational-ensemble:

import volumedata as vd
import volumedataio as vdio
import volumedatastructureutils as vdsutils
import numpy as np
import schrodinger.structure as structure

# Find the bounding-box of the conformational-ensemble. We'll assume
# that there is some basic alignment of the structures in 3D-space.

N, resolution, origin =         vdsutils.BoundingInformationL(
                structure.StructureReader( "test-confs.mae" ),
                resolution = ( 0.5, 0.5, 0.5 ),
                extend = ( 3.0, 3.0, 3.0 ) )

# Create the accumulator volume in accordance with the calculated
# bounding information. This accumulator volume will show the
# regions of space occupied by the ensemble.

accumulator = vd.VolumeData( N = N,
                                resolution = resolution,
                                                origin = origin )

# Loop across the ensemble once more, calculate the quantities
# needed to evaluate Boltzmann fractions.
fractionAccumulator = 0.0
fractions = list()
for st in structure.StructureReader( "test-confs.mae" ):
        energy = st.property[ "r_mmod_Relative_Potential_Energy-OPLS3" ]
        fraction = np.exp( -energy / (0.0083144621 * 300 ) )
        fractionAccumulator += fraction
        fractions.append( fraction )

# Now perform the accumulation of the volumes.
for i, st in            enumerate( structure.StructureReader( "test-confs.mae" ) ):

        mask = vdsutils.VDWMask( st.atom,
                                                N = N,
                                                resolution = resolution,
                                                origin = origin )
        mask *= fractions[ i ] / fractionAccumulator
        accumulator += mask

vdio.SaveVisFile( accumulator, "test-confs.vis" )

Manipulating SiteMap Volumes

Differences between SiteMaps can be important for locating regions where proteins have varying requirements:

import volumedataio as vdio
import volumedatautils as vdutils
import numpy as np

# Load the hydrophobic maps for the two proteins (SiteMaps are made
# up of many maps types, but this will suffice for an example).
pdkPhob = vdio.LoadVisFile( "PDK1_site_1_phob.vis" )
gsk3Phob = vdio.LoadVisFile( "GSK3_site_1_phob.vis" )

# Make the two maps consistent.
pdkPhob, gsk3Phob =             vdutils.MakeConsistent(
                [ pdkPhob, gsk3Phob ],
                resolutionMode = "highest",
                interpolationOrder = 2,
                oobMethod = "constant",
                oobConstant = 0.0 )

# Subtract the maps, then form the absolute difference.
difference = vdutils.CreateLike( pdkPhob )
difference.setData( pdkPhob.getData() - gsk3Phob.getData() )
difference.setData( np.fabs( difference.getData() ) )

# Save the resulting difference map.
vdio.SaveVisFile( difference, "PDK-GSK3-phob-diff.vis" )