Structures¶
The Structure class is the fundamental class in our modules, and will
probably be used in all of the code you write. Structures
can be single
molecules or groups of molecules. They provide access to atoms, bonds,
properties, and a number of substructure elements.
Like any other python object, Structures
can be stored in arrays or
dictionaries, assigned to variables, and passed between functions. (However,
they cannot be pickled because they wrap an underlying C library.)
In principle, Structures
can be created programmatically, by creating a
zero-atom structure, adding the desired atoms and connecting
them with bonds. However, this usage pattern is atypical. In most cases a
Structure
will be loaded from a file or retrieved from the
Maestro Workspace or the Maestro Project
Table.
Most Schrödinger calculations will produce a Maestro-format output file.
Creating a Structure
object from one of these files will allow you to
investigate the properties and structure of the resulting molecule or
molecules.
Structure Class Organization¶
Structures
expose many attributes as iterators, including atoms, bonds,
and substructure elements. Structures
, atoms, and bonds each have general
dictionary-like property attributes that can
store properties associated with the specific object.
The Structure
class itself doesn’t have many commonly used attributes
associated with it. See the API documentation for more detail on its
properties and methods.
Atoms and Bonds¶
All Structures
have a list-like atom
attribute that can be used to
iterate over all atoms or to access them by index. For example:
Note
In this example and those below, we use st
as the standard variable
name for a Structure
instance.
# Print the names and atomic numbers of all the atoms in the Structure
for atom in st.atom:
print "%s: %d" % (atom.name, atom.atomic_number)
# Print the name and atomic number of the first atom in the Structure.
# Indexing starts at 1.
print "%s: %d" % (st.atom[1].name, st.atom[1].atomic_number)
Each atom is represented by a _StructureAtom class. This class is
“private” (i.e. named with a leading underscore) because you won’t be
creating it directly. It isn’t possible for a _StructureAtom
object to
exist independently of a Structure
object, and so they can only be
accessed from an existing Structure
.
Some attributes (actually Python properties) of the
_StructureAtom
objects include name
, atomic_number
, formal_charge
,
and the Cartesian coordinates in x
, y
, and z
. See the _StructureAtom
properties for a full list.
Each atom also has a list-like bond attribute:
for atom in st.atom:
print "atom %d is bonded to:" % atom.index
for bond in atom.bond:
print " atom %d" % bond.atom2.index
Bonds are represented by the _StructureBond class. Important attributes
of the bond class include order
, atom1
, and atom2
. See the
_StructureBond properties for full documentation.
Bonds within the structure are also accessible from a list-like attribute of
Structure
called bond
. This access is useful for cases where you want to
iterate over all bonds in a Structure exactly once.
# It's possible to iterate over all bonds in a structure:
for bond in st.bond:
print "Bonded atoms: %d and %d" % (bond.atom1.index, bond.atom2.index)
Properties¶
Structures, atoms, and bonds each have the ability to store properties in a
dictionary-like attribute named property
.
The property names in this property
object must follow a pattern that is
required for storage in Maestro-format files. The required naming scheme is
type_author_property_name
, where type
is a data type
prefix, author
is a source specification, and
property_name
is the actual name of the data. The type
prefix must be b
for boolean, i
for integer, r
for real, and
s
for string. The source specification is typically a Schrödinger
program abbreviation (e.g. m
for Maestro and j
for Jaguar) and the
appropriate user-level source specification is user
. (In Maestro-format
files, the Structure
property names correspond to the properties listed
under the f_m_ct {
line.)
This example shows how to access, set, and delete Structure
properties:
# 'r_j_Gas_Phase_Energy' is a real property set by Jaguar.
gas_phase_energy = st.property['r_j_Gas_Phase_Energy']
# Properties stored by the user should use an "author" of 'user'.
st.property['r_user_Energy_Plus_Two'] = gas_phase_energy + 2.0
# Delete the new 'r_user_Energy_Plus_Two' property.
del st.property['r_user_Energy_Plus_Two']
Because the property
objects are dictionary subclasses, the standard
dictionary methods like keys
and items
also work.
Properties of atoms work the same way. For example, the property
b_fragmol_attachment
is set by fragment_molecule.py
(available
on the Script Center). The property is True for atoms that were bonded in
the input structure but whose bond is broken in the output structures.
for atom in st.atom:
if atom.property['b_fragmol_attachment']:
print "Atom %s unattached." % (atom.name)
print "Coordinates: %f, %f, %f" % (atom.x, atom.y, atom.z)
Bonds also have a property
attribute for general property storage and
retrieval, although they don’t have commonly-used built-in properties.
Substructures¶
A number of “substructure iterators” are available from each Structure
.
Each of these iterators returns an instance of a non-public class that is a
view on the substructure contained within the Structure
instance. Each
substructure class has an extractStructure
method that can be used to
create a new and independent Structure
with the atoms in the substructure.
The also have getAtomList
methods to return a list of atom indices
corresponding to the substructure.
- molecule
- Iterates over individual molecules. Returns a _Molecule instance.
- chain
- Iterates over protein chains in the
Structure
. Returns a _Chain instance. - residue
- Iterates over protein residues in the
Structure
. Returns a _Residue instance. - ring
- Iterates over all rings in the
Structure
, as found by SSSR. Returns a _Ring instance. (The Structure.find_rings method implements similar functionality but returns a list of lists of ints to identify the rings, with each int being an atom index.)
For example:
print "The structure has %d molecules." % len(st.molecule)
for mol in st.molecule:
print "Molecule %d has %d atoms." % (mol.number, len(mol.atom))
The _Molecule
and _Chain
instances also support their own residue
iterators. For example:
for chain in st.chain:
residues = []
for residue in chain.residue:
residues.append(residue.getCode())
print "chain %s: %s" % (chain.name, "".join(residues))
Structure I/O¶
Reading a Structure from a File¶
The schrodinger.structure.StructureReader class creates
Structure
objects from molecular data stored in a number of standard file
formats. Supported file types are Maestro, MDL SD, PDB, and Sybyl Mol2.
Because these files may contain multiple molecules, the StructureReader
is
an iterator, and molecule files are presented as a sequence of Structure
objects.
from schrodinger import structure
#Input can be a .mae, .sdf, .sd, .pdb, or .mol2 file.
input_file = "input.mae"
for st in structure.StructureReader(input_file):
# Do something with the Structure...
result = process_structure(st)
# To read only the first structure from a file, use the next() method.
st = structure.StructureReader(input_file).next()
SMILES format
files and CSV files with SMILES data are also supported, but because
these have no structural data, resulting structures are SmilesStructures,
which have less functionality than standard Structures
. See the
SmilesReader and SmilesCsvReader documentation.
Saving a Structure to a File¶
The StructureWriter class is the counterpart to the StructureReader
.
This is an example of a typical read, process, and write script:
from schrodinger import structure
# Open the input file with a StructureReader.
reader = structure.StructureReader("input.mae")
# Open the output file with a StructureWriter.
writer = structure.StructureWriter("output.mae")
for st in reader:
# Do the required processing.
result_structure = do_processing(st)
# Save the result to the output file.
writer.append(result_structure)
# The files associated with the reader and writer will be closed
# automatically when they are garbage collected, but it is good practice
# to close them explicitly.
writer.close()
reader.close()
Alternatively, if only a single structure is being written to a file, you can use the Structure.write method:
st.write("output.mae")
Structure Operations¶
In addition to the functionality provided in the schrodinger.structure module itself, much is provided in the schrodinger.structutils package.
This section lists some additional Structure
features and a few highlights
of the structutils
package.
Structure Minimization¶
Structures can be minimized using one of the OPLS_2005, PFF_2005, or OPLS_2001 force fields by using the minimize_structure function. This operation requires a valid product license from MacroModel, GLIDE, Impact, or PLOP. Note that minimization will not hold on to a license; a license is checked out to ensure that one is available, then immediately checked back in.
For example, to compare the energy of a molecule before and after minimization:
from schrodinger.structutils.minimize import minimize_structure
# Set the energy property name
energy_name = 'r_ff_Potential_Energy-OPLS_2005'
# Do a 0-step "minimization" to get the initial energy.
minimize_structure(st, max_steps=0)
original_energy = st.property[energy_name]
minimize_structure(st)
minimized_energy = st.property[energy_name]
print "The minimized energy is %f kcal/mol lower than the original." % (
original_energy - minimized_energy)
Substructure Searching or Specification¶
Generate SMILES, SMARTS, or ASL strings based on a set of atom indices via the generate_smiles, generate_smarts, and generate_asl functions. Documentation on ASL can be found in the Maestro Command Reference Manual.
Evaluate SMARTS or ASL strings and return a list of matching atom indices via the evaluate_smarts and evaluate_asl functions.
This example finds the set of unique SMILES strings in a structure file:
from schrodinger.structutils.analyze import generate_smiles
unique_smiles = set()
for st in reader:
pattern = generate_smiles(st)
if pattern not in unique_smiles:
unique_smiles.add(pattern)
Structure Measurement¶
The schrodinger.structutils.measure module provides functions for measuring distances, angles, dihedral angles, and plane angles. It also offers the get_close_atoms method to find all pairs of atoms within a specified distance in less than O(N 2) time.
Structure Superimposition or Comparison¶
The in-place RMSD of two structures can be determined via the calculate_in_place_rmsd function. The ConformerRmsd class offers more complete RMSD comparison tools for conformers.
Two structures can be superimposed based on all atoms or a subset of atoms with the superimpose function.
Conversion Between 1D/2D and 3D Structures¶
To convert a 3D structure to a 1D structure (SMILES or SMARTS), use the appropriate function from schrodinger.structutils.analyze:
from schrodinger.structutils import analyze
smiles_list = []
smarts_list = []
for st in reader:
smiles_list.append(analyze.generate_smiles(st))
smarts_list.append(analyze.generate_smarts(st))
To convert a file of 1D SMILES strings to 3D structures, use LigPrep. The following example also uses job control in order to ensure that the job fully completes before the code continues:
import schrodinger.job.jobcontrol as jc
from schrodinger import structure
cmd = ["ligprep", "-ismi", smiles_input, "-omae", ligprep_output]
job = jc.launch_job(cmd)
print ("LigPrep job status: %s" % job.Status)
job.wait()
print ("LigPrep job status: %s" % job.Status)
reader = structure.StructureReader(ligprep_output)
for st in reader:
process(st)
To convert a 3D structure to a 2D structure, use the canvasConvert
utility from the command line:
$SCHRODINGER/utilities/canvasConvert -imae input.mae -2D -osd output.sd
The resulting SD file can then be read back in with the StructureReader class.
Modifying a Structure¶
Note
The >>>
prefix in the examples that follow is the interactive
prompt. Examples without the prompt are snippets
of scripts.
Atoms can be added via the Structure.addAtoms method.
Individual atoms can be deleted with standard python list syntax:
>>> st_copy = st.copy()
>>> print len(st.atom)
5
>>> del st.atom[5]
>>> print len(st.atom)
4
Note
Deleting atoms changes the indices of the atoms remaining in the
Structure
.
Because deleting atoms renumbers the remaining atoms, multiple atoms should be deleted via the Structure.deleteAtoms method.
>>> print len(st.atom)
14
>>> st.deleteAtoms([1, 2, 3, 4])
>>> print len(st.atom)
10
Charges and atom identity can be modified by making assignments to the
proper _StructureAtom
attributes:
>>> print st.atom[1].element
'C'
>>> print st.atom[1].atomic_number
6
>>> print st.atom[1].formal_charge
0
>>> st.atom[1].element = 'N'
>>> st.atom[1].formal_charge = 1
>>> print st.atom[1].formal_charge
1
>>> print st.atom[1].atomic_number
7
>>> st.atom[1].atomic_number = 6
>>> print st.atom[1].element
'C'
As can be seen from the above examples, changing the atomic_number
or
element
attributes automatically updates the associated value.
Bonds can be broken or created. For example:
# To avoid modifying the original structure, make a copy.
st = st_orig.copy()
# Break and re-join the first bond on the first atom.
bond = st.atom[1].bond[1]
atom1 = bond.atom1.index
atom2 = bond.atom2.index
order = bond.order
st.deleteBond(atom1, atom2) # Delete the bond.
st.addBond(atom1, atom2, order) # Recreate bond with same bond order.
Hydrogens can be added via the add_hydrogens function, or deleted via the delete_hydrogens function.
Note
Changing formal charge, atomic identity (via element
,
atomic_number
, or atom_type
), breaking or forming bonds, or
changing bond orders all require retyping the atoms involved. This can
be accomplished via the Structure.retype method. This can be an
expensive operation, so is not automatically invoked.