schrodinger.protein.sequence module¶

Implementation of ProteinSequence, Sequence, and StructureSequence class.

StructureSequence allows iteration over all sequences in a given protein CT, and iteration over residues of each (in sequence order).

class schrodinger.protein.sequence.Inclusion¶

Bases: enum.Enum

Excluded = 1¶

FullyVisible = 4¶

NotVisible = 2¶

PartiallyVisible = 3¶

class schrodinger.protein.sequence.NucleicAcidSequence(elements, **kwargs)¶

Bases: schrodinger.protein.sequence.ProteinSequence

AnnotationClass¶: alias of NucleicAcidSequenceAnnotations

ElementClass¶: alias of Nucleotide

alphabet = {'TMP': NucleotideType('T', 'TMP', 'Thymine'), 'CTP': NucleotideType('C', 'CTP', 'Cytosine'), 'UMP': NucleotideType('U', 'UMP', 'Uracil'), 'DI': ResidueType('DI', 'DI', 'Unknown'), 'DG': NucleotideType('G', 'DG', 'Guanine'), 'DC': NucleotideType('C', 'DC', 'Cytosine'), 'DA': NucleotideType('A', 'DA', 'Adenine'), '2MG': NucleotideType('G', '2MG', 'Guanine'), 'ATP': NucleotideType('A', 'ATP', 'Adenine'), '1MA': NucleotideType('A', '1MA', 'Adenine'), 'DT': NucleotideType('T', 'DT', 'Thymine'), 'DU': NucleotideType('U', 'DU', 'Uracil'), '5MU': NucleotideType('U', '5MU', 'Uracil'), '1MG': NucleotideType('G', '1MG', 'Guanine'), 'CMP': NucleotideType('C', 'CMP', 'Cytosine'), 'G': NucleotideType('G', 'G', 'Guanine'), 'TTP': NucleotideType('T', 'TTP', 'Thymine'), 'AMP': NucleotideType('A', 'AMP', 'Adenine'), 'M2G': NucleotideType('G', 'M2G', 'Guanine'), '7MG': NucleotideType('G', '7MG', 'Guanine'), 'A': NucleotideType('A', 'A', 'Adenine'), 'GDP': NucleotideType('G', 'GDP', 'Guanine'), 'C': NucleotideType('C', 'C', 'Cytosine'), '5HC': NucleotideType('C', '5HC', 'Cytosine'), 'ADP': NucleotideType('A', 'ADP', 'Adenine'), 'I': ResidueType('I', 'I', 'Unknown'), '5FC': NucleotideType('C', '5FC', 'Cytosine'), '6MA': NucleotideType('A', '6MA', 'Adenine'), 'GTP': NucleotideType('G', 'GTP', 'Guanine'), 'UTP': NucleotideType('U', 'UTP', 'Uracil'), 'U': NucleotideType('U', 'U', 'Uracil'), 'GMP': NucleotideType('G', 'GMP', 'Guanine'), 'OMC': NucleotideType('C', 'OMC', 'Cytosine'), 'UDP': NucleotideType('U', 'UDP', 'Uracil'), 'OMG': NucleotideType('G', 'OMG', 'Guanine'), 'H2U': NucleotideType('U', 'H2U', 'Uracil'), '1CC': NucleotideType('C', '1CC', 'Cytosine'), 'YYG': ResidueType('X', 'YYG', 'Unknown'), 'CDP': NucleotideType('C', 'CDP', 'Cytosine'), '5MC': NucleotideType('C', '5MC', 'Cytosine'), 'TDP': NucleotideType('T', 'TDP', 'Thymine'), 'PSU': NucleotideType('Ψ', 'PSU', 'Uracil')}¶

class schrodinger.protein.sequence.ProteinSequence(elements, name='', origin=None, entry_id='', entry_name='', pdb_id='', chain='', title='')¶

Bases: schrodinger.protein.sequence.Sequence

AnnotationClass¶: alias of ProteinSequenceAnnotations

ElementClass¶: alias of Residue

class SSA_POS_TYPE¶

Bases: enum.Enum

END = 3¶

MIDDLE = 2¶

START = 1¶

alphabet = {'PAQ': ResidueType('Y', 'PAQ', 'Tyrosine'), 'AGM': ResidueType('R', 'AGM', 'Arginine'), 'PR3': ResidueType('C', 'PR3', 'Cysteine'), 'CCS': ResidueType('C', 'CCS', 'Cysteine'), 'GSC': ResidueType('G', 'GSC', 'Glycine'), 'ILE': ResidueType('I', 'ILE', 'Isoleucine'), 'TIH': ResidueType('A', 'TIH', 'Alanine'), 'C6C': ResidueType('C', 'C6C', 'Cysteine'), 'MIS': ResidueType('S', 'MIS', 'Serine'), 'FME': ResidueType('M', 'FME', 'Methionine'), 'LYM': ResidueType('K', 'LYM', 'Lysine'), 'HSD': ResidueType('H', 'HSD', 'Histidine'), 'LYS': ResidueType('K', 'LYS', 'Lysine'), 'SAC': ResidueType('S', 'SAC', 'Serine'), 'PRO': ResidueType('P', 'PRO', 'Proline'), 'LYZ': ResidueType('K', 'LYZ', 'Lysine'), 'HSP': ResidueType('H', 'HSP', 'Histidine'), 'DCY': ResidueType('X', 'DCY', 'Cysteine'), 'SAR': ResidueType('G', 'SAR', 'Glycine'), 'LYN': ResidueType('K', 'LYN', 'Lysine'), 'D': ResidueType('D', 'ASP', 'Aspartic acid'), 'VAL': ResidueType('V', 'VAL', 'Valine'), 'CHG': ResidueType('A', 'CHG', 'Alanine'), 'TPO': ResidueType('t', 'TPO', 'Threonine'), 'H': ResidueType('H', 'HIS', 'Histidine'), 'HAC': ResidueType('A', 'HAC', 'Alanine'), 'AYA': ResidueType('A', 'AYA', 'Alanine'), 'L': ResidueType('L', 'LEU', 'Leucine'), 'SVA': ResidueType('S', 'SVA', 'Serine'), 'THO': ResidueType('T', 'THO', 'Threonine'), 'P': ResidueType('P', 'PRO', 'Proline'), 'ALM': ResidueType('A', 'ALM', 'Alanine'), 'T': ResidueType('T', 'THR', 'Threonine'), 'TPQ': ResidueType('A', 'TPQ', 'Alanine'), 'HAR': ResidueType('R', 'HAR', 'Arginine'), 'ACE': ResidueType('X', 'ACE', 'Capping Group'), 'TYM': ResidueType('Y', 'TYM', 'Tyrosine'), 'PHI': ResidueType('F', 'PHI', 'Phenylalanine'), 'TYO': ResidueType('Y', 'TYO', 'Tyrosine'), 'PHL': ResidueType('F', 'PHL', 'Phenylalanine'), 'PHE': ResidueType('F', 'PHE', 'Phenylalanine'), 'PTR': ResidueType('y', 'PTR', 'Tyrosine'), 'MAA': ResidueType('A', 'MAA', 'Alanine'), 'NMA': ResidueType('X', 'NMA', 'Capping Group'), 'TYY': ResidueType('Y', 'TYY', 'Tyrosine'), 'OAS': ResidueType('S', 'OAS', 'Serine'), 'CXM': ResidueType('M', 'CXM', 'Methionine'), 'TYR': ResidueType('Y', 'TYR', 'Tyrosine'), 'TYS': ResidueType('Y', 'TYS', 'Tyrosine'), 'CY3': ResidueType('C', 'CY3', 'Cysteine'), 'DTH': ResidueType('X', 'DTH', 'Threonine'), 'CME': ResidueType('C', 'CME', 'Cysteine'), 'C': ResidueType('C', 'CYS', 'Cysteine'), 'DTY': ResidueType('X', 'DTY', 'Tyrosine'), '2AS': ResidueType('D', '2AS', 'Aspartic acid'), 'FLA': ResidueType('A', 'FLA', 'Alanine'), 'DTR': ResidueType('X', 'DTR', 'Tryptophan'), 'OCS': ResidueType('C', 'OCS', 'Cysteine'), 'PCA': ResidueType('E', 'PCA', 'Glutamic acid'), 'NLP': ResidueType('L', 'NLP', 'Leucine'), 'LLY': ResidueType('K', 'LLY', 'Lysine'), 'G': ResidueType('G', 'GLY', 'Glycine'), 'CEA': ResidueType('C', 'CEA', 'Cysteine'), 'LLP': ResidueType('K', 'LLP', 'Lysine'), 'HMR': ResidueType('R', 'HMR', 'Arginine'), 'GLU': ResidueType('E', 'GLU', 'Glutamic acid'), 'SCY': ResidueType('C', 'SCY', 'Cysteine'), 'BMT': ResidueType('T', 'BMT', 'Threonine'), 'BUC': ResidueType('C', 'BUC', 'Cysteine'), 'PEC': ResidueType('C', 'PEC', 'Cysteine'), 'BUG': ResidueType('L', 'BUG', 'Leucine'), 'SCS': ResidueType('C', 'SCS', 'Cysteine'), 'NLN': ResidueType('L', 'NLN', 'Leucine'), 'SHR': ResidueType('K', 'SHR', 'Lysine'), 'TRO': ResidueType('W', 'TRO', 'Tryptophan'), 'CSD': ResidueType('A', 'CSD', 'Alanine'), 'K': ResidueType('K', 'LYS', 'Lysine'), 'ALY': ResidueType('K', 'ALY', 'Lysine'), 'TRG': ResidueType('K', 'TRG', 'Lysine'), 'DSN': ResidueType('X', 'DSN', 'Serine'), 'S': ResidueType('S', 'SER', 'Serine'), 'SHC': ResidueType('C', 'SHC', 'Cysteine'), 'DSP': ResidueType('D', 'DSP', 'Aspartic acid'), 'W': ResidueType('W', 'TRP', 'Tryptophan'), 'DSG': ResidueType('X', 'DSG', 'Asparagine'), 'DLY': ResidueType('X', 'DLY', 'Lysine'), 'EFC': ResidueType('C', 'EFC', 'Cysteine'), 'CSP': ResidueType('C', 'CSP', 'Cysteine'), 'CSS': ResidueType('C', 'CSS', 'Cysteine'), 'ANF': ResidueType('X', 'ANF', 'Capping Group'), 'MPA': ResidueType('X', 'MPA', 'Capping Group'), 'HSE': ResidueType('H', 'HSE', 'Histidine'), 'TYQ': ResidueType('Y', 'TYQ', 'Tyrosine'), 'FCO': ResidueType('X', 'FCO', 'Capping Group'), 'C5C': ResidueType('C', 'C5C', 'Cysteine'), 'HTR': ResidueType('W', 'HTR', 'Tryptophan'), 'MPQ': ResidueType('G', 'MPQ', 'Glycine'), 'CYG': ResidueType('C', 'CYG', 'Cysteine'), 'KCX': ResidueType('K', 'KCX', 'Lysine'), 'CSX': ResidueType('C', 'CSX', 'Cysteine'), 'GLH': ResidueType('E', 'GLH', 'Glutamic acid'), 'NEM': ResidueType('H', 'NEM', 'Histidine'), 'GLN': ResidueType('Q', 'GLN', 'Glutamine'), 'DVA': ResidueType('X', 'DVA', 'Valine'), 'ACL': ResidueType('R', 'ACL', 'Arginine'), 'GLY': ResidueType('G', 'GLY', 'Glycine'), 'GLZ': ResidueType('G', 'GLZ', 'Glycine'), 'TRP': ResidueType('W', 'TRP', 'Tryptophan'), 'SMC': ResidueType('C', 'SMC', 'Cysteine'), 'CSW': ResidueType('C', 'CSW', 'Cysteine'), 'NEP': ResidueType('H', 'NEP', 'Histidine'), 'BCS': ResidueType('X', 'BCS', 'Cysteine'), 'ASQ': ResidueType('D', 'ASQ', 'Aspartic acid'), 'ASP': ResidueType('D', 'ASP', 'Aspartic acid'), 'SER': ResidueType('S', 'SER', 'Serine'), 'SEP': ResidueType('X', 'SEP', 'Serine'), 'DGN': ResidueType('X', 'DGN', 'Glutamine'), 'DGL': ResidueType('X', 'DGL', 'Glutamic acid'), 'MHS': ResidueType('H', 'MHS', 'Histidine'), 'ASB': ResidueType('D', 'ASB', 'Aspartic acid'), 'ASA': ResidueType('D', 'ASA', 'Aspartic acid'), 'NLE': ResidueType('X', 'NLE', 'Leucine'), 'LEU': ResidueType('L', 'LEU', 'Leucine'), 'ASK': ResidueType('D', 'ASK', 'Aspartic acid'), 'GGL': ResidueType('E', 'GGL', 'Glutamic acid'), 'SEL': ResidueType('S', 'SEL', 'Serine'), 'CGU': ResidueType('E', 'CGU', 'Glutamic acid'), 'ASN': ResidueType('N', 'ASN', 'Asparagine'), 'ASL': ResidueType('D', 'ASL', 'Aspartic acid'), 'LTR': ResidueType('W', 'LTR', 'Tryptophan'), 'F': ResidueType('F', 'PHE', 'Phenylalanine'), 'CLE': ResidueType('L', 'CLE', 'Leucine'), 'GMA': ResidueType('E', 'GMA', 'Glutamic acid'), 'PRR': ResidueType('A', 'PRR', 'Alanine'), '5HP': ResidueType('E', '5HP', 'Glutamic acid'), 'N': ResidueType('N', 'ASN', 'Asparagine'), 'DLE': ResidueType('X', 'DLE', 'Leucine'), 'MVA': ResidueType('V', 'MVA', 'Valine'), 'R': ResidueType('R', 'ARG', 'Arginine'), 'DNP': ResidueType('A', 'DNP', 'Alanine'), 'V': ResidueType('V', 'VAL', 'Valine'), 'UNK': ResidueType('X', 'UNK', 'Unknown'), 'TOSG': ResidueType('X', 'TOSG', 'Capping Group'), 'ALO': ResidueType('T', 'ALO', 'Threonine'), 'ASH': ResidueType('D', 'ASH', 'Aspartic acid'), 'MEN': ResidueType('N', 'MEN', 'Asparagine'), 'ALA': ResidueType('A', 'ALA', 'Alanine'), 'MET': ResidueType('M', 'MET', 'Methionine'), 'MMO': ResidueType('R', 'MMO', 'Arginine'), 'NMC': ResidueType('G', 'NMC', 'Glycine'), 'OMT': ResidueType('M', 'OMT', 'Methionine'), 'SET': ResidueType('S', 'SET', 'Serine'), 'GL3': ResidueType('G', 'GL3', 'Glycine'), 'DIL': ResidueType('X', 'DIL', 'Isoleucine'), '3AH': ResidueType('H', '3AH', 'Histidine'), 'DPR': ResidueType('X', 'DPR', 'Proline'), 'HYP': ResidueType('X', 'HYP', 'Proline'), 'IYR': ResidueType('Y', 'IYR', 'Tyrosine'), 'CSO': ResidueType('C', 'CSO', 'Cysteine'), 'DPN': ResidueType('X', 'DPN', 'Phenylalanine'), 'MSE': ResidueType('M', 'MSE', 'Methionine'), 'DIV': ResidueType('V', 'DIV', 'Valine'), 'MSA': ResidueType('G', 'MSA', 'Glycine'), 'AIB': ResidueType('A', 'AIB', 'Alanine'), 'CYS': ResidueType('C', 'CYS', 'Cysteine'), 'SOC': ResidueType('C', 'SOC', 'Cysteine'), 'CYP': ResidueType('C', 'CYP', 'Cysteine'), 'DAL': ResidueType('X', 'DAL', 'Alanine'), 'CYX': ResidueType('C', 'CYX', 'Cysteine'), 'DAH': ResidueType('F', 'DAH', 'Phenylalanine'), 'HIC': ResidueType('H', 'HIC', 'Histidine'), 'HID': ResidueType('H', 'HID', 'Histidine'), 'HIE': ResidueType('H', 'HIE', 'Histidine'), 'DAR': ResidueType('X', 'DAR', 'Arginine'), 'DAS': ResidueType('X', 'DAS', 'Aspartic acid'), 'IIL': ResidueType('I', 'IIL', 'Isoleucine'), 'TYB': ResidueType('Y', 'TYB', 'Tyrosine'), 'CYM': ResidueType('C', 'CYM', 'Cysteine'), 'A': ResidueType('A', 'ALA', 'Alanine'), 'HIP': ResidueType('H', 'HIP', 'Histidine'), 'CY1': ResidueType('C', 'CY1', 'Cysteine'), 'TPL': ResidueType('W', 'TPL', 'Tryptophan'), 'E': ResidueType('E', 'GLU', 'Glutamic acid'), 'DHI': ResidueType('X', 'DHI', 'Histidine'), 'MLE': ResidueType('L', 'MLE', 'Leucine'), 'I': ResidueType('I', 'ILE', 'Isoleucine'), 'HPQ': ResidueType('F', 'HPQ', 'Phenylalanine'), 'NCO': ResidueType('X', 'NCO', 'Capping Group'), 'M': ResidueType('M', 'MET', 'Methionine'), 'CYQ': ResidueType('C', 'CYQ', 'Cysteine'), 'DHA': ResidueType('A', 'DHA', 'Alanine'), 'THR': ResidueType('T', 'THR', 'Threonine'), 'Q': ResidueType('Q', 'GLN', 'Glutamine'), 'IND': ResidueType('X', 'IND', 'Capping Group'), 'HIS': ResidueType('H', 'HIS', 'Histidine'), 'NH2': ResidueType('X', 'NH2', 'Capping Group'), 'Y': ResidueType('Y', 'TYR', 'Tyrosine'), 'STY': ResidueType('Y', 'STY', 'Tyrosine'), 'SCH': ResidueType('C', 'SCH', 'Cysteine'), 'BHD': ResidueType('D', 'BHD', 'Aspartic acid'), 'ARG': ResidueType('R', 'ARG', 'Arginine'), 'ARM': ResidueType('R', 'ARM', 'Arginine'), 'ARN': ResidueType('R', 'ARN', 'Arginine'), 'BNN': ResidueType('A', 'BNN', 'Alanine')}¶

disulfide_bonds¶: Return a sorted tuple of the valid disulfide bonds

encodeForPatternSearch(with_ss=False, with_flex=False, with_asa=False)¶

Convert to sequence dict expected by find_generalized_pattern.

Parameters:	with_ss (bool) – Whether to include secondary structure information. with_flex (bool) – Whether to include flexibility information. with_asa (bool) – Whether to include accessible surface area information.
Return type:	dict
Returns:	dictionary of sequence data

getSSAPosType(index)¶

Return whether the residue at the specified index is at the start, middle or end of a secondary structure block. Note that while it is unrealistic for a residue to be both the start and end of a block, this may happen (e.g. due to deletion of other elements in an SSA block). In these cases, the residue will be identified as the start of a block by this function.

Parameters:	index (int) – Index of the residue to check
Returns:	One of `self.SSA_POS_TYPE.START`, `self.SSA_POS_TYPE.MIDDLE` or `self.SSA_POS_TYPE.END` or None if no residue is at the specified index.
Return type:	int or None

getSecondaryStructureLimits(index)¶

Return the starting and ending indices of the secondary structure assigned to the residue at the specified index.

Parameters:	index (int) – Index of the residue whose secondary structure range is requested
Returns:	The starting and ending indices of the secondary structure
Return type:	(int,int)

classmethod isValid(elements)¶

Parameters:	elements (iterable(str) or str) – An iterable of string representations of elements making up the sequence
Returns:	Tuple indicating whether valid and a set of invalid characters, if any
Return type:	tuple(bool, set(str))

removeStructurelessResidues(start=0, end=None)¶: Remove any structureless residues

class schrodinger.protein.sequence.Sequence(elements, name='', origin=None, entry_id='', entry_name='', pdb_id='', chain='', title='')¶

Bases: PyQt5.QtCore.QObject

Base class for biological sequences

Note: Protein-specific functionality should go in ProteinSequence.

Variables:

ORIGIN (enum.Enum) – Possible sequence origins
AnnotationClass (annotation.SequenceAnnotations) – Class to use for annotations
ElementClass (residue.SequenceElement) – Class to use for elements
alphabet (dict(str, residue.ElementType)) – A mapping of string representations of elements to element types
_gap_chars (tuple(str)) – A tuple of permissible gap characters in the element list; the first item will be used for serialization.
_unknown_res_type (residue.ElementType) – The type for an unknown residue
residuesDeleted (QtCore.pyqtSignal) – A signal emitted when sequence residues are deleted. Emitted with the indices of the first and last deleted residues.
residuesChanged (QtCore.pyqtSignal) – A signal emitted when sequence residues are changed. Emitted with the indices of the first and last changed residues.
lengthAboutToChange (QtCore.pyqtSignal) – A signal emitted when the sequence length is about to change. Emitted with the old and new lengths.
lengthChanged (QtCore.pyqtSignal) – A signal emitted when the sequence length is changed. Emitted with the old and new lengths.
nameChanged (QtCore.pyqtSignal) – A signal emitted when the sequence name is changed.
visibilityChanged (QtCore.pyqtSignal) – A signal emitted when the visibility is changed.
structureChanged (QtCore.pyqtSignal) – A signal emitted when the structure changes.
annotationTitleChanged (QtCore.pyqtSignal) – A signal emitted when an annotation title is changed.
sequenceCopied (QtCore.pyqtSignal) –
A signal emitted when this sequence is copied. Emitted with
- The sequence being copied
- The newly created copy
This signal is used by the structure model to make sure that the newly created copy is kept in sync with the structure.

AnnotationClass = None¶

ElementClass¶: alias of SequenceElement

class ORIGIN¶

Bases: enum.Enum

MAESTRO = 1¶

PYMOL = 2¶

addGaps(gaps)¶

Add gaps to the sequence at the specified indices

Parameters:	gaps (list(int)) – A list of gap indices

alphabet = {}¶

annotationTitleChanged¶

annotation_types¶

Return type:	Enum
Returns:	Enum of all annotation types

append(element)¶

Appends an element to the sequence

Parameters:	element – The element to append to this sequence
Type:	element: self.ElementClass or basestring

extend(elements)¶

Extends the sequence with elements from an iterable

Parameters:	elements (iterable(self.ElementClass) or iterable(str)) – The iterable containing elements with which to extend this sequence

fullname¶

Returns:	a formatted name + optional chain name for the sequence
Return type:	str

get(index, annotation=None)¶: Returns the item at the specified index. This is a residue or an annotation, if an annotation name has been supplied. A None value will be returned in the case of a gap.

getConservation(reference, consider_gaps=True)¶

Return a float scoring the homology conservation between the sequence and a reference sequence, assuming that they’re already aligned

The homology criterion is based on “side chain chemistry” descriptor matching.

Parameters:	reference (`schrodinger.protein.sequence.Sequence`) – A sequence to compare against consider_gaps (bool) – Should we include gaps in the calculation.
Return type:	float
Returns:	sequence conservation (between 0.0 and 1.0)

getGapCount()¶

Returns:	the number of gaps in the sequence

getGapIndicesByKeyFunc(gap_info, key_func)¶

Converts a gap_info list and key func into a list of gap indices

Parameters:	gap_info (list) – list of tuples key_func (function) – callable that takes a residue and returns a key
Return type:	list of int
Returns:	A list of gaps

getGaps()¶

Return type:	list
Returns:	The indices of gaps in the sequence, if any

getGapsByKeyFunc(key_func)¶

Given a key function to reidentify residues, build a list of tuples with gap information.

Parameters:	key_func (function) – callable that takes a residue and returns a key
Return type:	list of (object, int)
Returns:	A list of tuples with (key, number of gaps preceding it)

getIdentity(reference, consider_gaps=True)¶

Return a float scoring the identity between the sequence and a reference sequence, assuming that they’re already aligned

Parameters:	reference (`schrodinger.protein.sequence.Sequence`) – A sequence to compare against consider_gaps (bool) – Should we include gaps in the calculation.
Return type:	float
Returns:	sequence identity (between 0.0 and 1.0)

getNextResidue(index)¶

Return the next residue in the sequence (ignoring gaps) or None if there is none

Parameters:	index (int) – The index of the residue
Return type:	`schrodinger.protein.residue.Residue`
Returns:	The previous residue in the sequence

getNextResidueIndex(index)¶

Return the index of the next residue in the sequence (ignoring gaps) or None if there is none

Parameters:	index (int) – The index of the residue
Return type:	int or None
Returns:	The index of the next residue in the sequence

getNextResidueIndices(index, num_indices=1)¶

Return a list of indices of the next n residues in the sequence (ignoring gaps) or an empty list if there is none. May return fewer than n indices if the end of the sequence is reached.

Parameters:	index (int) – The index of the residue to start with num_indices (int) – The number of indices to return
Return type:	list of ints or None
Returns:	List of the indices of the next residues in the sequence

getNumResidues()¶

Return the number of residues in the sequence, that is, the length of the sequence without gaps

Return type:	int
Returns:	The number of residues in the sequence

getOrigin()¶

Rtype origin:	`Sequence.ORIGIN` or None
Returns:	A piece of metadata indicating where the sequence came from

getPreviousResidue(index)¶

Return the next residue in the sequence (ignoring gaps) or None if there is none

Parameters:	index (int) – The index of the residue
Return type:	`schrodinger.protein.residue.Residue`
Returns:	The previous residue in the sequence

getPreviousResidueIndex(index)¶

Return the index of the previous residue in the sequence (ignoring gaps) or None if there is none

Parameters:	index (int) – The index of the residue
Return type:	int or None
Returns:	The index of the previous residue in the sequence

getPreviousResidueIndices(index, num_indices=1)¶

Return a list of indices of the previous n residues in the sequence (ignoring gaps) or an empty list if there are none. May return fewer than n indices if the beginning of the sequence is reached.

Parameters:	index (int) – The index of the residue to start with num_indices (int) – The number of indices to return
Return type:	List of ints, or None
Returns:	The indices of the previous residues in the sequence

getResidueIndices()¶

Return type:	list
Returns:	The indices of residues, in the sequence, if any

getSimilarity(reference, consider_gaps=True)¶

Return a float score of the similarity count between the sequence and a reference sequence, assuming that they’re already aligned

Parameters:	reference (`schrodinger.protein.sequence.Sequence`) – A sequence to compare against consider_gaps (bool) – Should we include gaps in the calculation.
Return type:	float
Returns:	sequence similarity (between 0.0 and 1.0)

getSimilarityScore(reference)¶

Return the total score of similarity between the sequence and a reference sequence, assuming that they’re already aligned.

Since the similarity with a gap will always be 0.0, there is no need to consider gaps.

Parameters:	reference (`schrodinger.protein.sequence.Sequence`) – A sequence to compare against
Return type:	float
Returns:	sequence similarity

getStructure()¶: Return the associated structure. Will return None if there is no associated structure. :rtype: schrodinger.structure.Structure or NoneType

getSubsequence(start, end)¶

Return a sequence containing a subset of the elements in this one

Parameters:	start (int) – The index at which the subsequence should start end (int) – The index at which the subsequence should end
Return type:	`Sequence`
Returns:	A sequence

getSummary()¶

Returns a friendly, readable summary of the sequence

Return type:	basestring
Returns:	A summary of the sequence

getTerminalGaps()¶: Return indices of terminal gaps

hasEntryID()¶

Return whether or not this sequence has an associated Entry ID in the Project Table.

Returns:	Whether or not this sequence is associated with an entry ID.
Return type:	bool

hasStructure()¶: Whether this sequence has an associated structure. :rtype: bool

index(res)¶

Returns the index of the specified residue

Parameters:	res (`schrodinger.structure._Residue`) – The residue to find
Raises:	A Value error if the residue is not present or if the res is None
Return type:	int
Returns:	The index of the residue

insert(index, elements)¶

Insert a list of elements or sequence element into this sequence

Parameters:	index (int) – The index at which to insert elements elements (iterable(self.ElementClass) or iterable(str)) – A list of elements to insert

classmethod isValid(elements)¶

Parameters:	elements (iterable(str) or str) – An iterable of string representations of elements making up the sequence
Returns:	Tuple indicating whether valid and a set of invalid characters, if any
Return type:	tuple(bool, set(str))

iterNeighbors()¶

Return an iterable of three element tuples consisting of (prev_res, curr_res, next_res), ignoring gaps.

None is used for neighbors of first and last residues in the sequence, and does not indicate gaps here.

Returns:	Iterable of 3-tuples, each element of the each tuple being either a `schrodinger.protein.residue.Residue` or None
Return type:	iter

lengthAboutToChange¶

lengthChanged¶

classmethod makeSeqElement(element)¶

Parameters:	element (str or cls.ElementClass) – A sequence element or string representation thereof
Returns:	sequence element
Return type:	cls.ElementClass
Raises:	ValueError – If an element is not in cls.alphabet and cls._unknown_res_type is not defined

mutate(start, end, elements)¶

Mutate sequence elements starting at the given index to the provided elements

Parameters:	start (int) – The index at which to start mutating end (int) – The index of the last mutated element elements (iterable(self.ElementClass) or iterable(str)) – The elements to which to mutate the sequence

name¶

nameChanged¶

onStructureChanged()¶

remove(start, end=None)¶

Removes elements from the sequence from the index start to the end.

This method is safe to call with invalid indices (as may happen when an alignment is iterating through sequences calling remove at a single index).

Parameters:	start (int) – The index at which to begin removing sequence elements end (int) – The index at which to end removing sequence elements (inclusive). If end is None, elements will be removed until the end of the sequence.

removeAllGaps()¶: Remove gaps from the sequence

removeFromSequence(filter_func, start=0, end=None)¶

Remove any residues matching the specified filter_func from the sequence

Parameters:	filter_func – A callable taking a residue and returning a bool indicating whether to keep it in the sequence start (int) – The index at which to start filtering end (int) – The index at which to end filtering
Type:	callable
Raises:	ValueError – In the event that invalid indices are specified

removeGaps(gaps)¶

Removes the specified gaps from the sequence

Parameters:	gaps (list(int)) – A list of gap indices

removeTerminalGaps()¶: Remove gaps from the end of the sequence

replaceAllElements(elements)¶

Replace _sequence entirely with the supplied elements

Parameters:	elements (iterable(self.ElementClass) or iterable(str)) – Elements for the sequence

residuesChanged¶

residuesDeleted¶

sanitize(start=0, end=None)¶: Remove gaps and unknown sequence elements from sequence

sequenceCopied¶

setGaps(gaps)¶

Sets gaps on the sequence from a list of gap indices, relative to the ungapped sequence

Parameters:	gaps (list(int)) – A list of gap indices

setName(name)¶

Set the name on the instance and emit a notification

Parameters:	name (str) – The new name for the sequence
Raises:	ValueError if name is not instance of basestring

setOrigin(origin=None)¶

Parameters:	origin (`Sequence.ORIGIN` or None) – A piece of metadata indicating where the sequence came from

setStructure(struc)¶

Set the associated structure. Can only be used on sequences with an associated structure.

Parameters:	struc (schrodinger.structure.Structure) – The new structure for this sequence
Raises:	RuntimeError – If there’s no structure associated with this

sequence object.

structureChanged¶

visibility¶

visibilityChanged¶

class schrodinger.protein.sequence.StrictProteinSequence(elements, name='', origin=None, entry_id='', entry_name='', pdb_id='', chain='', title='')¶

Bases: schrodinger.protein.sequence.ProteinSequence

A protein sequence where all elements must be known amino acids.

alphabet = {'ILE': ResidueType('I', 'ILE', 'Isoleucine'), 'GLN': ResidueType('Q', 'GLN', 'Glutamine'), 'NH2': ResidueType('X', 'NH2', 'Capping Group'), 'GLY': ResidueType('G', 'GLY', 'Glycine'), 'PHE': ResidueType('F', 'PHE', 'Phenylalanine'), 'GLU': ResidueType('E', 'GLU', 'Glutamic acid'), 'CYS': ResidueType('C', 'CYS', 'Cysteine'), 'ASP': ResidueType('D', 'ASP', 'Aspartic acid'), 'SER': ResidueType('S', 'SER', 'Serine'), 'LYS': ResidueType('K', 'LYS', 'Lysine'), 'PRO': ResidueType('P', 'PRO', 'Proline'), 'ASN': ResidueType('N', 'ASN', 'Asparagine'), 'FCO': ResidueType('X', 'FCO', 'Capping Group'), 'A': ResidueType('A', 'ALA', 'Alanine'), 'C': ResidueType('C', 'CYS', 'Cysteine'), 'E': ResidueType('E', 'GLU', 'Glutamic acid'), 'D': ResidueType('D', 'ASP', 'Aspartic acid'), 'G': ResidueType('G', 'GLY', 'Glycine'), 'F': ResidueType('F', 'PHE', 'Phenylalanine'), 'I': ResidueType('I', 'ILE', 'Isoleucine'), 'H': ResidueType('H', 'HIS', 'Histidine'), 'K': ResidueType('K', 'LYS', 'Lysine'), 'NMA': ResidueType('X', 'NMA', 'Capping Group'), 'M': ResidueType('M', 'MET', 'Methionine'), 'L': ResidueType('L', 'LEU', 'Leucine'), 'N': ResidueType('N', 'ASN', 'Asparagine'), 'Q': ResidueType('Q', 'GLN', 'Glutamine'), 'P': ResidueType('P', 'PRO', 'Proline'), 'S': ResidueType('S', 'SER', 'Serine'), 'R': ResidueType('R', 'ARG', 'Arginine'), 'T': ResidueType('T', 'THR', 'Threonine'), 'W': ResidueType('W', 'TRP', 'Tryptophan'), 'V': ResidueType('V', 'VAL', 'Valine'), 'Y': ResidueType('Y', 'TYR', 'Tyrosine'), 'TRP': ResidueType('W', 'TRP', 'Tryptophan'), 'TOSG': ResidueType('X', 'TOSG', 'Capping Group'), 'ANF': ResidueType('X', 'ANF', 'Capping Group'), 'MPA': ResidueType('X', 'MPA', 'Capping Group'), 'NCO': ResidueType('X', 'NCO', 'Capping Group'), 'HIS': ResidueType('H', 'HIS', 'Histidine'), 'THR': ResidueType('T', 'THR', 'Threonine'), 'ALA': ResidueType('A', 'ALA', 'Alanine'), 'MET': ResidueType('M', 'MET', 'Methionine'), 'ACE': ResidueType('X', 'ACE', 'Capping Group'), 'LEU': ResidueType('L', 'LEU', 'Leucine'), 'ARG': ResidueType('R', 'ARG', 'Arginine'), 'IND': ResidueType('X', 'IND', 'Capping Group'), 'VAL': ResidueType('V', 'VAL', 'Valine'), 'TYR': ResidueType('Y', 'TYR', 'Tyrosine')}¶

class schrodinger.protein.sequence.StructureSequence(st, atoms)¶

Bases: schrodinger.structure._AtomCollection

Class representing a sequence of protein residues.

residue¶: Returns residue iterator for all residues in the sequence

schrodinger.protein.sequence.align_alignment(aln, second_aln=None, method='clustalw')¶

Perform alignment from an ProteinAlignment object

Parameters:	aln (`ProteinAlignment`) – Alignment data method (string) – Which method/program to use
Returns:	Aligned sequences
Return type:	`ProteinAlignment`

schrodinger.protein.sequence.align_from_chains(chains, method='clustalw')¶

Perform alignment on a series of chains

Parameters:	chains (iterable(structure._Chain)) – Chains to be aligned method (string) – Which method/program to use (choices ‘muscle’, ‘clustalw’)
Returns:	Aligned sequences
Return type:	`ProteinAlignment`

schrodinger.protein.sequence.convert_structure_sequence_for_pattern_search(seq, sasa_by_atom=None)¶

Converts a StructureSequence object to dictionary required by find_generalized_pattern function. Because the conversion can be time consuming, it should be done once per sequence.

Optionally a list of atom SASAs for each atom in the CT can be specified. If it’s not specified, it will get calculated by calling analyze.calculate_sasa_by_atom().

Parameters:	seq (`StructureSequence`) – StructureSequence object sasa_by_atom (list) – list of atom SASAs
Return type:	dict
Returns:	Dictionary of sequence information

schrodinger.protein.sequence.create_alignment_from_chains(chains)¶

Return ProteinAlignment object comprised of two chains

Parameters:	chains (iterable(structure._Chain)) – Chains to be aligned

schrodinger.protein.sequence.find_generalized_pattern(sequence_list, pattern, validate_pattern=False)¶

Finds a generalized sequence pattern within specified sequences. NOTE: The search is performed in the forward direction only.

Parameters:	sequence_list – list of sequence dictionaries to search. pattern (str) – Pattern defined using extended PROSITE syntax. standard IUPAC one-letter codes are used for all amino acids each element in a pattern is separated using ‘-‘ symbol symbol ‘x’ is used for position where any amino acid is accepted ambiguities are listed using the acceptable amino acids between square brackets, e.g. [ACT] means Ala, Cys or Thr amino acids not accepted for a given position are indicated by listing them between curly brackets, e.g. {GP} means ‘not Gly and not Pro’ repetition is indicated using parentheses, e.g. A(3) means Ala-Ala-Ala, x(2,4) means between 2 to 4 any residues the following lowercase characters can be used as additional flags: ’x’ means any amino acid ’a’ means acidic residue: [DE] ’b’ means basic residue: [KR] ’o’ means hydrophobic residue: [ACFILPWVY] ’p’ means aromatic residue: [WYF] ’s‘ means solvent exposed residue ’h’ means helical residue ’e’ means extended residue ’f’ means flexible residue Each position can optionally by followed by @<res_num> expression that will match the position with a given residue number. Entire pattern can be followed by :<index> expression that defines a ‘hotspot’ in the pattern. When the hotspot is defined, only a single residue corresponding to (pattern_match_start+index-1) will be returned as a match. The index is 1-based and can be used to place the hotspot outside of the pattern (can also be a negative number). Pattern examples: N-{P}-[ST] : Asn-X-Ser or Thr (X != Pro) N[sf]-{P}[sf]-[ST][sf] : as above, but all residues flexible OR solvent exposed Nsf-{P}sf-[ST]sf : as above, but all residues flexible AND solvent exposed Ns{f} : Asn solvent exposed AND not in flexible region N[s{f}] : Asn solvent exposed OR not in flexible region [ab]{K}{s}f : acidic OR basic, with exception of Lys, flexible AND not solvent exposed Ahe : Ala helical AND extended - no match possible A[he] : Ala helical OR extended A{he} : Ala coiled chain conformation (not helical nor extended) [ST] : Ser OR Thr ST : Ser AND Thr - no match possible validate_pattern (boolean) – If True, the function will validate the pattern without performing the search (the sequences parameter will be ignored) and return True if the pattern is valid, or False otherwise. The default is False.
Return type:	list of lists of integer tuples or False if the pattern is invalid
Returns:	False if the specified input pattern was incorrect. Otherwise, it returns a list of lists of matches for each input sequence. Each match is a (start, end) tuple where start and end are matching sequence positions.

schrodinger.protein.sequence.find_pattern(seq, pattern)¶

Find pattern matches in a specified StructureSequence object. Returns a list of matching positions.

Parameters:	seq (`StructureSequence`) – StructureSequence object pattern (string) – Sequence pattern. The syntax is described in find_generalized_pattern.
Return type:	list of lists of integer tuples or None
Returns:	None if the specified input pattern was incorrect. Otherwise, it returns a list of lists of matches for each residue position in the input structure. Each match is a (start, end) tuple where start and end are matching sequence positions. If ‘hotspot’ is specified then start = end.

schrodinger.protein.sequence.get_pairwise_sequence_similarity(chain1, chain2, consider_gap=True, method='clustalw')¶

Given two single chain sequences, align them, and return sequence similarity among them.

Parameters:	chain1 (`structure._Chain`) – The first sequence chain. chain2 (`structure._Chain`) – The second sequence chain. consider_gap (bool) – Whether or not to consider gaps in the alignment, default to true. method (string) – Which alignment method to use (‘muscle’ or ‘clustalw’)
Returns:	Sequence similarity of the alignment of the two.
Return type:	float, between 0.0 and 1.0

schrodinger.protein.sequence.get_structure_sequences(st)¶: Iterates over all sequences in the given structure.

schrodinger.protein.sequence.is_gap(res)¶

Utility function to check whether a residue is a gap (represented by None)

Parameters:	res (`schrodinger.structure._Residue` or None) – The residue to inspect

schrodinger.protein.sequence.is_not_gap(res)¶

Utility function to check whether a residue is a gap (represented by None)

Parameters:	res (`schrodinger.structure._Residue` or None) – The residue to inspect

Previous topic

Next topic

schrodinger.protein.sequence module¶