schrodinger.protein.align module

exception schrodinger.protein.align.CantAlignException

Bases: Exception

Exception raised when an aligner cannot start e.g. due to not enough seqs

__init__

Initialize self. See help(type(self)) for accurate signature.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class schrodinger.protein.align.AbstractAligner

Bases: object

Base class of objects that can perform an alignment

run(aln)

Aligns the sequences in an alignment using the parameters supplied on init

Subclasses need to override this default implementation.

Parameters:aln (schrodinger.protein.alignment.BaseAlignment) – The alignment to align
__init__

Initialize self. See help(type(self)) for accurate signature.

class schrodinger.protein.align.RescodeAligner

Bases: schrodinger.protein.align.AbstractAligner

Aligns sequences by rescode

run(aln)

Aligns the sequences in an alignment using the parameters supplied on init

Subclasses need to override this default implementation.

Parameters:aln (schrodinger.protein.alignment.BaseAlignment) – The alignment to align
__init__

Initialize self. See help(type(self)) for accurate signature.

class schrodinger.protein.align.AbstractPairwiseAligner(gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, merge_all=False, ss_constraints=False)

Bases: schrodinger.protein.align.AbstractAligner

Variables:
  • CONSTRAINT_SCORE – Reward amount of keeping constrained residues aligned
  • RES_MATCH_BONUS – Reward amount for aligning matching residues. Used by default if a substitution matrix is not specified.
  • RES_MISMATCH_PENALTY – Penalty for aligning differing residues. Used by default if a subtitution matrix is not specified
Ctype CONSTRAINT_SCORE:
 

float

Ctype RES_MATCH_BONUS:
 

float

Ctype RES_MISMATCH_PENALTY:
 

float

CONSTRAINT_SCORE = 10000
RES_MATCH_BONUS = 1.0
RES_MISMATCH_PENALTY = 1.0
__init__(gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, merge_all=False, ss_constraints=False)
Parameters:
  • gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
  • gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
  • sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
  • direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
  • merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
  • ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
run(aln, seqs_to_align=None, constraints=None)
Parameters:
  • aln (alignment.Alignment) – The alignment containing sequences to align.
  • seqs_to_align (list(sequence.Sequence)) – The sequences in aln to align against the reference sequence of aln. If None, defaults to the first non-reference sequence in aln (ie aln[1])
  • constraints (list of (residue.Residue, residue.Residue)) – Optional list of (ref_res, res) pairwise residue constraints. Note that these constraints will be heavily favored but are not guaranteed. Some constraints are impossible to respect simulultaneously [eg residues at indexes (1,1) and (0,2)]. The first residue should belong to `aln`s reference sequence.
Raises:

CantAlignException – If seqs_to_align contains a sequence not found in aln.

class schrodinger.protein.align.SchrodingerPairwiseAligner(**kwargs)

Bases: schrodinger.protein.align.AbstractPairwiseAligner

Implementation of the Needleman-Wunsch global alignment algorithm for pairwise sequence alignment with affine gap penalties.

  1. ability to merge new sequence with existing alignment,
  2. ability to penalize gaps in secondary structure elements,
  3. ability to use custom substitution matrix generated from a family of proteins or provided by the user.
NOTE::
Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
__init__(**kwargs)
Parameters:
  • gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
  • gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
  • sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
  • direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
  • merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
  • ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
getAlignmentScore()

Get the score of the alignment. Found by taking the highest value in the scoring matrix.

Returns:Score of the pairwise alignment.
Return type:float
CONSTRAINT_SCORE = 10000
RES_MATCH_BONUS = 1.0
RES_MISMATCH_PENALTY = 1.0
run(aln, seqs_to_align=None, constraints=None)
Parameters:
  • aln (alignment.Alignment) – The alignment containing sequences to align.
  • seqs_to_align (list(sequence.Sequence)) – The sequences in aln to align against the reference sequence of aln. If None, defaults to the first non-reference sequence in aln (ie aln[1])
  • constraints (list of (residue.Residue, residue.Residue)) – Optional list of (ref_res, res) pairwise residue constraints. Note that these constraints will be heavily favored but are not guaranteed. Some constraints are impossible to respect simulultaneously [eg residues at indexes (1,1) and (0,2)]. The first residue should belong to `aln`s reference sequence.
Raises:

CantAlignException – If seqs_to_align contains a sequence not found in aln.

class schrodinger.protein.align.BiopythonPairwiseAligner(*args, **kwargs)

Bases: schrodinger.protein.align.AbstractPairwiseAligner

Pairwise alignment using Biopython.

NOTE::
Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
__init__(*args, **kwargs)
Parameters:
  • gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
  • gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
  • sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
  • direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
  • merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
  • ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
getAlignmentScore()

Get the score of the alignment. Found by taking the highest value in the scoring matrix.

Returns:Score of the pairwise alignment.
Return type:float
CONSTRAINT_SCORE = 10000
RES_MATCH_BONUS = 1.0
RES_MISMATCH_PENALTY = 1.0
run(aln, seqs_to_align=None, constraints=None)
Parameters:
  • aln (alignment.Alignment) – The alignment containing sequences to align.
  • seqs_to_align (list(sequence.Sequence)) – The sequences in aln to align against the reference sequence of aln. If None, defaults to the first non-reference sequence in aln (ie aln[1])
  • constraints (list of (residue.Residue, residue.Residue)) – Optional list of (ref_res, res) pairwise residue constraints. Note that these constraints will be heavily favored but are not guaranteed. Some constraints are impossible to respect simulultaneously [eg residues at indexes (1,1) and (0,2)]. The first residue should belong to `aln`s reference sequence.
Raises:

CantAlignException – If seqs_to_align contains a sequence not found in aln.

class schrodinger.protein.align.ClustalAligner

Bases: schrodinger.protein.align.AbstractAligner

Aligns sequences using the Clustal alignment algorithm.

run(aln)

Aligns the sequences in an alignment

Parameters:aln (schrodinger.protein.alignment.BaseAlignment) – The alignment to align
__init__

Initialize self. See help(type(self)) for accurate signature.

class schrodinger.protein.align.SuperpositionAligner(gap_open_penalty=None, gap_extend_penalty=None)

Bases: schrodinger.protein.align.BiopythonPairwiseAligner

Align structured sequences based on their superposition.

__init__(gap_open_penalty=None, gap_extend_penalty=None)
Parameters:
  • gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
  • gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
  • sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
  • direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
  • merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
  • ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
run(aln, seqs_to_align=None)

Align sequences based on structure superposition to the reference.

Parameters:
  • aln (alignment.Alignment) – The alignment containing sequences to align.
  • seqs_to_align (list of sequence.Sequence or NoneType) – The sequences in aln to align against the reference sequence of aln. If None, defaults to the first non-reference sequence in aln (ie aln[1])
Raises:
  • CantAlignException – If seqs_to_align contains a sequence not found in aln.
  • CantAlignException – If the reference sequence or any of seqs_to_align don’t have an associated structure.
CONSTRAINT_SCORE = 10000
RES_MATCH_BONUS = 1.0
RES_MISMATCH_PENALTY = 1.0
getAlignmentScore()

Get the score of the alignment. Found by taking the highest value in the scoring matrix.

Returns:Score of the pairwise alignment.
Return type:float
class schrodinger.protein.align.StructureAligner

Bases: schrodinger.protein.align.AbstractAligner

Run structure alignment and align sequences based on structural alignment

class entry(sequence, structure)

Bases: tuple

__contains__

Return key in self.

__init__

Initialize self. See help(type(self)) for accurate signature.

__len__

Return len(self).

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

sequence

Alias for field number 0

structure

Alias for field number 1

run(aln, seqs_to_align=None)
Parameters:aln (alignment.Alignment) – Alignment to align
__init__

Initialize self. See help(type(self)) for accurate signature.

class schrodinger.protein.align.MaxIdentityAligner

Bases: schrodinger.protein.align.BiopythonPairwiseAligner

Pairwise aligner that maximizes the number of matching residues between two sequences. There are no penalties for mismatches or gaps.

__init__()
Parameters:
  • gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
  • gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
  • sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
  • direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
  • merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
  • ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
run(aln)
Parameters:
  • aln (alignment.Alignment) – The alignment containing sequences to align.
  • seqs_to_align (list(sequence.Sequence)) – The sequences in aln to align against the reference sequence of aln. If None, defaults to the first non-reference sequence in aln (ie aln[1])
  • constraints (list of (residue.Residue, residue.Residue)) – Optional list of (ref_res, res) pairwise residue constraints. Note that these constraints will be heavily favored but are not guaranteed. Some constraints are impossible to respect simulultaneously [eg residues at indexes (1,1) and (0,2)]. The first residue should belong to `aln`s reference sequence.
Raises:

CantAlignException – If seqs_to_align contains a sequence not found in aln.

CONSTRAINT_SCORE = 10000
RES_MATCH_BONUS = 1.0
RES_MISMATCH_PENALTY = 1.0
getAlignmentScore()

Get the score of the alignment. Found by taking the highest value in the scoring matrix.

Returns:Score of the pairwise alignment.
Return type:float
class schrodinger.protein.align.StructurelessGapAligner

Bases: schrodinger.protein.align.AbstractAligner

Align all structureless residues with gaps

For example, given the following alignment (where circled letters are structureless residues):

Resnum: 0 1 2 3 4 5 Seq1: Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ A D A

The result will be:

Resnum: 0 1 2 3 4 5 6 7 8 Seq1: ~ ~ ~ Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ ~ ~ ~ A D A

run(aln, seqs_to_align=None)

Aligns the sequences in an alignment using the parameters supplied on init

Subclasses need to override this default implementation.

Parameters:aln (schrodinger.protein.alignment.BaseAlignment) – The alignment to align
__init__

Initialize self. See help(type(self)) for accurate signature.