schrodinger.protein.align module¶
-
exception
schrodinger.protein.align.CantAlignException¶ Bases:
ExceptionException raised when an aligner cannot start e.g. due to not enough seqs
-
__init__¶ Initialize self. See help(type(self)) for accurate signature.
-
args¶
-
with_traceback()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.protein.align.AbstractAligner¶ Bases:
objectBase class of objects that can perform an alignment
-
run(aln)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment) – The alignment to align
-
__init__¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
schrodinger.protein.align.RescodeAligner¶ Bases:
schrodinger.protein.align.AbstractAlignerAligns sequences by rescode
-
run(aln)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment) – The alignment to align
-
__init__¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
schrodinger.protein.align.AbstractPairwiseAligner(gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, merge_all=False, ss_constraints=False)¶ Bases:
schrodinger.protein.align.AbstractAlignerVariables: - CONSTRAINT_SCORE – Reward amount of keeping constrained residues aligned
- RES_MATCH_BONUS – Reward amount for aligning matching residues. Used by default if a substitution matrix is not specified.
- RES_MISMATCH_PENALTY – Penalty for aligning differing residues. Used by default if a subtitution matrix is not specified
Ctype CONSTRAINT_SCORE: float
Ctype RES_MATCH_BONUS: float
Ctype RES_MISMATCH_PENALTY: float
-
CONSTRAINT_SCORE= 10000¶
-
RES_MATCH_BONUS= 1.0¶
-
RES_MISMATCH_PENALTY= 1.0¶
-
__init__(gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, merge_all=False, ss_constraints=False)¶ Parameters: - gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
run(aln, seqs_to_align=None, constraints=None)¶ Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
alnto align against the reference sequence ofaln. IfNone, defaults to the first non-reference sequence inaln(iealn[1]) - constraints (list of (residue.Residue, residue.Residue)) – Optional list of (ref_res, res) pairwise residue constraints. Note that these constraints will be heavily favored but are not guaranteed. Some constraints are impossible to respect simulultaneously [eg residues at indexes (1,1) and (0,2)]. The first residue should belong to `aln`s reference sequence.
Raises: CantAlignException – If
seqs_to_aligncontains a sequence not found inaln.
-
class
schrodinger.protein.align.SchrodingerPairwiseAligner(**kwargs)¶ Bases:
schrodinger.protein.align.AbstractPairwiseAlignerImplementation of the Needleman-Wunsch global alignment algorithm for pairwise sequence alignment with affine gap penalties.
- ability to merge new sequence with existing alignment,
- ability to penalize gaps in secondary structure elements,
- ability to use custom substitution matrix generated from a family of proteins or provided by the user.
- NOTE::
- Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
-
__init__(**kwargs)¶ Parameters: - gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
getAlignmentScore()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
Returns: Score of the pairwise alignment. Return type: float
-
CONSTRAINT_SCORE= 10000¶
-
RES_MATCH_BONUS= 1.0¶
-
RES_MISMATCH_PENALTY= 1.0¶
-
run(aln, seqs_to_align=None, constraints=None)¶ Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
alnto align against the reference sequence ofaln. IfNone, defaults to the first non-reference sequence inaln(iealn[1]) - constraints (list of (residue.Residue, residue.Residue)) – Optional list of (ref_res, res) pairwise residue constraints. Note that these constraints will be heavily favored but are not guaranteed. Some constraints are impossible to respect simulultaneously [eg residues at indexes (1,1) and (0,2)]. The first residue should belong to `aln`s reference sequence.
Raises: CantAlignException – If
seqs_to_aligncontains a sequence not found inaln.
-
class
schrodinger.protein.align.BiopythonPairwiseAligner(*args, **kwargs)¶ Bases:
schrodinger.protein.align.AbstractPairwiseAlignerPairwise alignment using Biopython.
- NOTE::
- Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
-
__init__(*args, **kwargs)¶ Parameters: - gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
getAlignmentScore()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
Returns: Score of the pairwise alignment. Return type: float
-
CONSTRAINT_SCORE= 10000¶
-
RES_MATCH_BONUS= 1.0¶
-
RES_MISMATCH_PENALTY= 1.0¶
-
run(aln, seqs_to_align=None, constraints=None)¶ Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
alnto align against the reference sequence ofaln. IfNone, defaults to the first non-reference sequence inaln(iealn[1]) - constraints (list of (residue.Residue, residue.Residue)) – Optional list of (ref_res, res) pairwise residue constraints. Note that these constraints will be heavily favored but are not guaranteed. Some constraints are impossible to respect simulultaneously [eg residues at indexes (1,1) and (0,2)]. The first residue should belong to `aln`s reference sequence.
Raises: CantAlignException – If
seqs_to_aligncontains a sequence not found inaln.
-
class
schrodinger.protein.align.ClustalAligner¶ Bases:
schrodinger.protein.align.AbstractAlignerAligns sequences using the Clustal alignment algorithm.
-
run(aln)¶ Aligns the sequences in an alignment
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment) – The alignment to align
-
__init__¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
schrodinger.protein.align.SuperpositionAligner(gap_open_penalty=None, gap_extend_penalty=None)¶ Bases:
schrodinger.protein.align.BiopythonPairwiseAlignerAlign structured sequences based on their superposition.
-
__init__(gap_open_penalty=None, gap_extend_penalty=None)¶ Parameters: - gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
run(aln, seqs_to_align=None)¶ Align sequences based on structure superposition to the reference.
Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list of sequence.Sequence or NoneType) – The sequences in
alnto align against the reference sequence ofaln. IfNone, defaults to the first non-reference sequence inaln(iealn[1])
Raises: - CantAlignException – If
seqs_to_aligncontains a sequence not found inaln. - CantAlignException – If the reference sequence or any of
seqs_to_aligndon’t have an associated structure.
-
CONSTRAINT_SCORE= 10000¶
-
RES_MATCH_BONUS= 1.0¶
-
RES_MISMATCH_PENALTY= 1.0¶
-
getAlignmentScore()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
Returns: Score of the pairwise alignment. Return type: float
-
-
class
schrodinger.protein.align.StructureAligner¶ Bases:
schrodinger.protein.align.AbstractAlignerRun structure alignment and align sequences based on structural alignment
-
class
entry(sequence, structure)¶ Bases:
tuple-
__contains__¶ Return key in self.
-
__init__¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__¶ Return len(self).
-
count(value) → integer -- return number of occurrences of value¶
-
index(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
sequence¶ Alias for field number 0
-
structure¶ Alias for field number 1
-
-
run(aln, seqs_to_align=None)¶ Parameters: aln (alignment.Alignment) – Alignment to align
-
__init__¶ Initialize self. See help(type(self)) for accurate signature.
-
class
-
class
schrodinger.protein.align.MaxIdentityAligner¶ Bases:
schrodinger.protein.align.BiopythonPairwiseAlignerPairwise aligner that maximizes the number of matching residues between two sequences. There are no penalties for mismatches or gaps.
-
__init__()¶ Parameters: - gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
run(aln)¶ Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
alnto align against the reference sequence ofaln. IfNone, defaults to the first non-reference sequence inaln(iealn[1]) - constraints (list of (residue.Residue, residue.Residue)) – Optional list of (ref_res, res) pairwise residue constraints. Note that these constraints will be heavily favored but are not guaranteed. Some constraints are impossible to respect simulultaneously [eg residues at indexes (1,1) and (0,2)]. The first residue should belong to `aln`s reference sequence.
Raises: CantAlignException – If
seqs_to_aligncontains a sequence not found inaln.
-
CONSTRAINT_SCORE= 10000¶
-
RES_MATCH_BONUS= 1.0¶
-
RES_MISMATCH_PENALTY= 1.0¶
-
getAlignmentScore()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
Returns: Score of the pairwise alignment. Return type: float
-
-
class
schrodinger.protein.align.StructurelessGapAligner¶ Bases:
schrodinger.protein.align.AbstractAlignerAlign all structureless residues with gaps
For example, given the following alignment (where circled letters are structureless residues):
Resnum: 0 1 2 3 4 5 Seq1: Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ A D A
The result will be:
Resnum: 0 1 2 3 4 5 6 7 8 Seq1: ~ ~ ~ Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ ~ ~ ~ A D A
-
run(aln, seqs_to_align=None)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment) – The alignment to align
-
__init__¶ Initialize self. See help(type(self)) for accurate signature.
-