schrodinger.protein.align module¶
-
class
schrodinger.protein.align.
ASLResult
(ref_ok, other_ok, other_skips)¶ Bases:
tuple
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
count
()¶ Return number of occurrences of value.
-
index
()¶ Return first index of value.
Raises ValueError if the value is not present.
-
other_ok
¶ Alias for field number 1
-
other_skips
¶ Alias for field number 2
-
ref_ok
¶ Alias for field number 0
-
-
exception
schrodinger.protein.align.
CantAlignException
¶ Bases:
Exception
Exception raised when an aligner cannot start e.g. due to not enough seqs
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.protein.align.
AbstractAligner
¶ Bases:
object
Base class of objects that can perform an alignment
-
run
(aln)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
schrodinger.protein.align.
RescodeAligner
¶ Bases:
schrodinger.protein.align.AbstractAligner
Aligns sequences by rescode
-
run
(aln)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
schrodinger.protein.align.
AbstractPairwiseAligner
(merge_all=False)¶ Bases:
schrodinger.protein.align.AbstractAligner
Abstract class for pairwise alignment where gaps can be merged into the entire alignment to preserve relative alignment of all non-reference sequences to the reference sequence.
Subclasses must implement
_getPairwiseGaps
to align one sequence to the ref seq. Subclasses may override_run
to customize aligning (e.g. validation orsetup of additional data needed by_getPairwiseGaps
)-
__init__
(merge_all=False)¶ Parameters: merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
Raises: CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
-
class
schrodinger.protein.align.
AbstractNWPairwiseAligner
(merge_all=False, gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, ss_constraints=False)¶ Bases:
schrodinger.protein.align.AbstractPairwiseAligner
Abstract class for the Needleman-Wunsch global alignment algorithm for pairwise sequence alignment with affine gap penalties.
Variables: - CONSTRAINT_SCORE – Reward amount for keeping constrained residues aligned
- RES_MATCH_BONUS – Reward amount for aligning matching residues. Used by default if a substitution matrix is not specified.
- RES_MISMATCH_PENALTY – Penalty for aligning differing residues. Used by default if a subtitution matrix is not specified
Ctype CONSTRAINT_SCORE: float
Ctype RES_MATCH_BONUS: float
Ctype RES_MISMATCH_PENALTY: float
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
__init__
(merge_all=False, gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, ss_constraints=False)¶ Parameters: - merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
Raises: CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
class
schrodinger.protein.align.
SchrodingerPairwiseAligner
(**kwargs)¶ Bases:
schrodinger.protein.align.AbstractNWPairwiseAligner
Implementation of the Needleman-Wunsch global alignment algorithm for pairwise sequence alignment with affine gap penalties.
- ability to merge new sequence with existing alignment,
- ability to penalize gaps in secondary structure elements,
- ability to use custom substitution matrix generated from a family of proteins or provided by the user.
- NOTE::
- Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
-
__init__
(**kwargs)¶ Parameters: - merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
getAlignmentScore
()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
Returns: Score of the pairwise alignment. Return type: float
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
Raises: CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
class
schrodinger.protein.align.
BiopythonPairwiseAligner
(*args, **kwargs)¶ Bases:
schrodinger.protein.align.AbstractNWPairwiseAligner
Pairwise alignment using Biopython.
- NOTE::
- Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
-
__init__
(*args, **kwargs)¶ Parameters: - merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
getAlignmentScore
()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
Returns: Score of the pairwise alignment. Return type: float
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
Raises: CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
class
schrodinger.protein.align.
PrimeSTAAligner
(protein_family=None)¶ Bases:
schrodinger.protein.align.AbstractAligner
Sequence alignment using $SCHRODINGER/sta
-
__init__
(protein_family=None)¶ Parameters: protein_family (str or NoneType) – ‘GPCR’ for specialized alignment or None for default templates.
-
run
(aln, structured_seq=None, constraints=None)¶ Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- structured_seq (sequence.ProteinSequence or NoneType) – Structured sequence to use as reference. If None, the first non-reference seq will be aligned.
- constraints (list(tuple(residue.Residue, residue.Residue)) or NoneType) – Pairs of (reference_seq, structured_seq) residues to constrain
-
-
class
schrodinger.protein.align.
ClustalAligner
¶ Bases:
schrodinger.protein.align.AbstractAligner
Aligns sequences using the Clustal alignment algorithm.
-
run
(aln)¶ Aligns the sequences in an alignment
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
schrodinger.protein.align.
SuperpositionAligner
(gap_open_penalty=None, gap_extend_penalty=None)¶ Bases:
schrodinger.protein.align.BiopythonPairwiseAligner
Align structured sequences based on their superposition.
-
__init__
(gap_open_penalty=None, gap_extend_penalty=None)¶ Parameters: - merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
getAlignmentScore
()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
Returns: Score of the pairwise alignment. Return type: float
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
Raises: CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
-
class
schrodinger.protein.align.
AbstractStructureAligner
(keywords=None, **kwargs)¶ Bases:
schrodinger.protein.align.AbstractAligner
Subclasses must reimplement
run
: - Call_setUpSeqs
to set up instance attributes for the current alignment - Call_setASLs
to validate and store ASLs - Call_getUniqueEidSeqs
to get the sequences to align - Call_runStructureAlignment
to call the backend-
class
Result
(ref_seq, other_seq, psd, rmsd)¶ Bases:
tuple
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
count
()¶ Return number of occurrences of value.
-
index
()¶ Return first index of value.
Raises ValueError if the value is not present.
-
other_seq
¶ Alias for field number 1
-
psd
¶ Alias for field number 2
-
ref_seq
¶ Alias for field number 0
-
rmsd
¶ Alias for field number 3
-
-
__init__
(keywords=None, **kwargs)¶ Parameters: keywords (dict) – Keywords to pass to the ska backend
-
getResultSeqs
()¶
-
run
(aln)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
class
-
class
schrodinger.protein.align.
StructureAligner
(keywords=None, **kwargs)¶ Bases:
schrodinger.protein.align.AbstractStructureAligner
Run structure alignment using the specified sequences to create chain ASLs
-
run
(aln, seqs_to_align, **kwargs)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
class
Result
(ref_seq, other_seq, psd, rmsd)¶ Bases:
tuple
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
count
()¶ Return number of occurrences of value.
-
index
()¶ Return first index of value.
Raises ValueError if the value is not present.
-
other_seq
¶ Alias for field number 1
-
psd
¶ Alias for field number 2
-
ref_seq
¶ Alias for field number 0
-
rmsd
¶ Alias for field number 3
-
-
__init__
(keywords=None, **kwargs)¶ Parameters: keywords (dict) – Keywords to pass to the ska backend
-
getResultSeqs
()¶
-
-
class
schrodinger.protein.align.
CustomASLStructureAligner
(keywords=None, ref_asl=None, other_asl=None)¶ Bases:
schrodinger.protein.align.AbstractStructureAligner
Run structure alignment using specified ASLs
-
SENTINEL
= <object object>¶
-
__init__
(keywords=None, ref_asl=None, other_asl=None)¶ Parameters: keywords (dict) – Keywords to pass to the ska backend
-
evaluateASLs
(aln, seqs_to_align)¶ Determine whether the ASLs match any atoms in the sequences’ structures
Parameters: - aln – Alignment
- seqs_to_align – Sequences to align
Return type:
-
run
(aln, seqs_to_align, **kwargs)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
class
Result
(ref_seq, other_seq, psd, rmsd)¶ Bases:
tuple
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
count
()¶ Return number of occurrences of value.
-
index
()¶ Return first index of value.
Raises ValueError if the value is not present.
-
other_seq
¶ Alias for field number 1
-
psd
¶ Alias for field number 2
-
ref_seq
¶ Alias for field number 0
-
rmsd
¶ Alias for field number 3
-
-
getResultSeqs
()¶
-
-
class
schrodinger.protein.align.
MaxIdentityAligner
¶ Bases:
schrodinger.protein.align.BiopythonPairwiseAligner
Pairwise aligner that maximizes the number of matching residues between two sequences. There are no penalties for mismatches or gaps.
-
__init__
()¶ Parameters: - merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
- gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
- gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
- sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
- direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
- ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
run
(aln)¶ kwargs
are additional arguments that will be passed to_run
.Parameters: - aln (alignment.Alignment) – The alignment containing sequences to align.
- seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
Raises: CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
getAlignmentScore
()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
Returns: Score of the pairwise alignment. Return type: float
-
-
class
schrodinger.protein.align.
StructurelessGapAligner
¶ Bases:
schrodinger.protein.align.AbstractAligner
Align all structureless residues with gaps
For example, given the following alignment (where circled letters are structureless residues):
Resnum: 0 1 2 3 4 5 Seq1: Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ A D A
The result will be:
Resnum: 0 1 2 3 4 5 6 7 8 Seq1: ~ ~ ~ Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ ~ ~ ~ A D A
-
run
(aln, seqs_to_align=None)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
Parameters: aln ( schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-