schrodinger.protein.align module¶
-
class
schrodinger.protein.align.
ASLResult
(ref_ok, other_ok, other_skips)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
count
(value, /)¶ Return number of occurrences of value.
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
other_ok
¶ Alias for field number 1
-
other_skips
¶ Alias for field number 2
-
ref_ok
¶ Alias for field number 0
-
-
exception
schrodinger.protein.align.
CantAlignException
¶ Bases:
Exception
Exception raised when an aligner cannot start e.g. due to not enough seqs
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.protein.align.
AbstractAligner
¶ Bases:
object
Base class of objects that can perform an alignment
-
abstract
run
(aln)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters
aln (
schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
abstract
-
class
schrodinger.protein.align.
RescodeAligner
¶ Bases:
schrodinger.protein.align.AbstractAligner
Aligns sequences by rescode
-
run
(aln)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters
aln (
schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
-
class
schrodinger.protein.align.
AbstractPairwiseAligner
(merge_all=False)¶ Bases:
schrodinger.protein.align.AbstractAligner
Abstract class for pairwise alignment where gaps can be merged into the entire alignment to preserve relative alignment of all non-reference sequences to the reference sequence.
Subclasses must implement
_getPairwiseGaps
to align one sequence to the ref seq. Subclasses may override_run
to customize aligning (e.g. validation orsetup of additional data needed by
_getPairwiseGaps
)-
__init__
(merge_all=False)¶ - Parameters
merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.- Parameters
aln (alignment.Alignment) – The alignment containing sequences to align.
seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
- Raises
CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
-
class
schrodinger.protein.align.
AbstractNWPairwiseAligner
(merge_all=False, gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, ss_constraints=False)¶ Bases:
schrodinger.protein.align.AbstractPairwiseAligner
Abstract class for the Needleman-Wunsch global alignment algorithm for pairwise sequence alignment with affine gap penalties.
- Variables
CONSTRAINT_SCORE – Reward amount for keeping constrained residues aligned
RES_MATCH_BONUS – Reward amount for aligning matching residues. Used by default if a substitution matrix is not specified.
RES_MISMATCH_PENALTY – Penalty for aligning differing residues. Used by default if a subtitution matrix is not specified
- Ctype CONSTRAINT_SCORE
float
- Ctype RES_MATCH_BONUS
float
- Ctype RES_MISMATCH_PENALTY
float
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
__init__
(merge_all=False, gap_open_penalty=1, gap_extend_penalty=0, sub_matrix=None, direct_scores=False, ss_constraints=False)¶ - Parameters
merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.- Parameters
aln (alignment.Alignment) – The alignment containing sequences to align.
seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
- Raises
CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
class
schrodinger.protein.align.
SchrodingerPairwiseAligner
(**kwargs)¶ Bases:
schrodinger.protein.align.AbstractNWPairwiseAligner
Implementation of the Needleman-Wunsch global alignment algorithm for pairwise sequence alignment with affine gap penalties.
ability to merge new sequence with existing alignment,
ability to penalize gaps in secondary structure elements,
ability to use custom substitution matrix generated from a family of proteins or provided by the user.
- NOTE::
Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
-
__init__
(**kwargs)¶ - Parameters
merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
getAlignmentScore
()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
- Returns
Score of the pairwise alignment.
- Return type
float
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.- Parameters
aln (alignment.Alignment) – The alignment containing sequences to align.
seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
- Raises
CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
class
schrodinger.protein.align.
BiopythonPairwiseAligner
(*args, **kwargs)¶ Bases:
schrodinger.protein.align.AbstractNWPairwiseAligner
Pairwise alignment using Biopython.
- NOTE::
Any residues with variant residue types will have their short codes uppercased. This means they will be treated identically to their standard variant. If a nonstandard residue type has a lowercase short code that doesn’t match its standard variant, or if we need special treatment for variant residues, _getMatrixValue will have to be changed.
-
__init__
(*args, **kwargs)¶ - Parameters
merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
getAlignmentScore
()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
- Returns
Score of the pairwise alignment.
- Return type
float
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.- Parameters
aln (alignment.Alignment) – The alignment containing sequences to align.
seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
- Raises
CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
class
schrodinger.protein.align.
PrimeSTAAligner
(protein_family=None)¶ Bases:
schrodinger.protein.align.AbstractAligner
Sequence alignment using $SCHRODINGER/sta
-
__init__
(protein_family=None)¶ - Parameters
protein_family (str or NoneType) – ‘GPCR’ for specialized alignment or None for default templates.
-
run
(aln, structured_seq=None, constraints=None)¶ - Parameters
aln (alignment.Alignment) – The alignment containing sequences to align.
structured_seq (sequence.ProteinSequence or NoneType) – Structured sequence to use as reference. If None, the first non-reference seq will be aligned.
constraints (list(tuple(residue.Residue, residue.Residue)) or NoneType) – Pairs of (reference_seq, structured_seq) residues to constrain
-
-
class
schrodinger.protein.align.
ClustalAligner
¶ Bases:
schrodinger.protein.align.AbstractAligner
Aligns sequences using the Clustal alignment algorithm.
-
run
(aln)¶ Aligns the sequences in an alignment
- Parameters
aln (
schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
-
class
schrodinger.protein.align.
SuperpositionAligner
(gap_open_penalty=None, gap_extend_penalty=None)¶ Bases:
schrodinger.protein.align.BiopythonPairwiseAligner
Align structured sequences based on their superposition.
-
__init__
(gap_open_penalty=None, gap_extend_penalty=None)¶ - Parameters
merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
getAlignmentScore
()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
- Returns
Score of the pairwise alignment.
- Return type
float
-
run
(aln, seqs_to_align=None, **kwargs)¶ kwargs
are additional arguments that will be passed to_run
.- Parameters
aln (alignment.Alignment) – The alignment containing sequences to align.
seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
- Raises
CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
-
class
schrodinger.protein.align.
AbstractStructureAligner
(keywords=None, **kwargs)¶ Bases:
schrodinger.protein.align.AbstractAligner
Subclasses must reimplement
run
: - Call_setUpSeqs
to set up instance attributes for the current alignment - Call_setASLs
to validate and store ASLs - Call_getUniqueEidSeqs
to get the sequences to align - Call_runStructureAlignment
to call the backend-
class
Result
(ref_seq, other_seq, psd, rmsd)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
count
(value, /)¶ Return number of occurrences of value.
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
other_seq
¶ Alias for field number 1
-
psd
¶ Alias for field number 2
-
ref_seq
¶ Alias for field number 0
-
rmsd
¶ Alias for field number 3
-
-
__init__
(keywords=None, **kwargs)¶ - Parameters
keywords (dict) – Keywords to pass to the ska backend
-
getResultSeqs
()¶
-
abstract
run
(aln)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters
aln (
schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
class
-
class
schrodinger.protein.align.
StructureAligner
(keywords=None, **kwargs)¶ Bases:
schrodinger.protein.align.AbstractStructureAligner
Run structure alignment using the specified sequences to create chain ASLs
-
run
(aln, seqs_to_align, **kwargs)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters
aln (
schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
class
Result
(ref_seq, other_seq, psd, rmsd)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
count
(value, /)¶ Return number of occurrences of value.
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
other_seq
¶ Alias for field number 1
-
psd
¶ Alias for field number 2
-
ref_seq
¶ Alias for field number 0
-
rmsd
¶ Alias for field number 3
-
-
__init__
(keywords=None, **kwargs)¶ - Parameters
keywords (dict) – Keywords to pass to the ska backend
-
getResultSeqs
()¶
-
-
class
schrodinger.protein.align.
CustomASLStructureAligner
(keywords=None, ref_asl=None, other_asl=None)¶ Bases:
schrodinger.protein.align.AbstractStructureAligner
Run structure alignment using specified ASLs
-
SENTINEL
= <object object>¶
-
__init__
(keywords=None, ref_asl=None, other_asl=None)¶ - Parameters
keywords (dict) – Keywords to pass to the ska backend
-
evaluateASLs
(aln, seqs_to_align)¶ Determine whether the ASLs match any atoms in the sequences’ structures
- Parameters
aln – Alignment
seqs_to_align – Sequences to align
- Return type
-
run
(aln, seqs_to_align, **kwargs)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters
aln (
schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-
class
Result
(ref_seq, other_seq, psd, rmsd)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
count
(value, /)¶ Return number of occurrences of value.
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
other_seq
¶ Alias for field number 1
-
psd
¶ Alias for field number 2
-
ref_seq
¶ Alias for field number 0
-
rmsd
¶ Alias for field number 3
-
-
getResultSeqs
()¶
-
-
class
schrodinger.protein.align.
MaxIdentityAligner
¶ Bases:
schrodinger.protein.align.BiopythonPairwiseAligner
Pairwise aligner that maximizes the number of matching residues between two sequences. There are no penalties for mismatches or gaps.
-
__init__
()¶ - Parameters
merge_all (bool) – Whether to merge the sequence with the whole alignment or only up to itself.
gap_open_penalty (float) – Penalty for opening a gap. Should be >=0.
gap_extend_penalty (float) – Penalty for extending a gap. Should be >=0.
sub_matrix (2D float array or dict mapping (char, char) to float) – Scoring matrix to be used for the alignment. If no matrix is specified, this method uses residue identity measure.
direct_scores (bool) – Use scoring matrix directly as (NxM) where N, M are lengths of both sequences rather than default 20x20 substitution matrix.
ss_constraints (bool) – Whether to constrain the alignment so no gaps appear in middle of a secondary structure.
-
run
(aln)¶ kwargs
are additional arguments that will be passed to_run
.- Parameters
aln (alignment.Alignment) – The alignment containing sequences to align.
seqs_to_align (list(sequence.Sequence)) – The sequences in
aln
to align against the reference sequence ofaln
. IfNone
, defaults to the first non-reference sequence inaln
(iealn[1]
)
- Raises
CantAlignException – If
seqs_to_align
contains a sequence not found inaln
.
-
CONSTRAINT_SCORE
= 10000¶
-
RES_MATCH_BONUS
= 1.0¶
-
RES_MISMATCH_PENALTY
= 1.0¶
-
getAlignmentScore
()¶ Get the score of the alignment. Found by taking the highest value in the scoring matrix.
- Returns
Score of the pairwise alignment.
- Return type
float
-
-
class
schrodinger.protein.align.
StructurelessGapAligner
¶ Bases:
schrodinger.protein.align.AbstractAligner
Align all structureless residues with gaps
For example, given the following alignment (where circled letters are structureless residues):
Resnum: 0 1 2 3 4 5 Seq1: Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ A D A
The result will be:
Resnum: 0 1 2 3 4 5 6 7 8 Seq1: ~ ~ ~ Ⓐ Ⓡ Ⓒ A D E Seq2: Ⓒ Ⓐ Ⓝ ~ ~ ~ A D A
-
run
(aln, seqs_to_align=None)¶ Aligns the sequences in an alignment using the parameters supplied on init
Subclasses need to override this default implementation.
- Parameters
aln (
schrodinger.protein.alignment.BaseAlignment
) – The alignment to align
-