schrodinger.application.msv.gui.undoable_alignment module¶
-
class
schrodinger.application.msv.gui.undoable_alignment.
AlignmentSelectionModel
(aln)¶ Bases:
PyQt5.QtCore.QObject
A class that manages selection of residues in an undoable alignment. Because of limitations with Qt’s selection models, we store selection status in our own domain objects instead.
This class has an undo stack because selection is undoable in the MSV.
Variables: - _selection (weakref.WeakSet(schrodinger.protein.residue.Residue)) – The current selection state
- _old_selection (weakref.WeakSet(schrodinger.protein.residue.Residue)) – The selection state from last time selectionChanged was emitted
- selectionChanged (
QtCore.pyqtSignal
emittingset(schrodinger.protein.residue.Residue)
andset(schrodinger.protein.residue.Residue)
) – A signal emitted to notify listeners that selection has changed, with the set of residues that have been selected and the set of residues that have been deselected. Note that this is called on a single shot timer so that if selection is modified multiple times successively (ie by a “clear then select”), only one signal is emitted.
-
clearSelection
()¶ Unselect all residues.
-
getSelection
()¶ Returns: A set of currently selected residues Return type: set(schrodinger.protein.residue.Residue)
-
isSelected
(res)¶ Parameters: res (schrodinger.protein.residue.Residue) – The residue to determine the selection state of Returns: whether res is selected Return type: bool
-
onResiduesRemoved
(residue_selection)¶ When residues are about to be removed, deselect those residues.
-
onSequencesAboutToBeRemoved
(start, end)¶ When sequences are about to be removed, deselect all the residues in those sequences.
-
selectionChanged
¶
-
setSelectionState
(residues, selected)¶ Set the selection state of the provided residues.
Parameters: - residues (iterable of
schrodinger.protein.residue
) – The residues to select/unselect - selected (bool) – Whether to select or deselect the residues
- residues (iterable of
-
setUndoStack
(undo_stack)¶ Parameters: undo_stack (QtWidgets.QUndoStack) – The undo stack to push commands onto
-
class
schrodinger.application.msv.gui.undoable_alignment.
ProteinAlignment
(sequences=None, is_workspace=False)¶ Bases:
object
A ProteinAlignment class that presents the same interface as a regular ProteinAlignment but optionally accomplishes mutating operations via a command stack.
If no command stack is set on the object, commands are executed but cannot be undone.
Undoable protein alignments have an
AlignmentSelectionModel
because they are intended for use in GUIs, whereas normal non-undoable alignments don’t have a selection model because their use cases (aligning sequences) don’t require a concept of selection.-
addDisulfideBond
(res1, res2)¶ Add a disulfide bond if both residues’ sequences are in the alignment
Parameters: - res1 (residue.Residue) – A residue to link with a disulfide bond
- res2 (residue.Residue) – Another residue to link with a disulfide bond
Raises: ValueError – if either sequence is not in the alignment
-
addGaps
(gap_indices)¶ Adds gaps to the alignment
Note: the length of the gap_indices list must match the number of sequences in the alignment. Parameters: gap_indices – A list of lists of gap indices, one for each sequence in the alignment.
-
addOrReplaceSeqs
(seqs, identifier_func)¶ Given seqs and an identifier_func, replaces seqs in the alignment matching the identifier_func and appends any additional seqs to the alignment
Parameters: - seqs (iterable of
schrodinger.protein.sequence. Sequence
) – The sequences to add to the alignment - identifier_func (callable) – A key function to uniquely identify sequences
- seqs (iterable of
-
addResidues
(selection)¶ Adds the specified residues to the alignment
Parameters: selection ( ResidueSelection
) – A selection of residues
-
addSeq
(seq, index=None)¶ Parameters: - seq (
sequence.Sequence
) – The sequence to add - start (int) – The index at which to insert; if None, seq is appended
- seq (
-
addSeqs
(seqs, index=None)¶ Add multiple sequences to the alignment
Parameters: - sequences (list of
sequence.Sequence
) – Sequences to add - start (int) – The index at which to insert; if None, seq is appended
- sequences (list of
-
addSeqsByIndices
(seq_map)¶ Insert a sequences at the specified indices in the alignment. The sequences will be added from lowest to highest to allow for specification of indexes that may be out of range of the current alignment until lower-indexed sequences have been added. Note that indexes that remain out of range will result in their corresponding sequence simply being appended to the end of the alignment.
Parameters: seq_index_map – Map of insertion indices to sequences to be added.
-
alignmentLocked
(*args, **kwargs)¶ Whether every column in the alignment is locked
Return type: bool Returns: Whether the alignment is locked
-
all_annotations
¶
-
annotations
¶
-
appendSubalignment
(aln)¶ Append an alignment to this one
Parameters: aln ( BaseAlignment
or list ofSequence
) – The alignment to append
-
calculateMatrix
(*args, **kwargs)¶ Calculates a substitution matrix based on the current alignment.
-
clear
(*args, **kwargs)¶ Remove all sequences and locked columns from the alignment.
-
columnHasAllSameResidues
(*args, **kwargs)¶ Return whether or not the column at a specified index has all the same residues (excluding gaps).
Note that if any unknown residues are present, the column will not be considered to be of all the same residue type.
Parameters: index (int) – Index to check for uniformity Returns: True if the column is of uniform identity, False otherwise. Return type: bool
-
columns
(*args, **kwargs)¶ Returns a range of alignment columns or all columns if indices are not specified.
Parameters: omit_gaps (bool) – Whether to omit gaps
-
connectSignals
()¶ Connect the signals in self.signals to the signals emitted by self._aln
-
disulfide_bonds
¶
-
findPattern
(*args, **kwargs)¶ Finds a specified PROSITE pattern in all sequences.
Parameters: pattern (str) – PROSITE pattern to search in sequences. See protein.sequence.find_generalized_pattern
for documentation.Returns: List of matching residues Return type: list of protein.residue.Residue
-
static
fromClustalFile
(file_name)¶ Returns alignment read from file in Clustal .aln format preserving order of sequences.
Parameters: file_name (str) – Source file name. Raises: IOError – If output file cannot be read. Return type: ProteinAlignment
Returns: An alignment Note: The alignment can be empty if no sequence was present in the input file.
-
static
fromFastaFile
(file_name)¶ Returns alignment read from file in Clustal .aln format preserving order of sequences.
Raises: IOError – If the input file cannot be read. Return type: ProteinAlignment
Returns: Read alignment. The alignment can be empty if no sequence was present in the input file.
-
static
fromFastaString
(lines)¶ Read sequences from FASTA-formatted text, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
Parameters: lines (list of str) – list of strings representing FASTA file Return type: ProteinAlignment
Returns: The alignment
-
static
fromFastaStringList
(strings)¶ Return an alignment object created from an iterable of sequence strings
Parameters: strings (Iterable of strings) – Sequences as iterable of strings (1D codes)
-
getAlignedBlocks
(*args, **kwargs)¶ Returns the indices of aligned blocks (regions without gaps).
-
getAlignmentQualityByColumn
(*args, **kwargs)¶ Retrieve the alignment quality at a given column and update the cache if necessary.
Parameters: col_index (int) – Column of the residue
-
getColumn
(*args, **kwargs)¶ Returns single alignment column at index position. Optionally, filters out gaps if omit_gaps is True.
Parameters: - index (int) – The index in the alignment
- omit_gaps (bool) – Whether to omit the gaps
Return type: list
Returns: Single alignment column at index position.
-
getDiscontinuousSubalignment
(*args, **kwargs)¶ Given a list of indices, return a new alignment of sequences made up of the residues at those specified indices within this alignment.
Parameters: indices (list of (int, int)) – List of (seq index, residue index) tuples Returns: A new subalignment Return type: BaseAlignment
-
getEntropy
(*args, **kwargs)¶ Returns an alignment length array of residue entropy scores
-
getFrequencies
(*args, **kwargs)¶ Returns a dict mapping residues types to the frequency in the alignment
Parameters: - exclude (list) – A list of sequences to exclude
- consider_gaps (bool) – Whether to consider gaps in calculating frequences
-
getGapIndicesByKeyFunc
(*args, **kwargs)¶ Converts a gap_info list and key func into a list of gap indices
Gap information consists of (key for residue, number of gaps preceding it)
Parameters: - gap_info (list) – list of list of tuples
- key_func (function) – callable that takes a residue and returns a key
Return type: list of lists of int
Returns: A list of gaps for each sequence in the alignment
-
getGapOnlyColumns
(*args, **kwargs)¶ Returns a list of lists of indices for unlocked columns that contain only gaps
Return type: list Returns: List of list of indices
-
getGaps
(*args, **kwargs)¶ Returns a list of gap indices lists
Return type: list Returns: A list of lists of ints
-
getGapsByKeyFunc
(*args, **kwargs)¶ Given a key function to uniquely identify residues, build a list of lists with gap information for each sequence in the alignment
Gap information consists of (key for residue, number of gaps preceding it)
Parameters: key_func (function) – callable that takes a residue and returns a key Return type: list Returns: A list of lists with gaps information for each sequence in the alignment
-
getGlobalAnnotationData
(*args, **kwargs)¶ Returns column-level annotation data at an index in the alignment
Parameters: - index (int) – The index in the alignment
- annotation (
enum.Enum
) – An enum representing the requested annotation, if any
-
getHiddenSeqCount
(*args, **kwargs)¶ Return the number of sequences in the alignment that have an associated PT entry ID but are not currently visible in the Workspace.
Returns: number of hidden sequences Return type: int
-
getIdentities
(*args, **kwargs)¶ Returns an alignment-length list of bools indicating which columns have identical residues
Parameters: omit_gaps (bool) – Whether gaps should be excluded from a column.
-
getRedundantSequences
(*args, **kwargs)¶ Returns the indices of sequences below a specified identity threshold value.
Returns: The indices of sequences in the alignment below specified identity threshold Return type: list of int
-
getReferenceSeq
(*args, **kwargs)¶ Returns the sequence that has been set as reference sequence or None if there is no reference sequence.
Returns: The reference sequence or None Return type: Sequence
or None
-
getResidueData
(*args, **kwargs)¶ Returns residue-level data for the specified sequence at the specified index in the alignment, or None if no data is available.
If annotation is specified, the residue-level information for the residue is returned. If not, the residue object itself is returned.
Parameters: - seqnum (int) – The index of the sequence in the alignment
- index (int) – The index of the residue in the sequence
- annotation (
enum.Enum
) – An enum representing the requested annotation, if any
-
getResidueIndices
(*args, **kwargs)¶ Returns the indices (in the alignment) of the specified residues
Parameters: residues – Return type: list of (sequence index, residue index) tuples Returns: A list of (int, int)
-
static
getReversedSequenceOrdering
(seq_indices)¶ Given a new ordering for sequences in an alignment, return an ordering that will restore the original order of sequences.
Given a an alignment [a, b, c, d, e] an ordering of [3, 1, 4, 2, 0] will rearrange the sequences into [d, b, e, c, a]. We need an ordering of [4, 1, 3, 0, 2] to restore the original arrangement of [a, b, c, d, e]. This method is used in undo operations.
Parameters: seq_indices – A list with the new indices for sequences Type: list of int Return type: list of int Returns: An ordering list that will restore the original arrangement of sequences in the alignment
-
getSeqIndex
(*args, **kwargs)¶ Parameters: seq ( sequence.Sequence
) – The requested sequenceReturn type: int Returns: The index of the requested sequence
-
getSimilarityScore
(*args, **kwargs)¶ Returns a sequence length array of similarity scores against the reference sequence
Gaps in the sequences are coded as None values.
-
getSubalignment
(*args, **kwargs)¶ Return another alignment containing the elements within the specified start and end indices
Parameters: - start (int) – The index at which the subalignment should start
- end (int) – The index at which the subalignment should end
Return type: BaseAligment
Returns: An alignment corresponding to the start and end point specified
-
getTerminalGaps
(*args, **kwargs)¶ Returns the indices of terminal gaps in all the sequences
Return type: list Returns: A list of lists of ints
-
getVisibleSeqCount
(*args, **kwargs)¶ Return the number of visible sequences in the alignment.
Returns: number of visible sequences Return type: int
-
global_annotations
¶
-
insertSubalignment
(aln, start)¶ Insert an alignment into the current alignment at the specified index
Parameters: - aln (
BaseAlignment
) – The alignment to insert - start (int) – The index at which to insert the alignment
- aln (
-
isReferenceSeq
(*args, **kwargs)¶ Return whether or not a sequence is the reference sequence.
Parameters: seq ( Sequence
) – Sequence to checkReturns: True if the sequence is the reference sequence, False otherwise. Return type: bool
-
isWorkspace
()¶ Returns: Whether this alignment is controlled by the structure model and only includes sequences that are currently included in the workspace. :rtype: bool
-
iterResidues
(*args, **kwargs)¶ Yields a sequence of schrodinger.protein.residue.Residue objects in the alignment, omitting gaps.
-
lockedColumns
(*args, **kwargs)¶ Returns a set with indices of locked columns.
Return type: set Returns: A set of indices The set is a copy of our internal set, so modifying it has no effect on our private attribute
-
makeResidueSelection
(*args, **kwargs)¶ Returns a residue selection object matching the specified residues
Parameters: residues (list) – A list of residues Return type: ResidueSelection
Returns: An object containing selection information
-
max_length
¶
-
mergePairwiseAlignments
(*args, **kwargs)¶ Merges several pairwise alignments into one flat alignment while preserving relative residue positions. The original sequences are modified. After executing this function, all reference sequences (first pair members) will be identical.
Example. Let’s assume we have three pairwise query/template alignments:
Q1: ACDEFGHI T1: ~~DEF~~~
Q2: ~~~ACDEFGHI T2: TTT~~DE~~H~
Q3: ACDEF~~GHI~ T3: ACD~~PPGH~Y
Note the reference sequence is identical in all cases, but it has gaps in different positions. After running mergePairwiseAlignments, the result is:
Q1: ~~~ACDEF~~GHI T1: ~~~~~DEF~~~~~
Q2: ~~~ACDEF~~GHI T2: TTT~~DE~~~~H~
Q3: ~~~ACDEF~~GHI~ T3: ~~~ACD~~PPGH~Y
Now the queries have gaps in identical positions, and aligned residues are in positions equivalent to these in original alignments.
Parameters: sequence_pairs (list of list of sequences) – List of [query, template] pairs.
-
minimizeAlignment
()¶ Minimizes the alignment, i.e. removes all gaps from the gap-only columns.
-
mutateResidues
(mutations)¶ Mutate the residues at the specified locations in the alignment
Note that the individual sequences will emit a signal announcing the mutation
Parameters: mutations (list of tuples (seq_i, res_i, replacement)) –
-
static
padAlignment
(aln)¶ Insert gaps into an alignment so that it forms a rectangular block
Parameters: aln ( schrodinger.protein.Alignment
) – An alignment to pad
-
removeAllGaps
()¶ Removes all the gaps of the sequences in the alignment. This also unlocks all columns
-
removeAllSeqs
()¶ Clears the entire alignment of sequences
-
removeDisulfideBond
(res1, res2)¶ Remove a disulfide bond if both residues’ sequences are in the alignment
Parameters: - res1 (residue.Residue) – A residue to link with a disulfide bond
- res2 (residue.Residue) – Another residue to link with a disulfide bond
Raises: ValueError – if either sequence is not in the alignment
-
removeGaps
(gap_indices)¶ Parameters: gap_indices (list of list of ints) – Indices of gaps to remove
-
removeResidues
(residues)¶ Removes the specified residues from the alignment and emits the signals.residuesRemoved signal with the selection
Parameters: residues (list) – The residues to remove
-
removeSeq
(seq)¶ Remove a sequence from the alignment
Parameters: seq ( sequence.Sequence
) – The sequence to remove
-
removeSeqByIndex
(index)¶ Remove a Sequence from the alignment
Parameters: index (int) – The index of the sequence to remove
-
removeSeqs
(seqs)¶ Remove multiple sequences from the alignment
-
removeSubalignment
(start, end)¶ Remove a block of the subalignment from the start to end points, including column locks in that region
Parameters: - start (int) – The start index of the columns to remove
- end (int) – The end index of the columns to remove
-
removeTerminalGaps
()¶ Removes the gaps from the ends of every sequence in the alignment
-
reorderSequences
(seq_indices)¶ Reorder the sequences in the alignment using the specified list of indices
Parameters: seq_indices – A list with the new indices for sequences Type: list of int Raises: ValueError – In the event that the list of indices does not match the length of the alignment
-
replaceResiduesWithGaps
(residues)¶ Replaces the specified residues with gaps
Parameters: residues (list) – A list of residues to replace with gaps
-
replaceSeq
(seq, index)¶ Replace the sequence at the specified index with the elements in the specified sequence
Note that this leaves the original sequence itself intact so that it continues to be monitored
Parameters: - seq (iterable of
schrodinger.protein.residue. Residue
) – The sequence whose elements we use - index (int) – The index of the sequence to replace
- seq (iterable of
-
replaceSubalignment
(aln, start, end)¶ Replace a subsection of the alignment indicated by start and end indices with the specified alignment
Parameters: - aln (
BaseAlignment
) – The alignment to insert - start (int) – The index at which to insert the alignment
- aln (
-
resMatchesReferenceRes
(*args, **kwargs)¶ Return True if the residue of a sequence at a column in the alignment matches the reference residue.
Parameters: - row_index (int) – Index of the sequence containing the residue to check
- col_index (int) – Column of the residue to check
Returns: True if the residue at the specified index matches the reference, False otherwise.
Return type: bool
-
seq_annotations
¶
-
setAllLocks
(lock=True)¶ Convenience method to set all the locks to the specified lock state at once
Parameters: lock (bool) – Whether to lock or unlock the specified columns
-
setGaps
(gap_indices)¶ Sets gaps on the alignment
Parameters: gap_indices – A list of lists of gap indices, one for each sequence in the alignment.
-
setLockedColumns
(columns, lock=True)¶ Sets the columns to the specified lock state
Parameters: - columns (iterable) – an iterable of columns to set, specified by index
- lock (bool) – Whether to lock or unlock columns
- reset (bool) – Whether to reset the locks or add to existing ones
-
setReferenceSeq
(seq)¶ Set the specified sequence as the reference sequence.
Parameters: seq ( sequence
) – Sequence to set as reference sequence
-
setUndoStack
(undo_stack)¶ Parameters: undo_stack ( QtWidgets.QUndoStack
Set the undo stack on the object) – The undo stack on which to push commands
-
sort
(key, reverse=False)¶ Sort the alignment by the specified criteria.
NOTE: Query sequence is not included in the sort.
Parameters: - key (function) – A function that takes a sequence and returns a value to sort by for each sequence.
- reverse – Whether to sort in reverse (descending) order.
-
staticMetaObject
¶
-
toClustalFile
(*args, **kwargs)¶ Writes aln to a Clustal alignment file.
Raises: IOError – If output file cannot be written.
Parameters: - file_name (str) – Destination file name.
- use_unique_names (bool) – If True, write unique name for each sequence.
-
toFastaFile
(*args, **kwargs)¶ Write self to specified FASTA file
Raises: IOError – If output file cannot be written.
-
toFastaString
(*args, **kwargs)¶ Convert ProteinAlignment object to list of sequence strings
Parameters: aln ( ProteinAlignment
) – Alignment data
-
toFastaStringList
(*args, **kwargs)¶ Convert self to list of fasta sequence strings
Return type: list Returns: list of str
-