schrodinger.application.msv.seqio module¶
-
class
schrodinger.application.msv.seqio.
FetchIDs
(pdb, entrez, uniprot)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
count
(value, /)¶ Return number of occurrences of value.
-
entrez
¶ Alias for field number 1
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
pdb
¶ Alias for field number 0
-
uniprot
¶ Alias for field number 2
-
-
exception
schrodinger.application.msv.seqio.
SequenceWarning
[source]¶ Bases:
UserWarning
Custom warning for problems loading sequences
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.application.msv.seqio.
catch_sequence_warnings
(*args, **kwargs)[source]¶ Bases:
contextlib.ExitStack
Filter SequenceWarnings and store them on the instance
-
callback
(callback, /, *args, **kwds)¶ Registers an arbitrary callback and arguments.
Cannot suppress exceptions.
-
close
()¶ Immediately unwind the context stack.
-
enter_context
(cm)¶ Enters the supplied context manager.
If successful, also pushes its __exit__ method as a callback and returns the result of the __enter__ method.
-
pop_all
()¶ Preserve the context stack by transferring it to a new instance.
-
push
(exit)¶ Registers a callback with the standard __exit__ method signature.
Can suppress exceptions the same way __exit__ method can. Also accepts any object with an __exit__ method (registering a call to the method instead of the object itself).
-
-
exception
schrodinger.application.msv.seqio.
GetSequencesException
[source]¶ Bases:
OSError
Custom Exception for problems retrieving sequences.
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
characters_written
¶
-
errno
¶ POSIX exception code
-
filename
¶ exception filename
-
filename2
¶ second exception filename
-
strerror
¶ exception strerror
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.application.msv.seqio.
PdbParts
(pdbcode, pdbchain)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
count
(value, /)¶ Return number of occurrences of value.
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
pdbchain
¶ Alias for field number 1
-
pdbcode
¶ Alias for field number 0
-
-
class
schrodinger.application.msv.seqio.
FastaParts
(name, long_name, chain, anno_type)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
anno_type
¶ Alias for field number 3
-
chain
¶ Alias for field number 2
-
count
(value, /)¶ Return number of occurrences of value.
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
long_name
¶ Alias for field number 1
-
name
¶ Alias for field number 0
-
-
schrodinger.application.msv.seqio.
make_maestro_pdb_id
(pdb_id)[source]¶ Convert a PDB ID to “:”-separated PDB code and PDB chain (e.g. 4hhb if chain is blank or 4hhb:A)
- Parameters
pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
- Returns
PDB ID with “:” between PDB code and PDB chain
- Return type
str
-
schrodinger.application.msv.seqio.
parse_pdb_id
(pdb_id, permissive=False)[source]¶ Parse a PDB ID into a (pdb code, pdb chain) Named tuple.
- Parameters
pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
permissive (bool) – Whether to use permissive parsing. In strict mode, PDB ID must be 4 characters starting with a digit and single-letter chain is optional. In permissive mode, PDB ID can contain any non-whitespace characters but chain separator and single-letter chain are required.
- Returns
Named tuple of (pdbcode, pdbchain)
- Type
- Raises
GetSequencesException – if pdb_id can’t be parsed
-
schrodinger.application.msv.seqio.
get_valid_pdb_id_map_for_seqs
(seqs, structureless_only=True)[source]¶ For a list of sequences return a map of valid PDB IDs to sequences.
- Parameters
seqs (list(sequence.Sequence)) – List of sequences to get the map for
structureless_only (bool) – Whether to only return structureless seqs
- Returns
Map of valid PDB IDs to their source sequence
- Return type
dict(str: sequence.Sequence)
-
schrodinger.application.msv.seqio.
valid_pdb_id
(pdb_id: str) → bool[source]¶ - Returns
Whether the ID appears to be a valid PDB ID
-
schrodinger.application.msv.seqio.
valid_entrez_id
(entrez_id: str) → bool[source]¶ Entrez ID may be:
1) NCBI Accession number: 9 or 12 characters starting with any letter, followed by “P_”, ending with 6 or 9 numbers and an optional number following a period (ex. NP_123456, XP_123456789.1)
NCBI GenInfo identifier: A single 9-digit number (ex. 123456789).
- Returns
Whether the ID appears to be a valid Entrez ID
-
schrodinger.application.msv.seqio.
valid_uniprot_id
(uniprot_id: str) → bool[source]¶ UniProt ID must be 6 characters or 10 characters starting with a letter
- Returns
Whether the ID appears to be a valid UniProt ID
-
schrodinger.application.msv.seqio.
valid_swiss_prot_name
(swiss_prot_name: str) → bool[source]¶ Swiss-Prot entry name must be of the form X_Y, where X and Y are at most 5 alphanumeric characters and the underscore serves as a separator.
We also require Y to be a minimum of 2 characters to avoid confusion with a PDB ID.
- Returns
Whether the name appears to be a valid Swiss-Prot entry name
-
schrodinger.application.msv.seqio.
process_fetch_ids
(ids, *, dialog_parent, allow_pdb=True)[source]¶ Convenience method to parse a list or comma-separated strings into valid sequence and/or structure identifiers. If any IDs can’t be identified, prompt the user to continue.
- Parameters
ids (str or list) – Database ID or IDs (comma-separated str or list)
dialog_parent (QtWidgets.QWidget) – Parent to show dialog box
allow_pdb (bool) – Whether to allow structure identifiers. If False, they will be treated as unidentified.
- Returns
Namedtuple of IDs identified as PDB, entrez, uniprot; or None if there are unidentified IDs and the user cancels.
- Return type
FetchIDs or NoneType
-
schrodinger.application.msv.seqio.
maestro_get_pdb
(maestro_pdb_id, pdb_dir=None, remote_ok=False)[source]¶ Download a PDB file. If specified, the chain will be split out into a separate file.
- Parameters
maestro_pdb_id (str) – 4-letter PDB code or code:chain (e.g. 4hhb or 4hhb:A)
pdb_dir (str) – directory to check for existing files and destination to download new files
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
downloaded PDB path
- Return type
str
- Raises
GetSequencesException – if pdb file can’t be downloaded
-
class
schrodinger.application.msv.seqio.
SeqDownloader
[source]¶ Bases:
object
-
ENTREZ_FORMAT_STR
= 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=fasta&id={ID}'¶
-
UNIPROT_FORMAT_STR
= 'https://www.uniprot.org/uniprot/{ID}.{EXT}'¶
-
classmethod
downloadPDB
(pdb_id, pdb_dir=None, remote_ok=False)[source]¶ Parse PDB ID string and download PDB file.
- Parameters
pdb_id (str) – PDB ID with optional chain (e.g. 4hhb, 4hhbA, 4hhb:A)
pdb_dir (str) – directory to check for existing files and destination to download new files
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
Full path to downloaded PDB path
- Type
str
- Raises
GetSequencesException – if pdb file can’t be downloaded
-
classmethod
downloadEntrezSeq
(sequence_id, remote_ok)[source]¶ Download a sequence from Entrez database.
- Parameters
sequence_id (str) – Sequence ID in Entrez format.
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
Full path to downloaded fasta file
- Return type
str
-
classmethod
downloadUniprotSeq
(sequence_id, remote_ok, *, use_xml=False)[source]¶ Download a sequence from Uniprot database.
- Parameters
sequence_id (str) – Sequence ID in Uniprot format.
remote_ok (bool) – whether it’s okay to make a remote query.
use_xml (bool) – whether to get the xml file with the full UniProt annotation information (e.g. domains). Setting this to True with download the xml file instead of the FASTA file.
- Returns
Full path to downloaded fasta or xml file
- Return type
str
-
-
schrodinger.application.msv.seqio.
read_sequences
(filename)[source]¶ Read sequences from the filename. Format is detected from the file extension
Note that this function is only used for non-structure filetypes. For structure filetypes, see the StructureConverter class.
- Parameters
filename (str) – Path to sequence file
- Return type
list
- Returns
A list of sequences in the file
-
schrodinger.application.msv.seqio.
from_biopython
(biopy_seq)[source]¶ Convert a Biopython sequence to a ProteinSequence
- Parameters
seq (Bio.SeqRecord.SeqRecord) – A Biopython sequence to convert to a ProteinSequence
- Returns
The converted sequence
- Return type
-
class
schrodinger.application.msv.seqio.
StructureConverter
(ct, eid=None)[source]¶ Bases:
object
Reads a structure and converts it to a list of sequences.
Note that this class produces sequences that are ordered based on residue number and insertion code, not connectivity. If that ever changes,
structure_model.MaestroStructureModel._extractChains
must also be updated.-
__init__
(ct, eid=None)[source]¶ - Parameters
ct (schrodinger.structure.Structure) – A structure to convert to sequences.
eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure will be used.
-
classmethod
convert
(ct, eid=None)[source]¶ Convert the provided structure into a list of sequences.
- Parameters
ct (schrodinger.structure.Structure) – A structure to convert to sequences.
eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure will be used.
- Returns
A list of sequences, one per chain.
- Return type
list[sequence.Sequence]
-
makeSequences
()[source]¶ Note that disulfide bonds might be between chains, so need to be calculated at the ct level
- Returns
A list of sequences, one per chain.
- Return type
list[sequence.Sequence]
-
classmethod
convertStructResidue
(struct_res, make_res)[source]¶ Convert a
structure._Residue
into aresidue.Residue
.- Parameters
struct_res (structure._Residue or residue.Residue) – A structure residue to convert. If this is a
residue.Residue
object, it will be returned unchanged.make_res (callable) – A method to convert a string into a
residue.Residue
- Returns
A newly created residue
- Return type
-
-
class
schrodinger.application.msv.seqio.
MMSequenceConverter
[source]¶ Bases:
object
Converts sequence between mmseq and MSV sequence formats.
- Note
This is supposed to be used with ‘with’ context manager.
-
classmethod
readSequences
(file_name, file_format=0)[source]¶ Reads all sequences from file specified by file_name.
- Parameters
file_name (str) – Name of input file.
file_format (int) – Format of the input file. By default, the format is MMSEQIO_ANY meaning file type is automatically recognized.
- Return type
- Returns
List of sequences read from the file.
- Raises
GetSequencesException – If the file could not be read.
-
classmethod
writeSequences
(sequences, file_name, file_format=1)[source]¶ Writes sequences to a file specified by file_name.
- Raises
mmcheck.MmException – If the file could not be open for writing.
- Parameters
seqences – List of sequences to be written to file.
file_name (str) – Name of input file.
file_format (int) – Format of the input file. By default, the format is MMSEQIO_NATIVE.
-
class
schrodinger.application.msv.seqio.
BaseProteinAlignmentReader
[source]¶ Bases:
object
Base class for reading protein sequence alignments from files.
-
classmethod
read
(file_name, AlnCls=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶ Returns alignment read from file
- Note
The alignment can be empty if no sequence was present in the input file.
- Parameters
file_name (str) – Source file name
AlnCls (type) – The type of the Alignment to return
- Returns
An alignment of the specified type
- Raises
IOError – If file cannot be read
-
classmethod
-
class
schrodinger.application.msv.seqio.
ClustalAlignmentReader
[source]¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentReader
Class for reading Clustal *.aln files.
-
class
schrodinger.application.msv.seqio.
SeqDReader
[source]¶ Bases:
object
-
REQUIRED_COLUMNS
= ('ResID', 'Chain', 'ResName')¶
-
-
class
schrodinger.application.msv.seqio.
FastaAlignmentReader
[source]¶ Bases:
object
-
classmethod
parseSSA
(seq)[source]¶ Parse a SSA sequence into a list of SSA values that can be assigned to residues’
secondary_structure
property- Parameters
seq (str) – the “sequence” from the FASTA file which encodes the SSA values
- Returns
a list of the SSA values. The SSA values come from schrodinger.structure. Returns None if any of the elements was invalid
- Type
list(int) or NoneType
-
classmethod
read
(file_name, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶ Loads a sequence file in FASTA format, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
- Parameters
file_name (str) – name of input FASTA file
AlnClass (type) – The class of the alignment object to return
- Returns
Read alignment.
- Return type
AlnClass
-
classmethod
readFromText
(lines, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶ Read sequences from FASTA-formatted text, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
- Parameters
lines (list of str) – list of strings representing FASTA file
AlnClass (type) – The class of the alignment object to return
- Returns
The alignment
- Return type
AlnClass
-
classmethod
readFromStringList
(strings, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶ Return an alignment object created from an iterable of sequence strings
- Parameters
strings (Iterable of strings) – Sequences as iterable of strings (1D codes)
AlnClass (type) – The class of the alignment object to return
- Returns
The alignment
- Return type
AlnClass
-
classmethod
-
schrodinger.application.msv.seqio.
to_biopython
(seq)[source]¶ Converts a sequence to a Biopython sequence
- Parameters
seq (schrodinger.protein.sequence.ProteinSequence) – A sequence to convert to a Biopython sequence
- Returns
The sequence converted to a Biopython SeqRecord
- Return type
Bio.SeqRecord.SeqRecord
-
class
schrodinger.application.msv.seqio.
BaseProteinAlignmentWriter
[source]¶ Bases:
object
Class for writing protein alignments to files.
-
class
schrodinger.application.msv.seqio.
FastaAlignmentWriter
[source]¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter
Class for writing FASTA .fasta files.
Format is described here: U{Fasta format wikipedia<https://en.wikipedia.org/wiki/FASTA_format>}
-
HEADER_START
= '>'¶
-
HEADER_END
= ''¶
-
classmethod
toStringAndNames
(aln, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)[source]¶ Converts aln to FASTA string
- Parameters
aln (
ProteinAlignment
) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONS
will be exported.sim_ref_seq (
sequence.Sequence
or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
FASTA string
- Return type
string
-
classmethod
toStringList
(aln)[source]¶ Convert ProteinAlignment object to list of sequence strings
- Parameters
aln (
ProteinAlignment
) – Alignment data- Return type
list of str
- Returns
A list of sequence strings representing the alignment
-
classmethod
write
(aln, file_name, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)[source]¶ Write aln to FASTA file
- Raises
IOError – If output file cannot be written.
- Parameters
aln (
ProteinAlignment
) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
file_name (str) – Destination file name.
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONS
will be exported.sim_ref_seq (
sequence.Sequence
or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
output names of each sequence
- Return type
list of str
-
-
class
schrodinger.application.msv.seqio.
TextAlignmentWriter
[source]¶ Bases:
schrodinger.application.msv.seqio.FastaAlignmentWriter
Class for writing alignments in text format
-
HEADER_START
= ''¶
-
HEADER_END
= '. '¶
-
classmethod
toString
(aln, use_unique_names=True, maxl=50)¶
-
classmethod
toStringAndNames
(aln, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)¶ Converts aln to FASTA string
- Parameters
aln (
ProteinAlignment
) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONS
will be exported.sim_ref_seq (
sequence.Sequence
or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
FASTA string
- Return type
string
-
classmethod
toStringList
(aln)¶ Convert ProteinAlignment object to list of sequence strings
- Parameters
aln (
ProteinAlignment
) – Alignment data- Return type
list of str
- Returns
A list of sequence strings representing the alignment
-
classmethod
write
(aln, file_name, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)¶ Write aln to FASTA file
- Raises
IOError – If output file cannot be written.
- Parameters
aln (
ProteinAlignment
) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
file_name (str) – Destination file name.
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONS
will be exported.sim_ref_seq (
sequence.Sequence
or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
output names of each sequence
- Return type
list of str
-
-
class
schrodinger.application.msv.seqio.
ClustalAlignmentWriter
[source]¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter
Class for writing Clustal *.aln files.
The format is described here:
http://meme-suite.org/doc/clustalw-format.html
-
classmethod
write
(aln, file_name, use_unique_names=True, **kwargs)[source]¶ Writes aln to a Clustal alignment file.
Note: **kwargs are ignored, to preserve signature of BaseProteinAlignmentWriter
- Raises
IOError – If output file cannot be written.
- Parameters
aln (
BaseAlignment
) – Alignment to be written to a file.file_name (str) – Destination file name.
use_unique_names (bool) – If True, write unique name for each sequence.
- Return type
dict
- Returns
A mapping of names written to the clustal file and sequences
-
classmethod
-
schrodinger.application.msv.seqio.
is_inhouse_header
(fasta_header)[source]¶ Test that the given fasta header is of the in house format In house format is given by “>NAME:<long_name>|CHAIN:<chain>” with an optional “|<anno_type>” flag on the end.
- Example: >NAME:ABC|CHAIN:X|SSA
>NAME:A|B|C|CHAIN:X x
- Parameters
fasta_header (str) – The fasta header to check
- Returns
Whether it is or isnt the in-house format
- Return type
bool
-
schrodinger.application.msv.seqio.
parse_in_house_header
(fasta_header)[source]¶ Test that the given fasta header is of the in house format In house format is given by “>NAME:<long_name>|CHAIN:<chain>” with an optional “|<anno_type>” flag on the end.
- Example: >NAME:ABC LONG|CHAIN:X|SSA –> ABC LONG, X, secondary_structure
>NAME:A|B|C|CHAIN:X x –> A|B|C, X, None
- Parameters
fasta_header (str) – The fasta header to parse
- Returns
the long_name, chain and annotation type corresponding to the header
- Return type
tuple(str, str, PSAnno.ANNOTATION_TYPES) or NoneType)
-
schrodinger.application.msv.seqio.
parse_fasta_header
(header, permissive=True)[source]¶ Parse a FASTA header into a (name, long_name, chain, anno_type) Named tuple.
- Parameters
header (str) – The header for a single entry in a FASTA file (including leading comment character)
permissive (bool) – Whether to use permissive parsing. See
parse_pdb_id
for documentation.
- Returns
Named tuple of (name, long_name, chain, anno_type)
- Type
-
schrodinger.application.msv.seqio.
parse_long_name
(long_name, permissive=True)[source]¶ Attempt to parse a long_name into a short name and a chain.
- Example: 1FSK:A –> 1FSK, A
2BJM.H VH CDR_LENGTH: 5 17 11 –> 2BJM, H sp|accession|entry name –> accession, “”
- Parameters
long_name (str) – The long name to attempt to parse
permissive (bool) – Whether to use permissive parsing. See
parse_pdb_id
for documentation.
- Returns
A short name and a chain id
- Return type
-
schrodinger.application.msv.seqio.
reorder_fasta_alignment
(aln, orig_names)[source]¶ Reorder a FASTA alignment to match the order of names written to FASTA.
Intended for use after alignment methods that reorder the output.
Example usage:
orig_names = seqio.FastaAlignmentWriter.write(orig_aln, input_filename) # run alignment method aln = seqio.FastaAlignmentReader.read(out_filename) reorder_fasta_alignment(aln, orig_names)
- Parameters
aln (alignment.BaseAlignment) – Alignment to reorder. Will be modified in place.
orig_names (list[str]) – Original order of sequence names as written to FASTA.
- Raises
ValueError – If the alignments have different lengths or mismatched names