schrodinger.application.msv.seqio module¶
-
class
schrodinger.application.msv.seqio.FetchIDs(pdb, entrez, uniprot)¶ Bases:
tuple-
__contains__(key, /)¶ Return key in self.
-
__len__()¶ Return len(self).
-
count(value, /)¶ Return number of occurrences of value.
-
entrez¶ Alias for field number 1
-
index(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
pdb¶ Alias for field number 0
-
uniprot¶ Alias for field number 2
-
-
exception
schrodinger.application.msv.seqio.SequenceWarning[source]¶ Bases:
UserWarningCustom warning for problems loading sequences
-
__init__(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
args¶
-
with_traceback()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.application.msv.seqio.catch_sequence_warnings(*args, **kwargs)[source]¶ Bases:
contextlib.ExitStackFilter SequenceWarnings and store them on the instance
-
callback(callback, /, *args, **kwds)¶ Registers an arbitrary callback and arguments.
Cannot suppress exceptions.
-
close()¶ Immediately unwind the context stack.
-
enter_context(cm)¶ Enters the supplied context manager.
If successful, also pushes its __exit__ method as a callback and returns the result of the __enter__ method.
-
pop_all()¶ Preserve the context stack by transferring it to a new instance.
-
push(exit)¶ Registers a callback with the standard __exit__ method signature.
Can suppress exceptions the same way __exit__ method can. Also accepts any object with an __exit__ method (registering a call to the method instead of the object itself).
-
-
exception
schrodinger.application.msv.seqio.GetSequencesException[source]¶ Bases:
OSErrorCustom Exception for problems retrieving sequences.
-
__init__(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
args¶
-
characters_written¶
-
errno¶ POSIX exception code
-
filename¶ exception filename
-
filename2¶ second exception filename
-
strerror¶ exception strerror
-
with_traceback()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.application.msv.seqio.PdbParts(pdbcode, pdbchain)¶ Bases:
tuple-
__contains__(key, /)¶ Return key in self.
-
__len__()¶ Return len(self).
-
count(value, /)¶ Return number of occurrences of value.
-
index(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
pdbchain¶ Alias for field number 1
-
pdbcode¶ Alias for field number 0
-
-
class
schrodinger.application.msv.seqio.FastaParts(name, long_name, chain, anno_type)¶ Bases:
tuple-
__contains__(key, /)¶ Return key in self.
-
__len__()¶ Return len(self).
-
anno_type¶ Alias for field number 3
-
chain¶ Alias for field number 2
-
count(value, /)¶ Return number of occurrences of value.
-
index(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
long_name¶ Alias for field number 1
-
name¶ Alias for field number 0
-
-
schrodinger.application.msv.seqio.make_maestro_pdb_id(pdb_id)[source]¶ Convert a PDB ID to “:”-separated PDB code and PDB chain (e.g. 4hhb if chain is blank or 4hhb:A)
- Parameters
pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
- Returns
PDB ID with “:” between PDB code and PDB chain
- Return type
str
-
schrodinger.application.msv.seqio.parse_pdb_id(pdb_id, permissive=False)[source]¶ Parse a PDB ID into a (pdb code, pdb chain) Named tuple.
- Parameters
pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
permissive (bool) – Whether to use permissive parsing. In strict mode, PDB ID must be 4 characters starting with a digit and single-letter chain is optional. In permissive mode, PDB ID can contain any non-whitespace characters but chain separator and single-letter chain are required.
- Returns
Named tuple of (pdbcode, pdbchain)
- Type
- Raises
GetSequencesException – if pdb_id can’t be parsed
-
schrodinger.application.msv.seqio.get_valid_pdb_id_map_for_seqs(seqs, structureless_only=True)[source]¶ For a list of sequences return a map of valid PDB IDs to sequences.
- Parameters
seqs (list(sequence.Sequence)) – List of sequences to get the map for
structureless_only (bool) – Whether to only return structureless seqs
- Returns
Map of valid PDB IDs to their source sequence
- Return type
dict(str: sequence.Sequence)
-
schrodinger.application.msv.seqio.valid_pdb_id(pdb_id: str) → bool[source]¶ - Returns
Whether the ID appears to be a valid PDB ID
-
schrodinger.application.msv.seqio.valid_entrez_id(entrez_id: str) → bool[source]¶ Entrez ID may be:
1) NCBI Accession number: 9 or 12 characters starting with any letter, followed by “P_”, ending with 6 or 9 numbers and an optional number following a period (ex. NP_123456, XP_123456789.1)
NCBI GenInfo identifier: A single 9-digit number (ex. 123456789).
- Returns
Whether the ID appears to be a valid Entrez ID
-
schrodinger.application.msv.seqio.valid_uniprot_id(uniprot_id: str) → bool[source]¶ UniProt ID must be 6 characters or 10 characters starting with a letter
- Returns
Whether the ID appears to be a valid UniProt ID
-
schrodinger.application.msv.seqio.valid_swiss_prot_name(swiss_prot_name: str) → bool[source]¶ Swiss-Prot entry name must be of the form X_Y, where X and Y are at most 5 alphanumeric characters and the underscore serves as a separator.
We also require Y to be a minimum of 2 characters to avoid confusion with a PDB ID.
- Returns
Whether the name appears to be a valid Swiss-Prot entry name
-
schrodinger.application.msv.seqio.process_fetch_ids(ids, *, dialog_parent, allow_pdb=True)[source]¶ Convenience method to parse a list or comma-separated strings into valid sequence and/or structure identifiers. If any IDs can’t be identified, prompt the user to continue.
- Parameters
ids (str or list) – Database ID or IDs (comma-separated str or list)
dialog_parent (QtWidgets.QWidget) – Parent to show dialog box
allow_pdb (bool) – Whether to allow structure identifiers. If False, they will be treated as unidentified.
- Returns
Namedtuple of IDs identified as PDB, entrez, uniprot; or None if there are unidentified IDs and the user cancels.
- Return type
FetchIDs or NoneType
-
schrodinger.application.msv.seqio.maestro_get_pdb(maestro_pdb_id, pdb_dir=None, remote_ok=False)[source]¶ Download a PDB file. If specified, the chain will be split out into a separate file.
- Parameters
maestro_pdb_id (str) – 4-letter PDB code or code:chain (e.g. 4hhb or 4hhb:A)
pdb_dir (str) – directory to check for existing files and destination to download new files
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
downloaded PDB path
- Return type
str
- Raises
GetSequencesException – if pdb file can’t be downloaded
-
class
schrodinger.application.msv.seqio.SeqDownloader[source]¶ Bases:
object-
ENTREZ_FORMAT_STR= 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=fasta&id={ID}'¶
-
UNIPROT_FORMAT_STR= 'https://www.uniprot.org/uniprot/{ID}.{EXT}'¶
-
classmethod
downloadPDB(pdb_id, pdb_dir=None, remote_ok=False)[source]¶ Parse PDB ID string and download PDB file.
- Parameters
pdb_id (str) – PDB ID with optional chain (e.g. 4hhb, 4hhbA, 4hhb:A)
pdb_dir (str) – directory to check for existing files and destination to download new files
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
Full path to downloaded PDB path
- Type
str
- Raises
GetSequencesException – if pdb file can’t be downloaded
-
classmethod
downloadEntrezSeq(sequence_id, remote_ok)[source]¶ Download a sequence from Entrez database.
- Parameters
sequence_id (str) – Sequence ID in Entrez format.
remote_ok (bool) – whether it’s okay to make a remote query.
- Returns
Full path to downloaded fasta file
- Return type
str
-
classmethod
downloadUniprotSeq(sequence_id, remote_ok, *, use_xml=False)[source]¶ Download a sequence from Uniprot database.
- Parameters
sequence_id (str) – Sequence ID in Uniprot format.
remote_ok (bool) – whether it’s okay to make a remote query.
use_xml (bool) – whether to get the xml file with the full UniProt annotation information (e.g. domains). Setting this to True with download the xml file instead of the FASTA file.
- Returns
Full path to downloaded fasta or xml file
- Return type
str
-
-
schrodinger.application.msv.seqio.read_sequences(filename)[source]¶ Read sequences from the filename. Format is detected from the file extension
Note that this function is only used for non-structure filetypes. For structure filetypes, see the StructureConverter class.
- Parameters
filename (str) – Path to sequence file
- Return type
list
- Returns
A list of sequences in the file
-
schrodinger.application.msv.seqio.from_biopython(biopy_seq)[source]¶ Convert a Biopython sequence to a ProteinSequence
- Parameters
seq (Bio.SeqRecord.SeqRecord) – A Biopython sequence to convert to a ProteinSequence
- Returns
The converted sequence
- Return type
-
class
schrodinger.application.msv.seqio.StructureConverter(ct, eid=None)[source]¶ Bases:
objectReads a structure and converts it to a list of sequences.
Note that this class produces sequences that are ordered based on residue number and insertion code, not connectivity. If that ever changes,
structure_model.MaestroStructureModel._extractChainsmust also be updated.-
__init__(ct, eid=None)[source]¶ - Parameters
ct (schrodinger.structure.Structure) – A structure to convert to sequences.
eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure will be used.
-
classmethod
convert(ct, eid=None)[source]¶ Convert the provided structure into a list of sequences.
- Parameters
ct (schrodinger.structure.Structure) – A structure to convert to sequences.
eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure will be used.
- Returns
A list of sequences, one per chain.
- Return type
list[sequence.Sequence]
-
makeSequences()[source]¶ Note that disulfide bonds might be between chains, so need to be calculated at the ct level
- Returns
A list of sequences, one per chain.
- Return type
list[sequence.Sequence]
-
classmethod
convertStructResidue(struct_res, make_res)[source]¶ Convert a
structure._Residueinto aresidue.Residue.- Parameters
struct_res (structure._Residue or residue.Residue) – A structure residue to convert. If this is a
residue.Residueobject, it will be returned unchanged.make_res (callable) – A method to convert a string into a
residue.Residue
- Returns
A newly created residue
- Return type
-
-
class
schrodinger.application.msv.seqio.MMSequenceConverter[source]¶ Bases:
objectConverts sequence between mmseq and MSV sequence formats.
- Note
This is supposed to be used with ‘with’ context manager.
-
classmethod
readSequences(file_name, file_format=0)[source]¶ Reads all sequences from file specified by file_name.
- Parameters
file_name (str) – Name of input file.
file_format (int) – Format of the input file. By default, the format is MMSEQIO_ANY meaning file type is automatically recognized.
- Return type
- Returns
List of sequences read from the file.
- Raises
GetSequencesException – If the file could not be read.
-
classmethod
writeSequences(sequences, file_name, file_format=1)[source]¶ Writes sequences to a file specified by file_name.
- Raises
mmcheck.MmException – If the file could not be open for writing.
- Parameters
seqences – List of sequences to be written to file.
file_name (str) – Name of input file.
file_format (int) – Format of the input file. By default, the format is MMSEQIO_NATIVE.
-
class
schrodinger.application.msv.seqio.BaseProteinAlignmentReader[source]¶ Bases:
objectBase class for reading protein sequence alignments from files.
-
classmethod
read(file_name, AlnCls=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶ Returns alignment read from file
- Note
The alignment can be empty if no sequence was present in the input file.
- Parameters
file_name (str) – Source file name
AlnCls (type) – The type of the Alignment to return
- Returns
An alignment of the specified type
- Raises
IOError – If file cannot be read
-
classmethod
-
class
schrodinger.application.msv.seqio.ClustalAlignmentReader[source]¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentReaderClass for reading Clustal *.aln files.
-
class
schrodinger.application.msv.seqio.SeqDReader[source]¶ Bases:
object-
REQUIRED_COLUMNS= ('ResID', 'Chain', 'ResName')¶
-
-
class
schrodinger.application.msv.seqio.FastaAlignmentReader[source]¶ Bases:
object-
classmethod
parseSSA(seq)[source]¶ Parse a SSA sequence into a list of SSA values that can be assigned to residues’
secondary_structureproperty- Parameters
seq (str) – the “sequence” from the FASTA file which encodes the SSA values
- Returns
a list of the SSA values. The SSA values come from schrodinger.structure. Returns None if any of the elements was invalid
- Type
list(int) or NoneType
-
classmethod
read(file_name, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶ Loads a sequence file in FASTA format, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
- Parameters
file_name (str) – name of input FASTA file
AlnClass (type) – The class of the alignment object to return
- Returns
Read alignment.
- Return type
AlnClass
-
classmethod
readFromText(lines, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶ Read sequences from FASTA-formatted text, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
- Parameters
lines (list of str) – list of strings representing FASTA file
AlnClass (type) – The class of the alignment object to return
- Returns
The alignment
- Return type
AlnClass
-
classmethod
readFromStringList(strings, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)[source]¶ Return an alignment object created from an iterable of sequence strings
- Parameters
strings (Iterable of strings) – Sequences as iterable of strings (1D codes)
AlnClass (type) – The class of the alignment object to return
- Returns
The alignment
- Return type
AlnClass
-
classmethod
-
schrodinger.application.msv.seqio.to_biopython(seq)[source]¶ Converts a sequence to a Biopython sequence
- Parameters
seq (schrodinger.protein.sequence.ProteinSequence) – A sequence to convert to a Biopython sequence
- Returns
The sequence converted to a Biopython SeqRecord
- Return type
Bio.SeqRecord.SeqRecord
-
class
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter[source]¶ Bases:
objectClass for writing protein alignments to files.
-
class
schrodinger.application.msv.seqio.FastaAlignmentWriter[source]¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriterClass for writing FASTA .fasta files.
Format is described here: U{Fasta format wikipedia<https://en.wikipedia.org/wiki/FASTA_format>}
-
HEADER_START= '>'¶
-
HEADER_END= ''¶
-
classmethod
toStringAndNames(aln, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)[source]¶ Converts aln to FASTA string
- Parameters
aln (
ProteinAlignment) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONSwill be exported.sim_ref_seq (
sequence.Sequenceor None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
FASTA string
- Return type
string
-
classmethod
toStringList(aln)[source]¶ Convert ProteinAlignment object to list of sequence strings
- Parameters
aln (
ProteinAlignment) – Alignment data- Return type
list of str
- Returns
A list of sequence strings representing the alignment
-
classmethod
write(aln, file_name, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)[source]¶ Write aln to FASTA file
- Raises
IOError – If output file cannot be written.
- Parameters
aln (
ProteinAlignment) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
file_name (str) – Destination file name.
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONSwill be exported.sim_ref_seq (
sequence.Sequenceor None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
output names of each sequence
- Return type
list of str
-
-
class
schrodinger.application.msv.seqio.TextAlignmentWriter[source]¶ Bases:
schrodinger.application.msv.seqio.FastaAlignmentWriterClass for writing alignments in text format
-
HEADER_START= ''¶
-
HEADER_END= '. '¶
-
classmethod
toString(aln, use_unique_names=True, maxl=50)¶
-
classmethod
toStringAndNames(aln, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)¶ Converts aln to FASTA string
- Parameters
aln (
ProteinAlignment) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONSwill be exported.sim_ref_seq (
sequence.Sequenceor None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
FASTA string
- Return type
string
-
classmethod
toStringList(aln)¶ Convert ProteinAlignment object to list of sequence strings
- Parameters
aln (
ProteinAlignment) – Alignment data- Return type
list of str
- Returns
A list of sequence strings representing the alignment
-
classmethod
write(aln, file_name, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)¶ Write aln to FASTA file
- Raises
IOError – If output file cannot be written.
- Parameters
aln (
ProteinAlignment) – Structured sequencesuse_unique_names (bool) – If True, write unique name for each sequence.
maxl (int) – Maximum length of a line
file_name (str) – Destination file name.
export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONSwill be exported.sim_ref_seq (
sequence.Sequenceor None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
- Returns
output names of each sequence
- Return type
list of str
-
-
class
schrodinger.application.msv.seqio.ClustalAlignmentWriter[source]¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriterClass for writing Clustal *.aln files.
The format is described here:
http://meme-suite.org/doc/clustalw-format.html-
classmethod
write(aln, file_name, use_unique_names=True, **kwargs)[source]¶ Writes aln to a Clustal alignment file.
Note: **kwargs are ignored, to preserve signature of BaseProteinAlignmentWriter
- Raises
IOError – If output file cannot be written.
- Parameters
aln (
BaseAlignment) – Alignment to be written to a file.file_name (str) – Destination file name.
use_unique_names (bool) – If True, write unique name for each sequence.
- Return type
dict
- Returns
A mapping of names written to the clustal file and sequences
-
classmethod
-
schrodinger.application.msv.seqio.is_inhouse_header(fasta_header)[source]¶ Test that the given fasta header is of the in house format In house format is given by “>NAME:<long_name>|CHAIN:<chain>” with an optional “|<anno_type>” flag on the end.
- Example: >NAME:ABC|CHAIN:X|SSA
>NAME:A|B|C|CHAIN:X x
- Parameters
fasta_header (str) – The fasta header to check
- Returns
Whether it is or isnt the in-house format
- Return type
bool
-
schrodinger.application.msv.seqio.parse_in_house_header(fasta_header)[source]¶ Test that the given fasta header is of the in house format In house format is given by “>NAME:<long_name>|CHAIN:<chain>” with an optional “|<anno_type>” flag on the end.
- Example: >NAME:ABC LONG|CHAIN:X|SSA –> ABC LONG, X, secondary_structure
>NAME:A|B|C|CHAIN:X x –> A|B|C, X, None
- Parameters
fasta_header (str) – The fasta header to parse
- Returns
the long_name, chain and annotation type corresponding to the header
- Return type
tuple(str, str, PSAnno.ANNOTATION_TYPES) or NoneType)
-
schrodinger.application.msv.seqio.parse_fasta_header(header, permissive=True)[source]¶ Parse a FASTA header into a (name, long_name, chain, anno_type) Named tuple.
- Parameters
header (str) – The header for a single entry in a FASTA file (including leading comment character)
permissive (bool) – Whether to use permissive parsing. See
parse_pdb_idfor documentation.
- Returns
Named tuple of (name, long_name, chain, anno_type)
- Type
-
schrodinger.application.msv.seqio.parse_long_name(long_name, permissive=True)[source]¶ Attempt to parse a long_name into a short name and a chain.
- Example: 1FSK:A –> 1FSK, A
2BJM.H VH CDR_LENGTH: 5 17 11 –> 2BJM, H sp|accession|entry name –> accession, “”
- Parameters
long_name (str) – The long name to attempt to parse
permissive (bool) – Whether to use permissive parsing. See
parse_pdb_idfor documentation.
- Returns
A short name and a chain id
- Return type
-
schrodinger.application.msv.seqio.reorder_fasta_alignment(aln, orig_names)[source]¶ Reorder a FASTA alignment to match the order of names written to FASTA.
Intended for use after alignment methods that reorder the output.
Example usage:
orig_names = seqio.FastaAlignmentWriter.write(orig_aln, input_filename) # run alignment method aln = seqio.FastaAlignmentReader.read(out_filename) reorder_fasta_alignment(aln, orig_names)
- Parameters
aln (alignment.BaseAlignment) – Alignment to reorder. Will be modified in place.
orig_names (list[str]) – Original order of sequence names as written to FASTA.
- Raises
ValueError – If the alignments have different lengths or mismatched names