schrodinger.application.msv.seqio module¶
-
class
schrodinger.application.msv.seqio.
BaseProteinAlignmentReader
¶ Bases:
object
Base class for reading protein sequence alignments from files.
-
classmethod
read
(file_name)¶ Returns alignment read from file in Clustal *.aln format preserving order of sequences.
Parameters: - file_name (str) – Source file name.
- file_name – Source file name.
Raises: IOError – If output file cannot be read.
Return type: ProteinAlignment
Returns: An alignment
Note: The alignment can be empty if no sequence was present in the input file.
-
classmethod
-
class
schrodinger.application.msv.seqio.
BaseProteinAlignmentWriter
¶ Bases:
object
Class for writing protein alignments to files.
-
class
schrodinger.application.msv.seqio.
ClustalAlignmentReader
¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentReader
Class for reading Clustal *.aln files.
-
class
schrodinger.application.msv.seqio.
ClustalAlignmentWriter
¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter
Class for writing Clustal *.aln files.
The format is described here:
http://web.mit.edu/meme_v4.9.0/doc/clustalw-format.html
-
classmethod
write
(aln, file_name, use_unique_names=True)¶ Writes aln to a Clustal alignment file.
Raises: IOError – If output file cannot be written.
Parameters: - aln (
BaseAlignment
) – Alignment to be written to a file. - file_name (str) – Destination file name.
- use_unique_names (bool) – If True, write unique name for each sequence.
- aln (
-
classmethod
-
class
schrodinger.application.msv.seqio.
FastaAlignmentReader
¶ Bases:
object
-
classmethod
read
(file_name)¶ Loads a sequence file in FASTA format, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
Parameters: file_name (str) – name of input FASTA file Return type: ProteinAlignment
Returns: Read alignment.
-
classmethod
readFromStringList
(strings)¶ Return an alignment object created from an iterable of sequence strings
Parameters: strings (Iterable of strings) – Sequences as iterable of strings (1D codes)
-
classmethod
readFromText
(lines, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)¶ Read sequences from FASTA-formatted text, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.
Parameters: lines (list of str) – list of strings representing FASTA file Return type: ProteinAlignment
Returns: The alignment
-
classmethod
-
class
schrodinger.application.msv.seqio.
FastaAlignmentWriter
¶ Bases:
schrodinger.application.msv.seqio.BaseProteinAlignmentWriter
Class for writing FASTA .fasta files.
Format is described here: U{Fasta format wikipedia<https://en.wikipedia.org/wiki/FASTA_format>}
-
HEADER_END
= ''¶
-
HEADER_START
= '>'¶
-
classmethod
toString
(aln, use_unique_names=True, maxl=50)¶
-
classmethod
toStringAndNames
(aln, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)¶ Converts aln to FASTA string
Parameters: - aln (
ProteinAlignment
) – Structured sequences - use_unique_names (bool) – If True, write unique name for each sequence.
- maxl (int) – Maximum length of a line
- export_annotations (bool) – Whether annotations should be exported along
with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONS
will be exported. - sim_ref_seq (
sequence.Sequence
or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
Returns: FASTA string
Return type: - aln (
-
classmethod
toStringList
(aln)¶ Convert ProteinAlignment object to list of sequence strings
Parameters: aln ( ProteinAlignment
) – Alignment dataReturn type: list of str Returns: A list of sequence strings representing the alignment
-
classmethod
write
(aln, file_name, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)¶ Write aln to FASTA file
Raises: IOError – If output file cannot be written.
Parameters: - aln (
ProteinAlignment
) – Structured sequences - use_unique_names (bool) – If True, write unique name for each sequence.
- maxl (int) – Maximum length of a line
- file_name (str) – Destination file name.
- export_annotations (bool) – Whether annotations should be exported along
with sequence information. If True, annotations listed in
EXPORT_ANNOTATIONS
will be exported. - sim_ref_seq (
sequence.Sequence
or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
Returns: output names of each sequence
Return type: list of str
- aln (
-
-
exception
schrodinger.application.msv.seqio.
GetSequencesException
¶ Bases:
exceptions.IOError
Custom Exception for problems retrieving sequences.
-
class
schrodinger.application.msv.seqio.
MMSequenceConverter
¶ Bases:
object
Converts sequence between mmseq and MSV sequence formats.
Note: This is supposed to be used with ‘with’ context manager. -
classmethod
readSequences
(file_name, file_format=0)¶ Reads all sequences from file specified by file_name.
Parameters: - file_name (str) – Name of input file.
- file_format (int) – Format of the input file. By default, the format is MMSEQIO_ANY meaning file type is automatically recognized.
Return type: Returns: List of protein sequences read from the file.
Raises: GetSequencesException – If the file could not be read.
-
classmethod
writeSequences
(sequences, file_name, file_format=1)¶ Writes sequences to a file specified by file_name.
Raises: mmcheck.MmException – If the file could not be open for writing.
Parameters: - seqences – List of sequences to be written to file.
- file_name (str) – Name of input file.
- file_format (int) – Format of the input file. By default, the format is MMSEQIO_NATIVE.
-
classmethod
-
class
schrodinger.application.msv.seqio.
PdbParts
(pdbcode, pdbchain)¶ Bases:
tuple
-
pdbchain
¶ Alias for field number 1
-
pdbcode
¶ Alias for field number 0
-
-
class
schrodinger.application.msv.seqio.
SeqDownloader
¶ Bases:
object
-
ENTREZ_FORMAT_STR
= 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=fasta&id=%s'¶
-
UNIPROT_FORMAT_STR
= 'http://www.uniprot.org/uniprot/%s.fasta'¶
-
classmethod
downloadEntrezSeq
(sequence_id)¶ Download a sequence from Entrez database.
Parameters: sequence_id (str) – Sequence ID in Entrez format. Returns: Full path to downloaded fasta file Return type: str
-
classmethod
downloadPDB
(pdb_id, pdb_dir=None, remote_ok=False)¶ Parse PDB ID string and download PDB file.
Parameters: Returns: Full path to downloaded PDB path
Type: str
Raises: GetSequencesException – if pdb file can’t be downloaded
-
-
class
schrodinger.application.msv.seqio.
StructureConverter
(ct, eid=None)¶ Bases:
object
Reads a structure and converts it to a list of sequences
-
classmethod
convert
(ct, eid=None)¶ Convert the provided structure into a list of sequences.
Parameters: - ct (schrodinger.structure.Structure) – A structure to convert to sequences.
- eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure will be used.
Returns: A list of sequences, one per chain.
Return type: list[sequence.Sequence]
-
classmethod
convertStructResidue
(struct_res, make_res)¶ Convert a
structure._Residue
into aresidue.SequenceElement
.Parameters: - struct_res (structure._Residue or residue.SequenceElement) – A structure residue to convert. If this is a
residue.SequenceElement
object, it will be returned unchanged. - make_res (callable) – A method to convert a string into a
residue.SequenceElement
Returns: A newly created residue
Return type: - struct_res (structure._Residue or residue.SequenceElement) – A structure residue to convert. If this is a
-
static
get_b_factor
(struct_res)¶ Takes a
_Residue
object and returns its temperature/b factor. :param struct_res: The residue to extract the b factor from :type struct_res: schrodinger.structure._Residue or NoneReturns: The b factor of a residue Return type: float or None
-
makeSequences
()¶ Note that disulfide bonds might be between chains, so need to be calculated at the ct level
Returns: A list of sequences, one per chain. Return type: list[sequence.Sequence]
-
classmethod
-
class
schrodinger.application.msv.seqio.
TextAlignmentWriter
¶ Bases:
schrodinger.application.msv.seqio.FastaAlignmentWriter
Class for writing alignments in text format
-
HEADER_END
= '. '¶
-
HEADER_START
= ''¶
-
-
schrodinger.application.msv.seqio.
maestro_get_pdb
(maestro_pdb_id, pdb_dir=None, remote_ok=False)¶ Download a PDB file. If specified, the chain will be split out into a separate file.
Parameters: Returns: downloaded PDB path
Return type: Raises: GetSequencesException – if pdb file can’t be downloaded
-
schrodinger.application.msv.seqio.
make_maestro_pdb_id
(pdb_id)¶ Convert a PDB ID to “:”-separated PDB code and PDB chain (e.g. 4hhb if chain is blank or 4hhb:A)
Parameters: pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A Returns: PDB ID with “:” between PDB code and PDB chain Return type: str
-
schrodinger.application.msv.seqio.
parse_fasta_header
(header, permissive=True)¶ Parse a FASTA header into a (pdb code, pdb chain) Named tuple.
Parameters: - header (str) – The header for a single entry in a FASTA file (including leading comment character)
- permissive (bool) – Whether to use permissive parsing. See
parse_pdb_id
for documentation.
Returns: Named tuple of (pdbcode, pdbchain)
Type: PdbParts
(str, str)
-
schrodinger.application.msv.seqio.
parse_pdb_id
(pdb_id, permissive=False)¶ Parse a PDB ID into a (pdb code, pdb chain) Named tuple.
Parameters: - pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
- permissive (bool) – Whether to use permissive parsing. In strict mode, PDB ID must be 4 characters starting with a digit and single-letter chain is optional. In permissive mode, PDB ID can contain any non-whitespace characters but chain separator and single-letter chain are required.
Returns: Named tuple of (pdbcode, pdbchain)
Type: PdbParts (str, str)
Raises: GetSequencesException – if pdb_id can’t be parsed
-
schrodinger.application.msv.seqio.
to_biopython
(seq)¶ Converts a sequence to a BioPython sequence
Parameters: seq ( schrodinger.protein.sequence.ProteinSequence
) – A sequence to convert to a BioPython sequenceReturn type: BioPython.SeqRecord
Returns: The sequence converted to a BioPython SeqRecord