schrodinger.application.msv.seqio module

class schrodinger.application.msv.seqio.BaseProteinAlignmentReader

Bases: object

Base class for reading protein sequence alignments from files.

classmethod read(file_name)

Returns alignment read from file in Clustal *.aln format preserving order of sequences.

Parameters:
  • file_name (str) – Source file name.
  • file_name – Source file name.
Raises:

IOError – If output file cannot be read.

Return type:

ProteinAlignment

Returns:

An alignment

Note:

The alignment can be empty if no sequence was present in the input file.

class schrodinger.application.msv.seqio.BaseProteinAlignmentWriter

Bases: object

Class for writing protein alignments to files.

classmethod write(aln, file_name)

Writes aln to a file.

Parameters:
  • aln (BaseAlignment) – Alignment to be written to a file.
  • file_name (str) – Destination file name.
class schrodinger.application.msv.seqio.ClustalAlignmentReader

Bases: schrodinger.application.msv.seqio.BaseProteinAlignmentReader

Class for reading Clustal *.aln files.

classmethod read(file_name)

Returns alignment read from file in Clustal *.aln format preserving order of sequences.

Raises:IOError – If the input file cannot be read.
Return type:ProteinAlignment
Returns:Read alignment. The alignment can be empty if no sequence was present in the input file.
class schrodinger.application.msv.seqio.ClustalAlignmentWriter

Bases: schrodinger.application.msv.seqio.BaseProteinAlignmentWriter

Class for writing Clustal *.aln files.

The format is described here: http://web.mit.edu/meme_v4.9.0/doc/clustalw-format.html

classmethod write(aln, file_name, use_unique_names=True)

Writes aln to a Clustal alignment file.

Raises:

IOError – If output file cannot be written.

Parameters:
  • aln (BaseAlignment) – Alignment to be written to a file.
  • file_name (str) – Destination file name.
  • use_unique_names (bool) – If True, write unique name for each sequence.
class schrodinger.application.msv.seqio.FastaAlignmentReader

Bases: object

classmethod read(file_name)

Loads a sequence file in FASTA format, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.

Parameters:file_name (str) – name of input FASTA file
Return type:ProteinAlignment
Returns:Read alignment.
classmethod readFromStringList(strings)

Return an alignment object created from an iterable of sequence strings

Parameters:strings (Iterable of strings) – Sequences as iterable of strings (1D codes)
classmethod readFromText(lines, AlnClass=<class 'schrodinger.protein.alignment.ProteinAlignment'>)

Read sequences from FASTA-formatted text, creates sequences and appends them to alignment. Splits sequence name from the FASTA header.

Parameters:lines (list of str) – list of strings representing FASTA file
Return type:ProteinAlignment
Returns:The alignment
class schrodinger.application.msv.seqio.FastaAlignmentWriter

Bases: schrodinger.application.msv.seqio.BaseProteinAlignmentWriter

Class for writing FASTA .fasta files.

Format is described here: U{Fasta format wikipedia<https://en.wikipedia.org/wiki/FASTA_format>}

HEADER_END = ''
HEADER_START = '>'
classmethod toString(aln, use_unique_names=True, maxl=50)
classmethod toStringAndNames(aln, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)

Converts aln to FASTA string

Parameters:
  • aln (ProteinAlignment) – Structured sequences
  • use_unique_names (bool) – If True, write unique name for each sequence.
  • maxl (int) – Maximum length of a line
  • export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in EXPORT_ANNOTATIONS will be exported.
  • sim_ref_seq (sequence.Sequence or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
Returns:

FASTA string

Return type:

string

classmethod toStringList(aln)

Convert ProteinAlignment object to list of sequence strings

Parameters:aln (ProteinAlignment) – Alignment data
Return type:list of str
Returns:A list of sequence strings representing the alignment
classmethod write(aln, file_name, use_unique_names=True, maxl=50, export_annotations=False, sim_ref_seq=None)

Write aln to FASTA file

Raises:

IOError – If output file cannot be written.

Parameters:
  • aln (ProteinAlignment) – Structured sequences
  • use_unique_names (bool) – If True, write unique name for each sequence.
  • maxl (int) – Maximum length of a line
  • file_name (str) – Destination file name.
  • export_annotations (bool) – Whether annotations should be exported along with sequence information. If True, annotations listed in EXPORT_ANNOTATIONS will be exported.
  • sim_ref_seq (sequence.Sequence or None) – Reference sequence to calculate similarities for the sequences to be exported. If None, similarity will not be exported.
Returns:

output names of each sequence

Return type:

list of str

exception schrodinger.application.msv.seqio.GetSequencesException

Bases: exceptions.IOError

Custom Exception for problems retrieving sequences.

class schrodinger.application.msv.seqio.MMSequenceConverter

Bases: object

Converts sequence between mmseq and MSV sequence formats.

Note:This is supposed to be used with ‘with’ context manager.
classmethod readSequences(file_name, file_format=0)

Reads all sequences from file specified by file_name.

Parameters:
  • file_name (str) – Name of input file.
  • file_format (int) – Format of the input file. By default, the format is MMSEQIO_ANY meaning file type is automatically recognized.
Return type:

List of schrodinger.protein.sequence.ProteinSequence.

Returns:

List of protein sequences read from the file.

Raises:

GetSequencesException – If the file could not be read.

classmethod writeSequences(sequences, file_name, file_format=1)

Writes sequences to a file specified by file_name.

Raises:

mmcheck.MmException – If the file could not be open for writing.

Parameters:
  • seqences – List of sequences to be written to file.
  • file_name (str) – Name of input file.
  • file_format (int) – Format of the input file. By default, the format is MMSEQIO_NATIVE.
class schrodinger.application.msv.seqio.PdbParts(pdbcode, pdbchain)

Bases: tuple

pdbchain

Alias for field number 1

pdbcode

Alias for field number 0

class schrodinger.application.msv.seqio.SeqDownloader

Bases: object

ENTREZ_FORMAT_STR = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=fasta&id=%s'
UNIPROT_FORMAT_STR = 'http://www.uniprot.org/uniprot/%s.fasta'
classmethod downloadEntrezSeq(sequence_id)

Download a sequence from Entrez database.

Parameters:sequence_id (str) – Sequence ID in Entrez format.
Returns:Full path to downloaded fasta file
Return type:str
classmethod downloadPDB(pdb_id, pdb_dir=None, remote_ok=False)

Parse PDB ID string and download PDB file.

Parameters:
  • pdb_id (str) – PDB ID with optional chain (e.g. 4hhb, 4hhbA, 4hhb:A)
  • pdb_dir (str) – directory to check for existing files and destination to download new files
  • remote_ok (bool) – whether it’s okay to make a remote query.
Returns:

Full path to downloaded PDB path

Type:

str

Raises:

GetSequencesException – if pdb file can’t be downloaded

classmethod downloadUniprotSeq(sequence_id)

Download a sequence from Uniprot database.

Parameters:sequence_id (str) – Sequence ID in Uniprot format.
Returns:Full path to downloaded fasta file
Return type:str
class schrodinger.application.msv.seqio.StructureConverter(ct, eid=None)

Bases: object

Reads a structure and converts it to a list of sequences

classmethod convert(ct, eid=None)

Convert the provided structure into a list of sequences.

Parameters:
  • ct (schrodinger.structure.Structure) – A structure to convert to sequences.
  • eid (str) – The entry id to assign to the created sequences. If not given, the entry id from the structure will be used.
Returns:

A list of sequences, one per chain.

Return type:

list[sequence.Sequence]

classmethod convertStructResidue(struct_res, make_res)

Convert a structure._Residue into a residue.SequenceElement.

Parameters:
  • struct_res (structure._Residue or residue.SequenceElement) – A structure residue to convert. If this is a residue.SequenceElement object, it will be returned unchanged.
  • make_res (callable) – A method to convert a string into a residue.SequenceElement
Returns:

A newly created residue

Return type:

residue.SequenceElement

static get_b_factor(struct_res)

Takes a _Residue object and returns its temperature/b factor. :param struct_res: The residue to extract the b factor from :type struct_res: schrodinger.structure._Residue or None

Returns:The b factor of a residue
Return type:float or None
makeSequences()

Note that disulfide bonds might be between chains, so need to be calculated at the ct level

Returns:A list of sequences, one per chain.
Return type:list[sequence.Sequence]
class schrodinger.application.msv.seqio.TextAlignmentWriter

Bases: schrodinger.application.msv.seqio.FastaAlignmentWriter

Class for writing alignments in text format

HEADER_END = '. '
HEADER_START = ''
schrodinger.application.msv.seqio.maestro_get_pdb(maestro_pdb_id, pdb_dir=None, remote_ok=False)

Download a PDB file. If specified, the chain will be split out into a separate file.

Parameters:
  • maestro_pdb_id (str) – 4-letter PDB code or code:chain (e.g. 4hhb or 4hhb:A)
  • pdb_dir (str) – directory to check for existing files and destination to download new files
  • remote_ok (bool) – whether it’s okay to make a remote query.
Returns:

downloaded PDB path

Return type:

str

Raises:

GetSequencesException – if pdb file can’t be downloaded

schrodinger.application.msv.seqio.make_maestro_pdb_id(pdb_id)

Convert a PDB ID to “:”-separated PDB code and PDB chain (e.g. 4hhb if chain is blank or 4hhb:A)

Parameters:pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
Returns:PDB ID with “:” between PDB code and PDB chain
Return type:str
schrodinger.application.msv.seqio.parse_fasta_header(header, permissive=True)

Parse a FASTA header into a (pdb code, pdb chain) Named tuple.

Parameters:
  • header (str) – The header for a single entry in a FASTA file (including leading comment character)
  • permissive (bool) – Whether to use permissive parsing. See parse_pdb_id for documentation.
Returns:

Named tuple of (pdbcode, pdbchain)

Type:

PdbParts (str, str)

schrodinger.application.msv.seqio.parse_pdb_id(pdb_id, permissive=False)

Parse a PDB ID into a (pdb code, pdb chain) Named tuple.

Parameters:
  • pdb_id (str) – PDB ID with optional chain, e.g. 4hhb, 4hhbA, 4hhb:A, 4hhb_A
  • permissive (bool) – Whether to use permissive parsing. In strict mode, PDB ID must be 4 characters starting with a digit and single-letter chain is optional. In permissive mode, PDB ID can contain any non-whitespace characters but chain separator and single-letter chain are required.
Returns:

Named tuple of (pdbcode, pdbchain)

Type:

PdbParts (str, str)

Raises:

GetSequencesException – if pdb_id can’t be parsed

schrodinger.application.msv.seqio.to_biopython(seq)

Converts a sequence to a BioPython sequence

Parameters:seq (schrodinger.protein.sequence.ProteinSequence) – A sequence to convert to a BioPython sequence
Return type:BioPython.SeqRecord
Returns:The sequence converted to a BioPython SeqRecord