schrodinger.ui.sequencealignment.fileio module

File I/O handling routines for the multiple sequence viewer.

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.ui.sequencealignment.fileio.partition_by_predicate(arr, pred)[source]

Utility function to groups a list into lists, with each sublist beginning with an element that matches the supplied predicate

Note that many file reading functions below would benefit from using this function.

Parameters
  • arr (list) – A list to split into sublists

  • pred (function) – A function that takes a list item and returns True if the list item meets a criteria and False otherwise

This is not efficient, since we loop through the array twice, but it probably doesn’t matter.

schrodinger.ui.sequencealignment.fileio.load_fasta_file(sequence_group, file_name, text=None)[source]

Load a sequence file in FASTA format, create sequences and append them to the sequence group. Splits sequence name from the FASTA header.

Parameters
  • sequence_group (SequenceGroup) – sequence group to which the sequences will be added

  • file_name (str) – name of input FASTA file

  • text (list of str) – optional text in FASTA format used instead of the input file, split by newline char into a list of lines

schrodinger.ui.sequencealignment.fileio.load_PIR_file(sequence_group, file_name)[source]

Load a sequence file in PIR format, create sequences and append them to the sequence group.

Parameters
  • sequence_group (SequenceGroup) – sequence group to which the sequences will be added

  • file_name (string) – name of input PIR file

schrodinger.ui.sequencealignment.fileio.load_GCG_file(sequence_group, file_name)[source]

Load a sequence file in GCG format, create sequences and append them to the sequence group.

Parameters
  • sequence_group (SequenceGroup) – sequence group to which the sequences will be added

  • file_name (string) – name of input PIR file

schrodinger.ui.sequencealignment.fileio.load_EMBL_file(sequence_group, file_name)[source]

Load a sequence file in EMBL format, create sequences and append them to the sequence group.

Parameters
  • sequence_group (SequenceGroup) – sequence group to which the sequences will be added

  • file_name (string) – name of input PIR file

schrodinger.ui.sequencealignment.fileio.load_swissprot_file(sequence_group, file_name, text=None)[source]

Load a sequence file in SWISSPROT format, create sequences and append them to the sequence group. Tries to split sequence name from the

Parameters
  • sequence_group (SequenceGroup) – sequence group to which the sequences will be added

  • file_name (string) – name of input SWISSPROT file

  • text (string) – optional text in SWISSPROT format used instead of the input file

schrodinger.ui.sequencealignment.fileio.save_fasta_file(sequence_group, file_name, for_clustal=False, file=None, target_sequence=None, skip_gaps=False, save_annotations=False, selected_only=False, start=- 1, end=- 1, as_text=False, save_similarity=False)[source]

Writes a contents of sequence group to a file.

Parameters
  • sequence_group (SequenceGroup) – Sequence group to be written to a file

  • file_name (string) – Name of the output file

  • for_clustal (bool) – Optional parameter indicating if the output file will be used for Clustal alignment

  • file (file) – Optional file handle, if not None, this handle will be used to write the sequences rather than creating a new file (file_name parameter would be ignored)

  • target_sequence (Sequence) – Optional sequence to be saved. If not specified, all sequences will be written to the output file.

  • skip_gaps (bool) – Optional parameter deciding if gaps should be written to the FASTA file.

  • save_annotations (bool) – Optional parameter for saving annotations (default is False).

  • save_similarity (bool) – Saves similarity when set to True.

  • start (int) – Optional starting position of the subset residues to save.

  • end (int) – Optional ending position of the subset residues to save.

  • selected_only (True) – Save only (partially) selected columns if True

schrodinger.ui.sequencealignment.fileio.save_clustal_file(sequence_group, file_name, file=None, start=- 1, end=- 1, ss_constraints=False, subset=None, ignore_selection=False)[source]

Writes a contents of sequence group to a Clustal ALN file.

Parameters
  • sequence_group (SequenceGroup) – Sequence group to be written to a file

  • file_name (string) – Name of the output file

  • start (int) – Optional starting position of the subset residues to save.

  • end (int) – Optional ending position of the subset residues to save.

  • ss_constraints (True) – Optional secondary structure constraints.

schrodinger.ui.sequencealignment.fileio.load_DND_tree(file_name, sequence_group)[source]

Load Newick-formatted tree file outputted by multiple sequence alignment program. The function was tested using outputs of ClustalW and T-Coffee.

Parameters
  • file_name (string) – name of the input file

  • sequence_group (SequenceGroup) – target sequence group

Returns

True if operation succeeded, False otherwise

Return type

boolean

schrodinger.ui.sequencealignment.fileio.parse_DND_string(dndstring, tree)[source]

Parse a dnd-formatted string, generate a tree and and append its branches to a given tree.

Parameters
  • dndstring (string) – tree in DND format

  • tree (TreeNode) – target tree

schrodinger.ui.sequencealignment.fileio.load_clustal_file(sequence_group, file_name, replace=False, start=- 1, end=- 1)[source]

Load a sequence alignment in Clustal format. Add sequences to a specified sequence group. By default, this method doesn’t replace the old residues, but only introduces gaps according to the alignment. Thus, all residue meta-data (e.g. Maestro information) will be preserved after doing the alignment.

Parameters
  • sequence_group (SequenceGroup) – target sequence

  • file_name (string) – input file name

  • replace (boolean (default=False)) – optional parameter, if True, replace existing sequences

Return type

bool

Returns

True on success, False otherwise

schrodinger.ui.sequencealignment.fileio.pdb_create_sequence(pdb_id, chain_id, sequence_string)[source]

Creates a new sequence out of sequence string read from a PDB file.

Parameters
  • pdb_id (str) – PDB ID (4-letter code)

  • chain_id (str) – single-letter chain ID

  • sequence_string (str) – single-letter code amino acid string to be converted to the sequence

Return type

Sequence

Returns

Created sequence.

schrodinger.ui.sequencealignment.fileio.load_PDB_file(sequence_group, file_name, requested_chain_id=None, given_pdb_id=None, align_func=None)[source]

Reads a PDB file, extracts relevant data and creates the sequence and annotations.

Parameters
  • sequence_group (SequenceGroup) – target sequence group

  • file_name (str) – name of the file to be read

  • requested_chain_id (str) – Optional parameter. If specified, only the chain ID equal to this parameter will be read.

Return type

bool

Returns

True on success, False otherwise.

schrodinger.ui.sequencealignment.fileio.save_PDB_file(sequence, file_name)[source]
schrodinger.ui.sequencealignment.fileio.load_file(sequence_group, file_name, format=None, align_func=None)[source]

Loads a file. The file format can be inferred from the file name extension, or can be explictly given.

Parameters
  • file_name (string) – input file name

  • format (string) – format of the input file

Return type

bool

Returns

True if file successfully read; otherwise False

schrodinger.ui.sequencealignment.fileio.load_maestro_file(sequence_group, file_name, align_func=None)[source]

Load a file in Maestro format.