schrodinger.application.pathfinder.molio module

PathFinder helper functions for reading and writing files using RDKit Mol objects.

class schrodinger.application.pathfinder.molio.MolWriter(filename, generate_coordinates=True, require_stereo=False)

Bases: schrodinger.structure.StructureWriter

Write Mol objects to a file using a StructureWriter-like API, optionally generating 3D coordinates.

__init__(filename, generate_coordinates=True, require_stereo=False)

Create a structure writer class based on the format.

Parameters:
  • filename (str) – The filename to write to.
  • overwrite (bool) – If False, append to an existing file instead of overwriting it.
  • format (str) – The format of the file. Values should be specified by one of the module-level constants MAESTRO, MOL2, SD, SMILES, or SMILESCSV. If the format is not explicitly specified it will be determined from the suffix of the filename. Multi-structure PDB files are not supported.
  • stereo (enum) –
    Use of the stereo option in the constructor is pending deprecation. Please use the setOption method instead.

    See the class docstring for documentation on the stereo options.

append(mol)

Append the provided structure to the open file.

close()

Close the file.

extend(cts)

Append all provided structures to the open file.

setOption(option, value)

Set a single option for this writer. This method is meant for options that may not be supported for all writer formats. See the StructureWriter class documentation for details on the available options.

Raises an OptionError subclass (either UnsupportedOption or UnsupportedOptionValue) if unsuccessful.

Parameters:
  • option (str) – The name of the option to set.
  • value – The value for the option. The data type of this parameter depends on the option being set.
class schrodinger.application.pathfinder.molio.CsvMolWriter(filename)

Bases: object

Write a CSV file given Mol objects, using a StructureWriter-like API. The first two columns are the SMILES and title, and the rest are the properties of the molecule.

  • We don’t use structure.SmilesCsvWriter because it is too slow due to all the conversions (the overall job takes 4 times as long, so the bottleneck clearly becomes the writing of the output file!).
  • We don’t use Chem.SmilesWriter because even though it can use comma as a delimiter, it doesn’t write proper CSV files because it doesn’t know how to escape the delimiter.
__init__(filename)

Initialize self. See help(type(self)) for accurate signature.

append(mol)

Write a molecule to the file. The first time this is called, the header row is written based on mol’s properties. The assumption is that all molecules will have the same properties, or at least that the first molecule has all the properties that we care about.

Parameters:mol (rdkit.Chem.rdchem.Mol) – molecule
close()
class schrodinger.application.pathfinder.molio.RdkitMolWriter(filename)

Bases: object

Write Mol objects to a file using the RDKit file-writing classes, but with a StructureWriter-like API. Supports SMILES and SDF.

__init__(filename)

Initialize self. See help(type(self)) for accurate signature.

append(mol)
close()
schrodinger.application.pathfinder.molio.get_mol_writer(filename, generate_coordinates=True, require_stereo=False)

Return a StructureWriter-like object based on the command-line arguments.

schrodinger.application.pathfinder.molio.supported_output_format(filename)

Check whether we know how to write a file with a given name, but without actually opening a file. Used for argument validation.

Return type:bool
schrodinger.application.pathfinder.molio.get_mol_reader(filename, skip_bad=True)

Return a Mol reader given a filename or a SMILES string. For .smi and .csv files, use the RDKit SmilesMolSupplier; for other formats, use StructureReader but convert Structure to Mol before yielding each molecule.

Whenever possible, the reader will be a Sequence. This is the currently the case for .smi and .csv files when skip_bad is False. (And for a SMILES string, which returns a list of size 1.)

Parameters:skip_bad (bool) – if True, bad structures are skipped implicitly, instead of being yielded as None (only applies to SMILES and CSV formats.)
Return type:Generator or Sequence of Mol
schrodinger.application.pathfinder.molio.get_mol(target)

Read a Mol from a file or a SMILES string.

Parameters:target (str) – filename or SMILES
Return type:rdkit.Chem.Mol
schrodinger.application.pathfinder.molio.adapt_st_reader(reader)

Generate RDKit Mol objects given a StructureReader-like object. Structures which cause conversion errors are skipped but a warning is logged.

Parameters:reader (iterable of Structure) – source of structures to convert
Returns:converted RDKit molecule objects
Return type:generator of Mol
schrodinger.application.pathfinder.molio.combine_output_files(outfiles, out, dedup=True)

Write the final output file.

Parameters:
  • outfiles (list of str) – subjob output filenames
  • out (list of str) – output filename
  • dedup (bool) – skip duplicate products
schrodinger.application.pathfinder.molio.dedup_smiles_cat(source_filenames, dest_filename)

Concatenate the given SMILES files, with deduplication.

Parameters:
  • source_filenames (sequence of str) – input files
  • dest_filename (str) – destination file
Returns:

number of structures written

Return type:

int

schrodinger.application.pathfinder.molio.dedup_st_cat(source_filenames, dest_filename)

Concatenate the given structure files, with deduplication. This uses StructureReader / StructureWriter so any file format supported by those classes may be used.

Parameters:
  • source_filenames (sequence of str) – input files
  • dest_filename (str) – destination file
Returns:

number of structures written

Return type:

int

schrodinger.application.pathfinder.molio.csvcat(source_filenames, dest_filename, dedup)

Concatenate the contents of the CSV source files, writing them to a destination CSV file, but making sure that the header row only appears once. Each input file is assumed to have a header row.

Parameters:
  • source_filenames (sequence of str) – input files
  • dest_filename (str) – destination file
  • dedup (bool) – skip duplicate products
Returns:

number of structures written

Return type:

int