schrodinger.application.pathfinder.molio module

PathFinder helper functions for reading and writing files using RDKit Mol objects.

class schrodinger.application.pathfinder.molio.MolWriter(filename, generate_coordinates=True, require_stereo=False)

Bases: schrodinger.structure.StructureWriter

Write Mol objects to a file using a StructureWriter-like API, optionally generating 3D coordinates.

__init__(filename, generate_coordinates=True, require_stereo=False)

Create a structure writer class based on the format.

Parameters:
  • filename (str) – The filename to write to.
  • overwrite (bool) – If False, append to an existing file instead of overwriting it.
  • format (str) – The format of the file. Values should be specified by one of the module-level constants MAESTRO, MOL2, SD, SMILES, or SMILESCSV. If the format is not explicitly specified it will be determined from the suffix of the filename. Multi-structure PDB files are not supported.
  • stereo (enum) –
    Use of the stereo option in the constructor is pending deprecation. Please use the setOption method instead.

    See the class docstring for documentation on the stereo options.

append(mol)

Append the provided structure to the open file.

close()

Close the file.

extend(cts)

Append all provided structures to the open file.

setOption(option, value)

Set a single option for this writer. This method is meant for options that may not be supported for all writer formats. See the StructureWriter class documentation for details on the available options.

Raises an OptionError subclass (either UnsupportedOption or UnsupportedOptionValue) if unsuccessful.

Parameters:
  • option (str) – The name of the option to set.
  • value – The value for the option. The data type of this parameter depends on the option being set.
class schrodinger.application.pathfinder.molio.CsvMolReader(filename)

Bases: object

Read a SMILES CSV file, returning Mol objects.

This is similar to RDKit’s SmilesMolSupplier with delimiter=’,’, except that it uses the csv module instead of naively splitting on commas. This makes it possible to have field values containing commas, as long as they are quoted following the CSV convention. Note, however, that multi-line records are still not supported for efficiency reasons.

Also, gzip-compressed files (identified by the filename ending in “gz”) are supported.

A CsvMolReader supports random access, like a list. Upon instantiation, the file is read in full and kept in memory. For a CSV file having only SMILES and an ID, this takes about 100 MB per million entries.

__init__(filename)

Initialize self. See help(type(self)) for accurate signature.

__len__()
class schrodinger.application.pathfinder.molio.CsvMolWriter(filename, properties=None)

Bases: object

Write a CSV file given Mol objects, using a StructureWriter-like API. The first two columns are the SMILES and title, and the rest are the properties of the molecule.

  • We don’t use structure.SmilesCsvWriter because it is too slow due to all the conversions (the overall job takes 4 times as long, so the bottleneck clearly becomes the writing of the output file!).
  • We don’t use Chem.SmilesWriter because even though it can use comma as a delimiter, it doesn’t write proper CSV files because it doesn’t know how to escape the delimiter.

Also, gzip-compressed files (identified by the filename ending in “gz”) are supported.

__init__(filename, properties=None)
Parameters:
  • filename (str) – file to write
  • properties (list of str or None) – optional, list of names of properties to write to output file. If None, all the properties present on the first structure will be written (the assumption is that all molecules will have the same properties, or at least that the first molecule has all the properties that we care about).
append(mol)

Write a molecule to the file. The first time this is called, the header row is written based on mol’s properties or the properties passed to __init__, if any.

Parameters:mol (rdkit.Chem.rdchem.Mol) – molecule
close()
class schrodinger.application.pathfinder.molio.PfxMolReader(filename)

Bases: object

Reader for PFX (PathFinder reactants) files. These are really zip archives containing a CSV file and a metadata JSON file.

Like CsvMolReader, PfxMolReader supports random access, like a list. Upon instantiation, the file is read in full and kept in memory. For a file having only SMILES and an ID, this takes about 100 MB per million entries.

__init__(filename)

Initialize self. See help(type(self)) for accurate signature.

__len__()
class schrodinger.application.pathfinder.molio.PfxMolWriter(filename, properties=None)

Bases: object

Writer for PFX (PathFinder reactants) files. These are really zip archives containing a CSV file and a metadata JSON file.

__init__(filename, properties=None)
Parameters:
  • filename (str) – file to write
  • properties (list of str or None) – optional, list of names of properties to write to output file. If None, all the properties present on the first structure will be written (the assumption is that all molecules will have the same properties, or at least that the first molecule has all the properties that we care about).
append(mol)

Write a molecule to the file.

Parameters:mol (rdkit.Chem.rdchem.Mol) – molecule
written_count
close()
class schrodinger.application.pathfinder.molio.RdkitMolWriter(filename, v3000=False)

Bases: object

Write Mol objects to a file using the RDKit file-writing classes, but with a StructureWriter-like API. Supports SMILES and SDF.

__init__(filename, v3000=False)
Parameters:
  • filename – filename to write
  • v3000 – when writing SD, force the use of the V3000 format
append(mol)
close()
schrodinger.application.pathfinder.molio.get_mol_writer(filename, generate_coordinates=True, require_stereo=False, v3000=False)

Return a StructureWriter-like object based on the command-line arguments. RDkit is used for non-Maestro formats.

Parameters:
  • filename – filename to write
  • generate_coordinates (bool) – generate 3D coordinates (non-SMILES formats)
  • require_stereo (bool) – when generating coordinates, fail when there’s unspecified stereochemistry, instead of producing an arbitrary isomer
  • v3000 – when writing SD, force the use of the V3000 format
schrodinger.application.pathfinder.molio.supported_output_format(filename)

Check whether we know how to write a file with a given name, but without actually opening a file. Used for argument validation.

Return type:bool
schrodinger.application.pathfinder.molio.get_mol_reader(filename, skip_bad=True, implicitH=True)

Return a Mol reader given a filename or a SMILES string. For .smi and .csv files, use the RDKit SmilesMolSupplier; for other formats, use StructureReader but convert Structure to Mol before yielding each molecule.

Whenever possible, the reader will be a Sequence. This is the currently the case for .smi and .csv files when skip_bad is False. (And for a SMILES string, which returns a list of size 1.)

Parameters:
  • skip_bad (bool) – if True, bad structures are skipped implicitly, instead of being yielded as None (only applies to SMILES and CSV formats.)
  • implicitH (bool) – use implicit hydrogens (only has an effect when reading Maestro files)
Return type:

Generator or Sequence of Mol

schrodinger.application.pathfinder.molio.get_mol(target, implicitH=True)

Read a Mol from a file or a SMILES string.

Parameters:
  • target (str) – filename or SMILES
  • implicitH (bool) – use implicit hydrogens (only has an effect when reading Maestro files)
Return type:

rdkit.Chem.Mol

schrodinger.application.pathfinder.molio.adapt_st_reader(reader, implicitH=True)

Generate RDKit Mol objects given a StructureReader-like object. Structures which cause conversion errors are skipped but a warning is logged.

Parameters:
  • reader (iterable of Structure) – source of structures to convert
  • implicitH (bool) – use implicit hydrogens
Returns:

converted RDKit molecule objects

Return type:

generator of Mol

schrodinger.application.pathfinder.molio.combine_output_files(outfiles, out, dedup=True)

Write the final output file.

Parameters:
  • outfiles (list of str) – subjob output filenames
  • out (list of str) – output filename
  • dedup (bool) – skip duplicate products
schrodinger.application.pathfinder.molio.dedup_smiles_cat(source_filenames, dest_filename)

Concatenate the given SMILES files, with deduplication.

Parameters:
  • source_filenames (sequence of str) – input files
  • dest_filename (str) – destination file
Returns:

number of structures written

Return type:

int

schrodinger.application.pathfinder.molio.dedup_st_cat(source_filenames, dest_filename)

Concatenate the given structure files, with deduplication. This uses StructureReader / StructureWriter so any file format supported by those classes may be used.

Parameters:
  • source_filenames (sequence of str) – input files
  • dest_filename (str) – destination file
Returns:

number of structures written

Return type:

int

schrodinger.application.pathfinder.molio.csvcat(source_filenames, dest_filename, dedup)

Concatenate the contents of the CSV source files, writing them to a destination CSV file, but making sure that the header row only appears once. Each input file is assumed to have a header row.

Parameters:
  • source_filenames (sequence of str) – input files
  • dest_filename (str) – destination file
  • dedup (bool) – skip duplicate products
Returns:

number of structures written

Return type:

int

schrodinger.application.pathfinder.molio.open_maybe_compressed(filename, *a, **d)

Open a file, using the gzip module if the filename ends in gz, or the builtin open otherwise. All arguments are passed through.

schrodinger.application.pathfinder.molio.is_csvgz(filename)
schrodinger.application.pathfinder.molio.is_pfx(filename)
schrodinger.application.pathfinder.molio.get_pfx_size(filename)

Return the size from the metadata header of a .pfx file.