schrodinger.application.pathfinder.molio module¶
PathFinder helper functions for reading and writing files using RDKit Mol objects.
-
class
schrodinger.application.pathfinder.molio.
MolWriter
(filename, generate_coordinates=True, require_stereo=False)[source]¶ Bases:
schrodinger.structure._io.StructureWriter
Write Mol objects to a file using a StructureWriter-like API, optionally generating 3D coordinates.
-
__init__
(filename, generate_coordinates=True, require_stereo=False)[source]¶ Create a structure writer class based on the format.
- Parameters
filename (str or pathlib.Path) – The filename to write to.
overwrite (bool) – If False, append to an existing file instead of overwriting it.
format (str) – The format of the file. Values should be specified by one of the module-level constants MAESTRO, MOL2, SD, SMILES, or SMILESCSV. If the format is not explicitly specified it will be determined from the suffix of the filename. Multi-structure PDB files are not supported.
stereo (enum) –
Use of the stereo option in the constructor is pending deprecation. Please use the setOption method instead.
See the class docstring for documentation on the stereo options.
-
close
()¶ Close the file.
-
extend
(cts)¶ Append all provided structures to the open file.
-
setOption
(option, value)¶ Set a single option for this writer. This method is meant for options that may not be supported for all writer formats. See the
StructureWriter
class documentation for details on the available options.Raises an OptionError subclass (either UnsupportedOption or UnsupportedOptionValue) if unsuccessful.
- Parameters
option (str) – The name of the option to set.
value – The value for the option. The data type of this parameter depends on the option being set.
-
static
write
(st, filename)¶ Writes the given Structure to the specified file, overwriting the file if it already exists.
- Parameters
st (structure.Structure) – structure object to write to file
filename (str or pathlib.Path) – filename to write to
-
-
class
schrodinger.application.pathfinder.molio.
CsvMolReader
(filename)[source]¶ Bases:
object
Read a SMILES CSV file, returning Mol objects.
This is similar to RDKit’s SmilesMolSupplier with delimiter=’,’, except that it uses the csv module instead of naively splitting on commas. This makes it possible to have field values containing commas, as long as they are quoted following the CSV convention. Note, however, that multi-line records are still not supported for efficiency reasons.
Also, gzip-compressed files (identified by the filename ending in “gz”) are supported.
A CsvMolReader supports random access, like a list. Upon instantiation, the file is read in full and kept in memory. For a CSV file having only SMILES and an ID, this takes about 100 MB per million entries.
-
class
schrodinger.application.pathfinder.molio.
CsvMolWriter
(filename, properties=None, cxsmiles=False)[source]¶ Bases:
object
Write a CSV file given Mol objects, using a StructureWriter-like API. The first two columns are the SMILES and title, and the rest are the properties of the molecule.
We don’t use structure.SmilesCsvWriter because it is too slow due to all the conversions (the overall job takes 4 times as long, so the bottleneck clearly becomes the writing of the output file!).
We don’t use Chem.SmilesWriter because even though it can use comma as a delimiter, it doesn’t write proper CSV files because it doesn’t know how to escape the delimiter.
Also, gzip-compressed files (identified by the filename ending in “gz”) are supported.
-
__init__
(filename, properties=None, cxsmiles=False)[source]¶ - Parameters
filename (str or file-like object) – file to write
properties (list of str or None) – optional, list of names of properties to write to output file. If None, all the properties are written. (CAVEAT: if
filename
is a file object rather than an actual filename, only the properties present in the first molecule are written.)cxsmiles (bool) – when writing SMILES, use CXSMILES extensions
-
class
schrodinger.application.pathfinder.molio.
PfxMolReader
(filename)[source]¶ Bases:
object
Reader for PFX (PathFinder reactants) files. These are really zip archives containing a CSV file and a metadata JSON file.
Like CsvMolReader, PfxMolReader supports random access, like a list. Upon instantiation, the file is read in full and kept in memory. For a file having only SMILES and an ID, this takes about 100 MB per million entries.
-
class
schrodinger.application.pathfinder.molio.
PfxMolWriter
(filename, properties=None)[source]¶ Bases:
object
Writer for PFX (PathFinder reactants) files. These are really zip archives containing a CSV file and a metadata JSON file.
-
__init__
(filename, properties=None)[source]¶ - Parameters
filename (str) – file to write
properties (list of str or None) – optional, list of names of properties to write to output file. If None, all the properties present on the first structure will be written (the assumption is that all molecules will have the same properties, or at least that the first molecule has all the properties that we care about).
-
append
(mol)[source]¶ Write a molecule to the file.
- Parameters
mol (rdkit.Chem.rdchem.Mol) – molecule
-
property
written_count
¶
-
-
class
schrodinger.application.pathfinder.molio.
RdkitMolWriter
(filename, v3000=False)[source]¶ Bases:
object
Write Mol objects to a file using the RDKit file-writing classes, but with a StructureWriter-like API. Supports SMILES and SDF.
-
__init__
(filename, v3000=False)[source]¶ - Parameters
filename – filename to write
v3000 – when writing SD, force the use of the V3000 format
-
property
written_count
¶
-
-
schrodinger.application.pathfinder.molio.
get_mol_writer
(filename, generate_coordinates=True, require_stereo=False, v3000=False, cxsmiles=False)[source]¶ Return a StructureWriter-like object based on the command-line arguments. RDkit is used for non-Maestro formats.
- Parameters
filename – filename to write
generate_coordinates (bool) – generate 3D coordinates (non-SMILES formats)
require_stereo (bool) – when generating coordinates, fail when there’s unspecified stereochemistry, instead of producing an arbitrary isomer
v3000 – when writing SD, force the use of the V3000 format
cxsmiles (bool) – when writing SMILES, use CXSMILES extensions
-
schrodinger.application.pathfinder.molio.
supported_output_format
(filename)[source]¶ Check whether we know how to write a file with a given name, but without actually opening a file. Used for argument validation.
- Return type
bool
-
schrodinger.application.pathfinder.molio.
get_mol_reader
(filename, skip_bad=True, implicitH=True)[source]¶ Return a Mol reader given a filename or a SMILES string. For .smi and .csv files, use the RDKit SmilesMolSupplier; for other formats, use StructureReader but convert Structure to Mol before yielding each molecule.
Whenever possible, the reader will be a Sequence. This is the currently the case for .smi and .csv files when skip_bad is False. (And for a SMILES string, which returns a list of size 1.)
- Parameters
skip_bad (bool) – if True, bad structures are skipped implicitly, instead of being yielded as None (only applies to SMILES and CSV formats.)
implicitH (bool) – use implicit hydrogens (only has an effect when reading Maestro files)
- Return type
Generator or Sequence of Mol
-
schrodinger.application.pathfinder.molio.
get_mol
(target, implicitH=True)[source]¶ Read a Mol from a file or a SMILES string.
- Parameters
target (str) – filename or SMILES
implicitH (bool) – use implicit hydrogens (only has an effect when reading Maestro files)
- Return type
rdkit.Chem.Mol
-
schrodinger.application.pathfinder.molio.
adapt_st_reader
(reader, implicitH=True)[source]¶ Generate RDKit Mol objects given a StructureReader-like object. Structures which cause conversion errors are skipped but a warning is logged.
- Parameters
reader (iterable of Structure) – source of structures to convert
implicitH (bool) – use implicit hydrogens
- Returns
converted RDKit molecule objects
- Return type
generator of Mol
-
schrodinger.application.pathfinder.molio.
combine_output_files
(outfiles, out, dedup=True)[source]¶ Write the final output file.
- Parameters
outfiles (list of str) – subjob output filenames
out (list of str) – output filename
dedup (bool) – skip duplicate products
-
schrodinger.application.pathfinder.molio.
dedup_smiles_cat
(source_filenames, dest_filename)[source]¶ Concatenate the given SMILES files, with deduplication.
- Parameters
source_filenames (sequence of str) – input files
dest_filename (str) – destination file
- Returns
number of structures written
- Return type
int
-
schrodinger.application.pathfinder.molio.
dedup_st_cat
(source_filenames, dest_filename)[source]¶ Concatenate the given structure files, with deduplication. This uses StructureReader / StructureWriter so any file format supported by those classes may be used.
- Parameters
source_filenames (sequence of str) – input files
dest_filename (str) – destination file
- Returns
number of structures written
- Return type
int
-
schrodinger.application.pathfinder.molio.
csvcat
(source_filenames, dest_filename, dedup)[source]¶ Concatenate the contents of the CSV source files, writing them to a destination CSV file, but making sure that the header row only appears once. Each input file is assumed to have a header row, and the headers must be consistent or an exception will be raised.
- Parameters
source_filenames (sequence of str) – input files
dest_filename (str) – destination file
dedup (bool) – skip duplicate products, using first column as key
- Returns
number of structures written
- Return type
int
- Raises
ValueError if headers are inconsistent
-
schrodinger.application.pathfinder.molio.
csvcat_union
(source_filenames, dest_filename, dedup_field=None)[source]¶ Concatenate the contents of the CSV source files, writing them to a destination CSV file, but making sure that the header row only appears once. Each input file is assumed to have a header row. If the headers are inconsistent, the output file contains the union of the columns from the input files, with empty values in rows coming from a file which didn’t have a given column.
- Parameters
source_filenames (sequence of str) – input files
dest_filename (str) – destination file
dedup_field (str or NoneType) – skip duplicate products using the named field as key. If not truthy, no deduplication is performed.
- Returns
number of structures written
- Return type
int
-
schrodinger.application.pathfinder.molio.
get_fieldnames
(filenames)[source]¶ Return a list with the union of the field names from all the given CSV files. The field names are listed in the order in which they were first seen. (First all the fields from file #1, then the “new” field names from file #2, etc.)
- Parameters
filenames ([str]) – list of CSV files
- Returns
list of field names
- Return type
[str]
-
schrodinger.application.pathfinder.molio.
get_pfx_size
(filename)[source]¶ Return the size from the metadata header of a .pfx file.
-
schrodinger.application.pathfinder.molio.
extract_structures
(filename, dest_file)[source]¶ Extract structures from .pfx file into a given file.