schrodinger.application.matsci.mlearn.features module

Classes and functions to deal with ML features.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)

Bases: tuple

__contains__(key, /)

Return key in self.

__len__()

Return len(self).

components

Alias for field number 1

count(value, /)

Return number of occurrences of value.

flag

Alias for field number 0

header

Alias for field number 2

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

units

Alias for field number 3

schrodinger.application.matsci.mlearn.features.DescriptorUtility

alias of schrodinger.application.matsci.mlearn.features.DescriptorUtilitity

schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)[source]

Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.

Parameters
Return type

schrodinger.structure.Structure, , schrodinger.infra.structure.DistanceCell, schrodinger.infra.structure.PBC

Returns

Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.

Raise

ValueError if struct is missing PBCs

schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)[source]
schrodinger.application.matsci.mlearn.features.get_anion(struct)[source]

Get the most electronegative element in the structure (anion).

Parameters

struct (schrodinger.structure.Structure) – Input structure

Return type

str, float, int

Returns

Element, it’s electronegativity, number of anions in the cell

class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)[source]

Bases: schrodinger.application.matsci.mlearn.base.BaseFeaturizer

Class to generate lattice-based features.

FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}
__init__(features, element='Li', cutoff=4.0)[source]

Initialize the object.

runFeature(feature)[source]

Get result from a feature.

Param

feature: One of the features listed in FEATURES.

Return type

int or float

Returns

Feature value

transform(structs)[source]

Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.

Parameters

structs (list(schrodinger.structure.Structure)) – List of structures to be featurized

Return type

numpy array of shape [n_samples, n_features]

Returns

Transformed array

avgAtomicVol()[source]

Get average atomic volume.

Parameters

struct (schrodinger.structure.Structure) – Structure to be used for feature calculation

Return type

float

Returns

Average atomic volume (A^3)

avgNeighborCount()[source]

Get average neighbor count.

Return type

float

Returns

Average neighbor count

stdNeighborCount()[source]

Get standard deviation of neighbor count.

Return type

float

Returns

Average neighbor count

avgSublatticeEneg()[source]

Get average sublattice electronegativity.

Return type

float

Returns

Average sublattice electronegativity

avgSublatticeNeighborCount()[source]

Get average sublattice neighbor count.

Return type

float

Returns

Average sublattice neighbor count

avgNeighborIon()[source]

Get average neighbor ionicity.

Return type

float

Returns

Average neighbor ionicity

stdNeighborIon()[source]

Get standard deviation of neighbor ionicity.

Return type

float

Returns

Average neighbor ionicity

avgSublatticeNeighborIon()[source]

Get average sublattice neighbor ionicity.

Return type

float

Returns

Average sublattice neighbor count

volPerAnion()[source]

Get volume per anion.

Return type

float

Returns

Volume per anion

packingFraction(skip_element=None)[source]

Get packing fraction of the crystal.

Parameters

skip_element (str) – Element to skip

Return type

float

Returns

Packing fraction

effectiveRadius(atom)[source]

Get atom effective radius.

Parameters

atom (schrodinger.structure._StructureAtom) – Atom

Return type

float

Returns

Effective radius

sublatticePackingFraction()[source]

Get packing fraction of the sublattice crystal.

Return type

float

Returns

Packing fraction

avgElementNeighborCount()[source]

Get average element neighbor count.

Return type

float

Returns

Average number of bonds per element

avgAnionAnionShortDistance()[source]

Get average anion anion shortest distance.

Return type

float

Returns

Average anion anion shortest distance

avgElementAnionShortDistance()[source]

Get average element anion shortest distance.

Return type

float

Returns

Average element anion shortest distance

avgShortDistance()[source]

Get average element element shortest distance.

Return type

float

Returns

Average element element shortest distance

anionFrameCoordination()[source]

Get anion framework coordination.

Return type

float

Returns

Anion framework coordination

pathWidth(eval_eneg=False)[source]

Evaluate average straight line path width. See the reference in the constructor for more info.

Parameters

eval_eneg (bool) – If True, return average over electronegativity, instead of distance

Return type

float

Returns

Average path or electronegativity

pathWidthEneg()[source]

Evaluate average straight line path electronegativity.

Return type

float

Returns

Average electronegativity along the path

ratioIonicity()[source]

Get ratio ionicity.

Return type

float

Returns

Ratio ionicity

ratioCount()[source]

Get ratio neighbor count.

Return type

float

Returns

Ratio neighbor count

fit(data, data_y=None)

Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.

Parameters
  • data (numpy array of shape [n_samples, n_features]) – Training set

  • data_y (numpy array of shape [n_samples]) – Target values

Return type

BaseFeaturizer

Returns

self object with fitted data

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

yndarray of shape (n_samples,), default=None

Target values.

**fit_paramsdict

Additional fit parameters.

X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

paramsmapping of string to any

Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

**paramsdict

Estimator parameters.

selfobject

Estimator instance.

class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)[source]

Bases: object

Manage a ligand.

__init__(st, metal_atom, new_to_old, coordination_idxs)[source]

Create an instance.

Parameters
getVec(point)[source]

Return a vector pointing from the metal atom to the given point.

Parameters

point (numpy.array) – the point in Ang.

Return type

numpy.array

Returns

the vector in Ang.

getCentroid(st, idxs)[source]

Return the centroid vector of the given coordination atom indices.

Parameters
Return type

numpy.array

Returns

the centroid vector in Ang.

getCoordinationVec(st, idxs)[source]

Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.

Parameters
Return type

numpy.array

Returns

the coordination vector in Ang.

getStoichiometry()[source]

Return the stoichiometry.

Return type

str

Returns

the stoichiometry

getDenticity()[source]

Return the denticity.

Return type

int

Returns

the denticity

getHapticity()[source]

Return the hapticity.

Return type

int

Returns

the hapticity

getHapticCharacter()[source]

Return the haptic character.

Return type

int

Returns

the haptic character

getBiteAngle()[source]

Return the bite angle in degrees.

Return type

float or None

Returns

the bite angle in degrees

getAtomConeAngle(atom)[source]

Return the cone angle for the given atom in degrees.

Parameters

atom (schrodinger.structure._StructureAtom) – the atom

Return type

float

Returns

the cone angle for the given atom in degrees

getConeAngle()[source]

Return the cone angle in degrees.

Return type

float

Returns

the cone angle in degrees

getBondLength()[source]

Return the bond length in Ang.

Return type

float

Returns

the bond length in Ang.

getDescriptors()[source]

Return descriptors.

Return type

dict

Returns

(label, data) pairs

class schrodinger.application.matsci.mlearn.features.Complex(st, logger=None)[source]

Bases: object

Manage a complex.

__init__(st, logger=None)[source]

Create an instance.

Parameters
getMetalAtom()[source]

Return the metal atom.

Return type

schrodinger.structure._StructureAtom

Returns

the metal atom

getLigands()[source]

Return the ligands.

Return type

list

Returns

contains Ligand

getBondAngle()[source]

Return the bond angle in degrees.

Return type

float

Returns

the bond angle in degrees

getVDWSurfaceArea()[source]

Return the VDW surface area in Angstrom^2.

Return type

float

Returns

the VDW surface area in Angstrom^2

getVDWVolume(vdw_scale=1, buffer_len=2)[source]

Return the VDW volume in Angstrom^3.

Parameters
  • vdw_scale (float) – the VDW scale

  • buffer_len (float) – a shape buffer lengths in Angstrom

Return type

float

Returns

the VDW volume in Angstrom^3

getStructureContainingLargestLigands()[source]

Return a structure containing the largest ligand or multiple copies thereof if it is symmetric.

Return type

schrodinger.structure.Structure

Returns

the structure containing the largest ligand(s)

getBuriedVDWVolumePct(vdw_scale=1)[source]

Return the buried VDW volume percent.

Parameters

vdw_scale (float) – the VDW scale

Return type

float

Returns

the buried VDW volume percent

getVectorizedDescriptors(jaguar_out_file)[source]

Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

Parameters

jaguar_out_file (str, None) – the name of a Jaguar *.out file from which descriptors will be extracted or None if there isn’t one

Return type

dict

Returns

(label, data) pairs

getDescriptors()[source]

Return descriptors.

Return type

dict

Returns

(label, data) pairs

schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)[source]

Return a list of unique titles for the given structures.

Parameters

sts (list) – contains schrodinger.structure.Structure

Return type

list

Returns

the unique titles

class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)[source]

Bases: schrodinger.application.matsci.mlearn.base.BaseFeaturizer

Class to generate features for metal complexes.

__init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)[source]

Create an instance.

Parameters
  • jaguar (bool) – specify whether to calculate Jaguar features

  • jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here

  • tpp (int) – the number of threads for any Jaguar jobs

  • ligfilter (bool) – specify whether to calculate Ligfilter features

  • canvas (bool) – specify whether to calculate Canvas features

  • moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors

  • include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

  • save_files (bool) – Whether to save subjob files or not

  • logger (logging.Logger or None) – output logger or None if there isn’t one

runJaguar()[source]

Run Jaguar on the given structures.

Return type

list

Returns

contains Jaguar *.out file names

getFeatures(structs, jaguar_out_files=None)[source]

Return features dictionary for the given structures

Parameters
  • structs (list(schrodinger.structure.Structure)) – list of structures to be featurized

  • jaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures

verifyJaguarOutfiles()[source]

Run jaguar and get the out-files if the out-files have not been provided

getComplexDescriptors()[source]

Create a Complex object for each structure and get their descriptors

Return type

dict

Returns

The descriptors from Complex for each structure

getJaguarDescriptors()[source]

Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.

Return type

dict

Returns

The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getUtilityDescriptors()[source]

Get the requested utility descriptors for all structures

Return type

dict

Returns

The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getDescriptorUtilityJob(descriptor_utility)[source]

Get the job to run to generate the descriptors using the passed descriptor_utility for all structures

Parameters

descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors

Return type

jaguarworkflows.RobustSubmissionJob

Returns

The job to run to generate the descriptors

processUtilityDescriptorOutputs(jobs_dict)[source]

Read the descriptors for all descriptor utilities that were run, and return them

Parameters

jobs_dict (dict) – Dictionary with `DescriptorUtility`s as keys and jobs as values

Return type

dict

Returns

The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors

getMolecularDescriptorsJob()[source]

Get the job to run to generate molecular descriptors for all structures

Return type

jaguarworkflows.RobustSubmissionJob

Returns

The job to run to generate the descriptors

static writeFingerprintFiles(structs)[source]

Write fingerprint files for the given structures.

Parameters

structs (list(schrodinger.structure.Structure)) – list of structures to be fingerprinted

Return type

list

Returns

the fingerprint file names

log(msg, **kwargs)[source]

Add a message to the log file

Parameters

msg (str) – The message to log

Additional keyword arguments are passed to the textlogger.log_msg function

fit(data, data_y=None)

Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.

Parameters
  • data (numpy array of shape [n_samples, n_features]) – Training set

  • data_y (numpy array of shape [n_samples]) – Target values

Return type

BaseFeaturizer

Returns

self object with fitted data

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)

yndarray of shape (n_samples,), default=None

Target values.

**fit_paramsdict

Additional fit parameters.

X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)

Get parameters for this estimator.

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

paramsmapping of string to any

Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

**paramsdict

Estimator parameters.

selfobject

Estimator instance.

transform(data)

Get numerical features. Must be implemented by a child class.

Parameters

data (numpy array of shape [n_samples, n_features]) – Training set

Return type

numpy array of shape [n_samples, n_features_new]

Returns

Transformed array

class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')[source]

Bases: object

Calculates CrystalNN structure fingerprints as implemented in pymatgen

OPS_PRESET = 'ops'
CN_PRESET = 'cn'
__init__(preset='ops')[source]

Create a structure featurizer

Parameters

preset (str) – One of OPS_PRESET or CN_PRESET class constants

featurize(struct)[source]

Get CrystalNN fingerprints for the passed structure

:param structure.Structure The structure to get features for

Return type

list

Returns

List of CrystalNN fingerprints for the structure