schrodinger.application.matsci.mlearn.features module

Classes and functions to deal with ML features.

Copyright Schrodinger, LLC. All rights reserved.

class schrodinger.application.matsci.mlearn.features.MomentData(flag, components, header, units)

Bases: tuple

__contains__

Return key in self.

__init__

Initialize self. See help(type(self)) for accurate signature.

__len__

Return len(self).

components

Alias for field number 1

count()

Return number of occurrences of value.

flag

Alias for field number 0

header

Alias for field number 2

index()

Return first index of value.

Raises ValueError if the value is not present.

units

Alias for field number 3

schrodinger.application.matsci.mlearn.features.DescriptorUtility

alias of schrodinger.application.matsci.mlearn.features.DescriptorUtilitity

schrodinger.application.matsci.mlearn.features.get_distance_cell(struct, cutoff)

Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.

Parameters:
Return type:

schrodinger.structure.Structure, , schrodinger.infra.structure.DistanceCell, schrodinger.infra.structure.PBC

Returns:

Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.

Raise:

ValueError if struct is missing PBCs

schrodinger.application.matsci.mlearn.features.elemental_generator(struct, element, is_equal=True)
schrodinger.application.matsci.mlearn.features.get_anion(struct)

Get the most electronegative element in the structure (anion).

Parameters:struct (schrodinger.structure.Structure) – Input structure
Return type:str, float, int
Returns:Element, it’s electronegativity, number of anions in the cell
class schrodinger.application.matsci.mlearn.features.LatticeFeatures(features, element='Li', cutoff=4.0)

Bases: schrodinger.application.matsci.mlearn.base.BaseFeaturizer

Class to generate lattice-based features.

FEATURES = {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}
__init__(features, element='Li', cutoff=4.0)

Initialize the object.

runFeature(feature)

Get result from a feature.

Param:feature: One of the features listed in FEATURES.
Return type:int or float
Returns:Feature value
transform(structs)

Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.

Parameters:structs (list(schrodinger.structure.Structure)) – List of structures to be featurized
Return type:numpy array of shape [n_samples, n_features]
Returns:Transformed array
avgAtomicVol()

Get average atomic volume.

Parameters:struct (schrodinger.structure.Structure) – Structure to be used for feature calculation
Return type:float
Returns:Average atomic volume (A^3)
avgNeighborCount()

Get average neighbor count.

Return type:float
Returns:Average neighbor count
stdNeighborCount()

Get standard deviation of neighbor count.

Return type:float
Returns:Average neighbor count
avgSublatticeEneg()

Get average sublattice electronegativity.

Return type:float
Returns:Average sublattice electronegativity
avgSublatticeNeighborCount()

Get average sublattice neighbor count.

Return type:float
Returns:Average sublattice neighbor count
avgNeighborIon()

Get average neighbor ionicity.

Return type:float
Returns:Average neighbor ionicity
stdNeighborIon()

Get standard deviation of neighbor ionicity.

Return type:float
Returns:Average neighbor ionicity
avgSublatticeNeighborIon()

Get average sublattice neighbor ionicity.

Return type:float
Returns:Average sublattice neighbor count
volPerAnion()

Get volume per anion.

Return type:float
Returns:Volume per anion
packingFraction(skip_element=None)

Get packing fraction of the crystal.

Parameters:skip_element (str) – Element to skip
Return type:float
Returns:Packing fraction
effectiveRadius(atom)

Get atom effective radius.

Parameters:atom (schrodinger.structure._StructureAtom) – Atom
Return type:float
Returns:Effective radius
sublatticePackingFraction()

Get packing fraction of the sublattice crystal.

Return type:float
Returns:Packing fraction
avgElementNeighborCount()

Get average element neighbor count.

Return type:float
Returns:Average number of bonds per element
avgAnionAnionShortDistance()

Get average anion anion shortest distance.

Return type:float
Returns:Average anion anion shortest distance
avgElementAnionShortDistance()

Get average element anion shortest distance.

Return type:float
Returns:Average element anion shortest distance
avgShortDistance()

Get average element element shortest distance.

Return type:float
Returns:Average element element shortest distance
anionFrameCoordination()

Get anion framework coordination.

Return type:float
Returns:Anion framework coordination
pathWidth(eval_eneg=False)

Evaluate average straight line path width. See the reference in the constructor for more info.

Parameters:eval_eneg (bool) – If True, return average over electronegativity, instead of distance
Return type:float
Returns:Average path or electronegativity
pathWidthEneg()

Evaluate average straight line path electronegativity.

Return type:float
Returns:Average electronegativity along the path
ratioIonicity()

Get ratio ionicity.

Return type:float
Returns:Ratio ionicity
ratioCount()

Get ratio neighbor count.

Return type:float
Returns:Ratio neighbor count
fit(data, data_y=None)

Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.

Parameters:
  • data (numpy array of shape [n_samples, n_features]) – Training set
  • data_y (numpy array of shape [n_samples]) – Target values
Return type:

BaseFeaturizer

Returns:

self object with fitted data

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
**fit_params : dict
Additional fit parameters.
X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
get_params(deep=True)

Get parameters for this estimator.

deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
params : mapping of string to any
Parameter names mapped to their values.
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

**params : dict
Estimator parameters.
self : object
Estimator instance.
class schrodinger.application.matsci.mlearn.features.Ligand(st, metal_atom, new_to_old, coordination_idxs)

Bases: object

Manage a ligand.

__init__(st, metal_atom, new_to_old, coordination_idxs)

Create an instance.

Parameters:
getVec(point)

Return a vector pointing from the metal atom to the given point.

Parameters:point (numpy.array) – the point in Ang.
Return type:numpy.array
Returns:the vector in Ang.
getCentroid(st, idxs)

Return the centroid vector of the given coordination atom indices.

Parameters:
Return type:

numpy.array

Returns:

the centroid vector in Ang.

getCoordinationVec(st, idxs)

Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.

Parameters:
Return type:

numpy.array

Returns:

the coordination vector in Ang.

getStoichiometry()

Return the stoichiometry.

Return type:str
Returns:the stoichiometry
getDenticity()

Return the denticity.

Return type:int
Returns:the denticity
getHapticity()

Return the hapticity.

Return type:int
Returns:the hapticity
getHapticCharacter()

Return the haptic character.

Return type:int
Returns:the haptic character
getBiteAngle()

Return the bite angle in degrees.

Return type:float or None
Returns:the bite angle in degrees
getAtomConeAngle(atom)

Return the cone angle for the given atom in degrees.

Parameters:atom (schrodinger.structure._StructureAtom) – the atom
Return type:float
Returns:the cone angle for the given atom in degrees
getConeAngle()

Return the cone angle in degrees.

Return type:float
Returns:the cone angle in degrees
getBondLength()

Return the bond length in Ang.

Return type:float
Returns:the bond length in Ang.
getDescriptors()

Return descriptors.

Return type:dict
Returns:(label, data) pairs
class schrodinger.application.matsci.mlearn.features.Complex(st, jaguar_out_file=None, ligfilter=False, canvas=False, moldescriptors=False, save_files=False, logger=None)

Bases: object

Manage a complex.

__init__(st, jaguar_out_file=None, ligfilter=False, canvas=False, moldescriptors=False, save_files=False, logger=None)

Create an instance.

Parameters:
  • st (schrodinger.structure.Structure) – the structure
  • jaguar_out_file (str, None) – the name of a Jaguar *.out file from which descriptors will be extracted or None if there isn’t one
  • ligfilter (bool) – specify whether to calculate Ligfilter features
  • canvas (bool) – specify whether to calculate Canvas features
  • moldescriptors (bool or list) – If a bool, specify whether to calculate Molecular Descriptors features. If a list, calculate Molecular Descriptors and use these flags on the command line for the moldescriptors utility to specify which descriptors to calculate.
  • save_files (bool) – Whether to save subjob files or not
  • logger (logging.Logger or None) – output logger or None if there isn’t one
getMetalAtom()

Return the metal atom.

Return type:schrodinger.structure._StructureAtom
Returns:the metal atom
getLigands()

Return the ligands.

Return type:list
Returns:contains Ligand
getBondAngle()

Return the bond angle in degrees.

Return type:float
Returns:the bond angle in degrees
getVDWSurfaceArea()

Return the VDW surface area in Angstrom^2.

Return type:float
Returns:the VDW surface area in Angstrom^2
getVDWVolume(vdw_scale=1, buffer_len=2)

Return the VDW volume in Angstrom^3.

Parameters:
  • vdw_scale (float) – the VDW scale
  • buffer_len (float) – a shape buffer lengths in Angstrom
Return type:

float

Returns:

the VDW volume in Angstrom^3

getStructureContainingLargestLigands()

Return a structure containing the largest ligand or multiple copies thereof if it is symmetric.

Return type:schrodinger.structure.Structure
Returns:the structure containing the largest ligand(s)
getBuriedVDWVolumePct(vdw_scale=1)

Return the buried VDW volume percent.

Parameters:vdw_scale (float) – the VDW scale
Return type:float
Returns:the buried VDW volume percent
getJaguarDescriptors()

Return Jaguar descriptors.

Return type:dict
Returns:(label, data) pairs
getDescriptorUtilityDescriptors(descriptor_utility)

Return descriptors for the given descriptor utility.

Parameters:descriptor_utility (DescriptorUtility) – the descriptor utility to use for obtaining descriptors
Return type:dict
Returns:(label, data) pairs
getLigfilterDescriptors()

Return Ligfilter descriptors.

Return type:dict
Returns:(label, data) pairs
getCanvasDescriptors()

Return Canvas descriptors.

Return type:dict
Returns:(label, data) pairs
getMolecularDescriptorsDescriptors()

Return Molecular Descriptors descriptors.

Return type:dict
Returns:(label, data) pairs
getVectorizedDescriptors()

Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.

Return type:dict
Returns:(label, data) pairs
getDescriptors()

Return descriptors.

Return type:dict
Returns:(label, data) pairs
schrodinger.application.matsci.mlearn.features.get_unique_titles(sts)

Return a list of unique titles for the given structures.

Parameters:sts (list) – contains schrodinger.structure.Structure
Return type:list
Returns:the unique titles
class schrodinger.application.matsci.mlearn.features.ComplexFeatures(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Bases: schrodinger.application.matsci.mlearn.base.BaseFeaturizer

Class to generate features for metal complexes.

__init__(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Create an instance.

Parameters:
  • jaguar (bool) – specify whether to calculate Jaguar features
  • jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
  • jaguar_out_files (list) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures
  • tpp (int) – the number of threads for any Jaguar jobs
  • ligfilter (bool) – specify whether to calculate Ligfilter features
  • canvas (bool) – specify whether to calculate Canvas features
  • moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
  • include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
  • save_files (bool) – Whether to save subjob files or not
  • logger (logging.Logger or None) – output logger or None if there isn’t one
runJaguar(structs, logger=None)

Run Jaguar on the given structures.

Parameters:
  • structs (list(schrodinger.structure.Structure)) – list of structures to be featurized
  • logger (logging.Logger or None) – output logger or None if there isn’t one
Return type:

list

Returns:

contains Jaguar *.out file names

transform(structs)

Get numerical features from structures. Also sets feature names in self.labels. See parent class for more documentation.

Parameters:structs (list(schrodinger.structure.Structure)) – list of structures to be featurized
Return type:numpy array of shape [n_structs, n_features]
Returns:transformed array
vectorized_transform(structs)

Get vectorized features from structures. Also sets feature names in self.vectorized_labels. See parent class for more documentation.

Parameters:structs (list(schrodinger.structure.Structure)) – list of structures to be featurized
Return type:numpy array of shape [n_structs, n_features]
Returns:transformed array
static getFeatures(structs, jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)

Return features and vectorized features dictionaries for the given structures.

Parameters:
  • structs (list(schrodinger.structure.Structure)) – list of structures to be featurized
  • jaguar (bool) – specify whether to calculate Jaguar features
  • jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
  • jaguar_out_files (list) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for the given structures
  • tpp (int) – the number of threads for any Jaguar jobs
  • ligfilter (bool) – specify whether to calculate Ligfilter features
  • canvas (bool) – specify whether to calculate Canvas features
  • moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
  • include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
  • savefiles (bool) – save files created by subjobs
  • logger (logging.Logger or None) – output logger or None if there isn’t one
Return type:

dict, dict

Returns:

features and vectorized features dictionaries where keys are structure titles and values are dicts of feature labels and values

static writeFingerprintFiles(structs)

Write fingerprint files for the given structures.

Parameters:structs (list(schrodinger.structure.Structure)) – list of structures to be fingerprinted
Return type:list
Returns:the fingerprint file names
fit(data, data_y=None)

Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.

Parameters:
  • data (numpy array of shape [n_samples, n_features]) – Training set
  • data_y (numpy array of shape [n_samples]) – Target values
Return type:

BaseFeaturizer

Returns:

self object with fitted data

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : numpy array of shape [n_samples, n_features]
Training set.
y : numpy array of shape [n_samples]
Target values.
**fit_params : dict
Additional fit parameters.
X_new : numpy array of shape [n_samples, n_features_new]
Transformed array.
get_params(deep=True)

Get parameters for this estimator.

deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
params : mapping of string to any
Parameter names mapped to their values.
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

**params : dict
Estimator parameters.
self : object
Estimator instance.
class schrodinger.application.matsci.mlearn.features.CrystalNNFeatures(preset='ops')

Bases: object

Calculates CrystalNN structure fingerprints as implemented in pymatgen

OPS_PRESET = 'ops'
CN_PRESET = 'cn'
__init__(preset='ops')

Create a structure featurizer

Parameters:preset (str) – One of OPS_PRESET or CN_PRESET class constants
featurize(struct)

Get CrystalNN fingerprints for the passed structure

:param structure.Structure The structure to get features for

Return type:list
Returns:List of CrystalNN fingerprints for the structure