schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
-
class
schrodinger.application.matsci.mlearn.features.
MomentData
(flag, components, header, units)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
components
¶ Alias for field number 1
-
count
(value, /)¶ Return number of occurrences of value.
-
flag
¶ Alias for field number 0
-
header
¶ Alias for field number 2
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
units
¶ Alias for field number 3
-
-
schrodinger.application.matsci.mlearn.features.
DescriptorUtility
¶ alias of
schrodinger.application.matsci.mlearn.features.DescriptorUtilitity
-
schrodinger.application.matsci.mlearn.features.
get_distance_cell
(struct, cutoff)[source]¶ Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.
- Parameters
struct (
schrodinger.structure.Structure
) – Input structurecutoff (float) – The cutoff for finding nearest neighbor atoms
- Return type
schrodinger.structure.Structure
, ,schrodinger.infra.structure.DistanceCell
,schrodinger.infra.structure.PBC
- Returns
Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.
- Raise
ValueError if struct is missing PBCs
-
schrodinger.application.matsci.mlearn.features.
elemental_generator
(struct, element, is_equal=True)[source]¶
-
schrodinger.application.matsci.mlearn.features.
get_anion
(struct)[source]¶ Get the most electronegative element in the structure (anion).
- Parameters
struct (
schrodinger.structure.Structure
) – Input structure- Return type
str, float, int
- Returns
Element, it’s electronegativity, number of anions in the cell
-
class
schrodinger.application.matsci.mlearn.features.
LatticeFeatures
(features, element='Li', cutoff=4.0)[source]¶ Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate lattice-based features.
-
FEATURES
= {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}¶
-
runFeature
(feature)[source]¶ Get result from a feature.
- Param
feature: One of the features listed in FEATURES.
- Return type
int or float
- Returns
Feature value
-
transform
(structs)[source]¶ Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – List of structures to be featurized- Return type
numpy array of shape [n_samples, n_features]
- Returns
Transformed array
-
avgAtomicVol
()[source]¶ Get average atomic volume.
- Parameters
struct (
schrodinger.structure.Structure
) – Structure to be used for feature calculation- Return type
float
- Returns
Average atomic volume (A^3)
-
avgNeighborCount
()[source]¶ Get average neighbor count.
- Return type
float
- Returns
Average neighbor count
-
stdNeighborCount
()[source]¶ Get standard deviation of neighbor count.
- Return type
float
- Returns
Average neighbor count
-
avgSublatticeEneg
()[source]¶ Get average sublattice electronegativity.
- Return type
float
- Returns
Average sublattice electronegativity
-
avgSublatticeNeighborCount
()[source]¶ Get average sublattice neighbor count.
- Return type
float
- Returns
Average sublattice neighbor count
-
avgNeighborIon
()[source]¶ Get average neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
-
stdNeighborIon
()[source]¶ Get standard deviation of neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
-
avgSublatticeNeighborIon
()[source]¶ Get average sublattice neighbor ionicity.
- Return type
float
- Returns
Average sublattice neighbor count
-
packingFraction
(skip_element=None)[source]¶ Get packing fraction of the crystal.
- Parameters
skip_element (str) – Element to skip
- Return type
float
- Returns
Packing fraction
-
effectiveRadius
(atom)[source]¶ Get atom effective radius.
- Parameters
atom (schrodinger.structure._StructureAtom) – Atom
- Return type
float
- Returns
Effective radius
-
sublatticePackingFraction
()[source]¶ Get packing fraction of the sublattice crystal.
- Return type
float
- Returns
Packing fraction
-
avgElementNeighborCount
()[source]¶ Get average element neighbor count.
- Return type
float
- Returns
Average number of bonds per element
-
avgAnionAnionShortDistance
()[source]¶ Get average anion anion shortest distance.
- Return type
float
- Returns
Average anion anion shortest distance
-
avgElementAnionShortDistance
()[source]¶ Get average element anion shortest distance.
- Return type
float
- Returns
Average element anion shortest distance
-
avgShortDistance
()[source]¶ Get average element element shortest distance.
- Return type
float
- Returns
Average element element shortest distance
-
anionFrameCoordination
()[source]¶ Get anion framework coordination.
- Return type
float
- Returns
Anion framework coordination
-
pathWidth
(eval_eneg=False)[source]¶ Evaluate average straight line path width. See the reference in the constructor for more info.
- Parameters
eval_eneg (bool) – If True, return average over electronegativity, instead of distance
- Return type
float
- Returns
Average path or electronegativity
-
pathWidthEneg
()[source]¶ Evaluate average straight line path electronegativity.
- Return type
float
- Returns
Average electronegativity along the path
-
fit
(data, data_y=None)¶ Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
data_y (numpy array of shape [n_samples]) – Target values
- Return type
- Returns
self object with fitted data
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
- yndarray of shape (n_samples,), default=None
Target values.
- **fit_paramsdict
Additional fit parameters.
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- paramsmapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- **paramsdict
Estimator parameters.
- selfobject
Estimator instance.
-
-
class
schrodinger.application.matsci.mlearn.features.
Ligand
(st, metal_atom, new_to_old, coordination_idxs)[source]¶ Bases:
object
Manage a ligand.
-
__init__
(st, metal_atom, new_to_old, coordination_idxs)[source]¶ Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structuremetal_atom (
schrodinger.structure._StructureAtom
) – the metal atomnew_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)
coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms
-
getVec
(point)[source]¶ Return a vector pointing from the metal atom to the given point.
- Parameters
point (
numpy.array
) – the point in Ang.- Return type
numpy.array
- Returns
the vector in Ang.
-
getCentroid
(st, idxs)[source]¶ Return the centroid vector of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the centroid vector in Ang.
-
getCoordinationVec
(st, idxs)[source]¶ Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the coordination vector in Ang.
-
getHapticCharacter
()[source]¶ Return the haptic character.
- Return type
int
- Returns
the haptic character
-
getBiteAngle
()[source]¶ Return the bite angle in degrees.
- Return type
float or None
- Returns
the bite angle in degrees
-
getAtomConeAngle
(atom)[source]¶ Return the cone angle for the given atom in degrees.
- Parameters
atom (
schrodinger.structure._StructureAtom
) – the atom- Return type
float
- Returns
the cone angle for the given atom in degrees
-
getConeAngle
()[source]¶ Return the cone angle in degrees.
- Return type
float
- Returns
the cone angle in degrees
-
-
class
schrodinger.application.matsci.mlearn.features.
Complex
(st, logger=None)[source]¶ Bases:
object
Manage a complex.
-
__init__
(st, logger=None)[source]¶ Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structurelogger (logging.Logger or None) – output logger or None if there isn’t one
-
getBondAngle
()[source]¶ Return the bond angle in degrees.
- Return type
float
- Returns
the bond angle in degrees
-
getVDWSurfaceArea
()[source]¶ Return the VDW surface area in Angstrom^2.
- Return type
float
- Returns
the VDW surface area in Angstrom^2
-
getVDWVolume
(vdw_scale=1, buffer_len=2)[source]¶ Return the VDW volume in Angstrom^3.
- Parameters
vdw_scale (float) – the VDW scale
buffer_len (float) – a shape buffer lengths in Angstrom
- Return type
float
- Returns
the VDW volume in Angstrom^3
-
getStructureContainingLargestLigands
()[source]¶ Return a structure containing the largest ligand or multiple copies thereof if it is symmetric.
- Return type
- Returns
the structure containing the largest ligand(s)
-
getBuriedVDWVolumePct
(vdw_scale=1)[source]¶ Return the buried VDW volume percent.
- Parameters
vdw_scale (float) – the VDW scale
- Return type
float
- Returns
the buried VDW volume percent
-
-
schrodinger.application.matsci.mlearn.features.
get_unique_titles
(sts)[source]¶ Return a list of unique titles for the given structures.
- Parameters
sts (list) – contains
schrodinger.structure.Structure
- Return type
list
- Returns
the unique titles
-
class
schrodinger.application.matsci.mlearn.features.
ComplexFeatures
(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)[source]¶ Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate features for metal complexes.
-
__init__
(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)[source]¶ Create an instance.
- Parameters
jaguar (bool) – specify whether to calculate Jaguar features
jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
tpp (int) – the number of threads for any Jaguar jobs
ligfilter (bool) – specify whether to calculate Ligfilter features
canvas (bool) – specify whether to calculate Canvas features
moldescriptors (bool or list) – specify whether to calculate Molecular Descriptors features. If it’s a list, it contains command line arguments for moldescriptors
include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
save_files (bool) – Whether to save subjob files or not
logger (logging.Logger or None) – output logger or None if there isn’t one
-
runJaguar
()[source]¶ Run Jaguar on the given structures.
- Return type
list
- Returns
contains Jaguar *.out file names
-
getFeatures
(structs, jaguar_out_files=None)[source]¶ Return features dictionary for the given structures
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurizedjaguar_out_files (list or None) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures
-
verifyJaguarOutfiles
()[source]¶ Run jaguar and get the out-files if the out-files have not been provided
-
getComplexDescriptors
()[source]¶ Create a
Complex
object for each structure and get their descriptors- Return type
dict
- Returns
The descriptors from
Complex
for each structure
-
getJaguarDescriptors
()[source]¶ Return Jaguar descriptors for all structures. Sets Jaguar atom descriptors on structures.
- Return type
dict
- Returns
The jaguar descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
-
getUtilityDescriptors
()[source]¶ Get the requested utility descriptors for all structures
- Return type
dict
- Returns
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
-
getDescriptorUtilityJob
(descriptor_utility)[source]¶ Get the job to run to generate the descriptors using the passed descriptor_utility for all structures
- Parameters
descriptor_utility (DescriptorUtility) – The descriptor utility to run to get the descriptors
- Return type
jaguarworkflows.RobustSubmissionJob
- Returns
The job to run to generate the descriptors
-
processUtilityDescriptorOutputs
(jobs_dict)[source]¶ Read the descriptors for all descriptor utilities that were run, and return them
- Parameters
jobs_dict (dict) – Dictionary with `DescriptorUtility`s as keys and jobs as values
- Return type
dict
- Returns
The descriptor utility descriptors for all structures. Keys are structure titles and values are dictionaries containing descriptors
-
getMolecularDescriptorsJob
()[source]¶ Get the job to run to generate molecular descriptors for all structures
- Return type
jaguarworkflows.RobustSubmissionJob
- Returns
The job to run to generate the descriptors
-
static
writeFingerprintFiles
(structs)[source]¶ Write fingerprint files for the given structures.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be fingerprinted- Return type
list
- Returns
the fingerprint file names
-
log
(msg, **kwargs)[source]¶ Add a message to the log file
- Parameters
msg (str) – The message to log
Additional keyword arguments are passed to the textlogger.log_msg function
-
fit
(data, data_y=None)¶ Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
data_y (numpy array of shape [n_samples]) – Target values
- Return type
- Returns
self object with fitted data
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
- yndarray of shape (n_samples,), default=None
Target values.
- **fit_paramsdict
Additional fit parameters.
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- paramsmapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- **paramsdict
Estimator parameters.
- selfobject
Estimator instance.
-
transform
(data)¶ Get numerical features. Must be implemented by a child class.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
- Return type
numpy array of shape [n_samples, n_features_new]
- Returns
Transformed array
-
-
class
schrodinger.application.matsci.mlearn.features.
CrystalNNFeatures
(preset='ops')[source]¶ Bases:
object
Calculates CrystalNN structure fingerprints as implemented in pymatgen
-
OPS_PRESET
= 'ops'¶
-
CN_PRESET
= 'cn'¶
-
__init__
(preset='ops')[source]¶ Create a structure featurizer
- Parameters
preset (str) – One of
OPS_PRESET
orCN_PRESET
class constants
-