schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
-
class
schrodinger.application.matsci.mlearn.features.
MomentData
(flag, components, header, units)¶ Bases:
tuple
-
__contains__
(key, /)¶ Return key in self.
-
__len__
()¶ Return len(self).
-
components
¶ Alias for field number 1
-
count
(value, /)¶ Return number of occurrences of value.
-
flag
¶ Alias for field number 0
-
header
¶ Alias for field number 2
-
index
(value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
units
¶ Alias for field number 3
-
-
schrodinger.application.matsci.mlearn.features.
DescriptorUtility
¶ alias of
schrodinger.application.matsci.mlearn.features.DescriptorUtilitity
-
schrodinger.application.matsci.mlearn.features.
get_distance_cell
(struct, cutoff)¶ Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.
- Parameters
struct (
schrodinger.structure.Structure
) – Input structurecutoff (float) – The cutoff for finding nearest neighbor atoms
- Return type
schrodinger.structure.Structure
, ,schrodinger.infra.structure.DistanceCell
,schrodinger.infra.structure.PBC
- Returns
Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.
- Raise
ValueError if struct is missing PBCs
-
schrodinger.application.matsci.mlearn.features.
elemental_generator
(struct, element, is_equal=True)¶
-
schrodinger.application.matsci.mlearn.features.
get_anion
(struct)¶ Get the most electronegative element in the structure (anion).
- Parameters
struct (
schrodinger.structure.Structure
) – Input structure- Return type
str, float, int
- Returns
Element, it’s electronegativity, number of anions in the cell
-
class
schrodinger.application.matsci.mlearn.features.
LatticeFeatures
(features, element='Li', cutoff=4.0)¶ Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate lattice-based features.
-
FEATURES
= {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}¶
-
__init__
(features, element='Li', cutoff=4.0)¶ Initialize the object.
-
runFeature
(feature)¶ Get result from a feature.
- Param
feature: One of the features listed in FEATURES.
- Return type
int or float
- Returns
Feature value
-
transform
(structs)¶ Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – List of structures to be featurized- Return type
numpy array of shape [n_samples, n_features]
- Returns
Transformed array
-
avgAtomicVol
()¶ Get average atomic volume.
- Parameters
struct (
schrodinger.structure.Structure
) – Structure to be used for feature calculation- Return type
float
- Returns
Average atomic volume (A^3)
-
avgNeighborCount
()¶ Get average neighbor count.
- Return type
float
- Returns
Average neighbor count
-
stdNeighborCount
()¶ Get standard deviation of neighbor count.
- Return type
float
- Returns
Average neighbor count
-
avgSublatticeEneg
()¶ Get average sublattice electronegativity.
- Return type
float
- Returns
Average sublattice electronegativity
-
avgSublatticeNeighborCount
()¶ Get average sublattice neighbor count.
- Return type
float
- Returns
Average sublattice neighbor count
-
avgNeighborIon
()¶ Get average neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
-
stdNeighborIon
()¶ Get standard deviation of neighbor ionicity.
- Return type
float
- Returns
Average neighbor ionicity
-
avgSublatticeNeighborIon
()¶ Get average sublattice neighbor ionicity.
- Return type
float
- Returns
Average sublattice neighbor count
-
volPerAnion
()¶ Get volume per anion.
- Return type
float
- Returns
Volume per anion
-
packingFraction
(skip_element=None)¶ Get packing fraction of the crystal.
- Parameters
skip_element (str) – Element to skip
- Return type
float
- Returns
Packing fraction
-
effectiveRadius
(atom)¶ Get atom effective radius.
- Parameters
atom (schrodinger.structure._StructureAtom) – Atom
- Return type
float
- Returns
Effective radius
-
sublatticePackingFraction
()¶ Get packing fraction of the sublattice crystal.
- Return type
float
- Returns
Packing fraction
-
avgElementNeighborCount
()¶ Get average element neighbor count.
- Return type
float
- Returns
Average number of bonds per element
-
avgAnionAnionShortDistance
()¶ Get average anion anion shortest distance.
- Return type
float
- Returns
Average anion anion shortest distance
-
avgElementAnionShortDistance
()¶ Get average element anion shortest distance.
- Return type
float
- Returns
Average element anion shortest distance
-
avgShortDistance
()¶ Get average element element shortest distance.
- Return type
float
- Returns
Average element element shortest distance
-
anionFrameCoordination
()¶ Get anion framework coordination.
- Return type
float
- Returns
Anion framework coordination
-
pathWidth
(eval_eneg=False)¶ Evaluate average straight line path width. See the reference in the constructor for more info.
- Parameters
eval_eneg (bool) – If True, return average over electronegativity, instead of distance
- Return type
float
- Returns
Average path or electronegativity
-
pathWidthEneg
()¶ Evaluate average straight line path electronegativity.
- Return type
float
- Returns
Average electronegativity along the path
-
ratioIonicity
()¶ Get ratio ionicity.
- Return type
float
- Returns
Ratio ionicity
-
ratioCount
()¶ Get ratio neighbor count.
- Return type
float
- Returns
Ratio neighbor count
-
fit
(data, data_y=None)¶ Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
data_y (numpy array of shape [n_samples]) – Target values
- Return type
- Returns
self object with fitted data
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
- yndarray of shape (n_samples,), default=None
Target values.
- **fit_paramsdict
Additional fit parameters.
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- paramsmapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- **paramsdict
Estimator parameters.
- selfobject
Estimator instance.
-
-
class
schrodinger.application.matsci.mlearn.features.
Ligand
(st, metal_atom, new_to_old, coordination_idxs)¶ Bases:
object
Manage a ligand.
-
__init__
(st, metal_atom, new_to_old, coordination_idxs)¶ Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structuremetal_atom (
schrodinger.structure._StructureAtom
) – the metal atomnew_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)
coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms
-
getVec
(point)¶ Return a vector pointing from the metal atom to the given point.
- Parameters
point (
numpy.array
) – the point in Ang.- Return type
numpy.array
- Returns
the vector in Ang.
-
getCentroid
(st, idxs)¶ Return the centroid vector of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the centroid vector in Ang.
-
getCoordinationVec
(st, idxs)¶ Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.
- Parameters
st (
schrodinger.structure.Structure
) – the structureidxs (list) – the coordination indices
- Return type
numpy.array
- Returns
the coordination vector in Ang.
-
getStoichiometry
()¶ Return the stoichiometry.
- Return type
str
- Returns
the stoichiometry
-
getDenticity
()¶ Return the denticity.
- Return type
int
- Returns
the denticity
-
getHapticity
()¶ Return the hapticity.
- Return type
int
- Returns
the hapticity
-
getHapticCharacter
()¶ Return the haptic character.
- Return type
int
- Returns
the haptic character
-
getBiteAngle
()¶ Return the bite angle in degrees.
- Return type
float or None
- Returns
the bite angle in degrees
-
getAtomConeAngle
(atom)¶ Return the cone angle for the given atom in degrees.
- Parameters
atom (
schrodinger.structure._StructureAtom
) – the atom- Return type
float
- Returns
the cone angle for the given atom in degrees
-
getConeAngle
()¶ Return the cone angle in degrees.
- Return type
float
- Returns
the cone angle in degrees
-
getBondLength
()¶ Return the bond length in Ang.
- Return type
float
- Returns
the bond length in Ang.
-
getDescriptors
()¶ Return descriptors.
- Return type
dict
- Returns
(label, data) pairs
-
-
class
schrodinger.application.matsci.mlearn.features.
Complex
(st, jaguar_out_file=None, ligfilter=False, canvas=False, moldescriptors=False, save_files=False, logger=None)¶ Bases:
object
Manage a complex.
-
__init__
(st, jaguar_out_file=None, ligfilter=False, canvas=False, moldescriptors=False, save_files=False, logger=None)¶ Create an instance.
- Parameters
st (
schrodinger.structure.Structure
) – the structurejaguar_out_file (str, None) – the name of a Jaguar *.out file from which descriptors will be extracted or None if there isn’t one
ligfilter (bool) – specify whether to calculate Ligfilter features
canvas (bool) – specify whether to calculate Canvas features
moldescriptors (bool or list) – If a bool, specify whether to calculate Molecular Descriptors features. If a list, calculate Molecular Descriptors and use these flags on the command line for the moldescriptors utility to specify which descriptors to calculate.
save_files (bool) – Whether to save subjob files or not
logger (logging.Logger or None) – output logger or None if there isn’t one
-
getMetalAtom
()¶ Return the metal atom.
- Return type
- Returns
the metal atom
-
getBondAngle
()¶ Return the bond angle in degrees.
- Return type
float
- Returns
the bond angle in degrees
-
getVDWSurfaceArea
()¶ Return the VDW surface area in Angstrom^2.
- Return type
float
- Returns
the VDW surface area in Angstrom^2
-
getVDWVolume
(vdw_scale=1, buffer_len=2)¶ Return the VDW volume in Angstrom^3.
- Parameters
vdw_scale (float) – the VDW scale
buffer_len (float) – a shape buffer lengths in Angstrom
- Return type
float
- Returns
the VDW volume in Angstrom^3
-
getStructureContainingLargestLigands
()¶ Return a structure containing the largest ligand or multiple copies thereof if it is symmetric.
- Return type
- Returns
the structure containing the largest ligand(s)
-
getBuriedVDWVolumePct
(vdw_scale=1)¶ Return the buried VDW volume percent.
- Parameters
vdw_scale (float) – the VDW scale
- Return type
float
- Returns
the buried VDW volume percent
-
getJaguarDescriptors
()¶ Return Jaguar descriptors.
- Return type
dict
- Returns
(label, data) pairs
-
getDescriptorUtilityDescriptors
(descriptor_utility)¶ Return descriptors for the given descriptor utility.
- Parameters
descriptor_utility (DescriptorUtility) – the descriptor utility to use for obtaining descriptors
- Return type
dict
- Returns
(label, data) pairs
-
getLigfilterDescriptors
()¶ Return Ligfilter descriptors.
- Return type
dict
- Returns
(label, data) pairs
-
getCanvasDescriptors
()¶ Return Canvas descriptors.
- Return type
dict
- Returns
(label, data) pairs
-
getMolecularDescriptorsDescriptors
()¶ Return Molecular Descriptors descriptors.
- Return type
dict
- Returns
(label, data) pairs
-
getVectorizedDescriptors
()¶ Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
- Return type
dict
- Returns
(label, data) pairs
-
getDescriptors
()¶ Return descriptors.
- Return type
dict
- Returns
(label, data) pairs
-
-
schrodinger.application.matsci.mlearn.features.
get_unique_titles
(sts)¶ Return a list of unique titles for the given structures.
- Parameters
sts (list) – contains
schrodinger.structure.Structure
- Return type
list
- Returns
the unique titles
-
class
schrodinger.application.matsci.mlearn.features.
ComplexFeatures
(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶ Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate features for metal complexes.
-
__init__
(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶ Create an instance.
- Parameters
jaguar (bool) – specify whether to calculate Jaguar features
jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
jaguar_out_files (list) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures
tpp (int) – the number of threads for any Jaguar jobs
ligfilter (bool) – specify whether to calculate Ligfilter features
canvas (bool) – specify whether to calculate Canvas features
moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
save_files (bool) – Whether to save subjob files or not
logger (logging.Logger or None) – output logger or None if there isn’t one
-
runJaguar
(structs, logger=None)¶ Run Jaguar on the given structures.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurizedlogger (logging.Logger or None) – output logger or None if there isn’t one
- Return type
list
- Returns
contains Jaguar *.out file names
-
transform
(structs)¶ Get numerical features from structures. Also sets feature names in self.labels. See parent class for more documentation.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurized- Return type
numpy array of shape [n_structs, n_features]
- Returns
transformed array
-
vectorized_transform
(structs)¶ Get vectorized features from structures. Also sets feature names in self.vectorized_labels. See parent class for more documentation.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurized- Return type
numpy array of shape [n_structs, n_features]
- Returns
transformed array
-
static
getFeatures
(structs, jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP', 'iuhf': '1'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶ Return features and vectorized features dictionaries for the given structures.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurizedjaguar (bool) – specify whether to calculate Jaguar features
jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
jaguar_out_files (list) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for the given structures
tpp (int) – the number of threads for any Jaguar jobs
ligfilter (bool) – specify whether to calculate Ligfilter features
canvas (bool) – specify whether to calculate Canvas features
moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
savefiles (bool) – save files created by subjobs
logger (logging.Logger or None) – output logger or None if there isn’t one
- Return type
dict, dict
- Returns
features and vectorized features dictionaries where keys are structure titles and values are dicts of feature labels and values
-
static
writeFingerprintFiles
(structs)¶ Write fingerprint files for the given structures.
- Parameters
structs (list(
schrodinger.structure.Structure
)) – list of structures to be fingerprinted- Return type
list
- Returns
the fingerprint file names
-
fit
(data, data_y=None)¶ Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
- Parameters
data (numpy array of shape [n_samples, n_features]) – Training set
data_y (numpy array of shape [n_samples]) – Target values
- Return type
- Returns
self object with fitted data
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
X : {array-like, sparse matrix, dataframe} of shape (n_samples, n_features)
- yndarray of shape (n_samples,), default=None
Target values.
- **fit_paramsdict
Additional fit parameters.
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- paramsmapping of string to any
Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- **paramsdict
Estimator parameters.
- selfobject
Estimator instance.
-
-
class
schrodinger.application.matsci.mlearn.features.
CrystalNNFeatures
(preset='ops')¶ Bases:
object
Calculates CrystalNN structure fingerprints as implemented in pymatgen
-
OPS_PRESET
= 'ops'¶
-
CN_PRESET
= 'cn'¶
-
__init__
(preset='ops')¶ Create a structure featurizer
- Parameters
preset (str) – One of
OPS_PRESET
orCN_PRESET
class constants
-
featurize
(struct)¶ Get CrystalNN fingerprints for the passed structure
:param
structure.Structure
The structure to get features for- Return type
list
- Returns
List of CrystalNN fingerprints for the structure
-