schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
-
class
schrodinger.application.matsci.mlearn.features.
MomentData
(flag, components, header, units)¶ Bases:
tuple
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
components
¶ Alias for field number 1
-
count
()¶ Return number of occurrences of value.
-
flag
¶ Alias for field number 0
-
header
¶ Alias for field number 2
-
index
()¶ Return first index of value.
Raises ValueError if the value is not present.
-
units
¶ Alias for field number 3
-
-
schrodinger.application.matsci.mlearn.features.
DescriptorUtility
¶ alias of
schrodinger.application.matsci.mlearn.features.DescriptorUtilitity
-
schrodinger.application.matsci.mlearn.features.
get_distance_cell
(struct, cutoff)¶ Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - cutoff (float) – The cutoff for finding nearest neighbor atoms
Return type: schrodinger.structure.Structure
, ,schrodinger.infra.structure.DistanceCell
,schrodinger.infra.structure.PBC
Returns: Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.
Raise: ValueError if struct is missing PBCs
- struct (
-
schrodinger.application.matsci.mlearn.features.
elemental_generator
(struct, element, is_equal=True)¶
-
schrodinger.application.matsci.mlearn.features.
get_anion
(struct)¶ Get the most electronegative element in the structure (anion).
Parameters: struct ( schrodinger.structure.Structure
) – Input structureReturn type: str, float, int Returns: Element, it’s electronegativity, number of anions in the cell
-
class
schrodinger.application.matsci.mlearn.features.
LatticeFeatures
(features, element='Li', cutoff=4.0)¶ Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate lattice-based features.
-
FEATURES
= {'anionFrameCoordination': 'Anion frame coordination', 'avgAnionAnionShortDistance': 'Average anion anion shortest distance', 'avgAtomicVol': 'Average atomic volume', 'avgElementAnionShortDistance': 'Average cation anion shortest distance', 'avgElementNeighborCount': 'Average cation count', 'avgNeighborCount': 'Average neighbor count', 'avgNeighborIon': 'Average neighbor ionicity', 'avgShortDistance': 'Average cation cation shortest distance', 'avgSublatticeEneg': 'Average sublattice electronegativity', 'avgSublatticeNeighborCount': 'Average sublattice neighbor count', 'avgSublatticeNeighborIon': 'Average sublattice neighbor ionicity', 'packingFraction': 'Crystal packing fraction', 'pathWidth': 'Average straight-line path width', 'pathWidthEneg': 'Average straight-line path electronegativity', 'ratioCount': 'Ratio of average cation to sublattice count', 'ratioIonicity': 'Ratio of average cation to sublattice electronegativity', 'stdNeighborCount': 'Standard deviation of neighbor count', 'stdNeighborIon': 'Standard deviation of neighbor ionicity', 'sublatticePackingFraction': 'Sublattice packing fraction', 'volPerAnion': 'Volume per anion'}¶
-
__init__
(features, element='Li', cutoff=4.0)¶ Initialize the object.
-
runFeature
(feature)¶ Get result from a feature.
Param: feature: One of the features listed in FEATURES. Return type: int or float Returns: Feature value
-
transform
(structs)¶ Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.
Parameters: structs (list( schrodinger.structure.Structure
)) – List of structures to be featurizedReturn type: numpy array of shape [n_samples, n_features] Returns: Transformed array
-
avgAtomicVol
()¶ Get average atomic volume.
Parameters: struct ( schrodinger.structure.Structure
) – Structure to be used for feature calculationReturn type: float Returns: Average atomic volume (A^3)
-
avgNeighborCount
()¶ Get average neighbor count.
Return type: float Returns: Average neighbor count
-
stdNeighborCount
()¶ Get standard deviation of neighbor count.
Return type: float Returns: Average neighbor count
-
avgSublatticeEneg
()¶ Get average sublattice electronegativity.
Return type: float Returns: Average sublattice electronegativity
-
avgSublatticeNeighborCount
()¶ Get average sublattice neighbor count.
Return type: float Returns: Average sublattice neighbor count
-
avgNeighborIon
()¶ Get average neighbor ionicity.
Return type: float Returns: Average neighbor ionicity
-
stdNeighborIon
()¶ Get standard deviation of neighbor ionicity.
Return type: float Returns: Average neighbor ionicity
-
avgSublatticeNeighborIon
()¶ Get average sublattice neighbor ionicity.
Return type: float Returns: Average sublattice neighbor count
-
volPerAnion
()¶ Get volume per anion.
Return type: float Returns: Volume per anion
-
packingFraction
(skip_element=None)¶ Get packing fraction of the crystal.
Parameters: skip_element (str) – Element to skip Return type: float Returns: Packing fraction
-
effectiveRadius
(atom)¶ Get atom effective radius.
Parameters: atom (schrodinger.structure._StructureAtom) – Atom Return type: float Returns: Effective radius
-
sublatticePackingFraction
()¶ Get packing fraction of the sublattice crystal.
Return type: float Returns: Packing fraction
-
avgElementNeighborCount
()¶ Get average element neighbor count.
Return type: float Returns: Average number of bonds per element
-
avgAnionAnionShortDistance
()¶ Get average anion anion shortest distance.
Return type: float Returns: Average anion anion shortest distance
-
avgElementAnionShortDistance
()¶ Get average element anion shortest distance.
Return type: float Returns: Average element anion shortest distance
-
avgShortDistance
()¶ Get average element element shortest distance.
Return type: float Returns: Average element element shortest distance
-
anionFrameCoordination
()¶ Get anion framework coordination.
Return type: float Returns: Anion framework coordination
-
pathWidth
(eval_eneg=False)¶ Evaluate average straight line path width. See the reference in the constructor for more info.
Parameters: eval_eneg (bool) – If True, return average over electronegativity, instead of distance Return type: float Returns: Average path or electronegativity
-
pathWidthEneg
()¶ Evaluate average straight line path electronegativity.
Return type: float Returns: Average electronegativity along the path
-
ratioIonicity
()¶ Get ratio ionicity.
Return type: float Returns: Ratio ionicity
-
ratioCount
()¶ Get ratio neighbor count.
Return type: float Returns: Ratio neighbor count
-
fit
(data, data_y=None)¶ Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
Parameters: - data (numpy array of shape [n_samples, n_features]) – Training set
- data_y (numpy array of shape [n_samples]) – Target values
Return type: Returns: self object with fitted data
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- **fit_params : dict
- Additional fit parameters.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- deep : bool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators.
- params : mapping of string to any
- Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- **params : dict
- Estimator parameters.
- self : object
- Estimator instance.
-
-
class
schrodinger.application.matsci.mlearn.features.
Ligand
(st, metal_atom, new_to_old, coordination_idxs)¶ Bases:
object
Manage a ligand.
-
__init__
(st, metal_atom, new_to_old, coordination_idxs)¶ Create an instance.
Parameters: - st (
schrodinger.structure.Structure
) – the structure - metal_atom (
schrodinger.structure._StructureAtom
) – the metal atom - new_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)
- coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms
- st (
-
getVec
(point)¶ Return a vector pointing from the metal atom to the given point.
Parameters: point ( numpy.array
) – the point in Ang.Return type: numpy.array
Returns: the vector in Ang.
-
getCentroid
(st, idxs)¶ Return the centroid vector of the given coordination atom indices.
Parameters: - st (
schrodinger.structure.Structure
) – the structure - idxs (list) – the coordination indices
Return type: numpy.array
Returns: the centroid vector in Ang.
- st (
-
getCoordinationVec
(st, idxs)¶ Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.
Parameters: - st (
schrodinger.structure.Structure
) – the structure - idxs (list) – the coordination indices
Return type: numpy.array
Returns: the coordination vector in Ang.
- st (
-
getStoichiometry
()¶ Return the stoichiometry.
Return type: str Returns: the stoichiometry
-
getDenticity
()¶ Return the denticity.
Return type: int Returns: the denticity
-
getHapticity
()¶ Return the hapticity.
Return type: int Returns: the hapticity
-
getHapticCharacter
()¶ Return the haptic character.
Return type: int Returns: the haptic character
-
getBiteAngle
()¶ Return the bite angle in degrees.
Return type: float or None Returns: the bite angle in degrees
-
getAtomConeAngle
(atom)¶ Return the cone angle for the given atom in degrees.
Parameters: atom ( schrodinger.structure._StructureAtom
) – the atomReturn type: float Returns: the cone angle for the given atom in degrees
-
getConeAngle
()¶ Return the cone angle in degrees.
Return type: float Returns: the cone angle in degrees
-
getBondLength
()¶ Return the bond length in Ang.
Return type: float Returns: the bond length in Ang.
-
getDescriptors
()¶ Return descriptors.
Return type: dict Returns: (label, data) pairs
-
-
class
schrodinger.application.matsci.mlearn.features.
Complex
(st, jaguar_out_file=None, ligfilter=False, canvas=False, moldescriptors=False, save_files=False, logger=None)¶ Bases:
object
Manage a complex.
-
__init__
(st, jaguar_out_file=None, ligfilter=False, canvas=False, moldescriptors=False, save_files=False, logger=None)¶ Create an instance.
Parameters: - st (
schrodinger.structure.Structure
) – the structure - jaguar_out_file (str, None) – the name of a Jaguar *.out file from which descriptors will be extracted or None if there isn’t one
- ligfilter (bool) – specify whether to calculate Ligfilter features
- canvas (bool) – specify whether to calculate Canvas features
- moldescriptors (bool or list) – If a bool, specify whether to calculate Molecular Descriptors features. If a list, calculate Molecular Descriptors and use these flags on the command line for the moldescriptors utility to specify which descriptors to calculate.
- save_files (bool) – Whether to save subjob files or not
- logger (logging.Logger or None) – output logger or None if there isn’t one
- st (
-
getMetalAtom
()¶ Return the metal atom.
Return type: schrodinger.structure._StructureAtom
Returns: the metal atom
-
getBondAngle
()¶ Return the bond angle in degrees.
Return type: float Returns: the bond angle in degrees
-
getVDWSurfaceArea
()¶ Return the VDW surface area in Angstrom^2.
Return type: float Returns: the VDW surface area in Angstrom^2
-
getVDWVolume
(vdw_scale=1, buffer_len=2)¶ Return the VDW volume in Angstrom^3.
Parameters: - vdw_scale (float) – the VDW scale
- buffer_len (float) – a shape buffer lengths in Angstrom
Return type: float
Returns: the VDW volume in Angstrom^3
-
getStructureContainingLargestLigands
()¶ Return a structure containing the largest ligand or multiple copies thereof if it is symmetric.
Return type: schrodinger.structure.Structure
Returns: the structure containing the largest ligand(s)
-
getBuriedVDWVolumePct
(vdw_scale=1)¶ Return the buried VDW volume percent.
Parameters: vdw_scale (float) – the VDW scale Return type: float Returns: the buried VDW volume percent
-
getJaguarDescriptors
()¶ Return Jaguar descriptors.
Return type: dict Returns: (label, data) pairs
-
getDescriptorUtilityDescriptors
(descriptor_utility)¶ Return descriptors for the given descriptor utility.
Parameters: descriptor_utility (DescriptorUtility) – the descriptor utility to use for obtaining descriptors Return type: dict Returns: (label, data) pairs
-
getLigfilterDescriptors
()¶ Return Ligfilter descriptors.
Return type: dict Returns: (label, data) pairs
-
getCanvasDescriptors
()¶ Return Canvas descriptors.
Return type: dict Returns: (label, data) pairs
-
getMolecularDescriptorsDescriptors
()¶ Return Molecular Descriptors descriptors.
Return type: dict Returns: (label, data) pairs
-
getVectorizedDescriptors
()¶ Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
Return type: dict Returns: (label, data) pairs
-
getDescriptors
()¶ Return descriptors.
Return type: dict Returns: (label, data) pairs
-
-
schrodinger.application.matsci.mlearn.features.
get_unique_titles
(sts)¶ Return a list of unique titles for the given structures.
Parameters: sts (list) – contains schrodinger.structure.Structure
Return type: list Returns: the unique titles
-
class
schrodinger.application.matsci.mlearn.features.
ComplexFeatures
(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶ Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate features for metal complexes.
-
__init__
(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶ Create an instance.
Parameters: - jaguar (bool) – specify whether to calculate Jaguar features
- jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
- jaguar_out_files (list) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures
- tpp (int) – the number of threads for any Jaguar jobs
- ligfilter (bool) – specify whether to calculate Ligfilter features
- canvas (bool) – specify whether to calculate Canvas features
- moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
- include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
- save_files (bool) – Whether to save subjob files or not
- logger (logging.Logger or None) – output logger or None if there isn’t one
-
runJaguar
(structs, logger=None)¶ Run Jaguar on the given structures.
Parameters: - structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurized - logger (logging.Logger or None) – output logger or None if there isn’t one
Return type: list
Returns: contains Jaguar *.out file names
- structs (list(
-
transform
(structs)¶ Get numerical features from structures. Also sets feature names in self.labels. See parent class for more documentation.
Parameters: structs (list( schrodinger.structure.Structure
)) – list of structures to be featurizedReturn type: numpy array of shape [n_structs, n_features] Returns: transformed array
-
vectorized_transform
(structs)¶ Get vectorized features from structures. Also sets feature names in self.vectorized_labels. See parent class for more documentation.
Parameters: structs (list( schrodinger.structure.Structure
)) – list of structures to be featurizedReturn type: numpy array of shape [n_structs, n_features] Returns: transformed array
-
static
getFeatures
(structs, jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False, save_files=False, logger=None)¶ Return features and vectorized features dictionaries for the given structures.
Parameters: - structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurized - jaguar (bool) – specify whether to calculate Jaguar features
- jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
- jaguar_out_files (list) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for the given structures
- tpp (int) – the number of threads for any Jaguar jobs
- ligfilter (bool) – specify whether to calculate Ligfilter features
- canvas (bool) – specify whether to calculate Canvas features
- moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
- include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
- savefiles (bool) – save files created by subjobs
- logger (logging.Logger or None) – output logger or None if there isn’t one
Return type: dict, dict
Returns: features and vectorized features dictionaries where keys are structure titles and values are dicts of feature labels and values
- structs (list(
-
static
writeFingerprintFiles
(structs)¶ Write fingerprint files for the given structures.
Parameters: structs (list( schrodinger.structure.Structure
)) – list of structures to be fingerprintedReturn type: list Returns: the fingerprint file names
-
fit
(data, data_y=None)¶ Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
Parameters: - data (numpy array of shape [n_samples, n_features]) – Training set
- data_y (numpy array of shape [n_samples]) – Target values
Return type: Returns: self object with fitted data
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- **fit_params : dict
- Additional fit parameters.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- deep : bool, default=True
- If True, will return the parameters for this estimator and contained subobjects that are estimators.
- params : mapping of string to any
- Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- **params : dict
- Estimator parameters.
- self : object
- Estimator instance.
-
-
class
schrodinger.application.matsci.mlearn.features.
CrystalNNFeatures
(preset='ops')¶ Bases:
object
Calculates CrystalNN structure fingerprints as implemented in pymatgen
-
OPS_PRESET
= 'ops'¶
-
CN_PRESET
= 'cn'¶
-
__init__
(preset='ops')¶ Create a structure featurizer
Parameters: preset (str) – One of OPS_PRESET
orCN_PRESET
class constants
-
featurize
(struct)¶ Get CrystalNN fingerprints for the passed structure
:param
structure.Structure
The structure to get features forReturn type: list Returns: List of CrystalNN fingerprints for the structure
-