schrodinger.application.matsci.mlearn.features module¶
Classes and functions to deal with ML features.
Copyright Schrodinger, LLC. All rights reserved.
-
class
schrodinger.application.matsci.mlearn.features.
MomentData
(flag, components, header, units)¶ Bases:
tuple
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
components
¶ Alias for field number 1
-
count
(value) → integer -- return number of occurrences of value¶
-
flag
¶ Alias for field number 0
-
header
¶ Alias for field number 2
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
units
¶ Alias for field number 3
-
-
schrodinger.application.matsci.mlearn.features.
DescriptorUtility
¶ alias of
schrodinger.application.matsci.mlearn.features.DescriptorUtilitity
-
schrodinger.application.matsci.mlearn.features.
avg_atomic_vol
(struct)¶ Get average atomic volume.
Parameters: struct ( schrodinger.structure.Structure
) – Structure to be used for feature calculationReturn type: float Returns: Average atomic volume (A^3)
-
schrodinger.application.matsci.mlearn.features.
get_distance_cell
(struct, cutoff, cell, pbc)¶ Create an infrastructure Distance Cell. Struct MUST have the Chorus box properties.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - cutoff (float) – The cutoff for finding nearest neighbor atoms
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: schrodinger.structure.Structure
, ,schrodinger.infra.structure.DistanceCell
,schrodinger.infra.structure.PBC
Returns: Supercell, an infrastructure Distance Cell that accounts for the PBC, and the pbc used to create it.
Raise: ValueError if struct is missing PBCs
- struct (
-
schrodinger.application.matsci.mlearn.features.
elemental_generator
(struct, element, is_equal=True)¶
-
schrodinger.application.matsci.mlearn.features.
avg_neighbor_count
(struct, element, cutoff, cell=None, pbc=None)¶ Get average neighbor count.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - element (str) – Element for which to compute the count
- cutoff (float) – The cutoff for finding nearest neighbor atoms
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: float
Returns: Average neighbor count
- struct (
-
schrodinger.application.matsci.mlearn.features.
std_neighbor_count
(struct, element, cutoff, cell=None, pbc=None)¶ Get standard deviation of neighbor count.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - element (str) – Element for which to compute the count
- cutoff (float) – The cutoff for finding nearest neighbor atoms
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: float
Returns: Average neighbor count
- struct (
-
schrodinger.application.matsci.mlearn.features.
avg_element_neighbor_count
(struct, cutoff, element, cell=None, pbc=None)¶ Get average element neighbor count.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - cutoff (float) – The cutoff for finding nearest neighbor atoms
- element (str) – Element for which to compute the average
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: float
Returns: Average number of bonds per element
- struct (
-
schrodinger.application.matsci.mlearn.features.
avg_sublattice_neighbor_count
(struct, element, cutoff, cell=None, pbc=None)¶ Get average sublattice neighbor count.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - element (str) – Element for which to compute the average
- cutoff (float) – The cutoff for finding nearest neighbor atoms
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: float
Returns: Average sublattice neighbor count
- struct (
-
schrodinger.application.matsci.mlearn.features.
avg_neighbor_ion
(struct, element, cutoff, cell=None, pbc=None)¶ Get average neighbor ionicity.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - element (str) – Element for which to compute the average
- cutoff (float) – The cutoff for finding nearest neighbor atoms
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: float
Returns: Average neighbor ionicity
- struct (
-
schrodinger.application.matsci.mlearn.features.
std_neighbor_ion
(struct, element, cutoff, cell=None, pbc=None)¶ Get standard deviation of neighbor ionicity.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - element (str) – Element for which to compute the average
- cutoff (float) – The cutoff for finding nearest neighbor atoms
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: float
Returns: Average neighbor ionicity
- struct (
-
schrodinger.application.matsci.mlearn.features.
avg_sublattice_neighbor_ion
(struct, element, cutoff, cell=None, pbc=None)¶ Get average sublattice neighbor ionicity.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - element (str) – Element for which to compute the average
- cutoff (float) – The cutoff for finding nearest neighbor atoms
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: float
Returns: Average sublattice neighbor count
- struct (
-
schrodinger.application.matsci.mlearn.features.
get_anion
(struct)¶ Get the most electronegative element in the structure (anion).
Parameters: struct ( schrodinger.structure.Structure
) – Input structureReturn type: str, float, int Returns: Element, it’s electronegativity, number of anions in the cell
-
schrodinger.application.matsci.mlearn.features.
vol_per_anion
(struct)¶ Get volume per anion.
Parameters: struct ( schrodinger.structure.Structure
) – Input structureReturn type: float Returns: Volume per anion
-
schrodinger.application.matsci.mlearn.features.
avg_anion_anion_short_distance
(struct, supercell)¶ Get average anion anion shortest distance.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - struct – Supercell input structure
Return type: float
Returns: Average anion anion shortest distance
- struct (
-
schrodinger.application.matsci.mlearn.features.
avg_element_anion_short_distance
(struct, element, supercell)¶ Get average element anion shortest distance.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - element (str) – Element for which to compute the average
- struct – Supercell input structure
Return type: float
Returns: Average element anion shortest distance
- struct (
-
schrodinger.application.matsci.mlearn.features.
avg_element_element_short_distance
(struct, element, supercell)¶ Get average element element shortest distance.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - element (str) – Element for which to compute the average
- struct – Supercell input structure
Return type: float
Returns: Average element element shortest distance
- struct (
-
schrodinger.application.matsci.mlearn.features.
anion_frame_coordination
(struct, supercell)¶ Get anion framework coordination.
Parameters: - struct (
schrodinger.structure.Structure
) – Input structure - struct – Supercell input structure
Return type: float
Returns: Anion framework coordination
- struct (
-
schrodinger.application.matsci.mlearn.features.
avg_sublattice_eneg
(struct, element)¶ Get average sublattice electronegativity.
Parameters: - struct (schrodinger.structure.Structure) – Input structure
- element (str) – Element which to exclude from lattice
Return type: float
Returns: Average sublattice electronegativity
-
schrodinger.application.matsci.mlearn.features.
packing_fraction
(struct, cutoff, cell=None, pbc=None)¶ Get packing fraction of the crystal.
Parameters: - struct (schrodinger.structure.Structure) – Input structure
- cutoff (float) – The cutoff for finding nearest neighbor atoms
- cell (
schrodinger.infra.structure.DistanceCell
) – Distance cell object for the struct - pbc (
schrodinger.infra.structure.PBC
) – PBC object for the struct
Return type: float
Returns: Packing fraction
-
class
schrodinger.application.matsci.mlearn.features.
LatticeFeatures
(element='Li', cutoff=4.0)¶ Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate lattice-based features.
-
__init__
(element='Li', cutoff=4.0)¶ Initialize the object.
-
transform
(structs)¶ Get numerical features from structures. Also sets features names in self.labels. See parent class for more documentation.
Parameters: structs (list( schrodinger.structure.Structure
)) – List of structures to be featurizedReturn type: numpy array of shape [n_samples, n_features] Returns: Transformed array
-
fit
(data, data_y=None)¶ Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
Parameters: - data (numpy array of shape [n_samples, n_features]) – Training set
- data_y (numpy array of shape [n_samples]) – Target values
Return type: Returns: self object with fitted data
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- deep : boolean, optional
- If True, will return the parameters for this estimator and contained subobjects that are estimators.
- params : mapping of string to any
- Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.self
-
-
class
schrodinger.application.matsci.mlearn.features.
Ligand
(st, metal_atom, new_to_old, coordination_idxs)¶ Bases:
object
Manage a ligand.
-
__init__
(st, metal_atom, new_to_old, coordination_idxs)¶ Create an instance.
Parameters: - st (
schrodinger.structure.Structure
) – the structure - metal_atom (
schrodinger.structure._StructureAtom
) – the metal atom - new_to_old (dict) – the map of new indices (extracted ligand) to old indices (original structure)
- coordination_idxs (list) – contains groups of indicies (new indices) of coordinating atoms
- st (
-
getVec
(point)¶ Return a vector pointing from the metal atom to the given point.
Parameters: point ( numpy.array
) – the point in Ang.Return type: numpy.array
Returns: the vector in Ang.
-
getCentroid
(st, idxs)¶ Return the centroid vector of the given coordination atom indices.
Parameters: - st (
schrodinger.structure.Structure
) – the structure - idxs (list) – the coordination indices
Return type: numpy.array
Returns: the centroid vector in Ang.
- st (
-
getCoordinationVec
(st, idxs)¶ Return a coordination vector pointing from the metal atom to the centroid of the given coordination atom indices.
Parameters: - st (
schrodinger.structure.Structure
) – the structure - idxs (list) – the coordination indices
Return type: numpy.array
Returns: the coordination vector in Ang.
- st (
-
getStoichiometry
()¶ Return the stoichiometry.
Return type: str Returns: the stoichiometry
-
getDenticity
()¶ Return the denticity.
Return type: int Returns: the denticity
-
getHapticity
()¶ Return the hapticity.
Return type: int Returns: the hapticity
-
getHapticCharacter
()¶ Return the haptic character.
Return type: int Returns: the haptic character
-
getBiteAngle
()¶ Return the bite angle in degrees.
Return type: float or None Returns: the bite angle in degrees
-
getAtomConeAngle
(atom)¶ Return the cone angle for the given atom in degrees.
Parameters: atom ( schrodinger.structure._StructureAtom
) – the atomReturn type: float Returns: the cone angle for the given atom in degrees
-
getConeAngle
()¶ Return the cone angle in degrees.
Return type: float Returns: the cone angle in degrees
-
getBondLength
()¶ Return the bond length in Ang.
Return type: float Returns: the bond length in Ang.
-
getDescriptors
()¶ Return descriptors.
Return type: dict Returns: (label, data) pairs
-
-
class
schrodinger.application.matsci.mlearn.features.
Complex
(st, jaguar_out_file=None, ligfilter=False, canvas=False, moldescriptors=False)¶ Bases:
object
Manage a complex.
-
__init__
(st, jaguar_out_file=None, ligfilter=False, canvas=False, moldescriptors=False)¶ Create an instance.
Parameters: - st (
schrodinger.structure.Structure
) – the structure - jaguar_out_file (str, None) – the name of a Jaguar *.out file from which descriptors will be extracted or None if there isn’t one
- ligfilter (bool) – specify whether to calculate Ligfilter features
- canvas (bool) – specify whether to calculate Canvas features
- moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
- st (
-
getMetalAtom
()¶ Return the metal atom.
Return type: schrodinger.structure._StructureAtom
Returns: the metal atom
-
getBondAngle
()¶ Return the bond angle in degrees.
Return type: float Returns: the bond angle in degrees
-
getVDWSurfaceArea
()¶ Return the VDW surface area in Angstrom^2.
Return type: float Returns: the VDW surface area in Angstrom^2
-
getVDWVolume
(vdw_scale=1)¶ Return the VDW volume in Angstrom^3.
Parameters: vdw_scale (float) – the VDW scale Return type: float Returns: the VDW volume in Angstrom^3
-
getStructureContainingLargestLigands
()¶ Return a structure containing the largest ligand or multiple copies thereof if it is symmetric.
Return type: schrodinger.structure.Structure
Returns: the structure containing the largest ligand(s)
-
getBuriedVDWVolumePct
(vdw_scale=1)¶ Return the buried VDW volume percent.
Parameters: vdw_scale (float) – the VDW scale Return type: float Returns: the buried VDW volume percent
-
getJaguarDescriptors
()¶ Return Jaguar descriptors.
Return type: dict Returns: (label, data) pairs
-
getDescriptorUtilityDescriptors
(descriptor_utility)¶ Return descriptors for the given descriptor utility.
Parameters: descriptor_utility (DescriptorUtility) – the descriptor utility to use for obtaining descriptors Return type: dict Returns: (label, data) pairs
-
getLigfilterDescriptors
()¶ Return Ligfilter descriptors.
Return type: dict Returns: (label, data) pairs
-
getCanvasDescriptors
()¶ Return Canvas descriptors.
Return type: dict Returns: (label, data) pairs
-
getMolecularDescriptorsDescriptors
()¶ Return Molecular Descriptors descriptors.
Return type: dict Returns: (label, data) pairs
-
getVectorizedDescriptors
()¶ Return vectorized descriptors which are instance specific descriptors that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
Return type: dict Returns: (label, data) pairs
-
getDescriptors
()¶ Return descriptors.
Return type: dict Returns: (label, data) pairs
-
-
schrodinger.application.matsci.mlearn.features.
get_unique_titles
(sts)¶ Return a list of unique titles for the given structures.
Parameters: sts (list) – contains schrodinger.structure.Structure
Return type: list Returns: the unique titles
-
class
schrodinger.application.matsci.mlearn.features.
ComplexFeatures
(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False)¶ Bases:
schrodinger.application.matsci.mlearn.base.BaseFeaturizer
Class to generate features for metal complexes.
-
__init__
(jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False)¶ Create an instance.
Parameters: - jaguar (bool) – specify whether to calculate Jaguar features
- jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
- jaguar_out_files (list) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for any given structures
- tpp (int) – the number of threads for any Jaguar jobs
- ligfilter (bool) – specify whether to calculate Ligfilter features
- canvas (bool) – specify whether to calculate Canvas features
- moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
- include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
-
runJaguar
(structs)¶ Run Jaguar on the given structures.
Parameters: structs (list( schrodinger.structure.Structure
)) – list of structures to be featurizedReturn type: list Returns: contains Jaguar *.out file names
-
transform
(structs)¶ Get numerical features from structures. Also sets feature names in self.labels. See parent class for more documentation.
Parameters: structs (list( schrodinger.structure.Structure
)) – list of structures to be featurizedReturn type: numpy array of shape [n_structs, n_features] Returns: transformed array
-
vectorized_transform
(structs)¶ Get vectorized features from structures. Also sets feature names in self.vectorized_labels. See parent class for more documentation.
Parameters: structs (list( schrodinger.structure.Structure
)) – list of structures to be featurizedReturn type: numpy array of shape [n_structs, n_features] Returns: transformed array
-
static
getFeatures
(structs, jaguar=False, jaguar_keywords={'basis': 'LACVP**', 'dftname': 'B3LYP'}, jaguar_out_files=None, tpp=1, ligfilter=False, canvas=False, moldescriptors=False, include_vectorized=False)¶ Return features and vectorized features dictionaries for the given structures.
Parameters: - structs (list(
schrodinger.structure.Structure
)) – list of structures to be featurized - jaguar (bool) – specify whether to calculate Jaguar features
- jaguar_keywords (OrderedDict) – if Jaguar jobs must be run to calculate the Jaguar features then specify the Jaguar keywords here
- jaguar_out_files (list) – if Jaguar features should be calculated using existing Jaguar *.out files then specify the files here using the same ordering as used for the given structures
- tpp (int) – the number of threads for any Jaguar jobs
- ligfilter (bool) – specify whether to calculate Ligfilter features
- canvas (bool) – specify whether to calculate Canvas features
- moldescriptors (bool) – specify whether to calculate Molecular Descriptors features
- include_vectorized (bool) – whether to include instance specific features that are vectorized, for example depending on the molecule’s geometric orientation, atom indexing, etc.
Return type: dict, dict
Returns: features and vectorized features dictionaries where keys are structure titles and values are dicts of feature labels and values
- structs (list(
-
static
writeFingerprintFiles
(structs)¶ Write fingerprint files for the given structures.
Parameters: structs (list( schrodinger.structure.Structure
)) – list of structures to be fingerprintedReturn type: list Returns: the fingerprint file names
-
fit
(data, data_y=None)¶ Fit and return self. Anything that evaluates properties related to the passed data should go here. For example, compute physical properties of a stucture and save them as class property, to be used in the transform method.
Parameters: - data (numpy array of shape [n_samples, n_features]) – Training set
- data_y (numpy array of shape [n_samples]) – Target values
Return type: Returns: self object with fitted data
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- X : numpy array of shape [n_samples, n_features]
- Training set.
- y : numpy array of shape [n_samples]
- Target values.
- X_new : numpy array of shape [n_samples, n_features_new]
- Transformed array.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- deep : boolean, optional
- If True, will return the parameters for this estimator and contained subobjects that are estimators.
- params : mapping of string to any
- Parameter names mapped to their values.
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.self
-