schrodinger.application.desmond.ana module¶
Desmond analyses
Copyright Schrodinger, LLC. All rights reserved.
-
class
schrodinger.application.desmond.ana.
DSC
¶ Bases:
schrodinger.application.desmond.constants.Constants
Data selection codes. See
select_data
below for its usage.-
ANY_VALUE
= '<<any-value>>'¶
-
NO_VALUE
= '<<absence>>'¶
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
-
schrodinger.application.desmond.ana.
calc_time_series
(requests, model_fname, traj_fname)¶ Returns: list
of analysis results
-
schrodinger.application.desmond.ana.
calc_prob_profile
(data, bin_width, min, max, is_periodic=False)¶ FIXME: To be added.
-
class
schrodinger.application.desmond.ana.
ForEachDo
¶ Bases:
tuple
An advanced tuple container that is able to apply any method call to itself to all its elements. For example:
a = ForEachDo([" a ", "b", " c"]) # Constructs a `ForEachDo` object with the three string elements. assert isinstance(a, tuple) # A `ForEachDo` instance is really a `tuple` instance. assert ("a", "b", "c") == a.strip() # `strip()` is applied to each element, and the results are aggregated # into a tuple.
-
__contains__
¶ Return key in self.
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
__len__
¶ Return len(self).
-
count
(value) → integer -- return number of occurrences of value¶
-
index
(value[, start[, stop]]) → integer -- return first index of value.¶ Raises ValueError if the value is not present.
-
-
exception
schrodinger.application.desmond.ana.
CompositeKeySyntaxError
¶ Bases:
SyntaxError
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
filename
¶ exception filename
-
lineno
¶ exception lineno
-
msg
¶ exception msg
-
offset
¶ exception offset
-
print_file_and_line
¶ exception print_file_and_line
-
text
¶ exception text
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
exception
schrodinger.application.desmond.ana.
ArkDbGetError
¶ Bases:
KeyError
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
exception
schrodinger.application.desmond.ana.
ArkDbPutError
¶ Bases:
Exception
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
exception
schrodinger.application.desmond.ana.
ArkDbDelError
¶ Bases:
Exception
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
exception
schrodinger.application.desmond.ana.
SubtaskExecutionError
¶ Bases:
RuntimeError
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
args
¶
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
schrodinger.application.desmond.ana.
ArkDb
(fname=None, string=None, db=None)¶ Bases:
object
Abstracts the key-value database where analysis results are stored.
-
__init__
(fname=None, string=None, db=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
val
¶
-
get
(key: str, default=<class 'schrodinger.application.desmond.ana.ArkDbGetError'>)¶ Gets a value keyed by
key
. Note thatNone
is a normal return value and does NOT mean that the key was not found.Raises: - CompositeKeySyntaxError – if
key
has a syntax error. You normally should NOT catch this exception, because this means your code has a syntactical error. - ArkDbGetError – if
key
is not found in the database. You can optionally change raising the exception to returning a default value by specifying the “default” argument.
Explanation on
key
’s value: - The value is generally a composite key like “a.b.c[1].d”, where “a”,“b”, “c”, “[1]”, and “d” are the subkeys or array-indices at each hierarchical level.For array indices, sometimes the exact number is unknown a priori, e.g., “ResultLambda0.Keywords[<number>].ProtLigInter”, where the <number> cannot be specified in the source code. For cases like this, we have to iterate over the “ResultLambda0.Keywords” list and find “ProtLigInter” by matching the keyword. Note that it’s possible (at least in principle) that there may be multiple matching elements.
In order to express the above indexing ideas, we introduce four new syntax components here: - [i] Iterates over elements in the list and returns the first
matching element. For getting, putting, finding, and deleting.
- [*] Iterates over elements in the list and returns a tuple of all
- matching elements. Only for getting, finding, and deleting.
- [$] Insert at the end of the list. Only for putting.
- [@] Similar to
[$]
except that this is for insertion into an - arbitrary position in the list. This is to be used with a
number immediately followed, e.g.,
[@]123
, and the number specifies the position in the list. Only for putting.
- [@] Similar to
We may call these meta-indices. Examples: - “ResultLambda0.Keywords[i].ProtLigInter”
- Gets the first “ProtLigInter” data.
“ResultLambda0.Keywords[*].ProtLigInter” - Gets all “ProtLigInter” data, and returns a tuple.
“ResultLambda0.Keywords[@]0.ProtLigInter” - Inserts a new “ProtLigInter” data at “ResultLambda0.Keywords[0]” - Note the difference from using “ResultLambda0.Keywords[0]”, which
is to change the existing data.
“ResultLambda0.Keywords[$].ProtLigInter” - Appends a new “ProtLigInter” data to “ResultLambda0.Keywords”.
- CompositeKeySyntaxError – if
-
put
(key: str, value)¶ Puts a value associated with the given key into this database.
value
can be either of a scalar type, or oflist
, or an emptydict
({}
), or ofsea.Sea
.key
can be a composite key, see the docstring ofArkDb.get
for detail.Raises: - CompositeKeySyntaxError – if
key
has a syntax error. You normally should NOT catch this exception, because this means your code has a syntactical error. - ArkDbPutError – if putting failed.
- CompositeKeySyntaxError – if
-
delete
(key: str, matches: Union[str, Iterable[str], None] = None, ignore_badkey=False)¶ Deletes a given
key
and the value from the database. If thekey
is not found,ArkDbDelError
will be raised unlessignore_badkey
isTrue
.matches
, if specified, provides one or more key-value pairs for checking on the value. If and only if all key-value pairs are found in the value, the key and the value will be deleted from the database. Each key-value pair is a string in the format of “<key>=<value>”. Note that the key and the value are connected by a single “=” symbol, no spaces allowed in the connection. Key is in the extended standard composite format (see the docstring of theArkDb
class above). Value is in the ARK format (note that spaces are allowed in the value). The value part is optional, when it’s missing, the “=” symbol should be absent as well, and this function will only look for the key indb
and disregard the value.Examples:
db.delete("a.b.c") db.delete("a.b.d[i].e") db.delete("a.b.d[i]", matches="e") db.delete("a.b.d[i]", matches=("e=5", "h=10"))
-
find
()¶ Finds the given
key
and returns the corresponding data as aForEachDo
object. TheForEachDo
object allows to iterate over the found data, each as a newArkDb
(or its subclass) object. It also allows us to concatenate operations on the found data.Example:
db.find("stage[*].simulate").put("ensemble", "NVT") # Resets all simulate stages' "ensemble" parameter's value to "NVT".
If the key is not found, this method will return
()
(i.e., empty tuple).Parameters: picker – This is to cherry-pick the found data. The follow types or values are supported: - None All found data will be returned. - int Among the found data, a single datum as indexed by
picker
will be returned. The index is zero-based.- List[int] Among the found data, multiple data as indexed by
picker
’s elements will be returned. The indices are zero-based.
- Callable
picker
will be called on each found data, and the - results will be
filter
-ed and returned.
- Callable
Example:
db.find("stage[*].task", picker=1) .put("set_family.simulate.temperature", 300) # Mutates the second "task" stage. db.find("stage[*].simulate.restrain", picker=lambda x: x.parent()) .put("temperature", 400) # For any simulate stages with "restrain" setting, resets temperature # to 400.
-
write
(fname: str)¶
-
-
class
schrodinger.application.desmond.ana.
Datum
(key: Optional[str], val=None)¶ Bases:
object
An instance of this class represents a particular datum in the database. A datum could be a scalar value, or a list/dict object. Each datum is assigned a key for identification in the database. The key can be accessed via the
key
public attribute. The actual value of the datum is obtained by theval
public attribute.N.B.: A limitation on the
val
’s value: For putting, the value cannot be adict
object.-
__init__
(key: Optional[str], val=None)¶ Creates a
Datum
object with the givenkey
and the default valueval
.key
’s value can beNone
, and in this case theget_from
method will always return the default valueval
.
-
key
¶
-
get_from
(arkdb)¶ Gets the value of this datum from the database
arkdb
. The new value is used to update the public attributeval
and is also returned.Raises: - ArkDbGetError – if the key is not found in the database.
- CompositeKeySyntaxError – if the key has a syntax error.
-
put_to
(arkdb)¶ Saves the value of this datum into the database
arkdb
.Raises: - ArkDbPutError – if saving the datum fails.
- CompositeKeySyntaxError – if the key has a syntax error.
-
del_from
(arkdb)¶ Deletes the key and the value of this datum from the database
arkdb
, Noop if the key isNone
.Raises: - ArkDbDelError – if the key is not found in the database.
- CompositeKeySyntaxError – if the key has a syntax error.
-
-
class
schrodinger.application.desmond.ana.
Premise
(key)¶ Bases:
schrodinger.application.desmond.ana.Datum
A premise here is a datum that must be available for a task (see the definition below) to be successfully executed.
-
__init__
(key)¶ Creates a
Datum
object with the givenkey
and the default valueval
.key
’s value can beNone
, and in this case theget_from
method will always return the default valueval
.
-
del_from
(arkdb)¶ Deletes the key and the value of this datum from the database
arkdb
, Noop if the key isNone
.Raises: - ArkDbDelError – if the key is not found in the database.
- CompositeKeySyntaxError – if the key has a syntax error.
-
get_from
(arkdb)¶ Gets the value of this datum from the database
arkdb
. The new value is used to update the public attributeval
and is also returned.Raises: - ArkDbGetError – if the key is not found in the database.
- CompositeKeySyntaxError – if the key has a syntax error.
-
key
¶
-
put_to
(arkdb)¶ Saves the value of this datum into the database
arkdb
.Raises: - ArkDbPutError – if saving the datum fails.
- CompositeKeySyntaxError – if the key has a syntax error.
-
-
class
schrodinger.application.desmond.ana.
Option
(key: Optional[str], val=None)¶ Bases:
schrodinger.application.desmond.ana.Datum
An option here is a datum that does NOT have to be available for a task (see the definition below) to be successfully executed.
-
__init__
(key: Optional[str], val=None)¶ Creates a
Datum
object with the givenkey
and the default valueval
.key
’s value can beNone
, and in this case theget_from
method will always return the default valueval
.
-
del_from
(arkdb)¶ Deletes the key and the value of this datum from the database
arkdb
, Noop if the key isNone
.Raises: - ArkDbDelError – if the key is not found in the database.
- CompositeKeySyntaxError – if the key has a syntax error.
-
get_from
(arkdb)¶ Gets the value of this datum from the database
arkdb
. The new value is used to update the public attributeval
and is also returned.Raises: - ArkDbGetError – if the key is not found in the database.
- CompositeKeySyntaxError – if the key has a syntax error.
-
key
¶
-
put_to
(arkdb)¶ Saves the value of this datum into the database
arkdb
.Raises: - ArkDbPutError – if saving the datum fails.
- CompositeKeySyntaxError – if the key has a syntax error.
-
-
schrodinger.application.desmond.ana.
select_data
(data: Iterable[schrodinger.application.desmond.ana.Datum], **match) → List[schrodinger.application.desmond.ana.Datum]¶ The following are from the real world:
- Keywords = [
- {RMSD = {
- ASL = “((protein and not (m.n 3) and backbone) and not (a.e H) )” Frame = 0 Panel = pl_interact_survey Result = [8.57678438812e-15 0.837188833342 ] SelectionType = Backbone Tab = pl_rmsd_tab Type = ASL Unit = Angstrom
}
}
- {RMSD = {
- ASL = “m.n 1” FitBy = “protein and not (m.n 3)” Frame = 0 Panel = pl_interact_survey Result = [3.54861302804e-15 1.36992917763] SelectionType = Ligand Tab = pl_rmsd_tab Type = Ligand Unit = Angstrom UseSymmetry = true
}
}
]
There are two dict data keyed by “RMSD”. If, for example, we want to select the one with “SelectionType” being “Ligand”, we can use this function for that:
rmsds = arkdb.get(“Keywords[*].RMSD”) select_data(rmsds, SelectionType=”Ligand”)Parameters: **match – Key-value pairs for matching
data
.data
’s elements should bedict
objects. All elements that have all key-value pairs specified bymatch
are returned. Note that for floating numbers, if the relative or the absolute difference is less than 1E-7, the two numbers are considered the same.See
DSC
above for special codes to be used inmatch
’s values. This function returns an empty list if no matches found.
-
schrodinger.application.desmond.ana.
expect_single_datum
(data, exc, **match)¶ Similar to
select_data
, except that this function expects one and only onedict
object that matches. If that’s not the case, an exception of the typetype(exc)
will be raised. The error message ofexc
is used to describe thekey
used to getdata
. On success, a singledict
object is returned.
-
class
schrodinger.application.desmond.ana.
Task
(name: str, subtasks: Optional[List] = None)¶ Bases:
object
This is a base class. An instance of this class defines a concrete task to be executed. All subclasses are expected to implement the
__init__
and theexecute
methods. Theexecute
should be either a public callable attribute or a public method. SeeParchTrajectoryForFepLambda
below for example.A task can be composed of one or more subtasks. The relationship among the premises of this task and its subtasks is the following: - If this task’s premises are not met, no subtasks will be executed. - Failure of one subtask will NOT affect other subtasks being executed.
Six public attributes/properties: - name: An arbitrary name for the task. Useful for error logging. - is_completed - A boolean value indicating if the particular task has been
completed successfully.- results - A list of
Datum
objects as the results of the execution of the task. The data will be automatically put into the dababase. - log - A list of strings recording the error messages (if any) during the last execution of the task. The list is empty if there was no errors at all.
- premises - A list of lists of
Premise
objects. The first list are the premises of thisTask
object, followed by that of the first subtask, and then of the second subtask, and so on. Each element list can be empty. - options - Similar to
premises
except that the object type isOption
.
-
__init__
(name: str, subtasks: Optional[List] = None)¶ Parameters: name – An arbitrary name. Useful for error logging.
-
premises
¶
-
options
¶
-
clear
()¶ Cleans the state of this object for a new execution.
-
execute
(db: schrodinger.application.desmond.ana.ArkDb)¶ Executes this task. This should only be called after all premises of this task are met. The premises of the subtasks are ignored until the subtask is executed. Subclasses should implement an
execute
, either as an instance method, or as an instance’s public callable attribute. After execution, all results desired to be put into the database should be saved as theresults
attribute.The first argument of
execute
should always be for the database.
- results - A list of
-
class
schrodinger.application.desmond.ana.
ParchTrajectoryForFepLambda
(name, fep_lambda: int, cms_fname_pattern: str, trj_fname_pattern: str, out_bname_pattern: str, num_solvent: int = 200)¶ Bases:
schrodinger.application.desmond.ana.Task
Task to parch the trajectory for the given given FEP lambda state. The lambda state is represented by 0 and 1.
Results are all
Datum
objects: - key = “ResultLambda{fep_lambda}.ParchedTrajectoryFileName”, where{fep_lambda}
is the value of the lambda state. val = Name of the parched trajectory fileWe leave this class here (1) to explain how the framework basically works and (2) to demonstrate how to create a concrete
Task
subclass.Introduction From the architectural point of view, one of the common and difficult issues in computation is perhaps data coupling: Current computation needs data produced by previous ones. It’s difficult because the coupling is implicit and across multiple programming units/modules/files, which often results in bugs when code change in one place implicitly breaks code somewhere else.
Taking this class as an example, the task is trivial when explained at the conceptual level: Call the
trj_parch.py
script with properly set options to generated a “parched” trajectory. But when we get to the detail to incorporate this task in a workflow, it becomes very complicated, mostly because of the data coupling issue (which is the devil here): From the view point of this task, we have to check the following data dependencies: 1. The input files (the output CMS file and the trajectory file) exist. 2. We identify the input files by file name patterns that depend on thecurrent jobname which is supposed to be stored in a (.sid) data file. So we have to ensure the jobname exists in the database. (Alternatively, we can pass the jobname through a series of function calls, but we won’t discuss about the general issues of that approach)
- To call trj_parch.py, we must set the
-dew-asl
and-fep-lambda
options correctly. The value for these options are either stored in .sid data file or passed into this class via an argument of the__init__
method.
Furthermore, when any of these conditions are not met, informative errors messages must be logged. All of these used to force the developer to write a LOT of biolerplate code to get/put data from the database, to check these conditions, and to log all errors, for even the most conceptually trivial task. So often than not, such boring (and repeated) code is either incomplete or not in place at all. And we take the risk of doing computations without verifying the data dependencies, until some code changes break one of the conditions.
- To call trj_parch.py, we must set the
Four types of data We must realize where the coupling comes into the architecture of our software. For this, it helps to categorize data into the following types in terms of the source of the data: 1. Hard coded data
- This type of data is hard coded and rarely needs to be modified
customized. Example,
num_solvent=200
.
- Arguments
- Data passed into the function by the caller code. Example,
fep_lambda
.
- From the database
- Examples: jobname, ligand ASL, number of lambda windows.
- Assumptions
- Assumptions are data generated by previous stages in a workflow but are out of the control of the task of interest. For example, we have to assume the CMS and trajectory files following certain naming patterns exist in the file system. In theory, the less assumptions, the more robust the code. But in practice, it is very difficult (if not impossible) to totally avoid assumptions.
Implicit data coupling happens for the types (3) and (4) data.
- This type of data is hard coded and rarely needs to be modified
customized. Example,
The task framework The basic idea of this framework is to make the types (3) and (4) data more explicitly and easily defined in our code, which will then make it possible to automatically check their availabilities and log errors. For the type (3) data, we provide
Premise
andOption
classes for getting the data. For the type (4) data, we have to rely on a convention to verify the assumpations. But utility functions are provided to make that easier and idiomatic. In both cases, when the data are unavailable, informative error messages will be automatically logged. The goal of this framework is to relieve the developer from writing a lot of biolerplate code and shift their attentions to writing reusable tasks.
-
__init__
(name, fep_lambda: int, cms_fname_pattern: str, trj_fname_pattern: str, out_bname_pattern: str, num_solvent: int = 200)¶ The values of the arguments:
cms_fname_pattern
,trj_fname_pattern
, andout_bname_pattern
, are simple strings that specify f-string patterns to be evaluated yet to get the corresponding file names. Example,"{jobname}_replica_{index}-out.cms"
, note that it’s a simple string and uses two f-string variables{jobname}
and{index}
. The values of the f-string variables will be obtained on the fly when the task is executed. Currently, the following f-string variables are available for this task:{jobname} - The FEP job’s name {fep_lambda} - Same value as that of the argument
fep_lambda
. It’seither 0 or 1.- {index} - The index number of the replica corresponding to either
- the first lambda window or the last one, depending on
the value of the
fep_lambda
argument.
-
execute
(db: schrodinger.application.desmond.ana.ArkDb)¶ Executes this task. This should only be called after all premises of this task are met. The premises of the subtasks are ignored until the subtask is executed. Subclasses should implement an
execute
, either as an instance method, or as an instance’s public callable attribute. After execution, all results desired to be put into the database should be saved as theresults
attribute.The first argument of
execute
should always be for the database.
-
clear
()¶ Cleans the state of this object for a new execution.
-
options
¶
-
premises
¶
-
class
schrodinger.application.desmond.ana.
ParchTrajectoryForFep
(name, num_solvent=200)¶ Bases:
schrodinger.application.desmond.ana.Task
Task to generate parched trajectories for both FEP lambda states. The lambda state is represented by 0 and 1.
Results are all
Datum
objects: - key = “ResultLambda0.ParchedCmsFname”val = Name of the parched CMS file for lambda state 0: “lambda0-out.cms”key = “ResultLambda1.ParchedCmsFname” val = Name of the parched CMS file for lambda state 1: “lambda1-out.cms”
key = “ResultLambda0.ParchedTrjFname” val = Name of the parched trajectory file for lambda state 0:
“lambda0{ext}”, where “{ext}” is the same extension of the input trajectory file name.
key = “ResultLambda1.ParchedTrjFname” val = Name of the parched trajectory file for lambda state 1:
“lambda0{ext}”, where “{ext}” is the same extension of the input trajectory file name.
We leave this class here to demonstrate how to define a concrete
Task
subclass by composition.-
__init__
(name, num_solvent=200)¶ Parameters: name – An arbitrary name. Useful for error logging.
-
clear
()¶ Cleans the state of this object for a new execution.
-
execute
(db: schrodinger.application.desmond.ana.ArkDb)¶ Executes this task. This should only be called after all premises of this task are met. The premises of the subtasks are ignored until the subtask is executed. Subclasses should implement an
execute
, either as an instance method, or as an instance’s public callable attribute. After execution, all results desired to be put into the database should be saved as theresults
attribute.The first argument of
execute
should always be for the database.
-
options
¶
-
premises
¶
-
schrodinger.application.desmond.ana.
execute
(arkdb: schrodinger.application.desmond.ana.ArkDb, tasks: Iterable[schrodinger.application.desmond.ana.Task]) → bool¶ Executes one or more tasks against the given database
arkdb
.This function is guaranteed to do the following: 1. This function will examine each task’s premises against the database. 2. If the premises are NOT met, it skips the task; otherwise, it will
proceed to check the task’s options against the database.- After getting the premises and options data, it will call the task’s
execute
callable object. If the execution of the task is completed without errors, it will set the task’sis_completed
attribute to true. - During the above steps, errors (if any) will be logged in the task’s
log
list. - After doing the above for all tasks, this function will return
True
if all tasks are completed without errors, orFalse
otherwise.
- After getting the premises and options data, it will call the task’s
-
schrodinger.application.desmond.ana.
collect_logs
(tasks: Iterable[schrodinger.application.desmond.ana.Task]) → List[str]¶ Iterates over the given
Task
objects, and aggregates the logs of uncompleted tasks into a list to return. The returned strings can be joined and printed out:print(““.join(collect_logs(…)))
and the text will look like the following:- task0: Task
- message another message another multiword message
- task1: ConcreteTaskForTesting
message another arbitrary message another completely arbitrary message
Note that the above is just an example to demostrate the format as explained further below. Do NOT take the error messages literally. And all the error messages here are unrelated to each other, and any patterns you might see is unintended!
So for each uncompleted task, the name and the class’ name of the task will be printed out, and following that are the error messages of the task, each in a separate line indented by 2 spaces.
Note the purpose of returning a list of strings instead of a single string is to make it slightly easier to further indent the text. For example, if you want to indent the whole text by two spaces. You can do this:
print(” %s” % ““.join(collect_logs(…)))
which will look like the following:- task0: Task
- message another message another multiword message
- task1: ConcreteTaskForTesting
- message another arbitrary message another completely arbitrary message