schrodinger.protein.getpdb module

Module for downloading PDB files from the web.

The data is retrieved from the RCSB. Current download URLs are documented at http://www.rcsb.org/pdb/static.do?p=download/http/index.html

Running this module is no different from using a web-browser to access the site - it’s just a different type of web client. Therefore this should cause no problems for the maintainers of that site and be within the terms and conditions of use.

Note that certain assumptions are made about the layout of the web site - changes there in future may make this script stop working.

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.protein.getpdb.download_file(filename)

Download the given file from RCSB and save it to either CWD or temp dir with same name. Path to the written file is returned.

Parameters:filename (str) – File to download from RSCB web site.
Raises:requests.HTTPError – if error in connection to RCSB.
schrodinger.protein.getpdb.download_sf(pdb_code)

Download the ENT file for the given PDB ID, converts it to CNS format, and returns the CNS file name. Will raise a RuntimeError if either download or conversion fails.

Not every pdb has structure factor files deposited, and not every structure factor file will convert perfectly.

schrodinger.protein.getpdb.download_fasta(pdb_code, chain=None)

Attemps to download the fasta file for the given PDB ID and chain.

Parameters:
  • pdb_code (str) – PDB ID of the file to download
  • chain (str or None) – The chain name to download. If None, the file will contain all chains
schrodinger.protein.getpdb.get_pdb(pdbid, source=0, caps_asis=False)

Attempts to get the specified PDB file from either the database or the web, depending on the source option. Default is AUTO, which attempts the database first, and then the web.

pdbid - string of 4 characters source - one of: AUTO, DATABASE, WEB.

Parameters:

caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.

Returns:

Path to the PDB file that was written (*.pdb or *.cif)

Return type:

str

Raises:
  • requests.HTTPError – if error in connection to RCSB
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.retrieve_pdb(pdbid, local_repos=None, verbose=False, caps_asis=False)

Attempt to retrieve the PDB from the local repository

First we look for current files ending in .gz or .Z, then obsolete files with the same endings. The file name we search for is:

pdbXXXX.ent.Y where XXXX is the PDB code and Y is either gz or Z

Parameters:
  • pdbid (str) – the PDB code of the desired file
  • local_repos (list of str) – the paths to the parent directories of each local repository.
  • caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.
Return type:

str

Returns:

the name of the pdb file or None if a failure occurs

schrodinger.protein.getpdb.find_local_repository(verbose=False)

Determine a directory list for local repositories.

Note: the location of the PDB directory can be specified via environment variables; the order of precedence is: * SCHRODINGER_PDB * SCHRODINGER_THIRDPARTY/database/pdb * SCHRODINGER/thirdparty/database/pdb (the default)

Parameters:verbose (bool) – True if debugging messages should be printed to the screen
Return type:list of str
Returns:the paths to the parent directories of each local repository. Returns an empty list if the local repository cannot be determined.
schrodinger.protein.getpdb.find_local_pdb(pdbid, local_repos=None, verbose=False, caps_asis=False)

Check a series of local directories and filenames for the PDB files.

First we look for current files ending in .gz or .Z, then obsolete files with the same endings. The file name we search for is:

pdbXXXX.ent.Y where XXXX is the PDB code and Y is either gz or Z

Note: the location of the PDB directory can be specified via environment variables; the order of precedence is: * SCHRODINGER_PDB * SCHRODINGER_THIRDPARTY * SCHRODINGER/thirdparty (the default)

Parameters:
  • pdbid (str) – the PDB code of the desired file
  • local_repos (list of str) – the paths to the parent directories of each local repository.
  • verbose (bool) – True if debug messages should be printed out
  • caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.
Return type:

str

Returns:

the path to an existing file ith the desired PDB code

schrodinger.protein.getpdb.download_pdb(pdb_code, biological_unit=False, try_as_cif=True)

Download the PDB record from www.rcsb.org into the CWD. If the PDB is too large to be downloaded as *.pdb file, it will be saved as *.cif.

Parameters:
  • pdb_code (str) – Four character alphanumeric string for the PDB id.
  • biological_unit (bool) – If True, and the file needs to be downloaded, then download the file at the biological unit URL, otherwise use the typical record URL. Default is False, get the typical record. # NOTE: This option is no longer used by PrepWizard, but still # used by getpdb_utility.py ($SCHRODINGER/utilities/getpdb)
  • try_as_cif (bool) – Whether to try downloading the file as CIF format if the structure is too large to be represented in PDB format.
Returns:

Path to the downloaded file.

Return type:

str

Raises:
  • requests.HTTPError – if error in connection to RCSB or pdb ID does not exist
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.requests_retry_session(max_retries=3, backoff_factor=0.3, status_forcelist=(500, 502, 503, 504), session=None)

Return a session to connect to a web url. In case of network failures the session will retry (number of re-attempts allowed is specified by retries) to connect to the url.

Parameters:
  • retries (int) – Total number of retries allowed
  • backoff_factor (float) – Backoff factor to apply between attempts after the second try. urllib3 will sleep for: {backoff factor} * (2 ** ({number of total retries} - 1)) seconds before making next attempt.
  • status_forcelist (iterable of int) – Http error status codes for which retry will happen
  • session (requests.Session) – A session object
Returns:

A session object

Return type:

requests.Session

schrodinger.protein.getpdb.retrieve_ent(pdbid)

Retrieves the ENT file for the specified PDB ID from the third-party database and copies it to the CWD. File path is returned.

Raises RuntimeError on error.

schrodinger.protein.getpdb.download_ent(pdbid)

Downloads the ENT file for the specified PDB ID from the RCSB web site, and saves it to the CWD. File path is returned.

Raises:
  • requests.HTTPError – if error in connection to RCSB
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.get_ent(pdbid, source=0)

Attempts to get the specified ENT file from either the database or the web, depending on the source option. Default is AUTO, which attempts the database first, and then the web.

pdbid - string of 4 characters source - one of: AUTO, DATABASE, WEB.

Raises:
  • requests.HTTPError – if error in connection to RCSB
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.open_filename(filename, mode)

Opens a filename, or a temporary filename, if filename is not writeable. The name may change and is accessible via name attribute on file object.

schrodinger.protein.getpdb.download_reflection_data(pdbid)

Attempt to download reflection data type pdbid: str param pdbid: PDB ID