schrodinger.protein.getpdb module

Module for downloading PDB files from the web.

The data is retrieved from the RCSB. Current download URLs are documented at http://www.rcsb.org/pdb/static.do?p=download/http/index.html

Running this module is no different from using a web-browser to access the site - it’s just a different type of web client. Therefore this should cause no problems for the maintainers of that site and be within the terms and conditions of use.

Note that certain assumptions are made about the layout of the web site - changes there in future may make this script stop working.

Copyright Schrodinger, LLC. All rights reserved.

schrodinger.protein.getpdb.download_sf(pdb_code)

Download the ENT file for the given PDB ID, converts it to CNS format, and returns the CNS file name. Will raise a RuntimeError if either download or conversion fails.

Not every pdb has structure factor files deposited, and not every structure factor file will convert perfectly.

schrodinger.protein.getpdb.download_fasta(pdb_code, chain=None)

Attemps to download the fasta file for the given PDB ID and chain.

Parameters:
  • pdb_code (str) – PDB ID of the file to download
  • chain (str or None) – The chain name to download. If None, the file will contain all chains
schrodinger.protein.getpdb.get_pdb(pdbid, source=0, caps_asis=False)

Attempts to get the specified PDB file from either the database or the web, depending on the source option. Default is AUTO, which attempts the database first, and then the web.

pdbid - string of 4 characters source - one of: AUTO, DATABASE, WEB.

Parameters:

caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.

Raises:
  • requests.HTTPError – if error in connection to RCSB
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.retrieve_pdb(pdbid, local_repos=None, verbose=False, caps_asis=False)

Attempt to retrieve the PDB from the local repository

First we look for current files ending in .gz or .Z, then obsolete files with the same endings. The file name we search for is:

pdbXXXX.ent.Y where XXXX is the PDB code and Y is either gz or Z

Parameters:
  • pdbid (str) – the PDB code of the desired file
  • local_repos (list of str) – the paths to the parent directories of each local repository.
  • caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.
Return type:

str

Returns:

the name of the pdb file or None if a failure occurs

schrodinger.protein.getpdb.find_local_repository(verbose=False)

Determine a directory list for local repositories.

Note: the location of the PDB directory can be specified via environment variables; the order of precedence is: * SCHRODINGER_PDB * SCHRODINGER_THIRDPARTY/database/pdb * SCHRODINGER/thirdparty/database/pdb (the default)

Parameters:verbose (bool) – True if debugging messages should be printed to the screen
Return type:list of str
Returns:the paths to the parent directories of each local repository. Returns an empty list if the local repository cannot be determined.
schrodinger.protein.getpdb.find_local_pdb(pdbid, local_repos=None, verbose=False, caps_asis=False)

Check a series of local directories and filenames for the PDB files.

First we look for current files ending in .gz or .Z, then obsolete files with the same endings. The file name we search for is:

pdbXXXX.ent.Y where XXXX is the PDB code and Y is either gz or Z

Note: the location of the PDB directory can be specified via environment variables; the order of precedence is: * SCHRODINGER_PDB * SCHRODINGER_THIRDPARTY * SCHRODINGER/thirdparty (the default)

Parameters:
  • pdbid (str) – the PDB code of the desired file
  • local_repos (list of str) – the paths to the parent directories of each local repository.
  • verbose (bool) – True if debug messages should be printed out
  • caps_asis (bool) – True if the capitalization of pdbid should be preserved, False (default) if it should be converted to lowercase.
Return type:

str

Returns:

the path to an existing file ith the desired PDB code

schrodinger.protein.getpdb.download_pdb(pdb_code, biological_unit=False)

Download the pdb record from www.rcsb.org into the cwd.

Parameters:
  • pdb_code (str) – Four character alphanumeric string for the PDB id.
  • biological_unit (bool) – If True, and the file needs to be downloaded, then download the file at the biological unit URL, otherwise use the typical record URL. Default is False, get the typical record.
Raises:
  • requests.HTTPError – if error in connection to RCSB or pdb ID does not exist
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.download_file(pdb_code, pdb_file, url=None, biological_unit=False)

Download a file from a given url into the working directory.

Parameters:
  • pdb_code (str) – Four character alphanumeric string for the PDB id.
  • pdb_file (str) – The file name to create
  • url (str or None) – The url to get the file from. If None, the pdb_file is downloaded from the default location.
  • biological_unit (bool) – If True, and the file needs to be downloaded, then download the file at the biological unit URL, otherwise use the typical record URL. Default is False, get the typical record.
Raises:
  • requests.HTTPError – if error in connection to RCSB
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.retrieve_ent(pdbid)

Retrieves the ENT file for the specified PDB ID from the third-party database and copies it to the CWD. File path is returned.

Raises RuntimeError on error.

schrodinger.protein.getpdb.download_ent(pdbid)

Downloads the ENT file for the specified PDB ID from the RCSB web site, and saves it to the CWD. File path is returned.

Raises:
  • requests.HTTPError – if error in connection to RCSB
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.get_ent(pdbid, source=0)

Attempts to get the specified ENT file from either the database or the web, depending on the source option. Default is AUTO, which attempts the database first, and then the web.

pdbid - string of 4 characters source - one of: AUTO, DATABASE, WEB.

Raises:
  • requests.HTTPError – if error in connection to RCSB
  • RuntimeError – for other error retreiving file
schrodinger.protein.getpdb.open_filename(filename, mode)

Opens a filename, or a temporary filename, if filename is not writeable. The name may change and is accessible via name attribute on file object.

schrodinger.protein.getpdb.download_reflection_data(pdbid)

Attempt to download reflection data type pdbid: str param pdbid: PDB ID