Back to TABLE
OF CONTENTS
File Formats &
Support
RasMol v2.6 is now able to load and to write most of the common coordinate file formats.
The original description of PDB files from the v2.5 Manual has been updated to v2.6 with
the addition of five additional file formats and a section on machine-specific support.
Portions of these topics are also integrated into various sections of the Manual.
PDB File Formats
Brookhaven Protein Data Bank Files
RasMol Interpretation of PDB Fields
PDB Color Scheme Specification
Multiple NMR Models in PDB Files
MOPAC File Formats
Alchemy File Format
IRIS RGB Image File Format
MDL Mol File Output
Machine Specific Support
Monochrome X Windows Support
Tcl/Tk 3.x and 4.x IPC support
UNIX sockets based IPC
Compiling RasWin with Borland
Brookhaven Data Bank Files
If you do not have the Brookhaven documentation,
you may find the following summary of the PDB file format useful. Additional information
can be found at the PDB WWW Home Page. The
Protein Data Bank is a computer-based archival database for macromolecular structures. The
database was established in 1971 by the Brookhaven National Laboratory, New York, as a
public domain repository for resolved crystallographic structures [20]. The Bank uses a uniform format to store
atomic coordinates and partial bond connectivities as derived from crystallographic
studies.
PDB file entries consist of records of 80 characters each.
Using the punched card analogy, columns 1 to 6 contain a
record-type identifier, the columns 7 to 70
contain data. Columns 71 to 80 are normally blank,
but may contain sequence information added by library management programs. The first four 4
characters of the record identifier are sufficient to identify the type of record
uniquely, and the syntax of each record is independent of the order of records within any
entry for a particular macromolecule.
The only record types that are of major interest to the RasMol program are
the ATOM and HETATM records, which describe the position
of each atom. ATOM/HETATM records contain standard atom names and residue
abbreviations, along with sequence identifiers, coordinates in Angstrom units, occupancies
and thermal motion factors. The exact details are given below as a FORTRAN format
statement:
FORMAT(6A1,I5,1X,A4,A1,A3,1X,A1,I4,A1,3X,3F8.3,2F6.2,1X,I3) Column Content 0 5 1 5 2 5 3 5 4 5 5 5 6 5 7 5 8 123456 'ATOM' or 'HETATM' 7--11 Atom serial number (may have gaps) 13-16 Atom name, in IUPAC standard format 17 Alternate location indicator indicated by A, B or C 18-20 Residue name, in IUPAC standard format 23-26 Residue sequence number (ordered as below) 27 Code for insertions of residues (i.e. 66A & 66B) 31-38 X coordinate 39-46 Y coordinate 47-54 Z coordinate 55-60 Occupancy 61-66 Temperature factor 68-70 Footnote number 0 5 1 5 2 5 3 5 4 5 5 5 6 5 7 5 8 ATOM AtomN A Sequ Xcoordin Zcoordin Tfacto HETATM Name Res C Ycoordin Occupa Num
Residues occur in order of their sequence numbers, which always increase
starting from the N-terminal residue for proteins and 5'-terminus for nucleic acids. If
the residue sequence is known, certain atom serial numbers may be omitted to allow for
future insertion of any missing atoms. Within each residue, atoms are ordered in a
standard manner, starting with the backbone (N-Ca-C-O for proteins) and proceeding in
increasing remoteness from the alpha carbon, along the side chain.
HETATM records are used to define post-translational
modifications and cofactors associated with the main molecule. Optional TER
records are interpreted as breaks in the main molecule's backbone.
If present, RasMol also inspects HEADER, COMPND,
HELIX, SHEET, TURN, CONECT,
CRYST1, MODEL, ENDM and END
records. Information such as the name, Brookhaven code, revision date and
classification of the molecule are extracted from HEADER and COMPND
records, initial secondary structure assignments are taken from HELIX, SHEET
and TURN records, and the end of the file may be indicated by an END
record.
An annotated Example of a PDB File
for the protein crambin (1crn.pdb) is shown on a separate page.
RasMol Interpretation of PDB fields
Atoms located at 9999.000, 9999.000,
9999.000 are assumed to be Insight pseudo atoms and are ignored by
RasMol. Atom names beginning ' Q' are also assumed to be pseudo atoms or
position markers.
When a data file contains an NMR structure, multiple
conformations may be placed in a single PDB file delimted by several
MODEL and ENDM records. In this case, RasMol only displays the
first NMR model displayed in the file.
Residue names "CSH", "CYH"
and "CSM" are considered pseudonyms for cysteine "CYS".
Residue names "WAT", "H20", "SOL"
and "TIP" are considered pseudonyms for water "HOH".
The residue name "D20" is consider heavy water "DOD".
The residue name "SUL" is considered a sulfate ion "SO4".
The residue name "CPR" is considered to be cis-proline and is
translated as "PRO". The residue name "TRY"
is considered a pseudonym for tryptophan "TRP".
RasMol uses the HETATM fields to define the sets hetero, water, solvent, and ligand. Any group with the
name "HOH", "DOD", "SO4"
or "PO4" (or aliased to one of these names by the preceding
rules) is considered a solvent and is considered to be defined by a HETATM
field.
RasMol only respects CONECT connectivity records in PDB
files containing less than 256 atoms. This is explained in more detail in
the set bonds section on
determining molecule connectivity. CONECT records that define a bond more
than once are interpreted as specifying the bond order of that bond, i.e. a bond
specified twice is a double bond and a bond specified three (or more) times is a triple
bond.
PDB Color Scheme Specification
RasMol also accepts the supplementary COLO
record type in the PDB files. This record format was introduced by David
Bacon's Raster3D program [4] for
specifying the color scheme to be used when rendering the molecule. This extension is not
currently supported by Brookhaven. The COLO record has the same basic
record type as the ATOM and HETATM records described
above.
Colors are assigned to atoms using a matching process. The Mask field is used
in the matching process as follows. First RasMol reads in and remembers
all the ATOM, HETATM and COLO records
in input order. When the user-defined (User)
color scheme is selected, RasMol goes through each remembered ATOM/HETATM
record in turn, and searches for a COLO record that matches in all of
columns 7 through 30. The first such COLO
record to be found determines the color and radius of the atom.
Column Content 0 5 1 5 2 5 3 5 4 5 5 5 6 5 7 5 8 COLOUR Red Green Blue SphereComments Mask Mask Mask Mask Mask 1-6 'COLOR' or 'COLOUR' 7-30 Mask (described below) 31-38 Red component 39-46 Green component 47-54 Blue component 55-60 Sphere radius in Angstroms 61-70 Comments
Note that the Red, Green and Blue
components are in the same positions as the X, Y, and Z
components of an ATOM or HETATM record, and the van der
Waal's radius goes in the place of the Occupancy. The Red,
Green and Blue components must all be in the range 0
to 1.
In order that one COLO record can provide color and radius
specifications for more than one atom (e.g. based on residue, atom type, or any
other criterion for which labels can be given somewhere in columns 7
through 30), a 'don't-care' character, the hash mark
"#" (number or sharp sign) is used. This character, when found in a COLO
record, matches any character in the corresponding column in a ATOM/HETATM
record. All other characters must match identically to count as a match. As an extension
to the specification, any atom that fails to match a COLO record is
displayed in white.
Multiple NMR Models
RasMol may now load all of the NMR models
from a Brookhaven PDB file using the new command, loadnmrpdb <filename>.
The NMR file format instructs the PDB reader to load all
the models from the PDB file, instead of just the first one as is the
behavior of the (default) "pdb" format specifier. If
the specified PDB file does not contain an NMR structure
the behavior of "nmrpdb" is identical to that of
"pdb". Once multiple NMR
conformations have been loaded they may be manipulated with the atom expression extensions
described in Primitive Expressions.
MOPAC File Formats
RasMol can now read MOPAC format
files. The new loadmopac
<filename> command automatically distinguishes between MOPAC
input and output file types, and can read input files in both Cartesian and internal(z-matrix) formats. RasMol will also read the charge information in
MOPAC output files, however, it cannot read the output files of MOPAC
jobs specifying the NOXYZ keyword.
Alchemy File Format
The Alchemy file format reader
has been enhanced to allow hydrogen bonds to be explicitly represented in a file using the
keyword HYDROGEN, instead of the typical SINGLE, DOUBLE,
TRIPLE or AROMATIC.
IRIS RGB Image File Format
RasMol on all platforms now supports the
generation of images in IRIS RGB format files. This file format is often
used when running on Silicon Graphics workstations. The appropriate form of RGB
fileis used by both 8bit and 32bit versions of RasMol.
These files may be created using the writeiris <filename> command.
MDL Mol File Output
RasMol version 2.6 may now be used to
generate MDL Mol files. The new command savemdl <filename> saves the currently selected set of atoms to
the specified file in MDL file format.
Machine-Specific Support
Monochrome X Windows Support. RasMol v2.6
now supports the many monochrome UNIX workstations typically found in academia, such as
low-end SUN workstations and NCD X-terminals. The X11 version of RasMol (when compiled in
8 bit mode) now detects black & white X Windows displays and enables dithering
automatically. The use of run-time error diffusion dithering means that all display modes
of RasMol are available when in monochrome mode. For best results, users should experiment
with the set ambient
command to ensure the maximum contrast in resulting images.
Tcl/Tk 3.x and 4.x IPC support. The recently announced version 4 of Tk
graphics library has changed the protocol used to communicate between Tk applications.
RasMol version 2.6 has been modified such that it can now communicate with both this new
protocol and the previous version 3 protocol supported by RasMol v2.5. Although Tcl/Tk 3.x
applications may only communicate with other 3.x applications and Tcl/Tk 4.x applications
with other 4.x applications, these changes allow RasMol v2.6 to communicate between
processes with both protocols (potentially concurrently).
UNIX sockets based IPC. The UNIX implementation of RasMol v2.6 now
supports BSD-style socket communication. An identical socket mechanism is also being
developed for VMS, Apple Macintosh and Microsoft Windows systems. This should allow RasMol
to interactively display results of a computation on a remote host. The current protocol
acts as a TCP/IP server on port 21069 that executes command lines until either the command
"exit" or the command "quit" is typed. The command exit disconnects the current
session from the RasMol server, the command quit
both disconnects the current session and terminates RasMol. This functionality may be
tested using the UNIX command "telnet <hostname> 21069".
Compiling RasWin with Borland. A number of changes have been
made to the source code to allow the Microsoft Windows version of RasMol to compile using
the Borland C/C++ compiler. These fixes include name changes for the standard library and
special code to avoid a bug in _fmemset.
Back to TABLE OF CONTENTS