Bundle Example: Read a New File Format¶
This example describes how to create a ChimeraX bundle that allows ChimeraX to open data files in XYZ format, which is a simple format containing only information about atomic types and coordinates.
The steps in implementing the bundle are:
Create a
bundle_info.xml
containing information about the bundle,Create a Python package that interfaces with ChimeraX and implements the file-reading functionality, and
Install and test the bundle in ChimeraX.
The final step builds a Python wheel that ChimeraX uses to install the bundle. So if the bundle passes testing, it is immediately available for sharing with other users.
Source Code Organization¶
The source code for this example may be downloaded as a zip-format file containing a folder named tut_read. Alternatively, one can start with an empty folder and create source files based on the samples below. The source folder may be arbitrarily named, as it is only used during installation; however, avoiding whitespace characters in the folder name bypasses the need to type quote characters in some steps.
Sample Files¶
The files in the tut_read
folder are:
tut_read
- bundle folderbundle_info.xml
- bundle information read by ChimeraXsrc
- source code to Python package for bundle__init__.py
- package initializer and interface to ChimeraXio.py
- source code to read XYZ format files
The file contents are shown below.
bundle_info.xml
¶
bundle_info.xml
is an eXtensible Markup Language
format file whose tags are listed in Bundle Information XML Tags.
While there are many tags defined, only a few are needed
for bundles written completely in Python. The
bundle_info.xml
in this example is similar to the one
from the Bundle Example: Add a Tool example with changes highlighted.
For explanations of the unhighlighted sections, please
see Bundle Example: Hello World, Bundle Example: Add a Command and
Bundle Example: Add a Tool.
1<!--
2ChimeraX bundle names must start with "ChimeraX-"
3to avoid clashes with package names in pypi.python.org.
4When uploaded to the ChimeraX toolshed, the bundle
5will be displayed without the ChimeraX- prefix.
6-->
7
8<BundleInfo name="ChimeraX-TutorialReadFormat"
9 version="0.1" package="chimerax.tut_read"
10 minSessionVersion="1" maxSessionVersion="1">
11
12 <!-- Additional information about bundle source -->
13 <Author>UCSF RBVI</Author>
14 <Email>chimerax@cgl.ucsf.edu</Email>
15 <URL>https://www.rbvi.ucsf.edu/chimerax/</URL>
16
17 <!-- Synopsis is a one-line description
18 Description is a full multi-line description -->
19 <Synopsis>Example for reading XYZ format files</Synopsis>
20 <Description>Example code for implementing ChimeraX bundle.
21
22Implements capability for reading XYZ format files and creating
23ChimeraX atomic structures.
24 </Description>
25
26 <!-- Categories is a list where this bundle should appear -->
27 <Categories>
28 <Category name="General"/>
29 </Categories>
30
31 <!-- Dependencies on other ChimeraX/Python packages -->
32 <Dependencies>
33 <Dependency name="ChimeraX-Core" version="~=1.1"/>
34 </Dependencies>
35
36 <!-- Register XYZ format as one of the supported input file formats -->
37 <Providers manager="data formats">
38 <Provider name="XYZ" suffixes=".xyz" category="Molecular structure"
39 reference_url="https://en.wikipedia.org/wiki/XYZ_file_format"
40 encoding="utf-8" mime_types="chemical/x-xyz" />
41 </Providers>
42
43 <Providers manager="open command">
44 <Provider name="XYZ" />
45 </Providers>
46
47 <Classifiers>
48 <!-- Development Status should be compatible with bundle version number -->
49 <PythonClassifier>Development Status :: 3 - Alpha</PythonClassifier>
50 <PythonClassifier>License :: Freeware</PythonClassifier>
51 </Classifiers>
52
53</BundleInfo>
The BundleInfo
, Synopsis
and Description
tags are
changed to reflect the new bundle name and documentation
(lines 8-10 and 17-24).
The Providers
sections on lines 36 through 45 use the
Manager/Provider protocol to inform
the “data formats” manager about the XYZ format, and the “open command”
manager that this bundle can open XYZ files.
The attributes usable with the “data formats” manager are described in
detail in Defining a File/Data Format. Note that most formats have a longer
official name than “XYZ” and therefore most formats will also specify
nicknames
and synopsis
attributes, whereas they are unneeded
in this example.
Similarly, the “open command” attributes are described in detail in
Opening Files. It is typical that the only attribute
specified is name
.
src
¶
src
is the folder containing the source code for the
Python package that implements the bundle functionality.
The ChimeraX devel
command, used for building and
installing bundles, automatically includes all .py
files in src
as part of the bundle. (Additional
files may also be included using bundle information tags
such as DataFiles
as shown in Bundle Example: Add a Tool.)
The only required file in src
is __init__.py
.
Other .py
files are typically arranged to implement
different types of functionality. For example, cmd.py
is used for command-line commands; tool.py
or gui.py
for graphical interfaces; io.py
for reading and saving
files, etc.
src/__init__.py
¶
As described in Bundle Example: Hello World, __init__.py
contains
the initialization code that defines the bundle_api
object
that ChimeraX needs in order to invoke bundle functionality.
ChimeraX expects bundle_api
class to be derived from
chimerax.core.toolshed.BundleAPI
with methods
overridden for registering commands, tools, etc.
1# vim: set expandtab shiftwidth=4 softtabstop=4:
2
3from chimerax.core.toolshed import BundleAPI
4
5
6# Subclass from chimerax.core.toolshed.BundleAPI and
7# override the method for opening files,
8# inheriting all other methods from the base class.
9class _MyAPI(BundleAPI):
10
11 api_version = 1
12
13 # Implement provider method for opening file
14 @staticmethod
15 def run_provider(session, name, mgr):
16 # 'run_provider' is called by a manager to invoke the
17 # functionality of the provider. Since the "data formats"
18 # manager never calls run_provider (all the info it needs
19 # is in the Provider tag), we know that only the "open
20 # command" manager will call this function, and customize
21 # it accordingly.
22 #
23 # The 'name' arg will be the same as the 'name' attribute
24 # of your Provider tag, and mgr will be the corresponding
25 # Manager instance
26 #
27 # For the "open command" manager, this method must return
28 # a chimerax.open_command.OpenerInfo subclass instance.
29 from chimerax.open_command import OpenerInfo
30 class XyzOpenerInfo(OpenerInfo):
31 def open(self, session, data, file_name, **kw):
32 # The 'open' method is called to open a file,
33 # and must return a (list of models created,
34 # status message) tuple.
35 from .io import open_xyz
36 return open_xyz(session, data)
37 return XyzOpenerInfo()
38
39
40# Create the ``bundle_api`` object that ChimeraX expects.
41bundle_api = _MyAPI()
The run_provider()
method is called by a ChimeraX manager
when it needs additional information from a provider or it needs a
provider to execute a task.
The session argument is a Session
instance,
the name argument is the same as the name
attribute in your Provider
tag, and the mgr argument is the manager instance.
These arguments can be used to decide what to do when your bundle offers
several Provider tags (to possibly several managers), but since the “data
formats” manager never calls run_provider()
, we can customize the
routine specifically for the “open command” manager and don’t need to check
the run_provider()
arguments.
When called by the “open command” manager, run_provider()
must return
an instance of a subclass of chimerax.open_command.OpenerInfo
.
The methods of the class are thoroughly documented if you click the preceding
link, but briefly:
The
open()
method is called to actually open/read the file and should return a (models, status message) tuple. The method’s data argument is normally an opened stream encoded as per the format’sencoding
attribute (binary if omitted), but it can be a file path if certain Provider attributes were specified (most often,want_path="true"
).If there are format-specific keyword arguments that the
open
command should handle, then anopen_args()
property should be implemented, which returns a dictionary mapping Python keyword names to Annotation subclasses. Such keywords will be passed to youropen()
method.So long as your
open()
method accepts a stream, opening compressed files of your format (e.g. with additional suffixes such as .gz, .bz2) will be handled automatically. For path-based openers, such files will result in an error before your opener is called.If for some reason the opened file should not appear in the file history, set
in_file_history
toFalse
.
src/io.py
¶
1# vim: set expandtab shiftwidth=4 softtabstop=4:
2
3
4def open_xyz(session, stream):
5 """Read an XYZ file from a file-like object.
6
7 Returns the 2-tuple return value expected by the
8 "open command" manager's :py:meth:`run_provider` method.
9 """
10 structures = []
11 line_number = 0
12 atoms = 0
13 bonds = 0
14 while True:
15 s, line_number = _read_block(session, stream, line_number)
16 if not s:
17 break
18 structures.append(s)
19 atoms += s.num_atoms
20 bonds += s.num_bonds
21 status = ("Opened XYZ file containing %d structures (%d atoms, %d bonds)" %
22 (len(structures), atoms, bonds))
23 return structures, status
24
25
26def _read_block(session, stream, line_number):
27 # XYZ files are stored in blocks, with each block representing
28 # a set of atoms. This function reads a single block
29 # and builds a ChimeraX AtomStructure instance containing
30 # the atoms listed in the block.
31
32 # First line should be an integer count of the number of
33 # atoms in the block.
34 count_line = stream.readline()
35 if not count_line:
36 # Reached EOF, normal termination condition
37 return None, line_number
38 line_number += 1
39 try:
40 count = int(count_line)
41 except ValueError:
42 session.logger.error("line %d: atom count missing" % line_number)
43 return None, line_number
44
45 # Create the AtomicStructure instance for atoms in this block.
46 # All atoms in the structure are placed in one residue
47 # since XYZ format does not partition atoms into groups.
48 from chimerax.atomic import AtomicStructure
49 from numpy import array, float64
50 s = AtomicStructure(session)
51 residue = s.new_residue("UNK", 'A', 1)
52
53 # XYZ format supplies the atom element type only, but
54 # ChimeraX keeps track of both the element type and
55 # a unique name for each atom. To construct the unique
56 # atom name, the # 'element_count' dictionary is used
57 # to track the number of atoms of each element type so far,
58 # and the current count is used to build unique atom names.
59 element_count = {}
60
61 # Next line is a comment line
62 s.comment = stream.readline().strip()
63 line_number += 1
64
65 # import convenience function for adding atoms
66 from chimerax.atomic.struct_edit import add_atom
67
68 # There should be "count" lines of atoms.
69 for n in range(count):
70 atom_line = stream.readline()
71 if not atom_line:
72 session.logger.error("line %d: atom data missing" % line_number)
73 return None, line_number
74 line_number += 1
75
76 # Extract available data
77 parts = atom_line.split()
78 if len(parts) != 4:
79 session.logger.error("line %d: atom data malformatted"
80 % line_number)
81 return None, line_number
82
83 # Convert to required parameters for creating atom.
84 # Since XYZ format only required atom element, we
85 # create a unique atom name by putting a number after
86 # the element name.
87 xyz = [float(v) for v in parts[1:]]
88 element = parts[0]
89 n = element_count.get(element, 0) + 1
90 name = element + str(n)
91 element_count[element] = n
92
93 # Create atom in AtomicStructure instance 's',
94 # set its coordinates, and add to residue
95 atom = add_atom(name, element, residue, array(xyz, dtype=float64))
96
97 # Use AtomicStructure method to add bonds based on interatomic distances
98 s.connect_structure()
99
100 # Return AtomicStructure instance and current line number
101 return s, line_number
The open_xyz()
function is called from the
__init__.bundle_api.open_file()
method to open an input file in XYZ format. The contents of such
a file is a series of blocks, each representing a single molecular
structure. Each block in an XYZ format file consists of
a line with the number atoms in the structure,
a comment line, and
one line per atom, containing four space-separated fields: element type and x, y, and z coordinates.
The return value that ChimeraX expects from open_xyz()
is a 2-tuple
of a list of structures and a status message. The open_xyz()
code
simply initializes an empty list of structures (line 10) and repeatedly
calls _read_block()
until the entire file is read (lines 14-20).
When read_block()
successfully reads a block, it returns an instance
of chimerax.atomic.AtomicStructure
,
which is added to the structure list (line 18); otherwise,
it returns None which terminates the block-reading loop (lines 16-17).
A status message is constructed from the total number of structures,
atoms, and bonds (lines 21-22).
The structure list and the status message are then returned to
ChimeraX for display (line 23).
_read_block()
reads and constructs an atomic
structure in several steps:
read the number of atoms in the block (lines 32-43).
build an empy atomic structure to which atoms will be added (lines 45-51). The
chimerax.atomic.AtomicStructure
instance is created on line 50, and achimerax.atomic.Residue
instance is created on line 51. The latter is required because ChimeraX expects every atom in a structure to be part of exactly one residue in the same structure. Even though XYZ format does not support the concept of residues, a dummy one is created anyway.skip the comment line (lines 61-63).
loop over the expected number of atoms and add them to the structure (lines 68-95). The construction of a
chimerax.atomic.Atom
instance is somewhat elaborate (lines 83-95). First, the atom parameters are prepared: the atomic coordinates are extracted from the input (line 87), and the atom name is constructed from the element type and an element-specific running index (lines 88-91). The Atom instance is created on line 95, using the convenience functionchimerax.atomic.struct_edit.add_atom()
which also adds it to the residue, and sets its coordinates.XYZ format files do not have connectivity information, so no bonds are created while processing input lines. Instead, the
connect_structure()
method of the structure is called to deduce connectivity from interatomic distances (line 98).Return success or failure to read a structure to
open_xyz
(line 101).
Building and Testing Bundles¶
To build a bundle, start ChimeraX and execute the command:
devel build PATH_TO_SOURCE_CODE_FOLDER
Python source code and other resource files are copied
into a build
sub-folder below the source code
folder. C/C++ source files, if any, are compiled and
also copied into the build
folder.
The files in build
are then assembled into a
Python wheel in the dist
sub-folder.
The file with the .whl
extension in the dist
folder is the ChimeraX bundle.
To test the bundle, execute the ChimeraX command:
devel install PATH_TO_SOURCE_CODE_FOLDER
This will build the bundle, if necessary, and install the bundle in ChimeraX. Bundle functionality should be available immediately.
To remove temporary files created while building the bundle, execute the ChimeraX command:
devel clean PATH_TO_SOURCE_CODE_FOLDER
Some files, such as the bundle itself, may still remain and need to be removed manually.
Building bundles as part of a batch process is straightforward, as these ChimeraX commands may be invoked directly by using commands such as:
ChimeraX --nogui --exit --cmd 'devel install PATH_TO_SOURCE_CODE_FOLDER exit true'
This example executes the devel install
command without
displaying a graphics window (--nogui
) and exits immediately
after installation (exit true
). The initial --exit
flag guarantees that ChimeraX will exit even if installation
fails for some reason.
Distributing Bundles¶
With ChimeraX bundles being packaged as standard Python
wheel-format files, they can be distributed as plain files
and installed using the ChimeraX toolshed install
command. Thus, electronic mail, web sites and file
sharing services can all be used to distribute ChimeraX
bundles.
Private distributions are most useful during bundle development, when circulation may be limited to testers. When bundles are ready for public release, they can be published on the ChimeraX Toolshed, which is designed to help developers by eliminating the need for custom distribution channels, and to aid users by providing a central repository where bundles with a variety of different functionality may be found.
Customizable information for each bundle on the toolshed includes its description, screen captures, authors, citation instructions and license terms. Automatically maintained information includes release history and download statistics.
To submit a bundle for publication on the toolshed,
you must first sign in. Currently, only Google
sign in is supported. Once signed in, use the
Submit a Bundle
link at the top of the page
to initiate submission, and follow the instructions.
The first time a bundle is submitted to the toolshed,
it is held for inspection by the ChimeraX team, which
may contact the authors for more information.
Once approved, all subsequent submissions of new
versions of the bundle are posted immediately on the site.
What’s Next¶
Bundle Example: Add a Tool (previous topic)
Bundle Example: Read a New File Format (current topic)
Bundle Example: Save a New File Format (next topic)