Atom Specification (ChimeraX)

Command-Line Target Specification

Most commands require or allow specifying which items they should affect. Which types of items are accepted depends on the specific command:

Atomic models and their parts (atoms/bonds, pseudobonds, residues, chains) and associated molecular surfaces can be specified using:
- hierarchical specifiers – model number, chain ID, residue name or number, atom name
- built-in classifications – sel (current selection), protein, helix, strand, ligand, solvent, hbonds, element symbols, functional groups, etc.
- user-defined targets – named selections or other targets defined with name
- attribute names and values
- zones – by distance from other items
- combinations of the above
Nonatomic models such as nonmolecular surfaces or maps (volume models) can be specified in a more limited set of ways, namely:
- hierarchical specifiers – model number
- built-in classifications – sel (the current selection, which could include nonatomic models)

Specification strings may contain embedded spaces, and a blank specification (where allowed) means “all.” Specification in ChimeraX is generally similar to that in Chimera, but there are differences.

[back to top: Target Specification]

Hierarchical Specifiers

Symbol	Reference Level	Definition	Examples
#	model	model number assigned to the data in ChimeraX (hierarchical, with positive integers separated by dots: N, N.N, N.N.N, etc.)	#1 #1.3
/	chain	chain identifier (case-insensitive unless both upper- and lowercase chain IDs are present)	/A
:	residue	residue number OR residue name (case-insensitive)	:51 :glu
@	atom	atom name (case-insensitive)	@ca

Each set of atomic coordinates is a model with an associated model identification (ID) number. Three-dimensional datasets other than atomic coordinates are also assigned model ID numbers. A model can be read from a file, derived from another model, or created from scratch.

Model numbers can be assigned automatically or in some cases, specified by the user. They are hierarchical, with any number of levels (positive integers N, N.N, etc.); for example, #1.1, #1.2, ... #1.10 could be 10 structures in an NMR ensemble. A submodel (model at a lower level) is still an entire model, but the hierarchy allows grouping. That is, submodels at any level can be specified individually by their own numbers, or collectively by their parent model number. A parent model only (without its submodels) can be specified with #! before the number, but this is only needed in a few specific situations, such as to hide an entire branch of the model hierarchy without changing the individual display settings of submodels within the branch.

An atomic model contains one or more chains, each chain contains one or more residues, and each residue contains one or more atoms. Thus, an atom can be specified by model number, chain identifier (ID), residue number, and atom name. The lack of a specifier is interpreted as all units at the associated level; for example, if no chain ID is given, the specification refers to all chains.

Specifying a set of atoms also specifies any bonds and pseudobonds between pairs of atoms in the set, unless the first character in the entire specification is an equals symbol (=). Starting the specification with = prevents implicitly including bonds and pseudobonds when only their endpoint atoms have been specified explicitly. A pseudobond model can also be specified directly by its own model number (not necessarily within the atomic-model hierarchy). Markers and links are basically the same as atoms and bonds and can be specified in the same ways.

Chain IDs, residue names and numbers, and atom names are read from the input file. In PDB format, a standard nomenclature is used for standard amino acid and nucleic acid residues. Asterisks (*) in PDB input atom names will be translated to prime symbols ('). Residue names containing at least one letter can be used directly with the colon symbol. However, to avoid interpretation as residue numbers, residue names containing numbers only should be specified as attribute values instead (e.g., ::name="276").

← Capitalization

In the command line, capitalization of chain IDs, residue names, and atom names is not important, with one exception: when a model contains both uppercase and lowercase chain identifiers, case matters for chain specification in that model only.

← Lists and Ranges

Multiple model numbers or residue numbers can be entered as comma-separated lists of individual numbers and/or one or more ranges of the form start-end. There cannot be a space next to a hyphen. The word start or end can be substituted for the start or end value to extend the range to the first or last number possible, respectively. An asterisk (*) can be used in the place of either start or end, in addition to its other uses as a wild card.

Chain IDs, residue names, and atom names (all of which are typically non-numerical) can be entered as comma-separated lists. Ranges of chain IDs are also allowed, in which case ordering is alphabetical.

Examples:

#1

– all atoms in model 1, including any models lower in the hierarchy (1.N, 1.N.N, etc.)

#1/B-D,F

– chains B, C, D, and F in model 1

:start-40

– residue numbers up to 40 in all chains

#1,2:50,70-85@ca

– atoms named CA in residues 50 and 70-85 in models 1 and 2

/a,d-f:43-256

– residues 43-256 in chains A, D, E, and F

:12-25,48@ca,n

– atoms named CA and N in residues 12-25 and 48

:lys,arg

– lysine and arginine residues

:lys,arg@cb

– atoms named CB in lysine and arginine residues

/A@n,ca,c,o

– atoms named N, CA, C, O in chain A

/A:195,221@n,ca,c,o

– atoms named N, CA, C, O in residues 195 and 221 of chain A

#1.2-end

– all submodels of 1 except 1.1

#2.1-3,5

– models 2.1, 2.2, 2.3, and 2.5 (submodels 1-3 and 5 of model 2)

#5,2.1-3

– models 5.1, 5.2, 5.3, 2.1, 2.2, and 2.3 (submodels 1-3 of models 5 and 2)

← Implicit Operations

When the hierarchical symbols are used in descending order (# / : @), each successive level only specifies items within the broader specification that came before it. The hierarchy can be “reset” for lower levels, however, by repeating or returning to a higher level. Examples:

:12,14@CA

– atoms named CA in residues 12 and 14 (successive narrowing, as in previous examples)

:12:14@CA

– all atoms of residue 12, CA atom of residue 14

/A/B:12-20@CA:14@N
- or -
/B:12-20@CA:14@N/A

– all atoms of chain A, CA atoms of residues 12-20 and N atom of residue 14 in chain B

/a:10-20,26/b:12-22,29@n,ca,c,o

– all atoms of chain A residues 10-20 and 26, plus atoms named N, CA, C, O in chain B residues 12-22 and 29

/a:10-20,26@n,ca,c,o/b:12-22,29@n,ca,c,o
- or -
/a:10-20,26/b:12-22,29 & @n,ca,c,o

– atoms named N, CA, C, O in chain A residues 10-20, 26 and in chain B residues 12-22, 29 (& means intersection, see combinations)

#1,2.1

– models 1.1 and 2.1

#1#2.1

– models 1 (including any submodels) and 2.1

← Wild Cards

The wild card * matches parts of atom and/or residue names. Similarly, the single-character wild card ? matches single characters. Square brackets [ ] indicate a set of characters to substitute individually, and can also be used to “escape” a single character that would otherwise have a special meaning, i.e., to force interpreting that character literally. Examples:

@S*

– atoms with names starting with S

#2:G??

– residues in model 2 with three-letter names starting with G

@c[ab]

– atoms named CA and atoms named CB (unordered)

:[*][*][*]

– residues named ***

:fmn@?1

– atoms in residue FMN with two-letter names ending with 1

@h,h?,h??

– atoms with one-, two-, or three-letter names starting with H

The wild card * can also signify “all,” for example, all atoms in a residue or all residues in a model. Since blank indicates the same thing, this is really only needed in the middle of a specification where a blank or omitted character would not be accepted, for example:

#*.1-3

– submodels 1-3 of all models that have them

[back to top: Target Specification]

Built-in Classifications

Built-in classifications include:

the mutually exclusive categories (membership rules given below): solvent, ions, ligand, main
biopolymer types protein, nucleic or nucleic-acid, and their subparts:
- sidechain – amino acid sidechain + CA (for connectivity to cartoon representations), nucleic acid base + ribose (see below), and any directly attached hydrogens
- sideonly – amino acid sidechain, nucleic acid base, and any directly attached hydrogens
- mainchain (or backbone) – the complement of sideonly, namely: peptide N, CA, C, O, nucleic acid phosphoribosyl backbone, and any directly attached hydrogens
- min-backbone – a continuous series of bonded atoms along a biopolymer chain (-[N-CA-C]_n- in peptides and -[O5'-C5'-C4'-C3'-O3'-P]_n- in nucleic acids)
- ribose – backbone ribose and deoxyribose moieties
protein/peptide secondary structure types: helix, strand, coil
element symbols: C, Fe, etc. as in the Select menu
ChimeraX atom types: Car, N3+, O2, etc. as in the Select menu (H is both an element symbol and an atom type, but element “wins”)
functional groups: amide, disulfide, etc. as in the Select menu, except in lowercase and with hyphens instead of spaces in multiword names (for example, aromatic-ring)
template-mismatch – standard biopolymer residues and a few other commonly encountered residue types (water and peptide-capping groups) that differ from their respective templates in the number of atoms of any element other than hydrogen; such a difference usually indicates missing atoms (e.g., a truncated sidechain), but occasionally extra ones (e.g., a 5' nucleic acid residue with a phosphate group)
sel – the current selection
sel-residues – residues with any selected atoms
pbonds – all pseudobonds
- pbondatoms – all pseudobonds and their endpoint atoms
hbonds – H-bond pseudobonds (those in a pseudobond model named hydrogen bonds)
- hbondatoms – H-bond pseudobonds and their endpoint atoms
last-opened – the top-level model most recently opened, not necessarily the highest model number
all – all data applicable to the specific command (can be used in certain commands where a blank specification is not accepted)

Built-in classifications cannot be redefined by the user. A full list can be shown in the Basic Actions tool or with the command: name list builtins true

Category	Membership rules, in order of application
solvent	of the following two, the set with the greater number of residues: “small solvent” candidate set: residues of up to 3 atoms named WAT, HOH, and DOD, plus singleton atoms (i.e., not covalently bonded to other atoms) of atomic number 6-8 in single-atom residues “other solvent” candidate set: excluding residues in the “small solvent” set, the most prevalent type of residue that is not covalently bonded to other residues, has ≤ 10 atoms per residue, and is present in at least 10 copies in the structure
ions	non-solvent singleton atoms other than noble gases, plus covalently bonded groups of ≤ 4 atoms (not counting hydrogens) in the same residues as those singletons
ligand	singleton atoms that are noble gases; single residues or bonded sequences of residues with < 10 residues per bonded sequence, < 250 atoms, and < 1/4 the number of atoms in the largest bonded sequence of residues in the model; residues bonded to a chain but not included in its main sequence (e.g., retinal in rhodopsin, glycosylations)
main	all remaining atoms

Examples:

helix & :arg,lys

– arginine and lysine residues in α-helices (using & for intersection, see combinations)

nucleic & backbone

– nucleic acid ribose-phosphate backbone

Car & :phe,tyr
– or –
aromatic-ring & :phe,tyr

– aromatic ring carbons of phenylalanine and tyrosine

H & ~HC

– polar hydrogens (those not bonded to carbon)

carboxylate

– atoms in carboxylate groups

[back to top: Target Specification]

User-Defined Targets

With the name command, users can assign a name to a selection or to a target specification string for easy use in later commands. For example:

name tm1 /a:34-64
name tm2 /a:70-101
color tm1 medium blue
color tm2 deep sky blue

Built-in classifications cannot be redefined by the user.

[back to top: Target Specification]

Attributes

Attributes are properties specified by name and value, indicated with symbols: @@ for atom attributes, :: for residue attributes, // for chain attributes, and ## for atomic-model attributes. Custom attributes can be created with setattr.

Attribute names are case-sensitive. Attribute values that are character strings or color names should be enclosed in quotation marks if they contain spaces or characters with special meanings in the command line (#, :, and others). In unquoted string values, * (wild card), ? (single-character wild card), and square brackets [ ] enclosing alternative single-character matches can be used as described above for atom and residue names.

Attribute-Test Operators
symbol	meaning
=	equals (or case-insensitive string match)
!=	does not equal, string does not match
==	string match, case-sensitive
!==	string does not match, case-sensitive
>	greater than
<	less than
>=	greater than or equal to
<=	less than or equal to
^	(before attribute name) attribute not assigned

Examples include & for intersection and ~ for negation, see combinations:

@@display

– atoms that are displayed

~@@display

– atoms that are not displayed

@@num_alt_locs>1

– atoms with alternate locations

@ca & @@bfactor>40

– atoms named CA with B-factor values over 40

:asn & @@bfactor>40

– atoms with B-factor values over 40 in asparagine residues

@@bfactor>=20 & @@bfactor<=40

– atoms with B-factor values ranging from 20 to 40

C & ~ @@idatm_type=Car

– non-aromatic carbon atoms (see atom types)

::num_atoms>=10

– residues with 10 or more atoms

::^chi3

– residues without a chi3 angle

//identity="#1/A,B"

– chains with sequences the same as either chain A or chain B in model 1

##name="2gbp map 5"

– any model named “2gbp map 5”

[back to top: Target Specification]

Zones

Atoms, residues or atomic models within or beyond some distance of a reference set of atoms can be specified in the command line as a zone. See also: select zone, surface zone, volume zone, zone, Select Contacts

A zone specification has the following parts:

reference-atom specification
zone operator, comprising both:
- a level symbol: @ (atom-based cutoff), : (residue-based), / (chain-based), or # (atomic-model-based)
- an inequality symbol: < (less than; for zones, also includes equal to) or > (greater than)
distance cutoff

Examples:

@nz @< 3.8

– atoms within 3.8 Å of atoms named NZ

#1:gtp :< 10.5

– residues with any atom within 10.5 Å of any atom in GTP residue(s) of model 1

#1:gtp :> 10.5

– residues farther than 10.5 Å from any atom in GTP residue(s) of model 1; the complement of the previous example

Zone specifiations are also useful in combination with other types of specifications.

[back to top: Target Specification]

Combinations

Atom specifications can be combined with the operators:

& for intersection (AND)
| for union (OR)
~ for negation (NOT)

Parentheses can be used to indicate the desired order of multiple operations. Thus, at least one parenthesis of each set should be adjacent to or separated by space only from a combination operator or zone operator. When & and | occur in the same list, & has higher priority (&-separated lists can be considered as grouped within parentheses).

Examples:

/A & protein

– chain A protein residues only

/A & ~ :hem

– chain A residues other than HEM

@ca & #1/B,E#2/A,D

– atoms named CA in chains B, E of model 1 and chains A, D of model 2

#1:asp,glu & (#2 :< 10)

– aspartate and glutamate residues in model 1 that are within 10 Å of model 2

(ligand | ions) @< 4.8

– atoms within 4.8 Å of ligand or ions

ligand | (ions @< 4.8)

– ligand plus atoms within 4.8 Å of ions

(ions @< 4) & ~ions

– atoms within 4 Å of ions, excluding the ions themselves

Ng+ | N3+

– guanidinium nitrogens and sp³-hybridized, formally positive nitrogens (see atom types)

:cys@sg & ~disulfide
- or -
:cys & S & ~disulfide

– cysteine sulfur atoms not participating in a disulfide bond

:phe,tyr & sidechain

– phenylalanine and tyrosine sidechains (including CA)

sidechain & ligand :<4

– sidechains (including peptide CA and nucleic acid ribose) of residues with any atom within 4 Å of ligand

sideonly & ligand @<4

– sidechain atoms within 4 Å of ligand

UCSF Resource for Biocomputing, Visualization, and Informatics / March 2024