A systematic approach to coding chemicals, biological and pharmaceutical substances in a hierarchical manner using 3 and/or 10 digit codes was published in a special issue of the Journal of Toxicology and Environmental Health, Part B (Volume 8, Numbers 3-5;145-452) in 2005 with Guest Editors Pierre R. Band and Daniel Krewski.

This “hierarchical coding system” was developed by British Columbia Cancer Agency, Cancer Control Research Program in collaboration with the Department of Chemical and Biological Engineering at the University of British Columbia. The first level of coding with 3-digits allows classification of a substance according to its use in an occupational environment (e.g. pesticide, catalyst). In level II of the coding system, the substances could be coded using 10-digit codes that indicate the structure and composition of the substances. Where the substances have no application in industry they can be coded based on structure and composition alone. This coding system provides codes for 99 general categories, with up to 10 subcategories. The coding of these substances are in accordance with the International Union of Pure and Applied Chemists (IUPAC) and the International Union of Biochemists (IUB) nomenclature rules. However, the coding of complex biomolecules (e.g., influenza vaccine) was based on plant or animal taxonomy.

This hierarchical coding system provides flexibility in analyzing the substances according to 3-, or 10-digit codes, or both. In addition this system allows grouping of similar substances for the purposes of analysis. The 3-digit codes, which were based on industrial application of substances, are predominantly utilized in occupational epidemiology studies.

The fist two digits of the 3-digit codes represent the general category of a substance (e.g., pesticide), while the third digit indicates specific subcategory (e.g., fungicide, herbicide). The first digit of the10-digit codes indicates the overall composition of substances, which differentiates between inorganic substances, organic compounds, pharmaceuticals, biomolecules (e.g., amino acids, enzymes) and other substances of biological origin (e.g., influenza vaccine). The coding strategy for digits 2 through 10 is dependent on the broad group of substances (i.e., first digit). For example, the 2 through 10 digits in a 10-digit code of an inorganic substance was based on the properties of principle elements and the secondary groups present in the substance. Similarly, organic compounds were coded based on the parent and functional groups of the substances. Pharmaceutical agents were coded based on biological origin, route of administration, drug description and identification, and presence of metals and/or halogen atoms. Biomolecules were coded based on biological origin and complexity of substances (antibody, heterogeneous biomolecules or single biomolecules).

This system also allows coding of complex substances, such as inorganic acid or basic salts, substances containing both inorganic and organic compounds, and other complex biomolecules by building on the framework developed for their respective simpler substances. The coding manual also provides various examples for coding these substances and elaborates on the variations in codes between different groups.

This coding system however has some limitations. The organic or inorganic substances with more than two functional groups or two cations and anions, respectively, can not be coded with complete specificity. The isomers of organic compounds also can not be differentiated. A university level background in chemistry is required to use this coding system however it is fairly easy to learn for the individuals who intend on using the system for the purpose of analysis.  

Keefe AR, Bert JL, Grace JR, Makaroff SJ, Lang BJ, Band PR.A hierarchical approach to coding chemical, biological and pharmaceutical substances. J Toxicol Environ Health Part B Crit Rev. 2005;8(3-5):145-452.

