Clinical Vocabularies: Essential to the Future of Health Information Management

Susan H. Fenton, MBA, RHIA


The field of clinical vocabularies is incredibly dynamic. In this paper you will find topics and research from the past that support the principle of a clinical vocabulary as essential. The presentation at the conference will include the latest information regarding the available vocabularies.

This presentation will encompass a very brief overview of the history of nomenclatures and classification systems, as well as the difference between nomenclatures, classification systems, and vocabularies. It will focus on the projected uses of vocabularies in electronic health records (EHRs) and an orientation to concepts and data formalization. If you feel the need to know more about the history of classification systems, you can reference Huffman, 10th Edition, Chapter 9.

Brief History of Classification Systems

London Bills of Mortality and Florence Nightingale

It can sometimes be difficult to believe that the quest for classifying morbidity and mortality is quite old. It was 1532 when the London parishes began keeping death records. (Encyclopaedia Britannica [EB] Online)  In 1662, John Graunt, a merchant, wrote Natural and Political Observations...Made Upon the Bills of Mortality (1662). (EB Online)  His friend, Sir William Petty, was able to extrapolate from mortality rates an estimate of community economic loss caused by deaths. (EB Online)  Go to to see how this very old data is still used today.

It was two hundred years later (1863) when Florence Nightingale, in her Notes on a Hospital, wrote "In attempting to arrive at the truth, I have applied everywhere for information, but in scarcely an instance have I been able to obtain hospital records fit for any purposes of comparison. If they could be obtained ... they would show subscribers how their money was being spent, what amount of good was really being done with it, or whether the money was not doing mischief rather than good." (Barnett, p.1046)

What We Currently Use

International Classification of Diseases

William Farr and Jaques Bertillon were primarily responsible for developing the Bertillon Classification of Diseases in 1893. (World Health Organization [WHO] Web site) The French government convened the first international conference to update the Bertillon Classification in 1900. (WHO Web site) At that time, it was renamed the International Classification of Causes of Death. As we all know, this eventually became the International Classification of Diseases, 10th Edition . The WHO Web site has additional information about the original versions. Be cautioned that, beyond the first page, the documents are all in French. ICD-10, the WHO edition, is already being used in the US to report the cause of death.

ICD-10-CM has been developed in the US. The National Center for Health Statistics developed it. ICD-10-CM will include the disease codes only. ICD-10-PCS , the procedural codes, are being developed by CMS. Although the National Committee for Vital and Health Statistics (NCVHS) has recommended both of these for adoption, implementation of these two systems continues to be delayed.

Current Procedural Terminology

CPT was first released by the American Medical Association in 1966. However, until CMS mandated that it be used for reimbursement in the early 80s, it was used sparingly. It is important to keep in mind that CPT was developed for physicians by physicians.

Concepts and Data Formalization


What are concepts? Why are they important in clinical vocabularies? 

What are concepts? The International Standards Organization (1087) defines a concept as a unit of thought constituted through abstraction on the basis of characteristics common to a set of objects. For Jim Cimino, a clinical vocabulary researcher, it means that terms must correspond to at least one meaning, have no more than one meaning, and that meanings correspond to no more than one term, though synonyms are allowed. In essence, a concept is a specific idea or thought. It may have more than one term, but it cannot have more than one meaning.

Why are they important in clinical vocabularies?  Concepts are important in clinical vocabularies because they are the base upon which the vocabulary is built. For example, let's take the concept "lung." The concept lung" has one meaning--an organ used for breathing. As soon as you modify the concept "lung" with "right" or "left" or "lower lobe," you change the meaning, and therefore you have another concept. Concepts allow health information to attain the degree of detail necessary for accurate and complete understanding, as well as for use in computer applications.

As an example, let's look at the world of banking. When we go to the bank and withdraw a dollar, we know what a dollar is and so does the bank. There is no ambiguity. This is because there is a standard vocabulary. One dollar equals 4 quarters, which equals 10 dimes, which equals 20 nickels, which equals 100 pennies. Further, because each country has standardized its money, we can exchange money and information about the money. This is what we are striving for in healthcare.

Data Formalization

What is meant by data formalization? Briefly, I would like for you to consider all of the information/data found in a clinical information system. Now, I would like for you to consider all of the information that has the potential to be in a clinical information system and the different mediums in which it could be stored. We have video, films, voice, and tracings to name a few. At present we have no way to accurately formalize this data for storage and retrieval. It's an issue, a big issue.

Example:  If there were an adverse incident in the cardiology catheter lab and the only place where you could definitively tell what happened was on the cath film, how would you index and retrieve that film? How do cardiologists do studies now? (I'll tell you...they maintain their own databases).

If HIM professionals are going to be a leader in "classification" in the future, we are going to have to broaden our horizons to "data formalization." How can we assist the healthcare industry in formalizing all of its data so that it can be accessed and utilized?

Vocabularies in CPRs

The amount of clinical information to be handled is increasing almost exponentially. A recent study concluded that a general physician wanting to stay abreast of pertinent findings would have to read 19 articles a day, 365 days a year. (Balas) The level of detail we have been able to extract from our patient records is no longer acceptable. We need detailed data for the following purposes: the provision of clinical care, research support, medical decision making support, and to support data reporting to government and other authorized third parties.

Required characteristics

As various researchers have worked to develop a viable clinical vocabulary, it appears that many have come to the conclusion that what they initially need is some agreement about the major characteristics of this vocabulary. Cimino from Columbia University has put forth the following desired attributes for clinical vocabularies:

  • Content: To be useful, a vocabulary must have adequate content. This becomes especially difficult when the vocabulary must meet multiple needs.
  • Concept orientation: The International Standards Organization, in standard 1087, defines a concept as "a unit of thought constituted through abstraction on the basis of characteristics common to a set of objects." Cimino construes this to mean that terms must correspond to at least one meaning, have no more than one meaning, and that meanings correspond to no more than one term, though synonyms are allowed.
  • Permanence: A term cannot be deleted, ever. It might be labeled as inactive or old, but it remains in the vocabulary forever.
  • Nonsemantic Identifiers: Semantics is the study of meanings. Therefore, nonsemantic identifiers would have no meaning such as body system or hierarchy. Having identifiers with meanings limits expansion. Examine ICD-9-CM code 250 for an example of how meaning limits expansion and level of detail.
  • Polyhierarchy: This is the idea that different users may demand different, equally valid arrangements of the concepts in a vocabulary. For example, when doing research on viral pneumoniae, the infectious disease clinician may want to classify the disease by the type of virus regardless of body part, while the pulmonologist will want to focus on diseases that manifest themselves in the lungs.
  • Formal Definitions: Obviously, to use the vocabulary, the concepts must be defined. For a clinical vocabulary, they must also have explicit relationships such as "is a," "caused by," "site," and "treated with," among others. If established correctly, these relationships can be manipulated symbolically, allowing for automated processing.
  • Reject "Not Elsewhere Classified": Not Elsewhere Classified (NEC) is not the same as Not Otherwise Specified (NOS). NOS means the concept cannot be modified further, nor is there a higher level of granularity. NEC is based on a knowledge of all other concepts and excluding them, but you know what it is. Utilizing NEC harms the evolution of the data. An example is Legionnaire's Disease that did not receive its own ICD-9-CM code until the mid-90s. Longitudinal tracking for Legionnaire's prior to that time is impossible.
  • Multiple Granularities: The level of detail found in a vocabulary will often depend upon the purpose of the vocabulary. Given that most vocabularies are going to be multipurpose, multiple granularities are required.
  • Multiple Consistent Views: With a multipurpose vocabulary, different users will need and want to see views that support their needs, however, the information presented must be consistent across these different views.
  • Context Representation: A vocabulary is going to require a grammar or rules for manipulation. The vocabulary may be developed independent of these rules, but they will be required in order for it to be useable. An example is distinguishing when the clinician documents "The patient has a cold" versus "The patient is cold."
  • Graceful Evolution: Medical knowledge grows by leaps and bounds on a daily basis. A good clinical vocabulary is going to require clear, detailed descriptions of what changes occur, when and why.
  • Recognized Redundancy: Synonymy happens, especially with evolution. It can even be desirable. Vocabulary developers and custodians have to learn how to recognize it and use it to their advantage.

Campbell found that these attributes were not sufficient for his needs and added the following six items to a list of desired characteristics.

  • Copyrighted and licensed. Terminologies need to be licensed to prevent local modifications that result in meanings that become slightly altered and local lingo and hence, incomparable data.
  • CIS vendor neutral. All vendors should be able to access the terminology system, and it should not favor one vendor over others.
  • Scientifically valid. The terminology content has to be understandable, reproducible, and useful. All participants in healthcare should consider it to be valid.
  • Well maintained. There should be a central authority that provides new terms rapidly and from which terms can be requested. This will minimize local changes.
  • Self-sustaining. It would be optimal for a terminology to be publicly funded or supported by endowment funding. Failing that, the fees for a terminology should be proportional to the value it provides to its users.
  • Scalable infrastructure and process control. This is required for timely maintenance of the terminology regardless of the size of the organization.

Current and Future Standards

This section will briefly discuss several of the current and future standards. As with most issues in healthcare, this is a rapidly changing area.


The Systemized Nomenclature of Medicine, promulgated by the College of American Pathologists, is rapidly becoming a standard in electronic health records. It is a large health-related terminology. It has been endorsed by the National Committee on Vital and Health Statistics, as well as by the federal government's Consolidated Health Initiative. In May 2004, the National Library of Medicine concluded an agreement to make SNOMED publicly available and included it in their Unified Medical Language System. (SNOMED Web site)


The Unified Medical Language System is a publicly available resource designed to "facilitate the development of computer systems that behave as if they 'understand' the meaning of the language of biomedicine and health." The UMLS has three knowledge sources: the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon. They are not designed for a specific type of software, but rather to be used by developers in their software. It can be downloaded from (UMLS Web site)


The Logical Observation Identifiers Names and Codes (LOINC) resulted from a voluntary effort to standardize laboratory information exchange. It has been enormously successful. In May 2004, LOINC was adopted as a standard by the Consolidated Health Initiative of the federal government. Additional information can be found at (LOINC Web site)

The Future for HIM

In talking with experts and researchers in the field of clinical vocabularies and classification systems, the words "data formalization" are increasingly heard. In essence, what they are saying is that, as the computerization of healthcare data increases, so too will the need to be able to process it and use it. This need will exist not only for diagnostic and procedural textual information, but also for images, voice recordings, photographs, and as yet undiscovered types of data. This is a huge opportunity for HIM professionals who are the coding or data formalization experts.

What do we need to do?

  1. As a profession we need to recognize that ICD-9-CM (ICD-10-CM, too) and CPT-4 do not provide the level of detail needed for decisionmaking (patient care or administrative) in healthcare today.
  2. We need to train ourselves in these established systems, as well as the new ones being developed all the time. Watch what is being done in the Consolidated Health Initiative in the federal government.
  3. We need to become better educated about natural language processing. AHIMA has issued several documents about NLP this summer.
  4. We need to become involved in the standards organizations that work on vocabulary issues (HL-7, ASTM, and XML especially).
  5. We need to lead the industry in finding "data formalization" strategies for the present and the future. This is not a static field; it will be dynamic.
  6. We need to become the recognized implementers and maintenance personnel of clinical vocabularies in healthcare organizations.


Balas, EA, Boren, SA. "Managing Clinical Knowledge for Healthcare Improvement." The Yearbook of Medical Informatics , 2000, pp. 65-70.

Barnett, O.G., et al, The Computer-Based Clinical Record --Where Do We Stand? Annals of Internal Medicine , 119(10), Nov 15 1993, pp. 1046-1048.+

Campbell KE, Hochhalter B, Slaughter J, Mattison J. "Enterprise Issues Pertaining to Implementing Controlled Terminologies," IMIA WG 6 Conference, December 1999.*

Cimino, J.J. "Desiderata for controlled medical vocabularies in the twenty-first century." Methods Inf Med. 1998 Nov; 37 (4-5):394-403. Review

Encyclopaedia Britanica Online.

Forrey, AW, McDonald, CJ, DeMoor, G, et al. "Logical Observation Identifier Names and Codes (LOINC) Database: A Public Use Set of Codes and Names for Electronic Reporting of Clinical Laboratory Results." Clinical Chemistry, 42:1, 1996, pp. 81-90

Health Level 7 Web site,, accessed May 14, 2003.

LOINC Web site, Regenstrief Institute for Health Care,, last accessed June 16, 2004.

SNOMED Web site, College of American Pathologists,, last accessed June 16, 2004.

Unified Medical Language System, National Library of Medicine,, last accessed June 16, 2004.

World Health Organization Web site.

Source: 2004 IFHRO Congress & AHIMA Convention Proceedings, October 2004