Speech Recognition in the Electronic Health Record (2013 update)


Many technologies exist to assist with better clinical documentation and speech recognition technology (SRT) is one solution. Since SRT uses mathematic probabilities, the technology can be complicated. The goals of its use in healthcare today, however, are clear: Increase productivity, decrease costs, and improve documentation while placing it closest to the point of care.

This practice brief focuses on increasing the understanding of how speech recognition works by highlighting the driving forces that are shaping current and future applications of this technology and its associated benefits and risks. This practice brief also provides tools including illustrated work flow, tasks and skills, and best practices to assist in efficient and effective application of SRT for day-to-day operations.

Driving Forces

Payment reform. Quality measures. Meaningful use. EHR adoption. Transition to ICD-10-CM/PCS. Health information exchange. Consumer engagement. Personal health records and patient portals. What do these areas have in common? They all point to the increasing need for accurate, complete, and accessible data in an environment that requires real-time documentation.

Documentation should improve the quality of the patient encounter and create efficiencies.

In the past, handwriting provided an immediate record of care. Legibility came to the forefront as an issue and documentation was often not as comprehensive when compared to a well-constructed dictated note. Today, speech recognition or speech-to-text is used in conjunction with the EHR and is considered a viable option in the face of the growing demand for near-real-time data.

Front-end speech recognition (FESR) and back-end speech recognition (BESR) technologies can help in the production of legible and comprehensive document(s). It also serves as a productivity tool to help lower costs and increase productivity, especially when compared to the manual labor required by traditional dictation and transcription.

Emerging Roles

In 2012, The Association for Healthcare Documentation Integrity (AHDI) coined the term "healthcare documentation specialist (HDS)," which they define as "an umbrella term that includes medical transcriptionists, speech recognition editors and QA specialists."1

For the purposes of this practice brief, the term "HDS" will be used when referring to this role. Visit AHDI's website for more information on the role of the HDS (www.ahdi.org).

The use of medical scribes is increasing in various healthcare settings due to growing EHR implementations and the need for timely, detailed point-of-care documentation.

There are distinct differences between HDS and scribes. HDS professionals work in a "one-way process" while scribes work in a "two-way process."2 HDS professionals perform their role by listening to the dictated voice in a passive sense—creating documents that have often been processed by a speech recognition engine, resulting in work that contains free text and templates. On the other hand, scribes work in a busy clinical environment, communicating with staff as well as patients and their families. Scribes make use of structured documentation, including templates and checklists, and are most likely to be present in the examining room with the provider. However, an emerging trend is utilizing scribes in a virtual environment where the scribe is seeing and hearing information that takes place in the examination room via telephony and internet technologies.

Scribes may also perform other various duties including assisting the provider in navigating the EHR, responding to messages as directed by the provider, and locating information for review such as lab results.

For more information on the role of scribes, see AHIMA's "Using Medical Scribes in a Physician Practice" practice brief.

Benefits, Risks, and Challenges

SRT can enhance clinical documentation in multiple ways. The demand for accurate and timely documentation is increasing to ensure optimal patient outcomes. To keep up with documentation requirements, implementing SRT may be the key to supporting healthcare clinicians and HDS professionals as more productive participants in the documentation process—keeping pace with increased demands. There is clear interest and movement to use speech recognition combined with the EHR in the healthcare setting today.

In order to analyze the effect speech recognition can have in delivering increased documentation faster and more accurately, while reducing cost, one of the first steps prior to the development of a return on investment (ROI) is to assess the readiness of the medical staff in terms of their receptiveness to a transition of this magnitude. If they are proponents of full application of the technology, which means a commitment of learning to use the system and allocating resources to apply this in practical applications, ROI can be structured around an objective analysis of both the benefits and the risks.

To garner the most benefit from SRT, consistent policies and procedures must be in place to address dictation best practices. In addition, a style guide and consistency when applying edits are critical to the success of the process.

The Benefits

In order to gain the most benefits from any speech recognition technology solution, organizations must determine their expectations from both the administrative and medical staff perspective. Organizational interoperability and integration with the facility's EHR system are key factors. A facility should consider a single department implementation; once proven successful, additional departments may be scheduled to follow. A phased-in approach often proves more successful than an enterprise-wide approach.

To achieve the benefits of back-end speech editing, a facility must consider the partnership between the author and the HDS. Physicians should be encouraged to utilize dictation best practices, providing complete information to the HDS staff to assure an accurate medical document. Adopting a style guide and using uniform approaches in documentation will give staff consistency in editing and will assure expected gains. For more information, click here to access the AHDI Style Guide.

Improved Turnaround

Benchmarking environments and productivity within the vendor selection process is critical. Understanding how the facilities compare with others in report turnaround time can be a determining factor to implement process improvements.

Facilities often experience transcription delays and when the information contained in the reports influences treatment decisions, the delayed dissemination of the information can hinder clinical decision making even without prolonged turnaround. EHR implementations require real-time results for clinical decision making at or near the point of care.

Increased Efficiencies and Cost Savings

When HDS professionals are used as medical text editors for a transcript generated by speech recognition on a server, reduced costs expressed in productivity gains for HDS professionals are based on the expectation that the HDS will no longer be required to manually produce the entire dictation. Rather, the HDS will review the voice file to the text provided and edit for missing or incorrect content, as well as format the document.

Prior to implementing BESR, organizations should evaluate current pay models and the impact that BESR may have whether they are using in-house staff or outsourcing through a third-party vendor. It is important to recognize that the medical language skill sets do not change when moving from traditional transcription to speech editing. BESR is merely a tool to increase productivity. Compensation changes should be performed in an incremental manner as overall staff productivity increases over time.

BESR differs from vendor to vendor. What may work for one setting may not work in another setting. Each organization should define organizational expectations to help drive the best fit in vendor and technology selection. Checking references with other organizations similar in size and type during the RFI/RFP process are necessary steps towards making a selection decision.

Upon implementation, providers must "qualify" for BESR prior to utilization and their results should be validated in order to achieve the best results. Results will vary and the editing needed for each individual provider will be inconsistent.

It is a common outcome that documents will have appropriate formatting and punctuation. On the other hand, server-based speech recognition transcripts will need to be edited and productivity gains should be measured against these standards.

Any productivity increases will be directly proportionate to factors that include: quality of physician input, SRT processor recognition of input, and any software applications used.

Contrary to popular thinking, the use of the technology holds no guarantee that cost savings will automatically be recognized. Therefore, it is important that organizations determine current costs and pay structure in order to benchmark and track future cost savings. HDS may see a negative impact to their compensation when SRT technology is implemented.

Measuring savings is difficult when evaluating the physician's time spent with front-end speech recognition. The entire process of dictation, review, and approval of a traditionally transcribed document does not take place all in a single time block, as it does with FESR. While a physician may perceive that the FESR process takes longer, consideration must be given to the fact that the document can be signed and distributed after the author finishes dictating and reviewing the document. With traditional transcription, the time for this process is spent after the transcription is completed. Whether dictation is performed in the traditional way or with SRT, it is less time-consuming than handwriting. Electronically produced records are legible and usually more detailed and complete. SRT has the capability of enhancing physician productivity, leaving more time for direct patient care.

Timely Clinical Decision Making

The clinical decision-making process is optimized and in fact, depends on accurate, timely and complete information. The use of speech recognition can reduce the amount of time it takes for information to be made available to other healthcare providers. What may have taken hours in a traditional dictation setting can often be accomplished in minutes or in a shorter amount of time. In serious trauma cases and critical care cases, prompt and accurate medical treatment determinations can not only save lives, they may substantially improve patient outcomes and reduce patient care days as well.

As healthcare evolves, providers and HDS professionals will both need to change along with the new approaches for documentation. As the transition from traditional documentation to editing through the use of speech recognition evolves, the goal is to deliver quality documentation at or closest to the point of care.

The Challenges

The challenges in implementing any new technology can be reduced through a thorough analysis of the balance between technology and the downstream expectations of the application. By now, there is a plethora of evidence that speech recognition, when used in the right setting with the right quality practices is extremely effective as a time and cost saving practice. But as with any technology application, it is not a solution if not properly managed.

Many newer speech recognition solutions work well in a wide variety of settings. Physicians with heavy accents or unique dictation habits can become "qualified" with the speech recognition engine. With newer technologies that go beyond recognition toward the goal of natural language understanding (NLU) and other highly sophisticated processes, these advances have enabled more providers to participate.

However, there remains a fairly large percentage of work that is not ideally suited for speech recognition. When this happens, and it is not identified, the resultant documents may actually become more work and thus, more costly to rework. As Dr. W. Edwards Deming said, "automating a process that produces junk just allows you to produce more junk faster."

This is where careful attention needs to be paid to the speech scores of potential participants. Forcing providers to use the speech recognition system before they have qualified with an established accuracy or identification compliance score will add the unintended consequences of poor quality, increased costs, and other downstream unacceptable results.

Provider Engagement

Organizations have used a number of processes and practices in gaining provider engagement in implementing speech recognition. The gamut spans from early project planning meetings to complete unawareness in transparently implementing speech recognition so as not to impact physicians in their day-to-day practices.

Today's medical provider is extremely technologically aware. Organizations that are counting on the best return on their investment should allow providers to engage and fully participate in advancements such as speech recognition implementation.

Start with a physician champion who is involved with and aware of the potential cost and time savings. This individual can help communicate the benefits to the medical staff and get their buy-in. This is a critical component to the successful launch or even continuation of the program. Failure to identify a champion of change can have severe impacts that can linger well beyond the beginning of a new program.

Implementation Costs

SRT can be a costly investment. Before decisions are made regarding such capital expenditures, a facility will need to look at many options, consider varying technology solutions, identify how they will fit into the overall EHR organizational strategy, and explore future upgrades and maintenance costs associated with the technology.

Additionally, optional enhancements such as capturing discrete data elements through the use of NLP and NLU, mobile device usage, and the system's ability to integrate with the EHR and associated decision support systems will need to be evaluated.

Natural language processing (NLP) is widely recognized in computer-assisted coding (CAC) applications, but many healthcare organizations are also leveraging speech recognition combined with NLP to promote data capture, improve turnaround times, streamline workflow, and offer advanced analytics and reporting to all users of the health record.

Careful consideration must be given to how the new workflow will change current processes including handling electronic signatures, narrative dictation in the EHR, and archival and destruction policies and procedures.

Technology Mismatch

A technology that does not align with an organization's needs could be a catastrophic setback. Having direction and support from administration and information technology when adopting SRT is a determining factor in the success of the project. When planning to implement SRT, identify the users' expectations in terms of input, time of usage, and willingness to be trained. Obtain a commitment from all stakeholders to use the system until the output quality has reached expected levels. Selecting a technology that is scalable to the expectations and widespread usage envisioned in the initial ROI will be important in selecting and deploying the technology. Investing in a system that does not become fully operational may be worse than making the decision not to adopt the technology at all.

Completion Process

Every organization considering SRT implementation needs to fully review and research the options and their implications before investing in hardware, software, and training. One area that will need extensive evaluation is how the content will be reviewed, edited, and completed. The transition from transcriptionist to medical editor will initially affect turnaround times and delivery of the documentation to the end user(s).

Every organization must determine who will be responsible for editing and making corrections and careful thought must be given to which system(s) are being integrated with the technology. SRT usage will not overcome disorganized dictation, poor grammar, or missing or overused punctuation so editing and auditing of the content is recommended to ensure accuracy of the documentation.3 In addition, some section headings and formatting may have been inadvertently omitted, creating missing elements to the report which may require additional dictation or addenda depending upon where the document is within the completion process.

In some cases, physicians are willing to take on the task of creating and editing to fully own and manage the process from beginning to end and to have the ability to disseminate the document immediately. However, because the physician is usually one of the highest paid professionals within the organization, this decision requires careful consideration. Medical staff has to be willing to be engaged and take on this responsibility, especially knowing that editing time takes physicians away from providing patient care.4

Quality Control

Editing text, whether done by a physician or HDS, reduces content errors in patient reports—provided it is done meticulously and reviewed thoroughly prior to signing. In current transcription practices, many transcribed reports are not reviewed closely before a signature is applied by the physician. Standards for ensuring accuracy with all documents produced using SRT call for third-party editing. A facility may want to consider implementing a QA program for physician front-end speech created documentation in addition to their current transcription QA program.5 Quality programs should benefit providers and documentation specialists alike.

Physician training is critical to drive quality documentation:

  • Ensure any smart phrases, routine text, and templates are carefully reviewed and entered into the EHR prior to implementation
  • Approved Abbreviation List must be followed
  • Train on the review process prior to signing in order to identify inconsistencies.

For more information on quality and best practices, refer to the "Healthcare Documentation Quality Assessment and Management Best Practices" report updated in March, 2011.

Editing Costs

Unless the recognition accuracy is very high and the software package has been enhanced to speed the process, the amount of time it takes to edit and format a document transcribed by server-based SRT could exceed the time it takes to transcribe manually. Current technology will not generate an acceptable level of accuracy for all users, which will require either continued manual transcription or combined use with a system that reduces the amount of free-text dictation (templates, EHR, etc.).

Using Speech Recognition

Front-End Speech Recognition (FESR)

FESR is the term generally used to describe a process where the provider generally speaks into a microphone or headset attached to a PC. The recognized words are displayed as they are recognized, and the provider then corrects misrecognitions.

The advantage is that the provider is in control of the entire process—the document is dictated, corrected, and authenticated all in one sitting. This is sometimes referred to as "once and done."When the document is complete, the document is ready for distribution. Front-end speech recognition is also the most effective use of SRT with an EHR, enabling the dictator to respond to prompts from the EHR for more complete and accurate documentation.

FESR may affect a provider's billable activities, however. Training the speech recognition engine can be a time-consuming process that takes time away from patient care. In many FESR systems, the provider can choose to send the voice and text to a medical editor for a quality check instead of performing the editing duties themselves. While sending the document to be edited by a third party may negate speeding up the distribution of a document after it has been authenticated, many facilities find this added step of quality control a necessity when striving to maintain documentation integrity.

Back-End Speech Recognition (BESR)

Server-based speech recognition takes place after the dictator has created audio input in much the same way as usual, and the process then takes place at the server level, or on the "back end." All speech recognition programs currently on the market have enrollment functionality. Providers should meet a certain recognition rate percentage (as deemed by the particular technology solution) in order to qualify for using speech recognition.

The end user could record audio and use the transcribing function of the application, then edit the final document. In most cases, server-based speech application refers to a speech recognition engine processing the audio to text, sending the draft text and a synchronized speech file to an editor for correction and formatting, and then inserting the document to continue the work flow.

The advantage of server-based speech recognition is that it does not affect the end user in terms of dictation habits or time; the author continues to dictate as always. It also has the potential to make HDS more productive, requiring fewer people to generate more documents. The time commitment to training the speech recognition engine is most often taken from the dictating provider and placed with the HDS specifically trained in medical language and speech technology. The captured audio file can be used to train and retrain the SRT engine for better recognition in a shorter time frame. The key to the speech engine's capacity to "learn" is based largely on how closely the voice captured matches the transcribed document. If HDS have to perform extensive editing for context and appropriate sentence structure as well as intended (what was meant, but perhaps fragmented during dictation), the speech engine cannot interpret these types of changes particularly well. This can result in limited improvement for these particular providers. Two good examples of this would be when dictation is created with a lot of background noise or sidebar conversations taking place. Another example is when providers instruct the HDS to go back and add or delete content. Even the best SRT cannot distinguish intended from unintended voice capture.

While server-based SRT seems to be most attractive to physicians in terms of clinical documentation, unfortunately it has some major disadvantages to others in the documentation chain. The first is that, without extremely good recognition accuracy and appropriate editing tools, documents produced may require more time to edit from the synchronized audio file than if they were just transcribed.

Essential components that make an EHR attractive are also lost in server-based SRT. If documentation improvement is the goal, server-based speech recognition does not do anything to move a dictator toward that goal. (see also: Appendix A)

Blended approach

Templates, Macros, and Partial Dictation

Dictating "free text" lends itself to more errors. Taking advantage of available technology, end users can improve their recognition accuracy and effectively reduce dictation time.

A template is a standardized document outline that includes any number of elements. Some companies sell templates, value-added resellers (VAR) will develop templates for a user, or a user can write his or her own template. In speech recognition, a template includes fields that enable a user to skip from one field to the next using speech commands.

Macros are a series of keystrokes and/or commands that are executed on command. Speech programs make it easy to use macros to generate large amounts of text using only a few commands that are easily recognized. Radiology has adapted readily to speech technology because of the limited amount of terminology, but also because of the large number of "normal" results, which can be programmed as macros.

Partial dictation is growing in popularity especially when used in conjunction with the EHR to capture details of the patient encounter. One method to provide additional narrative is for the author to dictate directly into the structured fields within the EHR. Examples may include the history of present illness (HPI) or assessment and plan (A/P) part of a template.

Another method is using BESR to complete an EHR template whereby the narrative is dictated on a server-based dictation system using BESR through a series of prompts and then interfaced into a predetermined, specific location on the intended template.

The intelligent use of templates and macros facilitates end-user acceptance of speech recognition as a device to spare more time for patient care while creating more complete documentation, faster. Use with an EHR that has been carefully selected with speech activation in mind accomplishes the same goal, allowing the clinician to document the record completely, accurately, and in a timely manner while not detracting from the primary purpose of patient care.


Audio Input

There are specific audio input requirements for successful speech recognition. The best audio input takes place on the same sound card the recognition engine will use for transcription, but in most clinical settings this is not going to be the case. High-quality handheld microphones, headsets with attached boom, and array microphones for hands-free or headset-free dictation provide the best audio input for front-end speech recognition. Telephone dictation works well for back-end speech recognition, but caution should be used when evaluating cellular devices for dictating. Mobile devices such as hand-held digital recorders equipped with dictation modules and tablet PCs can generate acceptable digital audio files for speech recognition. All devices should be noise-canceling devices or the recognition accuracy will be degraded.

Processor, Memory, and Sound Card

Speech recognition is a memory-intensive process, whether it takes place on the server or on a dictator's PC. All processors and sound cards are not alike, and most speech recognition companies have specific requirements for what works best with their product. If you are considering using speech recognition, do not purchase computer hardware until you have consulted with an expert in speech recognition or the software company (see "Critical Success Factors" below). In addition, prior to implementing speech recognition across a wide area network (WAN), careful evaluation should be given to how the software will affect operations including desktop support.

Definition of Accuracy

Documentation produced by speech recognition should be held to the same standards of accuracy as medical transcription. In practice, some clinicians are willing to accept certain upfront errors in exchange for the benefits speech recognition delivers. For example, providers may be willing to accept brief clipped phrases such as, "brought to the OR prepped and draped" instead of, "the patient was brought to the OR and was prepped and draped." In addition, commas, semi-colons and correct verb phrases may not be important to the dictating provider. Each facility should define acceptable standards of accuracy for all documentation, whether it is handwritten, checked off a form, dictated as free text, dictated for processing by speech recognition (front-end or server), or entered into an EHR by keyboard or speech commands.

Critical Success Factors

The critical success factors outlined below contributed to providing the following benefits to facilities deploying speech recognition:

  • Improve the level of success realized
  • Minimize the risks associated with such a project
  • Provide a smoother transition from the legacy system

Factors include:

  1. Define measurable objectives prior to implementation
  2. Establish a target return on investment (ROI), including time frame for achievement
  3. Secure executive sponsorship and appoint a physician champion
  4. Actively involve users from all levels throughout the project
  5. Designate both a technical and functional system administrator
  6. Identify key benefits for end users
  7. Align the HDS professionals' compensation with the new technology
  8. Develop an operational plan in advance
  9. Provide key stakeholder updates regularly during the project
  10. Establish benchmarks prior to deployment for post-deployment analysis and comparison

(see also: Appendix B, "Best Practices for Using Speech Recognition")

Process Owners


Providers do not want to wait for a slow computer or deal with poorly maintained and aging hardware. They do not want to wait for a workstation any more than they want to hear a busy signal on the phone. Training must be focused, relevant, and efficient to keep and hold their interest. Otherwise, they will resist spending the time to learn a speech recognition program and edit to their own dictation, which would enable faster turnaround times.

Healthcare Documentation Specialists (HDS)

Many HDS professionals, who have served in a traditional role, often view speech recognition editing as boring and tedious. It is important to note that "there is a different eye/ear/brain coordination dynamic at work in SRT editing compared to transcribing, which often makes it more challenging to identify errors in an SRT-draft document."7 This should be recognized when transitioning a professional into this role. During the training period, planning must include for no productivity gains as well as lost productivity. Considerations should be made for those compensated on a production-based pay program and training is a key element to success.

Health Information Management (HIM)

HIM has the responsibility of developing and securing approval for the many policies and procedures surrounding medical transcription content, process, and requirements. Whether in the traditional setting or with the implementation of SRT, these policies must be clearly written and in current practice.

HIM routines and processes may remain very much the same with the implementation of SRT. The impact related to tasks and processes is anticipated to be minimal. Traditional tasks such as charting reports and deficiency analysis for both missing dictation or unsigned reports remain the same with either technology and will continue to need to be done. HIM staff will likely be involved with training physicians and other clinicians on the use of the new technology as it relates to record completion, and effective training will take time.

One potential positive effect of SRT is that there may be fewer unsigned reports if the dictator originates and completes the dictation at one time. A possible outcome of the productivity gains with SRT may be fewer reports missing from the chart at the time coding is done, leading to reduced bill-hold days, fewer numbers of medical records in an incomplete status, and reduced number of days records are incomplete.

Information Technology (IT)

Speech recognition is a technology-heavy application and requires excellent technical skills to implement the technology and support the hardware and configuration requirements. IT will be a key stakeholder in the use of SRT.

Medical Scribes

The use and popularity of the scribe is increasing in healthcare due to increased need for documentation that is detailed and at the point of care. They are a key member of the documentation team who may be SRT users. For more information on scribes, refer to the "Driving Forces" section above.

(see also: Appendix C, Tasks and Skills List)


1 Association for Healthcare Documentation Integrity. "Statement on the Roles and Value of Healthcare Documentation Specialists and Equitable compensation Practices."

2 Gates, Mary Anne. "Medical Scribes Make Their Presence Felt." For The Record 24, no. 21 (2012): 14.

3 American Society for Testing and Materials International. "E 2364 Standard Guide to Speech Recognition Technology Products in Health Care." Viewed August 2013.

4 Basma, Sarah et al. "Error Rates in Breast Imaging Reports: Comparison of Automatic Speech Recgonition and Dictation Transcription." American Journal of Roentgenology 197, no. 4 (2011).

5 American Society for Testing and Materials. ASTM E31.22 Standard Guide to Speech Recognition Products in Health Care [draft].

6 Runyon, Barry. "Hype Cycle for Healthcare Provider Technologies and Standards." Gartner. 2011.

7 Association for Healthcare Documentation Integrity. "Speech Recognition Technology."

Appendix A: Work Flow Diagram

Traditional Dictation Processing SRT with MT Text Editors SRT with Provider Text Edit
Voice input Voice input Voice Input
Digital system storage Text file created Digital on-screen file
Pending work MT text edit Provider text edit
Assigned to MT Complete (yes/no) Report
Complete? (yes/no) Yes: report Sign
Yes: report No: query physician Patient record
No: editorial review Report  
Complete? (yes/no) Sign  
Yes: report Patient record  
No: query physician    
Patient record    

Appendix B: Best Practices for Using Speech Recognition—Defining a Successful Project

Facilities deploying speech recognition technology measure success differently, based on the unique objectives for that facility. Therefore, there are many different ways to define success, but this definition must be based on the original objectives established.

In general, most facilities will define success based on improvements in one or more of the following areas:

  • Decreased costs
  • Improved documentation and patient care
  • Increased customer (either patient or physician) satisfaction
  • Improved compliance with state, federal, or accreditation standards

Sites should follow a consistent set of practices during the implementation and go-live process. The critical success factors outlined below contribute to the following benefits of deploying speech recognition:

  • Improve the level of success realized
  • Minimize the associated risks
  • Provide a smoother transition from the legacy transcription system

Critical Success Factors in Deploying Speech Recognition

  • Define measurable objectives prior to implementation
  • Establish a target return on investment (ROI), including time frame for achievement
  • Secure executive sponsorship and appoint a physician champion
  • Actively involve users from all levels of the project
  • Designate both a technical and functional system administrator
  • Identify key benefits for end users
  • Align the healthcare documentation specialist's (HDS) compensation with the new technology
  • Develop an operational plan in advance
  • Provide key stakeholder updates regularly during the project
  • Establish benchmarks prior to deployment for post-deployment analysis and comparison

Define Measurable Objectives Prior to Implementation

Clearly defined business objectives serve several purposes, including:

  • Maintaining project focus
  • Providing measurements for success
  • Dramatically improving the likelihood of achieving desired results

To succeed, the business objectives must be specific. Rather than "decrease turnaround time on reports," include target goals such as "attain a 30 percent decrease in turnaround time within 90 days of go-live." (Consider the establishment of a "stretch" objective, with specific rewards provided for the accomplishment of the "stretch" objective.) Establishing a specific objective will enable the establishment of measurement criteria and mapping goals to that objective.

The next step is to examine the current methods and processes to identify specific areas where quantifiable gains can be realized. In the case of the above-stated objective, the measurement criteria and goal may be the usage of templates and macros, with the goal of "Analyzing the current usage of templates and macros and increasing this by 25 percent." Attainment of multiple goals of this nature would enable meeting the stated objective.

Each measurement criterion will require benchmarking prior to system go-live. Doing so provides the necessary comparative data to work post implementation and serves as a baseline against measurement of successful attainment of objectives. In some cases, the realization of the stated objective is due only in part to the deployment of the technology, with the rest of the benefit realized from process re-engineering.

Summary Example


  • Reduce report turnaround time by 30 percent within 90 days of go-live
  • First measurement criterion: use of macros and templates


  • Increase usage by 25 percent
  • Establish a target return on investment (ROI), including time frame for achievement

Establish a target ROI, including timeframe for achievement

Establishing a specific ROI goal, along with a timetable for attainment, is one of the strongest ways to ensure proper focus by all interested parties. In line with the principles outlined above, the ROI objective must be specific. An example of an acceptable objective would be "Reduce transcription costs by 25 percent annually on a pro-forma basis within six months of go-live." This is in comparison to a vague objective of "Reduce transcription expenses."

The identification of specific areas where quantifiable gains can be realized occurs by mapping measurement criteria and goals to a specific objective. For example, one measurement criterion is outsourcing. The overall goal is to reduce this component of transcription by 50 percent within six months. A specific goal may state, "analyze the current usage of outsourcing and reduce this by 50 percent within six months of go-live."

Summary Example


  • Reduce transcription costs by 25 percent annually on a pro forma basis within six months of go-live
  • First measurement criterion: Use of outsourcing services


  • Decrease outsourcing usage by 50 percent within six months of speech recognition system go-live

Secure Executive Sponsorship and Appoint a Physician Champion

Deploying speech recognition is strategic in nature and therefore requires commitment from the executive level of the organization to succeed. The executive sponsor, at a minimum, reinforces the importance of the project to the key stakeholders. This person demonstrates senior management's long-term commitment to the project and serves to bring the project to a level of strategic importance to the organization. The executive sponsor can also secure additional resources if needed or participate in the timely resolution of unforeseen issues.

The speech recognition project is also clinically related, requiring the identification and appointment of a physician champion. This member of the medical staff believes in the project, acknowledges the challenges, provides political support, and educates and communicates with staff to help overcome any barriers to progress. The physician champion is an instrumental member of the project team.

Actively Involve Users at all project levels

The initial success of the system deployment depends greatly on the level of acceptance demonstrated by the users of the system. In the case of speech recognition, this would include the transcription professionals and physicians. These users should play a key part in selection and deployment planning for this type of system.

Key representatives from each of the two groups outlined above should be involved in the project planning and assist in identifying the specific benefits for their particular area. The goal is to develop these users into system "champions" so that advocates are in place at all appropriate levels within the facility.

Designate Both a Technical and Functional System Administrator

In addition to the resources required to plan for and implement the system, organizations must designate appropriate ongoing resources to ensure continued project success. The two primary resources are a technical and functional system administrator. These roles are typically filled by two separate individuals, as the skill set for each task varies considerably. The actual time commitment of each job role depends upon the overall size and scope of the system deployment.

Identify Key Benefits for End Users

Human nature is to resist change. Therefore, it is of critical importance that the value of the project for physicians and HDS professionals be clearly defined up front. If the physicians perceive the project as merely a cost-saving effort on the part of the organization, there may be a high level of resistance on their part.

To create a positive environment, it is imperative to:

  • Identify improvements that will be directly realized by the physicians and transcription professionals
  • Develop and document agreement(s) as to the importance of these improvements
  • Emphasize these improvements as key project objectives. This will ensure the continued support of the project from both of these key groups and develop system champions among them.

Align Healthcare Documentation Specialist Compensation with New Technology

For a speech recognition system to achieve the maximum level of success (in the shortest amount of time), those most affected must be compensated and rewarded appropriately. The introduction of this technology represents a change for the transcription professional—creating new activities and responsibilities.

Adjusting compensation will accelerate the adoption of the desired new behaviors, which in turn ensures attainment of ROI goals.

Develop an Operational Plan in Advance

An effective operational plan will consist of three primary areas:

  • Recommendations to improve the existing documentation process
    • The existing process must be as efficient as possible prior to the introduction of any new technology. If the manual process is inefficient, the automated process will likely be inefficient as well. The full potential of the new technology will never be realized by integrating it into an inefficient process. Recommendations made in this area are focused on improving the manual documentation process and are irrelevant to the deployment of new technology.
  • Process changes required to accommodate the new technology
    • To fully benefit from the new technology, certain processes may need to be modified, and others may need to be introduced. Recommendations made in this area are exclusive to the introduction of the new technology and are designed to maximize the benefits that can be derived from deployment.
  • Optimal method and rate to integrate speech recognition into the documentation process
    • This component of the operational plan details the pace at which the new technology can be integrated successfully into the documentation process. The unique factors in place at the facility are incorporated into this section, and optional scenarios should be developed, along with the effect and potential consequences of each.

Provide Key Stakeholder Updates Regularly During the Project

End users often resist change, yet their acceptance of this technology is vital to the project's success. A clear strategy to address their concerns prior to go-live is important. A successful strategy to accomplish this goal includes the following components:

  • Provide informational sessions for HDS and physicians to demonstrate how the system meets the objectives of their respective areas
  • Conduct interactive updates with user representatives from all key stakeholders to keep them current on the progress of the project during the implementation phase
  • Once the system has gone live, provide regular communications to the end users emphasizing the project's benefits and future plans

Establish Benchmarks Prior to Deployment for post-deployment analysis and comparison

To accurately measure the success of the project, it is critical for pre-deployment benchmarks to be established. This provides the baseline for comparison once the system is installed, which will enable tracking of success levels for various components of the project.

Additional benefits of analyzing these benchmarks post-deployment include:

  • Justification for deploying the application in additional areas
  • Validation to end users of concrete benefits realized as a result of their cooperative participation in the project
  • Incentive to users or departments to adopt the technology

Appendix C – Tasks and Skills List

Job Role

Current Tasks and Skills

New Tasks and Skills


  • Operate audio capture equipment
    • Digital telephone device
    • Digital hand-held device
    • Analog hand-held device
  • Dictate patient encounter
    • Identify patient
    • Identify report type
    • Dictate details of encounter
  • Receive and review transcription
    • Revise incorrect or missing text
    • Resend for changes or approve and sign off

Back-end speech recognition (BESR)

  • In addition to current tasks and skills listed, authors should be made aware that speech recognition is being used to transcribe documents and trained in organizing and modifying their dictation slightly for better accuracy.

Author Continued


Front End Speech Recognition (FESR)

  • Physician is editor scenario
  • Operate speech recognition program
    • New user: enroll new speech user file
    • Pre-enrolled user: open speech user file
    • Open appropriate template
    • Begin dictating
    • Identify patient
    • Correct errors
    • Dictate formatting, including punctuation
    • Dictate details of encounter
    • Review
    • Revise incorrect or missing text
    • Approve
    • Sign off

Author Continued


Front End Speech Recognition (FESR)

  • Physician sends to third party editor scenario
  • Operate speech recognition program
    • New user: enroll new speech user file
    • Pre-enrolled user: open speech user file
    • Open appropriate template
    • Begin dictating
    • Identify patient
    • Correct errors
    • Dictate formatting, including punctuation
    • Dictate details of encounter
  • Transmit to third-party editor
  • Receive edited document
    • Review
    • Approve
    • Sign off

Healthcare Documentation Specialist (HDS)

  • Knowledge of grammar, punctuation, and style
  • Knowledge of English/American language usage
  • Knowledge of medical language, including all specialties and work types
  • Knowledge of computer word processing and performance enhancement applications
  • The bullets listed in current tasks and skills column are still relevant with the adoption of SRT

Healthcare Documentation Specialist (HDS) Continued

  • Ability to listen, interpret what is heard, and keyboard entries to software application
    • Input cues from physicians, including ESL, rapid speech, and implied and direct instructions
    • Query for ambiguous or conflicting elements and/or instructions
    • Verify patient data and correct event
    • Verify geographic-specific information and proper nouns
    • Identify poor audio quality or other barriers to accuracy
    • Prompt dictator or other appropriate management regarding errors in dictation
  • Transmit document to quality assurance or originating dictator as appropriate
  • Ability to operate speech recognition transcription editing program
    • Listen to dictation carefully and compare to transcribed document
    • Make appropriate corrections and changes
    • Input queues from physicians, including implied and direct instructions
    • Verify patient data and correct event
    • Prompt dictator or other appropriate management regarding errors in dictation
  • Transmit document to quality assurance or originating dictator as appropriate
  • Automated program transmittal of recognition correction for continuous accuracy training of speech user file

Health Information Management


  • Establish policies and procedures surrounding dictation guidelines
  • Define required content of all reports, including section headings and reports that are required for specific admissions
  • Establish turnaround time frames for every work type
  • Establish policies and procedures regarding the use of standard (canned text) reports. Gain medical staff approval of contents of each canned text report and templates
  • Determine formatting requirements to meet the needs of the facility and medical staff
  • Establish policies and procedures for signature requirements (residents, medical students, PAs, NPs, etc.)
  • Establish policies and procedures for auto-faxing if necessary
  • Establish policies and procedures for copy distribution (referring physician, attending physician, consultants)
  • Establish policies and procedures for e-signature
  • Establish policies and procedures for addenda
  • Manage system for medical transcription, monitoring work in progress
  • Sort transcribed reports as received from print source and place in patient record
  • Check that all required reports are present in patient record
  • Any missing reports flagged as a chart deficiency
  • Request the appropriate clinician to complete dictation
  • Approve
  • The bullets listed in current tasks and skills column are still relevant with the adoption of SRT

Health Information Management

(HIM) Continued

  • Train new users of dictation equipment
  • Query originator and resolve for clarification of any remaining missing or incorrect information (blanks, patient demographics, conflicting elements)
  • Most physicians understand generic use of dictation equipment. SRT will require more sophisticated training. HIM staff must be well versed in training methodology and trained on the system itself to effectively train physicians and other clinicians.
  • FESR: Report is complete and signed when physician completes report
  • BESR: No Change in process
  • BESR with third party editor: No change

Information Technology


  • Identify best hardware configuration to support software applications selected
  • Install speech recognition software
    • Macros (using content developed by clinicians and HIM)
    • Templates (using content developed by clinicians and HIM)
    • Components for partial dictation
    • Interface with information systems
  • Provide ongoing user support


Sherry Doggett
Julie A. Dooling, RHIA
Susan Lucci, RHIT, CHPS, CMT, AHDI-F


Cecelia A. Backman
Angie Comfort, RHIT, CDIP, CCS
Jennifer Gholson, RHIT
Mary Ellen Largess
Lori McNeil Tolley, M.Ed., RHIA
Angela Dinh Rose, MHA, RHIA, CHPS, FAHIMA
Marion V. Swaim, RHIA
Diana Warner, MS, RHIA, CHPS, FAHIMA
Lou Ann Wiedemann, MS, RHIA, CDIP, CPEHR, FAHIMA

The original version of this practice brief was developed by the following 2003 AHIMA e-HIM workgroup:

John Beats
Kathy Brouch, RHIA, CCS (staff)
Linda Bugdanowitz, RHIA, CHP
Mary Johnson, RHIT, CCS-P
Nancy Korn-Smith, RHIT
Susan Lucci, RHIT, CMT
Pamela Oachs, MA, RHIA
Sharon Rhodes, RHIT, CMT, CPC
Harry Rhodes, RHIA, CHP (staff)
Greg Schnitzer
David Sweet, MLS (staff)
Christine Taylor
Claudia Tessier
Joe Weber
Michelle Wieczorek, RN, RHIT
Julianne Weight

Article citation:
AHIMA. "Speech Recognition in the Electronic Health Record (2013 update)" Journal of AHIMA 84, no.9 (September 2013): expanded web version.