MacSpeech

MacSpeech is a developer of speech recognition software for Macintosh computers. The company was started in 1996 when Founder and current CEO Andrew Taylor noticed that Macintosh seemed to be falling behind when it came to speech recognition.

The Vision of Speech Everywhere

MacSpeech believes speech should be pervasive on the Macintosh. This vision originates with the company's founder, CEO, and the Chief Architect of its products, Andrew Taylor. Andy’s mild demeanor belies the persistence of his vision and his talents as a programmer. He has designed and implemented computer software and hardware solutions for clients in communications, imaging, speech synthesis and recognition.

MacSpeech began in late 1996 when Andy noticed the Macintosh platform was beginning to fall behind Windows in the area of Speech Recognition. Andy had previously helped design and build the industry's first dictation system, Power Secretary, while he was employed by Articulate Systems.

Company Profile

MacSpeech's engineering team is rounded out by engineers who have worked on award-winning Macintosh products. Their collective experience includes some of the most extensive sound input experience on the Macintosh. Their engineers came from companies like STF Technologies, Dragon Systems, and Wang, among others.

MacSpeech's technical support and marketing teams include former Apple employees who are well versed in the needs of the Macintosh user. Their contributions round out the company’s commitment to provide the Macintosh community with a speech recognition experience that is distinctly Mac-like.

Why Speech Recognition Is Important

The earliest accounts of human achievement were not written down. They were passed on from generation to generation using the only mechanism available: the human voice. The earliest attempts to record history took the form of pictures representing important events. Pictures became symbols, which turned into characters used to represent language.

Along with language came the tools used to record it: pen and paper, the printing press, typewriters, and eventually, computers. Today's most powerful computers are saddled with a significant limitation imposed on the original typewriter out of necessity: the QWERTY keyboard. Attempts to provide alternate layouts have met with only marginal success. Only recently have computers become powerful enough to reliably translate spoken words and phrases to written text. Harnessing this power means freeing people from the physical tether that separates what we can say from what we can read.

Speech Is Pervasive

MacSpeech believes speech should be a natural extension of the computer's existing interface. The computer should be able to not only translate what is said into written text, but should also be able to discern verbal commands and perform the appropriate actions when asked.

Talk Where You Can Type

While MacSpeech believes speech is pervasive, recognizing it should not be intrusive. The hallmark of the Macintosh interface is that it does not get in the way of the user. MacSpeech thinks speech should work that way too. When you speak, the computer should listen without requiring you to work differently than if you used the keyboard or mouse.

This is the ultimate goal — to make technology invisible to the user.

Most people relate a computer's ability to translate speech to what they have seen on television and in movies. The ability for anyone anywhere to speak to a computer and have it respond appropriately is the ultimate goal of speech recognition. There are several obstacles that will remain for some time before this level of recognition is a reality.

Methodologies of Speech Recognition

There are several types of Speech Recognition. Each has its advantages and disadvantages. The computing power available to the average user is also a limiting factor when considering which type of speech recognition to employ.

Speaker Dependency

Speaker Dependency determines whether or not someone needs to train the computer to understand their voice in order to get the best results. Apple's English Speech Recognition (PlainTalk™) technology is an excellent example of a Speaker Independent speech recognition system. Using PlainTalk, virtually any user can achieve average to good results when issuing commands to the computer. All of the processing power used for speech recognition is dedicated to translating speech into commands regardless of the speaker. Unlike speaker dependent speech recognition, PlainTalk does not get any better at recognizing your speech with use. Instead, you adapt your speaking so that it understands you better.

The first product MacSpeech released was a PlainTalk enhancement called ListenDo! It allowed users to control a Macintosh using their voice. Since it used PlainTalk, it was speaker independent voice recognition.

A Speaker Dependent speech recognition system is better suited to translating speech to text since the system is tuned to one or more individual users. Speaker dependent systems require anyone using the system to “train” the software. In doing so, the software creates a voice profile for that user, essentially creating rules that are applied to recognition when that user speaks, thus improving accuracy. This method conserves the computer's resources to translating speech from one user at a time, but requires the user spend time training the computer before the best results can be obtained.

iListen, MacSpeech’s first dictation product is a speaker dependent product, based on a proprietary implementation of the FreeSpeech engine from Philips Speech Processing. Continuous vs. Discrete. Earlier dictation products required the user to speak in a stilted fashion, with… pauses… between… each… word. This was necessary because computers were not powerful enough to convert text as fast as the user could speak. While there are still situations in which discrete dictation is desirable, computers are now capable of translating speech to text nearly as fast as it can be spoken.

All of MacSpeech's general purpose dictation products feature continuous speech dictation.

Dealing with Noise

Another factor in achieving accurate speech recognition is the level of background noise present. In science fiction stories, the characters interact with the computer without first training it to recognize their voice, and regardless of how many explosions are occurring around them. Today’s computers, however can't tell the difference between a person speaking to it and the radio playing in the background. It is all sound to the computer.

Someday it should be possible for the computer to discern one person's voice from another, as well as filter out background noise. MacSpeech will always be among the first to take advantage of these technologies as they become available. For now, however, accurate speech recognition is aided considerably by a quiet room as well as the use of a noise canceling microphone.

MacSpeech requires any headsets it sells meet very strict quality requirements to insure the user has the best speech recognition experience possible. There are several good noise canceling headsets available for the Macintosh.

Who Will Use Speech Recognition?

MacSpeech believes people most likely to use speech recognition will fall into one of three major categories: General Use, Professional, and Disabled.

General Use

This is by far the largest group, including nearly everyone without a professional focus or a disability. Most users in this group are either small business people or consumers. For them, the problem solved by dictation is compensation for their lack of typing skills. Having the computer translate what they say into text for emails, book reports, letters, — even documents such as this one — will be great time savers. As long as the user has a relatively normal speaking voice, a basic dictation product will meet their needs.

A sub category of the general use market is those whose primary use of a Macintosh is for graphics or desktop publishing. Their interest in speech recognition will probably be focused on the command and control aspects of the technology. The ability of the human voice to act as a “third hand” for selecting tools or menu items is very appealing. MacSpeech's extensive command and control features are uniquely suited to this group of people.

Professional Markets

Doctors and lawyers have been using dictation for years. They are accustomed to speaking into a device and having an assistant transcribe the text later. Dictation software can assist in this process, freeing up the assistant’s time for more important tasks. The assistant would have the professional’s voice profile on their computer and make the appropriate corrections to the dictated text after loading it from a sound file or recording device. The individual disciplines within the legal and medical markets have their own sets of terms. Professional dictation software supports additional vocabularies containing the terms used by a particular discipline. These vocabularies typically sell for between $500 and $1000. MacSpeech hopes to provide professional vocabularies for its products at some point in the future.

Helping the Handicapped

MacSpeech has identified three sub-categories in the disabled market that will benefit from speech recognition: mobility impaired, visually impaired, and speech impaired.

Mobility Impaired

Anyone who has trouble typing due to a physical condition can be considered mobility impaired. This includes everyone from Carpal Tunnel Syndrome sufferers to quadriplegics — or any group distinguished by their need to control the computer by voice in addition to having it translate speech to text.

iListen allows users to control virtually every aspect of their computer’s interface using their voice. Any user (whether mobility impaired or not) will benefit from MacSpeech products that not only allow the user to translate their speech to text, but also respond to their verbal commands.

Visually Impaired

Those with visual disabilities need the computer to provide comprehensive aural feedback, since they cannot see how accurately the computer is translating their speech.

Conventional spell checkers are not helpful since speech recognition software never misspells a word (it simply recognizes the wrong word). Macintosh computers already have the ability to read back text. Speech recognition for the visually impaired should contain special commands for selecting and navigating text, as well as the ability to read choices for correction. Verbal commands should also provide audio feedback to inform the user that the command has been executed.

MacSpeech is keenly aware of these issues. In fact, one of our senior engineers is himself visually impaired, and the Technical Support Manager has is mobility disabled. Future versions of MacSpeech software will contain command and control, and dictation features that are optimized for those with visual disabilities.

Speech Impaired

One of the byproducts of speaker dependent speech recognition is that the user must create a profile of their voice to insure the highest degree of accuracy. MacSpeech believes it will be possible to achieve better recognition for those with light to moderate speech impediments by optimizing the training process for those users. Eventually it may also be possible to address the needs of those with severe speech disabilities, but that is beyond the state of the technology today.

Mac-Only for Mac Users

MacSpeech is a Mac-only company. The software they produce is exclusively for the use of Macintosh owners who are also interested in one or more benefits provided by speech recognition. Being Mac-only means their engineers are unfettered by a corporate requirement to maintain a common code-base across platforms. Since they are uniquely Macintosh, we can take advantage of all the Macintosh has to offer.

Their goal is “speech everywhere” on the Macintosh, for every user.

Controlling The Mac With Your Voice

While the ability to have the computer convert speech to text has been the “Holy Grail” of speech recognition software for years. To say the least, it is exhilarating to have words you just uttered appear, as if by magic, on the computer’s screen.

But the thrill disappears quickly if all you can do is dictate. A truly robust implementation of speech recognition must include having the computer respond to commands as well as text.

This “third hand” is an invaluable tool for a variety of users.

Most of us are familiar with this famous phrase, uttered by the doomed astronaut in Stanley Kubrick’s 2001: A Space Odyssey. This is an almost perfect example of controlling a computer using speech. Almost perfect because, as we know, Hal adamantly refused to comply.

Fortunately, today’s computers are not endowed with sufficient intelligence to refuse the instructions they are given. They will blindly carry them out regardless of the outcome. It is the software developer’s responsibility therefore, to insure the computer responds as the user expects. This is especially important when creating software that is designed to respond to one or more actions based on spoken commands.

The Macintosh Advantage

The Macintosh is well known for having an interface that does not get in the user’s way. Providing a developer follows Apple’s User Interface Guidelines, menus, buttons, and keyboard commands all act in a predictable way. This consistency makes it easier to write software for manipulating Macintosh interface elements using speech. In addition, the Macintosh has a built-in command and control language, called AppleScript that allows compliant applications a degree of control unobtainable on other platforms.

Controlling The Macintosh Interface With Speech

MacSpeech’s proprietary technology adds the ability to execute menu items, click and double-click the mouse, push buttons, and create text macros that can type up to 32,000 characters with one spoken command.

This is just one example of how MacSpeech optimizes the speech interface on the Macintosh to provide a consistent and accurate user experience.

AppleScript and Speech

AppleScript is a unique Macintosh advantage. It allows users to automate repetitive or complex tasks using an English-like programming language. These small programs, called scripts, give one or more applications on the Macintosh instructions much the same way a movie script tells actors what to do. In order to respond to a script, the target application must be AppleScript compliant. That is, the software developer must include support for AppleScript in their application. The extent to which a script can interoperate with a program is dependent on how well the program’s authors have implemented support for AppleScript.

With MacSpeech technology, any AppleScript can be triggered with a single voice command. MacSpeech also allows any of the interface items mentioned above to be incorporated into a script. What can be done with a spoken phrase is limited only by how well scripting is implemented in the target application(s) and the user’s imagination.

ScriptPaks™ — Enhancing the User Experience

Not everyone has the time or ability to create their own scripts. For this reason, MacSpeech has designed its products to incorporate plug-ins, called ScriptPaks™. Once installed, a ScriptPak adds a series of scripts to an individual application’s vocabulary file or command set. Each of these scripts can be activated by one spoken command. This extensibility means that the user can purchase a ScriptPak for those applications they use most often, thus increasing the functionality of their MacSpeech product.

One of the ScriptPaks released in 2004 was the MouseAnywhere™ ScriptPak. This innovative add-in for iListen allows uses to completely control the on-screen cursor with their voice.

Who Will Use Speech Control?

While there are certainly times when everyone could benefit from having their Macintosh respond to verbal commands, there are two groups where this ability is particularly welcome: graphic artists and mobility impaired individuals.

Using the Voice as a Third Hand

Anyone who seems to run out of hands when using their Macintosh can benefit from speech control. Graphic artists typically have one hand on the keyboard and one on a drawing tool such as a stylus. The ability to switch tools in a graphics program without having to move their hands or take their eyes of the screen will be a tremendous time saver.

Speech Control for the Mobility Impaired

Speech control is particularly well suited to those with a mobility impairment. Using MacSpeech’s command and control technologies, individuals with paralysis can manipulate the Macintosh interface almost as fast as they could when using the mouse or command key shortcuts. Using Speech Control and Dictation Together We realize there are many people who want the benefits of both dictation and speech control. MacSpeech technology seemlessly integrates these two modes, allowing the user to move between dictating text and issuing commands simply by speaking a command.