Вы находитесь на странице: 1из 8

2 LITERATURE SURVEY 2.

1Need Of System
Speech recognition also known as automatic speech recognition, computer speech recognition, speech to text, or just STT converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speakeras is the case for most desktop recognition software. Recognizing the speaker can simplify the task of translating speech.Speech recognition is a broader solution that refers to technology that can recognize speech without being targeted at single speaker such as a call system that can recognize arbitrary voices. Speech recognition applications include voice user interfaces such as voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), domotic appliance control, search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-totext processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice The voice recognition technologies market will grow at a compound annual growth rate (CAGR) of 8.8% between 2010 and 2015. The total market is valued at an estimated $38.4 billion in 2010 and is expected to reach $58.4 in 2015. 2.2 Market survey Voice recognition software technologies need hardware to transmit the signals as well as abate ambient noise. This sector of the market is worth an estimated $16.5 billion in 2010 and will grow at a 9.8% compound annual growth rate (CAGR) to reach $26.3 billion in 2015. Automatic speech recognition and text-to-speech software work together to voice-enable many applications. Software sales will increase at a compound annual growth rate (CAGR) of 6.8%, from a value of $13.6 billion in 2010 to a value of $18.9 billion in 2015. Find Solutions for Enterprises, SMBs & Service Providers at The Worlds Communications Conference, ITEXPO West. September 13-15, 2011 Austin, Texas

2.2.1 Introduction reasons for the study and its importance No longer narrowly associated with assistive and customer care applications, voice recognition technologies are becoming integral parts of products and services that span a much broader array of industries. With worldwide software revenues expected to reach $18.9 billion by 2015, this maturing industry owes much of its growth to advances from the critical triad of automatic speech recognition (ASR), text-to-speech (TTS), and speaker verification (SV) technologies.

America and Europe, voice recognition providers are partnering with manufacturers who are loading their products with voice-activated multimodal options. These applications do everything from help drivers navigate to their destination and workers voice-pick warehouse inventory to aid doctors automate medical transcription processes and allow Web users to browse by voice commands. Marketers with a watchful eye are not only training their sights on the pent-up product demand of growing Asia-Pacific populations, they are also factoring in the potential of the emerging middle class in Latin America when they develop their marketing strategies.

2.2.2Objectives of the study Companies in the voice recognition space are facing similar challenges as other technology markets. Converging technologies offer the promise of new products and markets. But they also invite disruptive activities inherent in mergers and acquisitions -- all occurring during the worst economic downturn of the new millennium.

Customers are applying the same measuring stick to voice-aided products and services as they do to other products: They value accuracy, speed, and efficiency. Whether obtaining stock quotes from their smart phone, getting wake-up calls from voice-enabled alarm clocks, or accessing voice-translated-email, consumers not only are increasing their expectations about the content quality but also about the quality of the experience.

Choosing voice recognition solutions represents a significant information technology (IT) investment -- a fact not lost on companies that, in better economic times, focus primarily on strategic growth. Compelled to keep discretionary spending to a minimum, companies are more inclined to purchase products and services that can show a quantitative return on investment. Traditionally, call centers, with their highly developed statistical databases and a multitude of speech-enabled processes, provide some of the most compelling evidence that properly integrated voice recognition applications can help companies realize cost savings of as much as 80%.

This report will analyze in depth, voice recognition technologies and the market and applications they serve. It addresses such questions as: Who is using these technologies? What benefits do they accrue from using them? At what price points do they buy them? Which markets will reap the most benefits from their adoption? Which issues must be addressed to generate a successful 2.3.2Scope of the report This report analyzes voice recognition technologies and their markets. It recognizes the fact that software and hardware technologies act in tandem, building the momentum needed for its success. Additionally, tracking the growth of traditional and emerging voice-enabled devices is important since these media will promote and extend voice recognition's reach.

An overview of the voice recognition industry precedes later chapters that review the main voice recognition categories, discuss top supplier market share, new technologies, and the unique challenges faced by each category in the future. Five-year forecasts follow, segmented by voice recognition categories as well as expansion into end markets.

Succeeding chapters discuss enabling technologies, corporate and national research and development funding, the organizational and economic makeup of the voice recognition industry, and the legislative, political, and environmental issues facing the industry. The changing dynamics of international market share also are addressed.

The appendices contain upcoming voice recognition industry-related conferences and recent patent grants, as well as a list of related mergers and acquisitions, licensing arrangements, and partnerships.

2.4 Comparision with other systems 2.4.1. Interactive voice response (IVR) is a technology that allows a computer to interact with humans through the use of voice and DTMF keypad inputs. In telecommunications, IVR allows customers to interact with a companys database via a telephone keypad or by speech recognition, after which they can service their own inquiries by following the IVR dialogue. IVR systems can respond with prerecorded or dynamically generated audio to further direct users on how to proceed. IVR applications can be used to control almost any function where the interface can be broken down into a series of simple interactions. IVR systems deployed in the network are sized to handle large call volumes. IVR technology is also being introduced into automobile systems for hands-free operation. Current deployment in automobiles revolves around satellite navigation, audio and mobile phone systems. It has become common in industries that have recently entered the telecommunications industry to refer to an automated attendant as an IVR. The terms, however, are distinct and mean different things to traditional telecommunications professionals, whereas emerging telephony and VoIP professionals often use the term IVR as a catch-all to signify any kind of telephony menu, even a basic automated attendant.[ The term voice response unit (VRU), is sometimes used as well. 2.4.2Voice Verification Voice biometrics works by digitizing a profile of a person's speech to produce a stored model voice print, or template. Biometric technology reduces each spoken word to segments composed of several dominant frequencies called formants. Each segment has several tones that can be captured in a digital format. The tones collectively identify the speaker's unique voice print. Voice prints are stored in databases in a manner similar to the storing of fingerprints or other biometric data. To ensure a good-quality voice sample, a person usually recites some sort of text or pass phrase, which can be either a verbal phrase or a series of numbers. The phrase may be repeated several times before the sample is analyzed and accepted as a template in the database. When a person speaks the assigned pass phrase, certain words are extracted and compared with the stored template for that individual. When a user attempts to gain access to the system, his or her pass phrase is compared with the previously stored voice model. Some voice recognition systems do not rely on a fixed set of enrolled pass phrases to verify a person's identity. Instead, these systems are trained to recognize similarities between the voice patterns of individuals when the persons speak unfamiliar phrases and the stored templates.

A person's speech is subject to change depending on health and emotional state. Matching a voice print requires that the person speak in the normal voice that was used when the template was created at enrollment. If the person suffers from a physical ailment, such as a cold, or is unusually excited or depressed, the voice sample submitted may be different from the template and will not match. Other factors also affect voice recognition results. Background noise and the quality of the input device (the microphone) can create additional challenges for voice recognition systems. If authentication is being attempted remotely over the telephone, the use of a cell phone instead of a landline can affect the accuracy of the results. Voice recognition systems may be vulnerable to replay attacks: if someone records the authorized user's phrase and replays it, that person may acquire the user's privileges. More sophisticated systems may use liveness testing to determine that a recording is not being used. Voice verification systems can be used to verify a person's claimed identity or to identify a particular person. It is often used where voice is the only available biometric identifier, such as over the telephone. Voice verification systems may require minimal hardware investment as most personal computers already contain a microphone. The downside to the technology is that, although advances have been made in recognizing the human voice, ambient temperature, stress, disease, medications, and other physical changes can negatively impact automated recognition. Voice verification systems are different from voice recognition systems although the two are often confused. Voice recognition is used to translate the spoken word into a specific response, while voice verification verifies the vocal characteristics against those associated with the enrolled user. The goal of voice recognition systems is simply to understand the spoken word, not to establish the identity of the speaker. A familiar example of voice recognition systems is that of an automated call center asking a user to "press the number one on his phone keypad or say the word 'one'." In this case, the system is not verifying the identity of the person who says the word "one"; it is merely checking that the word "one" was said instead of another option. 2.4.3Biometric Fingerprint Recognition Identification systems based on biometrics are capable of identifying persons on the basis either physical or behavioural characteristics. Currently, there are over ten different techniques

available to identify a person based on biometrics. The following techniques are applied within the main categories physical and behavioural characteristics: Behavioural characteristics keystrokes dynamics voice recognition signature dynamics Physical characteristics iris recognition retina recognition vein pattern recognition face recognition recognition of hand or finger geometry fingerprint recognition

Before a system is able to verify the specific biometrics of a person, it of course requires something to compare it with. Therefore, a profile or template containing the biometrical properties is stored in the system. Recording the characteristics of a person is called enrolment. In order to get a profile that corresponds most with reality, the biometrical. characteristics are scanned several times. In case of fingerprint recognition the finger is scanned three to four times to get a profile that is independent of variations that occur in practice, such as the angle of placement of the finger on the scanner. Since storage capacity for the profiles in these systems is usually limited (for example if used in combination with smart cards), it is common to use data compression before storing the profile. Storing profiles in tokens requires a combination of token and biometry for verification and therefore gives a higher level of security. When a biometrical verification is to occur, a scan of the biometrics of a person is made and compared with the characteristics that are stored in the profile. In general, a certain margin of error is allowed between the observed and stored characteristics. If this margin is too small, the system will reject a righteous person more often while if this margin is too large, malicious persons will be accepted by the system. The probabilities that a righteous person will be rejected and that a malicious person will be accepted, are called False Reject Rate (FRR) and False Accept Rate (FAR) respectively. When using a biometric system, one would of course want to minimise both rates, but unfortunately these are not independent. An optimum trade-off between

FRR and FAR has to be found with respect to the application.

2.5Types of Speech Recognition There are two types of speech recognition. One is called speaker-dependent and the other is speaker-independent. Speaker-dependent software is commonly used for dictation software, while speaker-independent software is more commonly found in telephone applications. Speaker-dependent software works by learning the unique characteristics of a single person's voice, in a way similar to voice recognition. New users must first "train" the software by speaking to it, so the computer can analyze how the person talks. This often means users have to read a few pages of text to the computer before they can use the speech recognition software. Speaker-independent software is designed to recognize anyone's voice, so no training is involved. This means it is the only real option for applications such as interactive voice response systems where businesses can't ask callers to read pages of text before using the system. The downside is that speaker-independent software is generally less accurate than speaker-dependent software. Speech recognition engines that are speaker independent generally deal with this fact by limiting the grammars they use. By using a smaller list of recognized words, the speech engine is more likely to correctly recognize what a speaker said. This makes speaker-independent software ideal for most IVR systems, and any application where a large number of people will be using the same system. Speaker dependent software is used more widely in dictation software, where only one person will use the system and there is a need for a large grammar. 2.6 Speech recognition based password enabled switching device The project aims in designing a system which is capable of switching ON/OFF the electrical devices based on the speech (command). This system creates a new era in the automation system. This system integrates human-machine interface. The modules in the project are: Speech recognition system which is capable of recognizing the speech command by the user. Switches Relay and Triac were connected to the electrical appliances that are to be controlled.

Speech is the primary and most convenient means of communication between humans. Whether due to technological curiosity to build machines that mimic humans or desire to automate work with machine, research in speech recognition as a first step towards human-machine communication. Speech recognition is the process of recognizing the spoken word to take necessary actions accordingly. The controlling device of the whole system is a Microcontroller. Speech recognition module along with Relay and Triac are interfaced to the Microcontroller. Whenever user speaks a command (already defined), the speech recognition module recognize it and feds this as input to Microcontroller. The Microcontroller processes this information and acts on the switches relay and Triac accordingly depending on the voice command. This system also provides a unique feature of enabling password. This feature provides security. The foremost thing the user needs to do is to spoke the voice command based password to activate the system. The Microcontroller is programmed in Embedded C language. We can use this project to reduce deaths due to current shocks in industries and also to on/off the electrical devices using Speech recognition module.

. .

Вам также может понравиться