Вы находитесь на странице: 1из 573

comp.speech WWW site

comp.speech WWW site Welcome to the comp.speech Frequently Asked Questions WWW site. This site provides a

Welcome to the comp.speech Frequently Asked Questions WWW site. This site provides a range of information on speech technology, including speech synthesis, speech recognition, speech coding, and related material. The information is regularly posted to the comp.speech newsgroup as the "comp.speech FAQ" posting. This site is mirrored at several other WWW sites around the world (Australia, UK, Japan and USA) and the information is also available in a plain text format.

There are 250 comp.speech WWW pages and they include over 500 hyperlinks to speech technology web sites, ftp servers, mailing lists, and newsgroups.

Contents

SpeechLinks: Speech Technology Hyperlinks Pages Speech Technology Hyperlinks Pages

Table Of Contents List Of Software/Hardware/Resources Update Times Availability Odds 'n Ends
List Of Software/Hardware/Resources Table Of Contents Update Times Availability Odds 'n Ends
Update Times Table Of Contents List Of Software/Hardware/Resources Availability Odds 'n Ends
Availability Table Of Contents List Of Software/Hardware/Resources Update Times Odds 'n Ends
Odds 'n EndsTable Of Contents List Of Software/Hardware/Resources Update Times Availability

FAQ Section 1: General Information on Speech Technology FAQ Section 2: Signal Processing for Speech FAQ Section 3: Speech Coding and Compression FAQ
FAQ Section 2: Signal Processing for Speech FAQ Section 1: General Information on Speech Technology FAQ Section 3: Speech Coding and Compression FAQ
FAQ Section 3: Speech Coding and Compression Technology FAQ Section 2: Signal Processing for Speech FAQ Section 4: Natural Language Processing FAQ Section
FAQ Section 4: Natural Language Processing for Speech FAQ Section 3: Speech Coding and Compression FAQ Section 5: Speech Synthesis FAQ Section
FAQ Section 5: Speech Synthesis FAQ Section 3: Speech Coding and Compression FAQ Section 4: Natural Language Processing FAQ Section 6:
FAQ Section 6: Speech RecognitionFAQ Section 3: Speech Coding and Compression FAQ Section 4: Natural Language Processing FAQ Section 5:

comp.speech WWW site

Comp.Speech FTP Site

The comp.speech ftp site is an excellent repository of speech technology information, software and resources. It contains the following (see Question 1.2 for more detail):

Admin

Minor changes each month. Thanks to all the companies and individuals who send in information.

Acknowledgements

Hundreds of people and companies have made contributions to the comp.speech FAQ over the last few years - too many to name individually. Special thanks go to Tony Robinson and Kevin Lenzo who have provided a wide range of information and assistance. Tony Robinson also maintains the comp.speech ftp site which is an excellent resource for all people working with speech technology. I am grateful to the people at Sydney University, Cambridge University, ATR ITL and CMU for supporting the FAQ on their WWW sites.

Disclaimer

The comp.speech FAQ and WWW pages are provided as is without any express or implied warranties. While every effort has been taken to ensure the accuracy of the information presented here, the author assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. The comp.speech FAQ and WWW pages should not be construed as representing the views or products of my employer, Sun Microsystems, Inc.

Copyright and Reproduction

Copyright (c) 1993-6 by Andrew Hunt, all rights reserved. The comp.speech WWW pages may not be distributed for financial gain and may not be included in any collections or compilations without express permission from the author.

http://mi.eng.cam.ac.uk/comp.speech/ (2 of 3) [10/31/2003 8:41:02 AM]

comp.speech WWW site

You may make links to the documents, but you may not make copies without permission of the author. Note: hyperlinks to the comp.speech WWW pages are encouraged.

Maintainer

The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by

Andrew Hunt Speech Applications Group, Sun Microsystems Laboratories Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA Ph: (508) 442 2681 andrew.hunt@east.sun.com

Last Revision: 18:40 05-Sep-1997

http://mi.eng.cam.ac.uk/comp.speech/ (3 of 3) [10/31/2003 8:41:02 AM]

comp.speech FAQ/WWW Availability

Availability comp.speech FAQ/WWW

The comp.speech FAQ is available in two forms: text for posting to newsgroup and availability by ftp, and HTML for the WWW. The original was the text version, and since September 1994 both WWW and text versions have been supported. The WWW version is now the master version.

WWW Availability

The WWW version of the comp.speech FAQ is mirrored at a number of web sites.

Text Version on the WWW

The three parts of the text version are available on this WWW site:

Part 1 - General informationparts of the text version are available on this WWW site: Part 2 - Signal Processing,

Part 2 - Signal Processing, Coding and NLPare available on this WWW site: Part 1 - General information Part 3 - Speech Synthesis

Part 3 - Speech Synthesis and Speech Recognitioninformation Part 2 - Signal Processing, Coding and NLP Text by Anonymous ftp The text version

Text by Anonymous ftp

The text version is available by anonymous ftp from:

comp.speech FAQ/WWW Availability

Text by email

Finally, the text version can be obtained by sending email to mail-server@rtfm.mit.edu with the following line in the body of the message:

with the following line in the body of the message: send usenet/news.answers/comp-speech-faq/* Back to the

send usenet/news.answers/comp-speech-faq/*

Back to the comp.speech FAQ Home Page Jump to Section 1, Section 2, Section 3, Section 4, Section 5, Section 6 Jump to SpeechLinks, Contents, Package List, Update Times

Administrivia, Copyright, Submit Information : Last Revision: 03:10 01-Apr-1996

http://mi.eng.cam.ac.uk/comp.speech/availability.html (2 of 2) [10/31/2003 8:41:05 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Newsgroups: comp.speech,comp.answers,news.answers Subject: comp.speech Frequently Asked Questions - part 1/3 From: andrew.hunt@east.sun.com (Andrew Hunt) Reply-To: andrew.hunt@east.sun.com (Andrew Hunt) Followup-To: comp.speech Organization: Speech Applications Group, Sun Microsystems Laboratories Summary: Information on Speech Technology Approved: news-answers-request@MIT.Edu

Archive-name: comp-speech-faq/part1 Last-modified: 1997/09/06 URL: http://www.speech.su.oz.au/comp.speech/

COMP.SPEECH FAQ POSTING - PART 1/3

[Note: this document has been automatically extracted from a WWW site:

http://www.speech.su.oz.au/comp.speech/ This may introduce some formatting errors.]

Comp.Speech Frequently Asked Questions

The Frequently Asked Questions (FAQ) is a regular posting to comp.speech which attempts to answer some of the regular questions in the comp.speech newsgroup. It covers speech synthesis, speech recognition, speech coding and a range of related material. It contains lists of speech technology software and hardware, including commerical products, public domain and freeware software, plus it contains over 500 links to speech technology sites and software.

The FAQ is not meant to discuss any topic exhaustively. It will hopefully provide readers with pointers on where to find useful information, especially material available on the Internet.

If you have not already read the Usenet introductory material posted to news.announce.newusers, please do. For help with FTP (file transfer protocol) look for a regular posting of anonymous FTP FAQ in comp.misc, comp.archives.admin or news.answers.

This FAQ is posted every 4 weeks to comp.speech, comp.answers and news.answers.

It is also available on the World Wide Web:

* Australia: http://www.speech.su.oz.au/comp.speech/

* Britain: http://svr-www.eng.cam.ac.uk/comp.speech/

* Japan: http://www.itl.atr.co.jp/comp.speech/

* USA: http://www.speech.cs.cmu.edu/comp.speech/

Or by anonymous ftp from the comp.speech archive site:

* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete

Or from the news.answers ftp site (and its mirrors):

* ftp://rtfm.mit.edu/pub/usenet/comp.speech/*

Or by sending email to mail-server@rtfm.mit.edu with the following line in the body of the message:

* send usenet/news.answers/comp-speech-faq/*

If you only have email access to the internet, then I suggest you obtain the Internet-by-email guide. Send email to mail-server@rtfm.mit.edu with the following line in the body of the message:

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (1 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* send usenet/news.answers/internet-services/access-via-email

Admin

Minor changes each month. Thanks to all the companies and individuals who send in information.

Acknowledgements

Hundreds of people and companies have made contributions to the comp.speech FAQ over the last few years - too many to name individually. Special thanks go to Tony Robinson and Kevin Lenzo who have provided a wide range of information and assistance. Tony Robinson also maintains the comp.speech ftp site which is an excellent resource for all people working with speech technology. I am grateful to the people at Sydney University, Cambridge University, ATR ITL and CMU for supporting the FAQ on their WWW sites.

Disclaimer

The comp.speech FAQ and WWW pages are provided as is without any express or implied warranties. While every effort has been taken to ensure the accuracy of the information presented here, the author assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. The comp.speech FAQ and WWW pages should not be construed as representing the views or products of my employer, Sun Microsystems, Inc.

Copyright and Reproduction

Copyright (c) 1994-6 by Andrew Hunt, all rights reserved. The comp.speech FAQ posting may not be distributed for financial gain.

The comp.speech FAQ posting may not be included in any collections or compilations without express permission from the author. The comp.speech FAQ posting may be posted to any USENET newsgroup, on-line service, or BBS as long as it is posted in its entirety with this copyright statement, and that a current version is always maintained. [Note: hyperlinks to the comp.speech WWW pages are encouraged.]

Maintainer

The FAQ posting and the Comp.Speech WWW Site are maintained on a volunteer basis by

Andrew Hunt Speech Applications Group, Sun Microsystems Laboratories Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA Ph: (508) 442 2681 Fax: (508) 250 5067 andrew.hunt@east.sun.com

comp.speech FAQ

Table of Contents

+ SpeechLinks: Speech Technology Hyperlinks Pages

* SpeechLinks: 500+ Speech Technology Links

* SpeechLinks: General Speech Technology Links

* SpeechLinks: Signal Processing for Speech

* SpeechLinks: Speech Coding

* SpeechLinks: Speech Synthesis

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (2 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* SpeechLinks: Speech Recognition

+ List Of Software/Hardware

+ Update Times

+ Availability

+ Odds 'n Ends

+ FAQ Section 1: General Information on Speech Technology

* SpeechLinks: General

* Q1.1: What is comp.speech?

* Q1.2: comp.speech ftp site

* Q1.3: Common abbreviations and jargon

* Q1.4: Related newsgroups and mailing lists

* Q1.5: Associations, publications and conferences

* Q1.6: Handicap Aids

* Q1.7: Speech Databases

* Q1.8: Speech File Formats and Conversion

* Q1.9: Speech Laboratory Environments and Audio Editors

* Q1.10: Speech Research Sites

* Q1.11: Miscellaneous Software and Resources

+ FAQ Section 2: Signal Processing

* SpeechLinks: Signal Processing for Speech

* Q2.1: What sampling do I need for speech?

* Q2.2: Finding the pitch of a speech signal

* Q2.3: How do I find the start and end points of a speech signal?

* Q2.4: Where can I find FFT software?

* Q2.5: Signal processing in speech technology

* Q2.6: Speech sampling and signal processing hardware

* Q2.7: How do I convert to/from mu-law format?

* Q2.8: Signal Processing Software

+ FAQ Section 3: Speech Coding and Compression

* SpeechLinks: Speech Coding

* Q3.1: Speech compression techniques

* Q3.2: Information on speech coding and compression

* Q3.3: Speech Compression / Coding Software

+ FAQ Section 4: Natural Language Processing

* Q4.1: NLP References and Books

* Q4.2: NLP Software

+ FAQ Section 5: Speech Synthesis

* SpeechLinks: Speech Synthesis

* Q5.1: What is speech synthesis?

* Q5.2: How can speech synthesis be performed?

* Q5.3: References/Books on Synthesis

* Q5.4: Speech Synthesis on the WWW

* Q5.5: Speech Synthesis Software/Hardware

+ FAQ Section 6: Speech Recognition

* SpeechLinks: Speech Recognition

* Q6.1: What is speech recognition?

* Q6.2: How is speech recognition performed?

* Q6.3: How can I build a simple speech recogniser?

* Q6.4: References & books on speech recognition

* Q6.5: Speech Recognition Hardware/Software

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (3 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Q6.6: Speaker Recognition (Verification and Identification)

* Q6.7: Integrated Speech Products

List of Software/Hardware/Information

The comp.speech FAQ provides information on a range of software, hardware and resources.

Q1.6: Handicap Aids

* Man-Machine Interfacing

* SpeechViewer II

Q1.7: Speech Data

* Bavarian Archive for Speech Signals

* BUPT Spoken Digit Database (Chinese)

* Center for Spoken Language Understanding (CSLU)

* Examples of IPA Symbols

* Linguistic Data Consortium (LDC)

* NOISEX

* Oxford Acoustic Phonetic Database

* Phonemic Samples

* RELATOR project

* ShATR

* University of Victoria Phonetic Database

Q1.9: Speech Processing Environments

* CSRE: Computerized Speech Research Environment

* DADiSP from DSP Development Corporation

* Entropic Signal Processing System (ESPS) and Waves

* GoldWave

* Kay Elemetrics Computer Speech Lab

* Khoros

* Matlab plus Signal Processing Toolbox

* MacSpeech Lab II

* N!Power

* OGI Speech Tools

* Ptolemy

* Quadravox Speech Processing Products - Qbox

* Speech Filing System (SFS)

* Signalyze 3.0 from InfoSignal

* SoundScope

Q1.11: Miscelaneous Software and Resources

Speech Application Interfaces

* ASAPI: Advanced Speech API (AT&T)

* SAPI: Microsoft Windows Speech API

* SRAPI: Speech Recognition API

* TAPI: Microsoft Windows Telephony API

Network "Phone" Software

* CUSeeMe

* CyberPhone

* DigiPhone

* InterFACE from Hijinx

* FAQ: How can I use the Internet as a telephone?

* Nautilus: Secure Computer Telephony

* NEVOT (1.4v) from AT&T BL

* PGPfone

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (4 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Speak Freely

* Internet Phone from VocalTec

* WebPhone

* WebTalk

Audio Processing Software

* AF version AF3R1

* Voice E-Mail from Bonzi Software

* MicNotePad Recording Software for Macs

* MixViews

* Network Audio System Release 1.1

* NIST Software - SPHERE and SCORE

* Sound Processing Kit

* TCPplay

Human Audio Perception

* Auditory Modeller 1

* Auditory Modeller 2

* Auditory Toolbox for Matlab

* Human Audio Perception Document

Dictionaries and other Lexical Tools

* BEEP dictionary

* CMU dictionary

* CUVOLAD dictionary (Oxford Dictionary)

* Comprehensive Word List

* EAT: Edinburgh Associative Thesaurus

* Homophone List

* Moby Lexical Resources

* MRC Psycholinguistic Database

* WordNet

* Dictionaries on the WWW

Phonetic Fonts and Phonetic Samples

* International Phonetic Alphabet

* WWW: Phonetic Fonts and Examples Online

* Summer Institute of Linguistics IPA Fonts

* Phonetic Fonts for TeX and LaTeX

* Yamada Language Center

Very Miscellaneous Software

* The vOICe

* The Learning Company's Language Training

* Wildfire - an Electronic Assistant

Q2.6: Audio Hardware

* Macintosh Audio Hardware

* PC Audio Hardware

* Unix Audio Hardware

Q2.8: Signal Processing Software

* SigLib from Numerix Ltd.

Q3.3: Compression Software and Hardware

* 32 kbps ADPCM

* Castleton Network Systems - G.729 Voice Coder

* CELP 3.2a & LPC-10

* 8 Kbit/s CELP on the TMS320C5x family of DSP chips

* CyberVoice

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (5 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Rockwell's DigiTalk

* File format conversion

* G.711/721/723 Compression

* G.728 LD-CELP vocoder

* G.728 Compression

* GSM 06.10 Compression

* Lernout & Hauspie Speech Coding (5 products)

* Lernout & Hauspie Speech Coding SDK

* MPEG Audio

* shorten - a lossless compressor for speech signals

* Sipro Lab Telecom Inc. Coding

* Sonarc: Digital Audio Compression

* StarAudio Compressor/Player

* TrueSpeech from DSP Group

* U.S.F.S. 1016 CELP vocoder for DSP56001

* ToolVox from Voxware

Q4.2: Natural Language Processing

* Natural Language Software Registry (NLSR) - NLP Tools

* Part of Speech Tagger

Q5.5: Speech Synthesis

_Apple Macintosh_

* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

* Infovox Product Range

* Macintosh Speech Output Applications

* Macintosh Speech Synthesis Manager

* MacYack Pro

* MBROLA: Free Speech Synthesis Project

* ProVoice Developer's Speech Toolkit from First Byte

* SENSYN speech synthesizer

* Sound Bytes DeveloperUs Kit

* Macintosh Speech Synthesis Manager

_Windows (including 95, NT, 3.1)_

* AcuVoice

* AT&T Watson Speech Synthesis

* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

* Creative TextAssist and TextAssist API

* DECtalk: Text-to-Speech from Digital

* ETI-Eloquence

* HADIFIX

* Infovox Product Range

* IPOX: All Prosodic Speech Synthesis Architecture

* Lernout and Hauspie Text-To-Speech Windows SDK

* Listen2 Text Reader

* MBROLA: Free Speech Synthesis Project

* Monologue for Windows from First Byte

* PAM - A Text-To-Speech Application

* ProVerbe Speech Engine from ELAN Informatique

* ProVoice Developer's Speech Toolkit from First Byte

* SENSYN speech synthesizer

* Sound Bytes DeveloperUs Kit

* Tinytalk

* TruVoice from Centigram

* WinSpeech

* ZMD Speech Synthesis

_DOS_

* CSRE: Computerized Speech Research Environment

* Infovox Product Range

* MBROLA: Free Speech Synthesis Project

* ProVoice Developer's Speech Toolkit from First Byte

* SENSYN speech synthesizer

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (6 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* spchsyn.exe

* Tinytalk

* ZMD Speech Synthesis

_OS/2_

 

* ProVerbe Speech Engine from ELAN Informatique

* ProVoice Developer's Speech Toolkit from First Byte

* Sound Bytes DeveloperUs Kit

_Unix_

* AcuVoice

* AsTeR

* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

* DECtalk: Text-to-Speech from Digital

* ETI-Eloquence

* Emacspeak - A Speech Output Subsystem For Emacs

* Festival Speech Synthesis System

* JSRU

* Klatt-style synthesiser

* KPE80 - A Klatt Synthesiser and Parameter Editor

* "learph": Trainable text-to-phoneme software by Antonio Lucca

* MBROLA: Free Speech Synthesis Project

* Orator from Bellcore

* ProVerbe Speech Engine from ELAN Informatique

* rsynth

* SENSYN speech synthesizer

* SGI Developers Toolbox Synthesiser

* Speak

* TrueTalk

* TruVoice from Centigram

_Integrated Circuits and Dedicated Hardware_

* Eurovocs

* Infovox Product Range

* ProVerbe Speech Engine from ELAN Informatique

* RC Systems V8600/V8601 Text to Speech synthesizers

_Other Platforms_

* BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

* TheBigMouth (NeXT)

* MBROLA: Free Speech Synthesis Project

* Narrator Translator Library (Amiga)

* Narrator (Amiga)

* TextToSpeech Kit (NeXT)

* Orator from Bellcore

* SENSYN speech synthesizer

* WreadFiles: File reader for Commodore Amiga

_Unknown_

* Lernout and Hauspie Text-To-Speech (3 products)

* Lucent Technologies Bell Labs Text-to-Speech system

* SIMTEL

* Text to Phoneme Program 1

* Text to phoneme program 2

* Text to phoneme program 3

Q6.5: Speech Recognition

_Apple Macintosh_

* Digital Dreams Speech Recognition Plug-Ins

* Dragon Dictation Products

* Macintosh Speech Recognition Manager

* PowerSecretary

_Windows (including 95, NT, 3.1)_

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (7 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* AT&T Watson Speech Recognition

* Cambridge Voice for Windows

* CustomVoice and CustomTelephone: A&G Graphics Interface Inc.

* DragonDictate for Windows

* Dragon Dictation Products

* Dragon Developer Tools

* Ficomp Interpreter 6000

* IBM VoiceType Dictation and Control

* IN CUBE

* Kurzweil Speech Recognition (2 products)

* Lernout & Hauspie ASR SDK

* Listen for Windows 2.0 from Verbex Voice Systems

* Microsoft Speech Recognition

* NCC Dictate

* Phonetic Engine 500 (PE500) from Speech Systems, Inc.

* Philips Speech Recognition (2 products)

* ProNotes Voice Tools

* PureSpeech

* smARTspeak from Advanced Recognition Technologies, Inc.

* Visual Voice from Stylus Innovation

* VoiceAssist for Windows from Creative Labs, Inc.

* VoiceServer for Windows

* Whisper

* WildCard Speech Products

_DOS_

 

* DATAVOX - French

* Dragon Developer Tools

* Ficomp Interpreter 6000

* Jialong He's Speech Recognition Research Tool

* smARTspeak from Advanced Recognition Technologies, Inc.

* Votan VPC2100 Voice Card and VSP 1010 Speech Processor

_OS/2_

 

* IBM VoiceType Dictation and Control

_Unix_

* AbbotDemo

* BBN Hark Telephony Recognizer

* EARS: Single Word Recognition Package

* Ficomp Interpreter 6000

* Hidden Markov Model Toolkit (HTK) from Entropic

* IN CUBE

* Jialong He's Speech Recognition Research Tool

* Lotec Speech Recognition Package

* Myers' Hidden Markov Model software

* NICO Artificial Neural Network Toolkit

* Nuance Speech Recognition System

* PureSpeech

* recnet

_Integrated Circuits and Dedicated Hardware_

* HM2007 - Speech Recognition Chip

* OKI VRP6679 - Speech Recognition Chip

* Sensory Inc. Integrated Circuits

* Speech Commander - Verbex Voice Systems

* Voice Control Systems Recognition

* VCS 2030 & 2060 Voice Dialer

_Other Platforms_

* Simon Says (NeXT)

* Voice Command Line Interface (Amiga)

* Visus SpeechKit

_Unknown_

* Berkeley Restaurant Project (BeRP)

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (8 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Lernout & Hauspie ASR (3 products)

* Voice-Trek 2.0

* Voicetek Corp.

* Voice Processing Corporation Speech Recognition Product Line

Q6.6: Speaker Verification and Identification

* ImagineNation: Voice Activated UnLock Technology

* Jialong He's Speaker Recognition (Identification) Tool

* Keyware Biometric Security Products

* SpeakerKey Voice Verifier from ITT

* SpeakEZ Voice Print Speaker Verification

* Voice Control Systems: Speaker Verification Technology

Q6.7: Integrated Speech Products

* SpeechWorksfrom Applied Language Technologies, Inc.

* Nortel Speech Technology Products

General Speech Technology

comp.speech FAQ Section 1

* SpeechLinks: General

* Q1.1: What is comp.speech?

* Q1.2: comp.speech ftp site

* Q1.3: Common abbreviations and jargon

* Q1.4: Related newsgroups and mailing lists

* Q1.5: Associations, publications and conferences

* Q1.6: Handicap Aids

* Q1.7: Speech Databases

* Q1.8: Speech File Formats and Conversion

* Q1.9: Speech Laboratory Environments and Audio Editors

* Q1.10: Speech Research Sites

* Q1.11: Miscellaneous Software and Resources

Q1.1: What is comp.speech?

Comp.speech is an unmoderated newsgroup for discussion of speech technology and speech science. It covers a wide range of issues from the application of speech technology, to research, to products and lots more. By its nature, speech technology is an inter-disciplinary field and the newsgroup reflects this. However, computer application is the basic theme of the group.

Note: If you don't know what a newsgroup is, then talk to your local system administration about how to get access. A useful newsgroups for beginners is news.announce.newusers. You might also find the following documents useful.

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Us

enet?

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to

_Frequently_Asked_Questions_about_Usenet

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_

posting_to_Usenet

ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about

_FAQs

The following is a list of some of the topics covered by comp.speech.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (9 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Speech Recognition - discussion of methodologies, training, techniques, results and applications. This should cover the application of techniques including HMMs, neural-nets and so on to the field.

* Speech Synthesis - discussion concerning theoretical and practical issues associated with the design of speech synthesis systems.

* Speech Coding and Compression - both research and application matters.

* Phonetic/Linguistic Issues - coverage of linguistic and phonetic issues which are relevant to speech technology applications. Could cover parsing, natural language processing, phonology and prosodic work.

* Speech System Design - issues relating to the application of speech technology to real-world problems. Includes the design of user interfaces, the building of real-time systems and so on.

* Other matters - relevant conferences, jobs, books, software, hardware, and products.

Q1.2: comp.speech ftp site

Tony Robinson maintains the comp.speech ftp site. The ftp site is a comprehensive repository of software and information related to speech technology. The site is

* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/

Comp.speech Archives

The comp.speech ftp site provides full archives of the comp.speech newsgroup dating back to the creation of the group in 1991. The postings are stored in the order in which they arrive. Batches of 1000 articles are grouped into gzip'ed tar file. Matching files listing the subjects are also provided.

* ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/

Software and Other Resources

The comp.speech ftp site includes a wide range of useful software and resources. Tony has arranged it into a series of sub-directories:

/analysis : Speech analysis software FFT code, a pitch tracker, RASTA code, and IEEE DSP code.

/auditory : Auditory model software AIM, Auditory Toolbox and Lutear.

/coding : Speech coding software ADPCM, CELP 3.2a, G711, G721, G723, GSM, LDCELP, LPC10, Shorten.

/data : Repository for (small) speech-related databases BEEP, CMUDict, Homophone list, hVd database, Peterson Barney database

/dictionaries : Phonetic dictionaries BEEP, CMUDict, CUVOALD, Homophone list, MRC database

/info : Key postings to comp.speech archives by subject Lots of interesting info!

/recognition : Speech recognition software AbbotDemo, Ears, Lotec, recnet, sound blaster recognition, whistle

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (10 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

/simtel_sound : Mirror of the simtel/msdos/sound directory Range of useful software

/simtel_voice : Mirror of the simtel/msdos/voice directory Another range of useful software

/synthesis : Speech synthesis software Klatt synthesis software, Klatt parameter editor and rsynth.

/tools : Miscelaneous tools Part-of-speech tagger, OGI speech tools, sox audio file format conversion, SPHERE software and more.

Q1.3: Common abbreviations and jargon.

* ANN - Artificial Neural Network.

* ASR - Automatic Speech Recognition.

* ASSP - Acoustics Speech and Signal Processing

* AVIOS - American Voice I/O Society

* CELP - Code-book Excited Linear Prediction.

* COLING - COmputational LINGuistics

* DTW - Dynamic Time Warping.

* FAQ - Frequently Asked Questions.

* HMM - Hidden Markov Model.

* IEEE - Institute of Electrical and Electronics Engineers

* JASA - Journal of the Acoustic Society of America

* LPC - Linear Predictive Coding.

* LVQ - Learned Vector Quantisation.

* MFCC - Mel Frequency Cepstral Coefficients

* NLP - Natural Language Processing.

* NN - Neural Network.

* TIMIT - A speech corpus with phoneme labels - see Q1.7

* TTS - Text-To-Speech (i.e. speech synthesis).

* VQ - Vector Quantisation.

Q1.4: Related newsgroups and mailing lists.

Newsgroups

comp.ai - Artificial Intelligence newsgroup. Postings on general AI issues, language processing and AI techniques. The comp.ai FAQ covers NLP, NN and other AI information.

comp.ai.nat-lang - Natural Language Processing Group Postings regarding Natural Language Processing. Set up to cover a broard range of related issues and different viewpoints. A comp.ai.nat-lang FAQ posting is available.

comp.ai.nlang-know-rep - Natural Language Knowledge Representation Moderated group.

comp.ai.neural-nets - discussion of Neural Networks and related issues. There are often posting on speech related matters - phonetic recognition, connectionist grammars and so on. A comp.ai.neural-nets FAQ posting is available.

comp.compression - occasional articles on compression of speech. The comp.compression FAQ has some info on audio compression standards.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (11 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

comp.dcom.telecom - Telecommunications newsgroup. Has occasional articles on voice products.

comp.dsp - discussion of signal processing - hardware and algorithms and more. Has a good FAQ posting which is also available on the WWW and by ftp (addresses below). Has a regular posting of a comprehensive list of Audio File Formats.

+ http://www.bdti.com/faq/dsp_faq.htm

+ ftp://rtfm.mit.edu/pub/usenet/comp.dsp/

comp.multimedia - Multi-Media discussion group. Has occasional articles on voice I/O.

sci.lang - Language. Discussion about phonetics, phonology, grammar, etymology and lots more. A sci.lang FAQ is available.

alt.sci.physics.acoustics Some discussion of speech production & perception.

alt.binaries.sounds.* - posting and discussion of sound samples.

Mailing Lists

Voice-Users Mailing List For discussion of any aspect of using voice recognition systems.

+ Using such systems safely, without muscle or voice strain

+ Techniques for improving recognition accuracy

+ How to set up the physical voice workstation

+ Tips for effective use of voice interfaces

+ Configuration of specific systems, troubleshooting, etc

To subscribe fill out the web-based subscription form Posts to the list should go to:

voice-users@voicerecognition.com

Colibri

News about language, speech, logic and information. Email: colibri@let.ruu.nl WWW: http://colibri.let.ruu.nl/

ECTL - Electronic Communal Temporal Lobe Founder & Moderator: David Leip. Moderated mailing list for researchers with interests in computer speech interfaces. This list serves a broad community including persons from signal processing, AI, linguistics and human factors. To subscribe, send your name, institute, department, daytime phone and email address to:

+ ectl-request@snowhite.cis.uoguelph.ca

The ECTL archive site is ftp://snowhite.cis.uoguelph.ca/pub/ectl

Prosody Mailing List Unmoderated mailing list for discussion of prosody. The aim is to facilitate the spread of information relating to the research of prosody by creating a network of researchers in the field. If you want to participate, send the following one-line message to

+ listserv@msu.edu

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (12 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

+ subscribe prosody Your Name

foNETiks A moderated monthly newsletter distributed by e-mail. It carries job advertisements, notices of conferences, and other news of general interest to phoneticians, speech scientists and others. The editors are Linda Shockey and Gerry Docherty. To subscribe send the following 1 line message to

+ mailbase@mailbase.ac.uk

+ join fonetiks your_first_name your_second_name

Digital Mobile Radio Covers lots of areas include some speech topics including speech coding and speech compression. Mail Peter Decker dec@dfv.rwth-aachen.de to subscribe.

Q1.5: Associations, Journals and Conferences

[Note: Also see the list provided in Shikano's WWW site on Speech and Acoustics:

http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e

-www-site.html.]

Associations

Institute of Electrical and Electronics Engineers (IEEE)

* Publications: include IEEE Transactions on Signal Processing, IEEE Transactions on Speech and Audio (from Jan 93), IEEE Transactions on Acoustics, Speech, and Signal Processing (now obsolete), IEEE Signal Processing Magazine. (More information on the WWW:

http://www.ieee.org/sp/index.html).

* Speech-Related Conferences: ICASSP - Intl. Conf. Acoustics, Speech, and Signal Processing. IEEE also runs speech technology related workshops and many other conferences. (Does anyone have a list?)

* Contact: IEEE Service Center

445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855, USA

Phone: 1-800-678-IEEE or (201) 981-0060

* WWW: IEEE: http://www.ieee.org/ IEEE Signal Processing Society http://www.ieee.org/sp/index.html

The Acoustical Society of America (ASA)

* Publications: Journal of the Acoustical Society of America (JASA)

* Conferences: ASA holds four meetings a year. Information is available on the WWW: http://asa.aip.org/meetings.html.

* Contact: ASA Office Manager,

500 Sunnyside Blvd, Woodbury, NY 11797-2999, USA

Ph: (516) 576-2360, FAX (516) 576-2377 Email: asa@aip.org

* WWW: http://asa.aip.org/

European Speech Communication Association (ESCA)

* Publications: Speech Communications

* Conferences: EUROSPEECH is held every two years. E'97 will take place in Patras, Greece, in September 1997. ESCA organises regular speech-related workshops: see their WWW pages for details.

* Contact: Secretariat ESCA ICP, Universite Stendhal, BP 25X, F38400 Grenoble Cedex 9, France Ph: (+33).76.82.43.36 Fax (+33).76.82.43.35

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (13 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Email: esca@icp.grenet.fr

* WWW: http://ophale.icp.grenet.fr/esca/esca.html

Association for Computational Linguistics (ACL)

* Publications: Computational Linguistics

* SIGPHON: Special Interest Group for Computational Phonology. The home page is provided by the Centre for Cognitive Science at the University of Edinburgh. A special issue on Computational Phonology appeared in Vol 20, Num 3 of Computational Linguistics and included an Introduction to Computational Phonology by Steven Bird

* Conferences: COLING is held bi-annually. ACL also organises a range of workshops. See the WWW pages for details.

* Contact: P.O. Box 6090 Somerset, NJ 08875, USA Ph: (908) 873 3893 Email: acl@bellcore.com

* WWW: http://www.cs.columbia.edu:80/~acl/

American Voice Input/Output Society (AVIOS)

* Description: AVIOS is a not-for-profit organization, dedicated to disseminating information about applications using speech technology. It aims "to bridge the gap between emerging voice technology and its application, by providing an interactive forum for the technologists, students, system developers, business managers, and users actively involved in or with an interest in the field of voice processing."

* Publications: International Journal of Speech Technology (with Kluwer Academic Publishers) The Journal of the American Voice Input/Output Society was published from 1984 to 1994.

* Conferences: The International Voice Input/Output Applications Conference is held annually (since 1982): Sept 10-12, San Jose, CA.

* Contact: 4010 Moorpark Avenue, Suite 105M, San Jose, CA 95117, USA

Ph: +1-408-248-1353, Fax: +1-408-248-0251 Email: avios@pilot.net WWW: http://www.avios.com/

European Language Resources Association

* Description: The European Language Resources Association was established in Luxembourg in February, 1995, with the goal of creating an organization to promote the creation, verification, and distribution of language resources in Europe. A non-profit organization, ELRA aims to serve as a central focal point for information related to language resources in Europe, It will help users and developers of European language resources, as well as government agencies and other interested parties, exploit language resources for a wide variety of uses. It will also oversee the distribution of language resources via CD-ROM and other means and promote standards for such resources.

* More info: see the ELRA Home page for membership information, lists of resources etc.

* Contact: K. Choukri, Executive Director ELRA 87, Avenue d'Italie, 75013 Paris, FRANCE Ph: +33 1 45 86 53 00, Fax: +33 1 45 86 44 88 Email: elra@calvanet.calvacom.fr WWW: http://www.icp.grenet.fr/ELRA/home.html

ASSTA: Australian Speech Science and Technology Association

* Conference: SST, the Australian conference on Speech Science and

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (14 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Technology, is held bi-annually. SST-96 will be held in Adelaide.

* WWW: Home Page: http://cslab.anu.edu.au/~bruce/assta/ List of members: http://ciips.ee.uwa.edu.au/~roberto/assta-users/

SALT: UK Speech and Language Technology Club

* WWW home page: http://salt.essex.ac.uk/salt/

Linguistic Associations

*

A comprehensive list of linguistic associations and linguistic WWW

links is available at http://engserve.tamu.edu/files/linguistics/linguist/associations.h tml

Industry Publications

ASR News

* Description: Monthly newsletter covering developments in the speech recognition and speech synthesis marketplace.

* Note: Voice Information Associates also publish "Automatic Speech Recognition: A study of the world-wide market" (revised 1995) and "Text-to-Speech Technology Markets: 1995-2000" (revised 1995)

* Contact: Voice Information Associates, Inc. 14 Glen Road South, P.O. Box 625, Lexington, MA 02173, USA Ph: +1-617-861-6680, Fax: +1-617-863-8790 Email: asrnews@tiac.net WWW: http://www.tiac.net/users/asrnews/

Voice News

* Description: Monthly newsletter reporting on voice mail, voice response, speech recognition, speech synthesis, digital voice record/playback and related technologies, markets and company activities. Review copy available on request.

* Contact: Stoneridge Technical Services P.O. Box 1891, Rockville, MD, 20849, USA Ph: +1-301-424-0114, Fax: +1-301-424-8971 Email: info@stoneridgetech.com WWW: http://www.stoneridgetech.com/

Speech Recognition Update

* Description: Monthly news and analysis of speech recognition markets, applications and technology.

A free sample copy is available by contacting TMA Associates.

* Also: TMA Associates also publishes market studies, including The Advanced Speech Technology Market: Recognition, Synthesis and

Compression (1996) and Voice ID (1996)

.

Contact: TMA Associates 6021 Wish Avenue, Encino, CA 91316, USA Ph: +1-818-708-0962, Fax: +1-818-345-2980 Email: 72162.3172@compuserve.com http://www.tmaa.com/

Voice Technology and Services News

* Description: Follows integrated PC LAN messaging (voice, fax, mail, video) and speech technology. It follows the merging computer and telephone technologies, provides insights into business and marketing opportunities and offers executive timely information on industry trend analysis.

* Contact: Phillips Business Information

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (15 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

1201 Seven Locks Rd., Potomac, Maryland, 20854, USA Ph: 1-800-777-5006 OR +1-301-340-1520 Subscription FAX: +1-301-309-3847 Editorial FAX: +1-424-4297

Telleconnect

* Contact: +1-212-691-8215

Computer Telephony

* Contact: +1-212-691-8215

Voice Processing Magazine

* Contact: 1-800-854-3112

Speech Technology

* Description: No longer published

Technical and Research Publications

Computer Speech and Language

* Price: $US170 (Institutions), $US75 (Individuals), 4 issues per year.

* Publisher: Academic Press Limited 24-28 Oval Road, London NW1, England WWW: http://www.apnet.com/

Speech Communication

* Contact: ESCA (see above)

* Publisher: Elsevier Science B.V. P.O. Box 521, 1000 AM Amsterdam, The Netherlands. WWW: http://www.elsevier.com/

IEEE Transactions on Speech and Audio Processing,

IEEE Signal Processing Magazine,

IEEE Transactions on Acoustics, Speech, and Signal Processing: OBSOLETE

* Contact: IEEE (see above)

Free Speech Journal

* Description: A Web Journal dedicated to the state of the art in human language technology. Past volumes, editorial and submission information, and so on are

* Contact: Editor-In-Chief: Ron Cole: cole@cse.ogi.edu WWW: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html

Linguistics Abstracts Online

* Description: online access to all abstracts published in Linguistics Abstracts since 1985, plus all current material as it becomes available. Over 250 publications are indexed. Free trial available. http://www.blackwellpublishers.co.uk/labs/

Computational Linguistics

* Contact: Published by Computational Linguistics Assoc. (see above)

Journal of the Acoustical Society of America (JASA)

* Contact: Published by Acoustical Society of America (see above)

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (16 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

International Journal of Speech Technology (was the AVIOS Journal)

* Description: Focuses on speech technology and its applications, and promotes research and description of all aspects of speech input and output: applications, base technology, theory, approach, experiment, and testing.

* Publisher: Kluwer Academic Publishers 101 Philip Drive, Norwell, MA 02061, USA Ph: +1-617-871-6300, Fax: +1-617-871-0449

* Submissions to: International Journal of Speech Technology Journals Editorial Office, Ms. Kelly Riddle Kluwer Academic Publishers (Address, phone, fax as above) Email: krkluwer@world.std.com

Conferences

ICSLP: Intl. Conference on Spoken Language Processing Next: 30 Nov to 4 Dec, 1998, Sydney, Australia Held in even years.

ICASSP - Intl. Conf. Acoustics, Speech, and Signal Processing

Eurospeech

Computational Linguistics (COLING), held bi-annually

International Voice Input/Output Applications Conference

SST: Australian Speech Science and Technology Conference

Also see the following lists on the WWW:

Shikano's WWW site on Speech and Acoustics http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res ource/e-www-site.html

Institute of Phonetic Sciences WWW list

http://fonsg3.let.uva.nl/Other_pages.html#Meetings

Q1.6: Handicap Aids

The following are products and companies which support users who can benefit from the use of speech technology in a user interface. Please feel free to submit information on relevant products, names of companies and links to useful information on the Internet (especially WWW sites). [Of course, most of the products listed in Q5.5 and Q6.5 are useful.]

* Man-Machine Interfacing

* SpeechViewer II

Man-Machine Interfacing

* Description: Offers a service designed for people with physical challenges. Can successfully implement a computerized voice controlled system adapted to unique needs. They have developed a free-standing microphone and signal processing system to compensate for speech/articulation distortions, and background noise produced by electronic devices such as wheelchairs and respirators.

* Contact: Man-Machine Interfacing

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (17 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

P.O. Box 5371, Evanston, IL 60204 Ph: 1-888-425-2001, Fax : (847) 328-7975 Email: jwhite@mcs.com WWW: http://www.speechrec.com/

SpeechViewer II

* Platform: IBM Machines from Mod 25 on.

* Description: SpeechViewer II is a speech therapy tool. It provides graphical feedback of various speech features so that speech impaired individuals can improve their speech. It works with an audio bandwidth of 7.3 Khz and thus allows the therapist to work with sustained vowels and fricatives. A wide range of graphics are used to provide adequate variability to hold client interest. An extensive set of statistics are gathered which allows a therapist to do research or keep therapy records. The speech therapy modules are:

+ Awareness - Sound, Loudness, Pitch, Voicing Onset, Voicing

+ Skill Building - Pitch, Voicing, Phonology

+ Patterning - Pitch & Loudness - Waveform & Spectrogram, Spectra

+ Clinical Management - Profiles, Models, Client Data

A multilingual option is available which provides support for 12 languages: Danish, Dutch, Finnish, French, German, Icelandic, Italian, Norwegian, Portuguese, Spanish, Swedish, and UK English.

With the Multilingual Option, clinicians can use SpeechViewer II as a training tool for English as a second language and for foreign language training.

* Hardware: Requires an IBM M-ACPA (Multimedia-Audio Capture Playback Adapter). It has a TI TMS320C25 DSP chip. The input sampling rate is 44.1 Khz stereo, 88.2 Khz mono. This is a 16 bit card. It has the following jacks: mic in, stereo line in, stereo line out, speaker out. Note: This card is being replaced by Mwave technology. For more info on Mwave contact Texas Instruments.

* Price:

+ The software is $2130 list, $1491 educational, part number

92F2066.

+ The M-ACPA is $370 list, $222 educational, part number

92F3378.

+ The MicroChannel adapter part number is 92F3379 (same price).

* Contact: IBM Special Needs Information 1000 N. W. 51st Street, Internal Zip 5432, Boca Raton, Florida 33431, USA Ph: 1-800-426-4832, TDD: 1-800-426-4833, Fax: 1-407-982-6059 Email: IBM_SPEC_NEEDS_INFO@vnet.ibm.com

WWW: http://www.austin.ibm.com/pspinfo/snsspv2.html

Q1.7: Speech databases

A wide range of speech databases have been collected. These databases are primarily for the development of speech synthesis/recognition and for linguistic research.

Some databases are free but most are not. The databases normally require lots of storage space (100's of MBytes is not unusual). Do not expect to be able to ftp large amounts of speech data.

In addition to the descriptions of speech databases and speech database providers below, information can be obtained from

LDC: Linguistic Data Consortium Provides a very wide range of speech and text data to research

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (18 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

and commercial users: see below.

COCOSDA Home Page: http://www.itl.atr.co.jp/cocosda/ The International Committee for the Co-ordination and Standardisation of Speech Databases and Assesment Techniques for Speech Input/Output.

Shikano's WWW site on Speech and Acoustics http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res ource/e-www-site.html

RELATOR Project European resource initiative: see below.

The following speech data resources are described in the FAQ.

* Bavarian Archive for Speech Signals

* BUPT Spoken Digit Database (Chinese)

* Center for Spoken Language Understanding (CSLU)

* Examples of IPA Symbols

* Linguistic Data Consortium (LDC)

* NOISEX

* Oxford Acoustic Phonetic Database

* Phonemic Samples

* RELATOR project

* ShATR

* University of Victoria Phonetic Database

Bavarian Archive for Speech Signals

* Description: The Bavarian Archive for Speech Signals (BAS) was founded in January 1995 as an initiative of the Institute of Phonetics at the University of Munich, Germany. The BAS will develop, validate, administrate and disseminate corpora of spoken German to the speech community as well as to speech engineering industry. Presently the following German speech corpora are available on ISO 9660 CDROM:

Siemens 1000 - SI1000

5 CDROMs, newspaper corpus, read speech, 10 speakers x 1000 utterances

Siemens 100 - SI100

7 CDROMs, read speech, 101 speakers x 100 sentences

PhonDat 1 - PD1

6 CDROMs, new edition in preparation, read speech, 201 speakers x 450+ sentences

PhonDat 2 - PD2

1 CDROM, read speech, 2nd edition, 16 speakers x 200 sentences, various labelled information

Verbmobil Spontaneous speech recorded in a dialog task (appointment scheduling). More information on the VERBMOBIL project:

http://www.dfki.uni-sb.de/verbmobil/

Corpora in Preparation

PhonDat I - PD1: 2nd extended edition (Jul 1995)

Strange Corpora - SC Reference Corpora that reflect certain well known problems in speech processing, like accents, repair,

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (19 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

breaks, hesitations, repetitions, extreme F0, backround noise, pathological speech, speaker adaptation. The first SC corpus (SC1 Accents) will be edited in Jul 1995.

BAS Edition of Verbmobil Corpora - VM: 2nd extended edition

Articulatory data - AD: EMA data of speakers of SI1000 corpus

ERBA: 10000 utterances from a train inquiry task

* Misc: BAS is currently developing tools for the automatic annotation and segmentation of very large speech corpora. This includes the automatic detection of variants of pronunciation, a statistical based alignment and a rule-based refinement of the outcome. The BAS seeks to cooperate with public institutions as well as with industrial partners to further develop new German speech databases. BAS can be a platform to re-distribute existing German speech.

* Contact and More Information: The BAS is located at the University of Munich, Germany. BAS c/o Institut fuer Phonetik Schellingstr. 3/II 80799 Muenchen, Germany Ph: +49-89-21802758, Fax: +49-89-2800362 Email: bas@sun1.phonetik.uni-muenchen.de WWW: http://www.phonetik.uni-muenchen.de/BASSeng.html

BUPT Spoken Digit Database (Chinese)

* Vocabulary : {0, 1/yi/, 2, 3, 4, 5, 6, 7, 8, 9, 1/yao/, /dui/, /cuo/ }, 13 words in total.

* Size: 1202 speakers in total, 789 Males and 413 Females. Each speaker utters each word 2 times. Total of 31252 utterances.

* Format: 8000Hz 14bit sampling. One utterance per file.

* Contact:

GLuck Co. 195 Berlioz 1C, Nun's Island Verdun H3E 1C1, Canada e-mail: weigang@zaphod.math.mcgill.ca

Center for Spoken Language Understanding (CSLU)

* The ISOLET speech database of spoken letters of the English alphabet. The speech is high quality (16 kHz with a noise cancelling microphone). 150 speakers x 26 letters of the English alphabet twice in random order. The ISOLET data base can be purchased for $100 by sending an email request to vincew@cse.ogi.edu. (This covers handling, shipping and medium costs). The data base comes with a technical report describing the data.

* CSLU has a telephone speech corpus of 1000 English alphabets. Callers recite the alphabet with brief pauses between letters. This database is available to not-for-profit institutions for $100. The data base is described in the proceedings of the International Conference on Spoken Language Processing. + Contact vincew@cse.ogi.edu if interested.

* CSLU has released for universities its Continuous English Speech Corpus. The corpus contains recorded speech from 690 different speakers, with label files at various levels - including word level and phonetic labels. The data were collected as part of the OGI Multi-language telephone corpus. CSLU provides speech corpora to all universities without charge. To order a corpus, print the

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (20 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

license agreement/order form, complete it, and fax it to the CSLU. A description of the corpora and an order form are available:

http://www.cse.ogi.edu/CSLU/

ftp://speech.cse.ogi.edu/pub/releases

* Contact: Mike Noel: noel@cse.ogi.edu

Examples of IPA Symbols

UCLA Sounds of the World's Languages

* Description: The UCLA Sounds of the World's Languages are available for Macintosh users (no DOS based system currently available). The sounds are stored in a Hypercard database developed at the UCLA Phonetics Laboratory. The aim is to illustrate and teach about the range of sounds used in human languages with material on more than 80 languages. The set demonstrates particular highlights of the sound systems focusing especially on rarer sounds that students may not otherwise have a chance to hear from a native speaker. The recordings are based on the archives of recordings collected at UCLA, with additional contributions from outside collaborators. All the languages can be accessed from the list of language names, or by clicking on the language name in a set of maps. Support for part of this work was provided by NSF. The database currently includes examples of languages from Agul and Akan to Zulu.

* Availability: 15 DSDD disks, requiring about 35 meg of disk space when expanded. Available for $50 individual $100 institutions. Prepayment in US dollars (checks or international money orders payable to "UC Regents") must accompany all orders.

* Contact: The UCLA Phonetics Laboratory Linguistics Department, UCLA, Los Angeles, CA 90095 1543 Tel: (310) 825-1254 E-mail: oldfogey@ucla.edu

John Eslings "IPA Labels"

* Description: A HyperCard stack which is available for free or a nominal fee.

* Contact: John Esling can be reached by email: pdb@uvvm.uvic.ca.

Linguistic Data Consortium (LDC)

The LDC was established to broaden the collection and distribution of speech and natural language data bases for the purposes of research and technology development in automatic speech recognition, natural language processing and other areas where large amounts of linguistic data are needed. Detailed information on the LDC is now available on the WWW: http://www.ldc.upenn.edu/. The LDC WWW server provides information on membership agreements, license agreements, and summaries of speech and text corpora available.

Speech Corpora

* TIMIT Acoustic-Phonetic Continuous Speech Corpora and NYNEX Telephone Version of TIMIT Corpus (NTIMIT)

* Resource Management Corpora

* Air Travel Information System (ATIS) Corpora (multiple)

* ARPA Continuous Speech Recognition Corpora (WSJ etc)

* Switchboard Corpus of Recorded Telephone Conversations and Switchboard Corpus Excerpts (Credit Card Conversations)

* Texas Instruments 46-Word Speaker-Dependent Isolated Word Corpus

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (21 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

(TI46)

* Texas Instruments Speaker-Independent Connected-Digit Corpus (TIDIGITS)

* Road Rally Conversational Speech Corpus

* HCRC Map Task Corpus

* Air Traffic Control Corpus (ATC0)

* SPIDRE Speaker Identification Corpus

* YOHO Speaker Verification Corpus

* OGI Multi-Language Corpus and OGI Spelled and Spoken Telephone Corpus

* BRAMSHILL

* MACROPHONE

* King Corpus for Speaker Verification Research

* WSJCAM0: Cambridge Read News Corpus

* TRAINS Spoken dialog corpus

* NYNEX PhoneBook Database

* Frontiers in Speech Processing

Text Corpora

* Association for Computational Linguistics Data Collection Initiative (ACL/DCI)

* The Penn Treebank Project - Release 2

* TIPSTER Information Retrieval Text Research Collection

* United Nations Parallel Text Corpus (English, French, Spanish)

* Japanese Language Financial New

* European Corpus Initiative-1

Lexical Databases

* CELEX Lexical Database

* COMLEX : COMmon LEXical Database of English (English syntax and pronunciation)

Contact information:

Linguistic Data Consortium 3615 Market Street, Suite 200, Philadelphia, PA, 19104-2608, USA. Phone: +1 (215) 898-0464 Fax: +1 (215) 573-2175 e-mail: ldc@ldc.upenn.edu WWW: http://www.ldc.upenn.edu/

NOISEX-92

* Description: Database of recording of various noises available on 2 CDROMs. Some material from the same source is available by anonymous ftp in the IEEE's Signal Processing Information Base. The samples include

+ Voice babble

+ Factory noise

+ HF radio channel noise, pink noise, white noise

+ Various military noises; fighter jets (Buccaneer, F16), destroyer noises (engine room, operations room), tank noise (Leopard, M109), machine gun

+ Volvo 340

* Availability 1: The cost of this database is 135 Pounds Sterling for the set of two CD-ROMs. Send payment with order to:

The Speech Research Unit, Ex1, DRA Malvern, St.Andrew's Road, Malvern, Worcestershire, WR14 3PS, UK Tel +44-684-894074 Fax +44-684-894384 Note: The supply of CD-ROMs is limited so please check that they are still available before placing an order. The only acceptable methods of payment are cheques (from the UK only) or bank drafts in Pounds Sterling drawn on a UK bank. They should be made payable

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (22 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

to:-

Public Sub Account HMG 4768.

* Availability 2: Information on how to obtain a copy of the NATO RSG.10 NOISE-ROM-0 can be obtained from the DRA Speech Research Unit (address above) or from:

Dr. Herman Steeneken, TNO Institute for Perception, P.O. Box 23, 3769 ZG Soesterberg, The Netherlands.

* Availability 3 (WWW): Examples of the NOISEX database are available on the Rice University Digital Signal Processing (DSP) group home page. (Note the files are large (>20MB). http://spib.rice.edu/spib/select_noise.html

Oxford Acoustic Phonetic Database

* Available on compact disc, from J. Pickering and B. Rosner. It contains data on vowel-consonant and consonant-vowel combinations in both stressed and unstressed locations. The language covered include French, German, Hungarian, Italian, Japanese, British English, Spanish and English. For further information write to

Electronic Publishing, Oxford University

Press, Walton Street, Oxford OX2 6DP, UK. The ISBN is 0-19-268086-2

* Contact:

Prof. B. Rosner Dept. of Experimental Psychology South Parks Rd, Oxford, OX1 3UD, UK email: burton.rosner@wolfson.ox.ac.uk

Phonemic Samples

* Some basic data. The following ftp sites have samples of English phonemes (American accent I believe) in Sun audio format files. See Question 1.8 for information on audio file formats.

ftp://sounds.sdsu.edu/.1/phonemes: This ftp site appears to be obsolete. Does anyone know a new address?

ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes: There appears to be some config problem with this ftp server.

ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes

The RELATOR project

* Description: RELATOR is a European-wide consortium of researchers who, with the support of the European Commission, are striving to establish a European repository of linguistic resources. Linguistic resources comprise a variety of spoken and written language materials, including lexicons, grammars, corpora, and spoken language databases. RELATOR will ensure that the requirements of the European language processing community receive attention. The RELATOR WWW pages provide information on the consortium, The languages currently covered by the RELATOR consortium include Danish, Dutch, English, French, German, Greek, Italian, Portuguese, Spanish plus multilingual resources. The resources include both text and speech.

* WWW: http://cristal.icp.grenet.fr/Relator/homepage.html

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (23 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

ShATR

* Description: Multi-simultaneous-speaker corpus available on one CDROM. This specialised corpus is primarily intended to provide acoustic material for studies in auditory scene analysis. However many researchers in the speech sciences, ranging from acoustics to discourse analysis may find it a valuable source of information. The corpus has been transcribed and aligned at four different levels of analysis. An overlap analysis between the individual speaker channels and word counts are available. There is also a general tool for accessing concurrent events in transcribed multi-sound-source databases.

* Cost: 30 Pounds Sterling for one CD-ROM. Availability, licensing and ordering information is provided on ShATR's home page.

* Examples: Samples of the ShATR database are available on ShATR's home page and by anonymous ftp ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/

* Contact: Speech and Hearing Research Group Department of Computer Science, University of Sheffield Regents Court, 211 Portobello Street, Sheffield S1 4DP, U.K.

WWW:

http://www.dcs.shef.ac.uk/research/groups/spandh/pr/ShATR/ShATR.ht

ml

University of Victoria Phonetic Database

* Platform: Computerized Speech Lab CSL4300, MultiSpeech on Winxx or Win95 with any multimedia card, or a SoundBlaster16 option with support from the PDBAUDIO program.

* Description: Phonetic database consisting of proprietary format digitized speech samples from 45 world languages on CDROM. The CDROM is supported by hardcopy documentation containing the phonetic inventory of each language, transcriptions and orthography of each digitized speech sample. The PDB depicts and compares the the sounds, symbols and conventions of transcription used by these languages. More information is available from the STR web site.

* Contact: Speech Technology Research Ltd., Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada Ph: +1-250-477-0544 Email: products@speechtech.com WWW: http://www.speechtech.com/home/speechtech/

Q1.8: Speech File Formats and Conversion

Q2.7 of this FAQ has information on mu-law coding.

A very good and very comprehensive list of audio file formats is

prepared by Guido van Rossum. The list is posted regularly to comp.dsp and alt.binaries.sounds.misc, amongst others. It includes information on sampling rates, hardware, compression techniques, file format definitions, format conversion, standards, programming hints and lots more. It is also available by ftp from

WWW: ftp://ftp.cwi.nl/pub/audio/index.html

Text: ftp://ftp.cwi.nl/pub/audio/AudioFormats.part1,2

A useful source of software (Sox, ulaw conversion, SoundKit etc) is:

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (24 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

http://peace.wit.com/sounds/SoundConversion/

Q1.9: Speech Laboratory Environments and Audio Editors

First, what is a Speech Laboratory Environment? A speech lab is a software package which provides the capability of recording, playing, analysing, processing, displaying and storing speech. Your computer will require audio input/output capability. The different packages vary greatly in features and capability - best to know what you want before you start looking around.

Most general purpose audio editing packages will be able to process speech but do not necessarily have some specialised capabilities for speech (e.g. formant analysis).

The following article provides a good survey.

* Read, C., Buder, E., & Kent, R. "Speech Analysis Systems: An Evaluation" Journal of Speech and Hearing Research, pp 314-332, April 1992.

The following is a list of the speech labs described in the FAQ.

* CSRE: Computerized Speech Research Environment

* DADiSP from DSP Development Corporation

* Entropic Signal Processing System (ESPS) and Waves

* GoldWave

* Kay Elemetrics Computer Speech Lab

* Khoros

* Matlab plus Signal Processing Toolbox

* MacSpeech Lab II

* N!Power

* OGI Speech Tools

* Ptolemy

* Quadravox Speech Processing Products - Qbox

* Speech Filing System (SFS)

* Signalyze 3.0 from InfoSignal

* SoundScope

CSRE: Computerized Speech Research Environment

* Platform: DOS

* Description: CSRE (pronounced "Caesar") is a speech processing system for the PC. It provides

+ Signal recording and playback

+ Signal editing

+ Pitch and spectral analysis and formant analysis

+ Speech synthesis with an implementation of the Klatt-1980 parametric speech synthesizer

* Requirements: PC compatible (80486DX), 1 Meg RAM (recommend 4M), DOS 3.2 (recommend 6.22), VGA graphics (640x480; 16 colors) 30 Meg of hard disk space (5 Meg for CSRE plus space for audio recordings), and a supported audio card .

* Cost: See AVAAZ WWW Pages

* Contact: AVAAZ Innovations Inc. P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G

2B0

Ph: +1-519-472-7944, Fax: +1-519-472-7814 Email: info@avaaz.com WWW: http://www.icis.on.ca/homepages/avaaz/

* Note: See also the CSRE entry in Q5.5 on speech synthesisers.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (25 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

DADiSP from DSP Development Corporation

* Platform: Windows and various Unix

* Description: DADiSP is designed for scientists and engineers to collect, analyze, and display scientific and technical data. Packages available include AdvDSP, Controls, DADiMP, Filters, GPIBLab, NeuralNet, and Stats.

A description of the application of DADiSP to speech processing is

provided on the DSP Development Corporation WWW site.

Detailed product information is available on the DSP Development Corporation WWW site and by filling out a WWW form.

* Cost: Unknown

* Availability: See the DSP Development Corporation WWW site

A free, fully featured demo of DADiSP 4.0 is available from the

DSP Development Corporation WWW site and can be mailed on floppy disk.

A special Student Edition of DADiSP is available for free.

* Contact: DSP Development Corporation One Kendall Square, Cambridge, MA 02139, USA

Ph: (617) 577-1133 Fax: (617) 577-8211 EMail: info@dadisp.com WWW: http://www.dadisp.com/

Entropic Signal Processing System (ESPS) and Waves

*

Platform: Range of Unix platforms.

*

Description: ESPS is a comprehensive set of speech analysis/processing tools for the UNIX environment. The package includes UNIX commands, and a comprehensive C library (which can

be accessed from other languages). Waves is a graphical front-end for speech processing. Speech waveforms, spectrograms, pitch traces etc can be displayed, edited and processed in X windows and Openwindows (versions 2 & 3). Waves also includes a signal labelling utility which provides multiple feature labelling and useful features for fast labelling of large speech databases. Other Entropic products are HTK (see Q6.5) and TrueTalk (see

Q5.5).

*

Misc: A more detailed description is provided on the Entropic WWW pages (http://www.entropic.com/esps.html).

*

Cost: On request.

*

Contact:

Entropic Research Laboratory, Washington Research Laboratory 600 Pennsylvania Ave, S.E. Suite 202, Washington, D.C. 20003 (202) 547-1420 email: info@entropic.com WWW: http://www.entropic.com/

GoldWave

* Platform: Windows

* Description: GoldWave is a digital audio editor for Microsoft Windows. It features realtime amplitude/spectrum oscilloscopes, large file editing, effects, and support for a wide variety of sound formats.

+ Editing of multiple waveforms and large waveforms

+ Realtime amplitude/spectrum oscilloscopes

+ Resizable device controls window for accessing audio devices

+ Realtime fast forward and rewind playback

+ Effects: distortion, Doppler, echo, filter, mechanize,

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (26 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

offset, pan, volume shaping, invert, resample, transpose, etc

+ Multiple file formats and conversions: .WAV, .AU, .IFF, .VOC, .SND, .MAT, .AIFF, and raw data

+ CD-ROM controls window More information is available on the GoldWave home page.

* Cost: Shareware

* Availability: Through the GoldWave home page:

http://web.cs.mun.ca/~chris3/goldwave/goldwave.html

* Contact: Chris Craig: chris3@cs.mun.ca

Kay Elemetrics CSL (Computer Speech Lab) 4300

*

Platform: Minimum IBM PC-AT compatible with extended memory (min 2MB) with at least VGA graphics. More powerful machines preferable.

*

Description: Speech analysis package, with optional separate LPC program for analysis/synthesis. Uses its own file format for data, but has some ability to export data as ascii. The main editing/analysis prog (but not the LPC part) has its own macro language, making it easy to perform repetitive tasks. Options - more information on the Kay Elemetrics Corp. WWW site:

+ Multi-Dimensional Voice Program (MDVP)

+ Voice Range Profile (Phonetograph)

+ Real-Time Spectrogram

+ Sona-Match

+ Palatometer Database

+ IPA Transcription Tutorial

+ Delayed Auditory Feedback (DAF)

+ Disordered Voice Database

+ Auditory Perception Program and Database

+ Motor Speech Profile Program

+ CSL-Pitch

+ Real-Time EGG Processing

+ Signal Enhancement in Noise Program

+ Synthesis Program

+ DAT Interface and Four Channel Input

+ Phonetic Database

+ Direct-to-Disk Program

+ Programmers Kit

+ Condenser Microphone

+ Multi-Speech

*

Cost: Contact Kay Elemetrics Corp.

*

Contact: Kay Elemetrics Corp. 2 Bridgewater Lane, Lincoln Park, NJ 07035, USA Ph: +1-201-628-6200, Fax: +1-201-628-6363 Toll free tel. 1-800-289-5297 [WWW: http://www.kayelemetrics.com/ - available soon]

Khoros

*

Platform: Any Unix - source code available.

*

Description: Khoros is a technical computing environment for image and signal processing, visual programming and software development.

*

Price: On request.

*

Availability: Khoral Research Inc. 6001 Indian School Rd. NE Suite 200, Albuquerque, NM 87110, USA Ph: (505)837-6500, Fax: (505) 881-3842 Email: info@khoral.com ftp: ftp://ftp.khoral.com/ WWW: http://www.khoral.com/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (27 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Matlab plus Signal Processing Toolbox

* Platform: Wide range

* Description: Matlab (MATrix LABoratory) is a technical computing environment for numerical computation and visualization based on a

matrix oriented, interpreted programming language. The programming environment provides support for the development of customized operations, along with debugging facilities and a graphical user interface toolkit. Audio output is provided.

A specialised Signal Processing Toolbox is available which

provides many functions which are useful for speech analysis. It includes filter design, spectral estimation, statistical signal processing, waveform generation, and signal and spectrogram display.

A specialised Auditory Toolbox is available which contains

functions useful to people interested in auditory/cochlear models.

A more detailed description is given in Q1.10.

* Price: On request.

* Contact: The Math Works Inc. 24 Prime Park Way, Natick, MA

01760-1500 USA Ph: 1-508-653 1415 Fax: 1-508-653 6284 Email: info@mathworks.com ftp: ftp://ftp.mathworks.com WWW: http://www.mathworks.com/

MacSpeech Lab II (MSL II)

*

Platform: Macintosh

*

Description: A sound analysis and acquisition for Macs. MSL II delivers the most common functions for speech analysis (FFTs, LPCs, f0 extraction, etc.) & produces grayscale spectrographic displays. Can be used for various speech technology and phonetic training tasks.

*

Hardware: Requires MacADIOS ("Macintosh Analog/Digital Input/Output System") hardware for speech I/O at 12/16 bits.

*

Misc: Software no longer updated by GW Instruments; MSL soft/hardware will not perform input/output on Quadras, for

example, though analysis seems fine. Known to operate properly on systems as high as IIcx & II fx.

*

Availability: MSL has been replaced by SoundScope; see the SoundScope entry for more detail.

*

Contact:

GW Instruments 35 Medford Street, Somerville, MA 02143, USA Phone: (617) 625-4096 Fax: (617) 625-1322

N!Power

* Platform: SUN, DEC and HP workstations.

* Description: An object-oriented software package with a MOTIF GUI interface and a range of functionality for data analysis/editing, signal analysis, speech processing, real-time A/D and D/A, and 2D/3D interactive graphics. N!Power replaces ILS. N!Power can provide a Block Diagram user interface, menus, pop-ups, and a high-level IEEE standard symbolic scripting language. You can customize the blocks, menus and pop-ups with mouse point-and-click operations.

* Contact: Signal Technology, Inc. 104 W. Anapamu, Suite J, Santa Barbara, CA 93101-3126 Phone: +1-805-899-8300, Fax: +1-805-899-4344

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (28 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Email: stisales@signal.com WWW: http://www.silcom.com/~stilarry/

OGI Speech Tools

* Developers from the Center for Spoken Language Understanding

(CSLU) at the Oregon Graduate Institute of Science and Technology (Portland Oregon)

* Platform: Unix

* Description: The OGI Speech tools include :

+

An X windows display tool (LYRE) for displaying data in a time synchronous fashion for a. the speech signal b. spectrograms c. phoneme labels, and other information.

+

A

Neural Network (NOPT) training package.

+

An set of C library routines (LIBNSPEECH) for the manipulation of speech data, including: a. PLP Analysis, b. Rasta PLP Analysis, c. Linear Predictive Coding, d. Mel Cepstrum Coding, e. Fast Fourier Transform

+

A

set of utilities for converting file formats such as ADC,

NIST, mu-law, binary files, and ascii. Includes filtering.

+

A

database utility (find_phone) to automate speech database

related enquiries. It allows the user to specify a particular label or set of labels in a given context, display all occurrences of the label, and relabel the occurrences if desired.

+

A

Vector-Quantizer based on the Linde Buzo and Gray (LBG)

algorithm.

+

A

set of PERL Scripts which have been used mainly to automate

the use of the OGI Speech Tools.

+

MAN Pages for all routines and programs developed, as well as

a User manual in both in postscript and tex format.

* Misc: Software is written in ANSI C.

* Contact: Email: tools@cse.ogi.edu WWW: http://www.cse.ogi.edu/CSLU/ ftp: ftp://speech.cse.ogi.edu/pub/tools/

Ptolemy

* Platform: Sun SPARC, DecStation (MIPS), HP (hppa).

* Description: Ptolemy provides a highly flexible foundation for the specification, simulation, and rapid prototyping of systems. It is an object oriented framework within which diverse models of computation can co-exist and interact. Ptolemy can be used to model entire systems. Ptolemy has been used for a broad range of applications including signal processing, telecomunications, parallel processing, wireless communications, network design, radio astronomy, real time systems, and hardware/software co-design. Ptolemy has also been used as a lab for signal processing and communications courses. Ptolemy has been developed at UC Berkeley over the past 3 years. Further information, including papers and the complete release notes, is available from the FTP site.

* Cost: Free

* Availability: The source code, binaries, and documentation are available by anonymous ftp from

ftp://ptolemy.berkeley.edu/pub/README

Quadravox Speech Processing Products - Qbox

* Platform: Windows 3.1, Windows 95

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (29 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Description: Qbox comprises a Windows-based LPC-12 analysis and editing sytem and a parallel-port driven programmer for one-time-programmable TI TSP50P11 synthesis chips. The analysis software utilizes standard 11025Hz, 16bit monaural .wav files for input and allows graphical editing of the coded pitch, gain, and reflection coefficients. It can also be used to define concatenation sequences of individual phrases. Data rates depend on the original sound, but are typically below 2000bits/sec. The processed data can then be merged with synthesis and control routines and programmed into the TI synthesizer. The Quadravox-developed synthesis routine accepts run-time modifications of pitch and frame-length (speed), as well as externally defined concatenation sequences. The synthesis chip interface can be defined as a matrixed-keyboard drive, a simple parallel control, or a serial bus control supporting up to 31 individually addressed devices and modules.

* Cost: $90-$150 depending on options selected.

* Contact: Quadravox, Inc. 1701 N. Greenville Ave., Suite 608, Richardson, TX, 75081 USA Ph: 214-669-4002 Email: info@quadravox.com WWW: http://www.quadravox.com/

Speech Filing System (SFS)

* Platform: Unix and DOS

* Description: SFS provides a computing environment for conducting speech research. It comprises software tools, file and data formats, subroutine libraries, graphics, standards and special programming languages. It performs standard operations such as recording, replay, waveform editing and labelling, spectrographic and formant analysis and fundamental frequency estimation. For more information, see ftp://ftp.phon.ucl.ac.uk/pub/sfs/README

* Misc: SFS is copyrighted University College London, but is currently supplied free of charge to research establishments for non-profit use.

* Availability: SFS source code is available by anonymous FTP from:

ftp://ftp.phon.ucl.ac.uk/pub/sfs/

* Contact: Mark Huckvale University College London, Gower Street, London WC1E 6BT, UK Email: SFS@phonetics.ucl.ac.uk ftp: ftp://ftp.phon.ucl.ac.uk/pub/sfs/

Signalyze 3.0 from InfoSignal

* Platform: Macintosh

* Description: Signalyze is an interactive program for the analysis of speech and other acoustic material. Signalyze's basic concept revolves around the display of up 100 signals in HyperCard fashion. The program offers a range of signal editing features, spectral analysis tools, manual scoring tools, pitch extraction routines, signal manipulation tools, and extensive input-output capacity. It also has a range of capabilities for creating, editing and manipulating label files with flexibility in labelling format. Signalyze handles the following file formats: Signalyze, MacSpeech Lab, AudioMedia, SoundDesigner II, SoundEdit/MacRecorder, SoundWave, sound resource formats, and ASCII-text. Sound I/O: Direct sound input from Apple 8- or 16-bit sound input Sound output via Macintosh 8- or 16-bit sound.

* Compatibility: MacPlus and higher. Takes advantage of large

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (30 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

screens, multiple screens and 16/256 color/grayscales. System 7.0 compatible. Runs in background with adjustable priority.

* Misc: Manuals and tutorials included (250 pp.). Program is switchable to English, French, and German. For more information and demo:

WWW: http://www.agoralang.com:2410/pubdirsoftware.html WWW: http://www.agoralang.com:2410/signalyze.html

Gopher: gopher://uldns1.unil.ch:70/11/unilgophers/gopher_lett/LAIP

* Cost: Individual licence US$450, departmental license US$750, organisational license US$1250, plus shipping. Upgrades from version 2.0 are available.

* Contact: The Americas: Network Technology Corporation

91 Baldwin St., Charlestown, MA 02129, USA

Phone: +1-617-241-9205, Fax: +1-617-241-5064

---

Elsewhere: InfoSignal Inc. C.P. 73, 1015 LAUSANNE, Switzerland, Fax: +41 21 691-1372,

Email: 76357.1213@COMPUSERVE.COM

SoundScope

* Platform: Macintosh: 68K and PowerPC native

* Description: The SoundScope product family is used primarily in speech teaching & research, with some applications in animal sounds, forensics, and general acoustic analysis. It can record, view, analyze, play, copy, paste, store and print sound waveforms. Analysis functions include spectrogram, fundamental frequency (Fo), Linear Predictive Coding (LPC) including formant tracking, LPC residual, jitter (pitch perturbation), shimmer (amplitude perturbation), HNR, frequency spectrum, spectral slice, envelope, energy and zero crossing. Includes limited built-in filtering, runs any filter created with WLFDAP. An integrated text editor stores notes and calculation results. SoundScope lets you design your own custom "instrument" screen, tasks (macros) and menus. Supplied instruments include 1 channel analyser (dual snap, dual time, spectrogram, spectrum), 2 channel analyser, segment analyser, multi-channel recorder, etc.

* Note: Supercedes MacSpeech Lab II.

* Price: $490 to $4990, less educational discount

* Availability: In North America, directly from GW Instruments. Contact the company for international distributors.

* Contact: GW Instruments

35 Medford Street, Somerville, MA 02143, USA

Ph: +1-617-625-4096, Fax: +1-617-625-1322

Email: info@gwinst.com

Q1.10: Speech Research Sites

Rather than try to list the places round the world which perform speech research this FAQ lists sites on the WWW where other comprehensive lists are maintained. Try the following:

Shikano's WWW site on Speech and Acoustics http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res ource/e-www-site.html Lists of speech research sites by country. Currently includes around 100 sites. The list of Japanese sites is particularly comprehensive.

Mambo Speech Research List

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (31 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

http://mambo.ucsc.edu/psl/speech.html Lists about 50 speech research sites and related information sources. Very nice presentation!

ESCA: European Speech Communication Association http://ophale.icp.grenet.fr/esca/labos.html Links to around 15 European speech research sites and around 15 related sources of information.

Institute for Perception Research: Speech on the Web http://www.tue.nl/ipo/hearing/webspeak.htm Jan Roelof de Pijper at the Institute for Perception Research has a long list of research sites plus links to lots of other speech material on the WWW.

Russ Wilcox's list of Commercial Speech Recognition http://www.tiac.net/users/rwilcox/speech.html Links to information on speech technology vendors, speech research labs, speech resources, on-line demos and more.

Speech Groups List: Leeds University Cognitive Psychology Research Group http://lethe.leeds.ac.uk/research/cogn/speechlab/other.html List of about 25 research sites.

Institute of Phonetic Sciences, Amsterdam

http://fonsg3.let.uva.nl/Other_pages.html#Phonetics

Good list of European sites.

Speech and Hearing Research Group, University of Sheffield, UK http://www.dcs.shef.ac.uk/research/groups/spandh/world/misclink s.html Links to sites in the UK, USA, Europe and the rest of the world.

Duncan M. Forrest's Speech Recognition Resource List http://www.skye.co.za/dmf/speech/

Most speech research sites have links to other speech research sites somewhere in their WWW pages.

Q1.11: Miscellaneous Software and Resources.

Speech Interface Standards: APIs etc

* ASAPI: Advanced Speech API (AT&T)

* SAPI: Microsoft Windows Speech API

* SRAPI: Speech Recognition API

* TAPI: Microsoft Windows Telephony API

Network "Phone" Software

* CUSeeMe

* CyberPhone

* DigiPhone

* InterFACE from Hijinx

* FAQ: How can I use the Internet as a telephone?

* Nautilus: Secure Computer Telephony

* NEVOT (1.4v) from AT&T BL

* PGPfone

* Speak Freely

* Internet Phone from VocalTec

* WebPhone

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (32 of 50) [10/31/2003 8:41:13 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* WebTalk

Audio Processing Software

* AF version AF3R1

* Voice E-Mail from Bonzi Software

* MicNotePad Recording Software for Macs

* MixViews

* Network Audio System Release 1.1

* NIST Software - SPHERE and SCORE

* Sound Processing Kit

* TCPplay

Human Audio Perception

Other useful information on Auditory Modeling can be found in

Malcolm Slaney's home page http://www.interval.com/~malcolm/

Martin Cooke's home page Speech and Hearing Research Group, Dept of Computer Science, University of Sheffield, UK. http://www.dcs.shef.ac.uk/~martin/

* Auditory Modeller 1

* Auditory Modeller 2

* Auditory Toolbox for Matlab

* Human Audio Perception Document

Dictionaries and other Lexical Tools

* BEEP dictionary

* CMU dictionary

* CUVOLAD dictionary (Oxford Dictionary)

* Comprehensive Word List

* EAT: Edinburgh Associative Thesaurus

* Homophone List

* Moby Lexical Resources

* MRC Psycholinguistic Database

* WordNet

* Dictionaries on the WWW

Phonetic Fonts and Phonetic Samples

* International Phonetic Alphabet

* WWW: Phonetic Fonts and Examples Online

* Summer Institute of Linguistics IPA Fonts

* Phonetic Fonts for TeX and LaTeX

* Yamada Language Center

Subjective Evaluation of Speech Quality

Dynastat, Inc. Speech Intelligibility Testing with Diagnostic Rhyme Test (DRT), Modified Rhyme Test (MRT), Phonetically Balanced Word Lists (PB), Diagnostic Medial Consonant Test (DMCT), Diagnostic Alliteration Test (DALT), ICAO Spelling Alphabet Test (SpAT) Speech Quality (Acceptability) Evaluation with Diagnostic Acceptability Measure (DAM), Mean Opinion Score (MOS), Degredation Mean Opinion Score (DMOS) Contact: Dynastat, Inc. 2704 Rio Grande, Suite 4, Austin, TX 78705, USA Ph: +1-512-476-4797, Fax: 512/472-2883 Email: sharpley@dynastat.com WWW: http://www.bga.com/dynastat/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (33 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

ANSI S3.2-1989: American National Standard for Measuring the Intelligibility of Speech Over Connunication Systems Available from American National Standards Institute (ANSI) Ph: +1-212-642-4900, Fax: +1-212-398-0023 WWW: http://www.ansi.org/

Louis Pols' List of References on Synthesis Development And Assessment

700 references:

http://www.itl.atr.co.jp/cocosda/output/synth.refs

Very Miscellaneous

* The vOICe

* The Learning Company's Language Training

* Wildfire - an Electronic Assistant

ASAPI: Advanced Speech API (AT&T)

* Description: The AT&T ASAPI Specification is a open, cross-platform, easy-to-use speech API that can support speech engines from AT&T and other vendors. ASAPI does not replace the Microsoft Speech API, but it provides extensions and enhancements to the Microsoft SAPI Specification including support for SAPI-compatible applications. The ASAPI Specification defines two types of interfaces. The "ASAPI Extensions" interface which provides extensions to the MS-SAPI interface as well as C++ class encapsulation of SAPI functionality. The "Visual ASAPI" interface provides an even higher-level abstraction of SAPI/ASAPI low-level functionality such that application developers can quickly and easily embed speech technology into existing or new applications. Special Purpose Recognizers are examples of Visual ASAPI interfaces which integrate lower-level functionality that an application developer can access via a simple interface.

* More information: Contact Jose Garcia at AT&T on (908) 957-5457 or by email: jrg@att.com. For more information on the WATSON Speech Engine which supports ASAPI and news about ASAPI please visit the AT&T Advanced Speech Products Group home page or call

1-800-5-WATSON.

SAPI: Microsoft Windows Speech API

* Platform: Windows 95 and Windows NT 3.51

* Description: The Microsoft Speech API provides applications with

the ability to incoporate speech recognition (command & control or dictation) or text-to-speech, using either C/C++ or Visual Basic. SAPI follows the OLE Component Object Model (COM) architecture. It is supported by many major speech technology vendors. The major interfaces are

+ Voice Commands: high level speech recognition API for command and control.

+ Voice Text: simple high level text-to-speech API.

+ Speech Recognition: provides detailed control of a speech recognition engine for both command-and-control and dictation.

+ Text-to-Speech: provides detailed interface to a text-to-speech engine for control of playback, speaking style, voice quality etc.

+ Multimedia Audio Objects: audio I/O for microphones, headphones, speakers, telephone lines, files etc.

* Availability: Download Microsoft's latest speech technology, including the Microsoft Speech SDK, command and control

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (34 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

recognition, the Microsoft dictation research demonstration and text-to-speech.

* More information: Email: MSSpeech@Microsoft.Com WWW: The Microsoft Speech API WWW: An Overview of the Microsoft Speech API Documentation included with the Microsoft SDK.

* See also: TAPI: Microsoft Telephone API

SRAPI: Speech Recognition API

* Platform: Various

* Description: The SRAPI provides support for speech recognition, text-to-speech and other media playback. The SRAPI Committee is a nonprofit Utah corporation with the goal of providing solutions for interaction of speech technology with applications. Core members include: Novell, Inc., Dragon Systems, IBM, Kurzweil AI, Intel, and Philips Dictation Systems. Additional contributing members include Articulate Systems, DEC, Kolvox Communications, Lernout and Hauspie, Syracuse Language Systems, Voice Control Systems, Corel, Verbex and Voice Processing Corporation.

* More information: WWW: http://www.srapi.com/ Email: For more information on the SRAPI Developer CD, send email to srapi@srapi.com with Subject "SRAPI CD Info".

TAPI: Microsoft Windows Telephony API

* Description: TAPI allows applications to support telephone communication. TAPI facilitates include:

+ Connecting directly to a telephone network.

+ Automatic phone dialing.

+ Transmission of data (files, faxes, electronic mail).

+ Access to data (news, information services).

+ Conference calling.

+ Voice mail.

+ Caller identification.

+ Control of a remote computer.

+ Collaborative computing over telephone lines.

Windows 95 comes with a telephony application, DIALER.EXE, that can dial voice calls, act as a proxy for applications making simple telephony requests, and maintain a call log.

* More information: The Win32 Software Development Kit (SDK) contains documentation, tools, and sample code for TAPI including the Microsoft Telephony Programmer's Reference and the Microsoft Telephony Service Provider Interface (TSPI) for Telephony. WWW: Tapping in TAPI, TAPI White Paper

* See also: SAPI: Microsoft Speech API

CUSeeMe

* Platform: Macintosh and Windows

* Description: Cornell University software for audio and video conferencing over the Internet.

* Requirments: Macintosh to RECEIVE video:

+ Macintosh platform with a 68020 processor or higher

+ System 7 or higher operating system

+ Minimum 16-level-grayscale (e.g. color)

+ IP network connection and MacTCP

+ Apple's QuickTime, to receive slides with SlideWindow Macintosh to SEND video:

+ All the above plus

+ Quicktime installed

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (35 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

+ video digitizer (with vdig software) and Camera

For Windows:

+ Video receive only 386SX, Video send & receive 386DX, Video receive w/Audio 486SX, Video send & receive w/Audio 486DX

+ Windows 3.1 or higher running in Enhanced Mode.

+ Winsock

+ 256 color (8 bit) video driver

+ Video camera and a video capture board that supports Microsoft Video For Windows

+ For audio: Windows Sound board that conforms to the Windows MultiMedia Specification, speakers and a microphone

* Availability: Mac: http://cu-seeme.cornell.edu/get_cuseeme.html Windows: http://cu-seeme.cornell.edu/PC.CU-SeeMeCurrent.html

* More information: http://cu-seeme.cornell.edu/

CyberPhone

* Platform: Sun Workstations running Solaris 2.x (SunOS 5.x)

* Description: Provides voice communications over the internet. Has a graphical user interface and requires no additional hardware. An optional centralized server system is available to make finding and connecting to other users easier.

* Availability: a free demonstration is available by anonymous ftp

ftp://magenta.com/pub/cyberphone

* Contact: Email: cyberphone@magenta.com. More information is available on the WWW: http://magenta.com/cyberphone/.

DigiPhone

* Platform: Macintosh, Windows 3.1 and Windows 95

* Description: DigiPhone provides two-way phone conversations by dialing direct and over the Internet. Includes encryption for privacy, caller ID, call screening, call timer, adjustable sound and compression quality, messaging, and access to the Global Directory providing a database of DigiPhone users.

+ DigiPhone v1.03: provides the standard features listed above. [ More information].

+ DigiPhone Deluxe: provides the standard features of DigiPhone v1.03 and adds conference calling, mute, speed dial, call recording and playback, voice effects, customizations, and internet tools. [ More information].

+ DigiPhone for Mac: provides the standard features listed above, plus cross-platform compatibility and mute. [ More information].

* Requirements: DigiPhone v1.03 requires 386DX/33 or faster, 4MB RAM, 9,600 bps modem, Sound Blaster 16 card (or any compatible half or full duplex card), and a local internet connection with SLIP or PPP. [Recommend 486DX/33 and 14,400 bps modem] DigiPhone Deluxe has the same requirements on v1.03 but requires 486DX/33 or faster. DigiPhone for Mac requires a 68030 33Mhz, 68040 25Mhz or Power PC, 4 MB RAM, System 7.x, 14,400 bps modem or better, Sound Manager 3.x for System 7, microphone and speakers, MacTCP or Open Transport and a local internet connection with SLIP or PPP.

* Price and Availability: Contact Third Planet Publishing for pricing. Trial software is available from Third Planet Publishing. Orders and Upgrades can be made on the Web. Also available through many retailers.

* Contact: Third Planet Publishing, Inc. 17770 Preston Rd, Dallas, Texas 75252, USA

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (36 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Ph: +1-972-733-3005, Fax: +1-972-380-8712 Email: 3pp@planeteers.com WWW: http://www.planeteers.com/

InterFACE from Hijinx

* Platform: Windows

* Description: InterFACE provides voice communication on the Internet through IRC (Internet Relay Chat) services.

* Requirments: Recommend a 486DX, 8meg Ram, Windows, VGA Monitor and

a 16 bit sound card.

* Availability: Available on CD Only for $60.00 US, which includes, postage and handling. Demo versions available from the HiJiNX WWW site.

* Contact: HiJiNX, Brisbane, Australia Email: jester@hijinx.com.au WWW: http://www.hijinx.com.au/

FAQ: How can I use the Internet as a telephone?

*

Description: Kevin M. Savetz and Andrew Sears have prepared an FAQ document titled _FAQ: How can I use the Internet as a telephone?_ The current document has the following sections:

+ Can I use the Internet as a telephone?

+ What do I need to call others on the Internet?

+ How does it work?

+ How do I make calls using a modem?

+ Is the sound quality as good as a regular telephone?

+ Is there a noticeable delay in hearing the other user?

+ What is the difference between full duplex and half duplex?

+ What is multicasting?

+ Can I talk to users of other phone software?

+ What software is available?

The section on available software covers the following:

+ Mac: Maven, NetPhone, CU-Seeme, PGPfone

+ Windows: Speak Freely, CU-Seeme, Internet Phone, Digiphone, Internet Voice Chat, Internet Global Phone, Web Phone

+ UNIX: Speak Freely, nevot, vat, mtalk, ztalk

*

Availability:

By Email

Mail voice-faq-request@northcoast.com with "Subject: archive" and "Body: send voice-faq"

FTP

ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_

How_can_I_use_the_Internet_as_a_telephone?

WWW:

http://rpcp.mit.edu/~asears/voice-faq.html

*

Contact: Andrew Sears: asears@mit.edu Kevin Savetz: savetz@northcoast.com

Nautilus: Secure Computer Telephony

* Platform: DOS, Linux, SunOS, Solaris.

* Description: Nautilus is software which allows two users to hold a secure conversation with either over ordinary phone lines or over

a computer network. Nautilus uses your computer's audio hardware

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (37 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

to digitize and play back your speech using speech compression algorithms built into the program. It encrypts the compressed speech using your choice of the Blowfish, Triple DES, or IDEA block ciphers, and transmits the encrypted packets over the internet or your modem to another computer. At the other end, the process is reversed. Nautilus operates in half duplex mode like a speakerphone -- only one person can talk at a time. Either user can hit a key to switch between talking and listening. Audio quality ranges from fair to very good depending on which of the four speech coders is selected. The Nautilus WWW page provides more detailed information.

* Requirements: Nautilus runs on IBM PC-compatible computers (386DX25 or faster) under MSDOS or Linux as well as audio-capable Sun workstations running SunOS or Solaris. The MSDOS version of Nautilus requires a Soundblaster compatible sound card and currently only runs over ordinary phone lines with a modem. To use Nautilus over ordinary telephone lines, a modem capable of connecting at 4800 bps or faster is required.

* Availability: Nautilus is available in three different formats. As

a DOS executable, it is available as an archive in zip format

along with it's associated documentation. In source format, it is available as either a zip-ed archive, or a gzip-compressed tar archive. Nautilus is distributed freely (subject to US export restrictions) with full source code. This insures that its security can be independently examined and verified. Follow the instructions in

the following README files to obtain Nautilus.

+ ftp://ftp.csn.org/mpj/README

+ ftp://ripem.msu.edu/pub/crypt/README

* More information: WWW: http://www.lila.com/nautilus/

* Contacts: The Nautilus development team includes Bill Dorsey, Paul Rubin, Andy Fingerhut, Paul Kronenwetter, Bill Soley, and Pat Mullarky. To contact the developers, send email to nautilus@lila.com.

NEVOT (1.4v) from AT&T BL

* Platforms: Sun Sparc Station (SunOS 4.1.x) and Silicon Graphics

* Description: Audio-conferencing tool which supports both

point-to-point and broadcasting of audio using multicast IP. Audio

encoding:

+ PCM 64kb/s 8-bits u-law encoded 8KHz PCM (G.711)

+ ADPCM 32 kb/s [Sun only] (G.721)

+ DVI ADPCM 32 kb/s

+ ADPCM 24 kb/s [Sun only] (G.723)

+ CELP 4.8 kb/s

+ LPC 2.4 kb/s

* Availability: by anonymous ftp from

ftp://gaia.cs.umass.edu/pub/hgschulz/nevot

* Contact: Henning Schulzrinne (hgs@researh.att.com)

PGPfone

* Platform: Macintosh and Windows

* Description: Pretty Good Privacy Phone is free secure audio connection software for the internet. It uses speech compression

and strong cryptography protocols to give you the ability to have

a real-time secure telephone conversation via a modem-to-modem

connection.

* Requirements (Mac): Fast modem: at least 14.4 Kbps V.32bis (28.8

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (38 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

Kbps V.34 recommended). An Apple Macintosh with at least a 25MHz 68LC040 processor (PowerPC recommended), running System 7.1 or above, Thread Manager 2.0.1, ThreadsLib 2.1.2, and Sound Manager 3.0. (These are available from Apple's FTP sites.)

* Requirements (Windows): Fast modem: at least 14.4 Kbps V.32bis (28.8 Kbps V.34 recommended). A multimedia PC running Windows 95 or NT, with at least a 66 MHz 486 CPU (Pentium recommended), sound card, microphone, and speakers or headphones.

* Contact: Jeffrey I. Schiller Email: jis@mit.edu WWW: http://web.mit.edu/network/pgpfone/

Speak Freely

* Platform: Windows and Unix

* Description: Free "Internet Phone" software supporting voice mail, multicasting, encryption and several coding methods. Includes 4 forms of data compression and encryption with DES, IDEA and PGP. The Windows and Unix versions are compatible. You can designate a bitmap file to be sent to users who connect so they can see who they're talking to. The Unix version does not have the graphical user interface of the Windows edition, but supports all its compression and encryption modes.

* More information:

http://www.fourmilab.ch/netfone/windows/speak_freely.html

Internet Phone from VocalTec

* Platforms: IBM Compatible

* Description: Supports real-time conversations with Internet users by compressing speech. Voice-activation feature and interactive display. Features an graphical interface and on-line help. Up to date listing of all on-line users running Internet Phone. Join or create topics for conversation with people from all over the globe. Supports private topics for private conversations with family or with business associates.

* Requirements: 486SX PC - 25 MHZ, 8MB RAM (recommended) An Internet Winsock 1.1 compatible TCP\IP connection (minimum connection: a 14,400 baud modem SLIP\PPP connection) Windows 3.1 Windows-compatible sound card

* Cost: $US59 + shipping. You can order on the internet:

http://www.vocaltec.com/order.html

* More Information: WWW: http://www.vocaltec.com/

* Availability:

Demo version:

ftp://ftp.vocaltec.com/pub/iphone09.exe

* Contact: VocalTec Inc.

157 Veterans Drive, Northvale, NJ 07647 Tel: 201-768-9400 Fax: 201-768-8893 E-mail: info@vocaltec.com

WebPhone

* Platform: Windows

* Description: WebPhone provides telephone quality, real-time, full duplex, encrypted, point-to-point voice communication over the Internet and other TCP/IP based networks. (More detail provided on

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (39 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

the NetSpeak WWW pages).

* Requirements: 80486DX-33 MHz running Windows 3.1 or higher, 4 MB of RAM, MCI compliant sound card, Winsock 1.1 compliant stack, 14.4Kbps modem, VGA card capable of displaying 256 colors. Full duplex audio card required for full duplex.

* Price: $49.95 (US)

* Availability: via the WWW: http://www.netspeak.com/getphone.html

* Contact: NetSpeak Corporation 902 Clint Moore Rd., Boca Raton, Fl. 33487, USA Ph: +1-407-997-4001, Fax: +1-407-997-2401 Email: info@netspeak.com WWW: http://www.netspeak.com/

WebTalk

* Platform: Windows 3.1/95

* Description: Full-duplex or half duplex, telephone-quality voice, supports many commercial web browsers.

* Contact: Quarterdeck Corporation 13160 Mindanao Way, 3rd Floor, Marina Del Rey, CA 90292-9705, USA Ph: +1-310-309-3700, Fax: +1-310-309-4217 Email: info@quarterdeck.com WWW: http://www.quarterdeck.com/

AF version AF3R1

* Platforms: DEC workstations (Alpha and MIPS), SparcStation, SGI

* Description: The AF System is a device-independent network-transparent system including client applications and audio servers. With AF, multiple audio applications can run simultaneously, sharing access to the actual audio hardware. The AF3R1 distribution of AF includes server support for Digital RISC systems running Ultrix, Digital Alpha AXP systems running OSF/1, SGI Indigo running IRIX 4.0.5, Sun Microsystems SPARCstations running SunOS 4.1.3, and Sun Microsystems SPARCstations running Solaris 2.3. The servers support audio hardware ranging from the built-in CODEC audio on SPARCstations and Personal DECstations to 48 KHz stereo audio using the DECaudio TURBOchannel module or the SPARCstation DBRI interface

* Availability: The source kit is distributed by anonymous ftp from

ftp://crl.dec.com/pub/DEC/AF

WWW:

http://www.research.digital.com/CRL/projects/AF/home.html

* Contact: af-request@crl.dec.com

Voice E-Mail from Bonzi Software

* Description: Voice E-Mail is an extension to regular e-mail which allows recorded voice messages to be transmitted in the same way as normal text messages. Voice E-Mail is available in several forms: Voice E-Mail 3.0 for WinCIM, Voice E-Mail 3.0 for America Online, Voice E-Mail 3.0 for Eudora, and Voice E-Mail 3.0 for Netscape. Voice E-Mail uses digital audio and image compression technology to compress messages before transferring them through CompuServe, America Online, and the Internet.

* Availability: Go to the Bonzi home page - http://www.bonzi.com/ - and follow the links to the Internet Shopping Network's "Downloadable Software Division."

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (40 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Further Information: Bonzi Software WWW: http://www.bonzi.com/ Email: info@bonzi.com Fax 805-238-5798

MicNotePad Recording Software for Macs

* Platforms: Macintosh

* Description: MicNotePad is audio recording tool designed to improve dictation (a digital replacement for the old-style mechnical tape systems used by typists). It uses the built-in microphone or sound input port and the hard disk to record conversations or speech of arbitrary length. Speech compression techniques are used to reduce the disk-space. Once it is recorded, single keystrokes control playback while you type in your word processor.

* Contact: Nirvana Research WWW: http://moof.com/nirvana/ Email: nirvana@got.net

MixViews

* Description: A Unix/X sound editor. Does waveform play/record, and cut/splice. Has various filters, handles native file formats, FFT, LPC and more

* Availability: by anonymous ftp including SunOS 4 and IRIX 5 binaries.

ftp://foxtrot.ccmrc.ucsb.edu/pub/MixViews

Network Audio System Release 1.1

* Platforms: Various (includes SunOS, Solaris, SGI)

* Description: A device-independent mechanism for transferring, playing and recording audio signals over a network. Has a range of features suited to networks.

* Cost: Free

* Availability: By anonymous ftp from

ftp://ftp.x.org:/contrib/audio/nas/netaudio-1.2.tar.gz

Also available in the same directory are document files and some sample sounds.

NIST SPeech HEader REsources Package (SPHERE)

* Description: Standard speech header software from the National Institute of Standards & Technology (NIST). SPHERE headers represent information about sample frequency, sample format, etc.

* Availability: By anonymous ftp from

Readme File ftp://jaguar.ncsl.nist.gov/pub/sphere.README

Source Code

ftp://jaguar.ncsl.nist.gov/pub/sphere_2.5.tar.Z

NIST Speech Recognition Scoring Package (SCORE)

* Description: Software for scoring results of speech recognition systems from the National Institute of Standards & Technology

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (41 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

(NIST) .

* Availability: By anonymous ftp from

README File ftp://jaguar.ncsl.nist.gov/pub/score.README

Source Code

ftp://jaguar.ncsl.nist.gov/pub/score_3.6.2.tar.Z

Sound Processing Kit

* Platforms: UNIX

* Description: Sound Processing Kit (SPKit) is an object-oriented class library for audio signal processing. SPKit includes classes for various signal processing tasks and a way of implementing sound processing algorithms in a simple object-oriented manner. Sound Processing Kit is implemented in C++ and is designed to be portable. The current version requires a bare-bones C++ 2.0 compatible compiler (templates and exceptions are not needed).

ANSI C standard libraries are required. SPKit includes classes for

+ Sound input and output

+ Basic signal processing

+ Dynamics processing (compressor, gating etc)

+ Filtering

+ Delay and reverberation

+ Distortion

+ Signal routing

* Availability:

Full documentation on the WWW:

http://www.music.helsinki.fi/research/spkit/documentation

/SPKit.html

Software distribution:

http://www.music.helsinki.fi/research/spkit/distribution/

spkit.tar.Z

* Contact: Kai Lassfolk University of Helsinki Music Research Laboratory Email: spkit@elisir.helsinki.fi

TCPplay

* Description: TCPPlay lets you use your mac as an audio server for your Unix box. Provided with source code. Written by Bill Stafford, Rich Tsoi and Malcolm Slaney.

* Availability: Anonymous ftp from ftp://ftp.apple.com/pub/malcolm/TcpPlay.sit.hqx ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx

Auditory Modeller 1

* Description: John Holdsworth's implementation of a gammatone filter bank and Roy Patterson's spiral model, in C (with X-window display).

* Availability: By anonymous ftp from

ftp://ftp.mrc-apu.cam.ac.uk/pub/aim

Auditory Modeller 2

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (42 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Description:Lowel O'Mard's implementation of peripheral filtering, Ray Meddis's hair cell model and other stuff in C (as a library of routines).

* Availability: By anonymous ftp from

ftp://suna.lut.ac.uk/public/hulpo/lutear

Auditory Toolbox for Matlab

* Description: This toolbox provides extensions to Matlab which are

useful to people interested in auditory/cochlear modeling. [Matlab is described is the previous section.] This toolbox has been tested on both Macintosh and Unix computers. It includes the following major models:

+ Lyon's Passive Long Wave Cochlear Model (our conventional model)

+ Patterson-Holdsworth ERB Filter bank with Meddis Hair cell

+ Seneff's Auditory Model (Stages I and II)

+ MFCC (Mel-scale frequency cepstral coefficients from the ASR world)

+ Spectrogram

+ Correlogram generation and pitch modeling

+ Simple vowel synthesis

* Availability: From Malcolm Slaney home page and by anonymous FTP:

ftp://ftp.apple.com/pub/malcolm

The following files are available:

+ AuditoryToolbox.mif.Z

+ AuditoryToolbox.psc.Z

+ AuditoryToolbox.sea.hqx

+ AuditoryToolbox.tar

+ AuditoryToolbox.tar.Z

The ".mif.Z" file is a Unix compressed version of the FrameMaker documentation. The ".psc.Z" file is a Unix compressed version of the Postscript documentation. The ".tar" and ".tar.Z" files are Unix TAR archives containing all of the m-functions and C-MEX source code. Finally, the ".sea.hqx" file is a Macintosh self-extracting archive that has been encoded using BinHex. There is precompiled version of the three MEX function for the Macintosh.

* Misc: Our lawyers ask you to remind you that there is no warranty. We've done some testing but we undoubtably missed things.

* Contact: Malcolm Slaney, Interval Resarch. Email: malcolm@interval.com WWW: http://www.interval.com/~malcolm/

Human Audio Perception Document

* Description: Document prepared by Argiris Kranidiotis on the human audio perception system. It lists a number of references, gives plenty of numbers and some equations.

* Availability: by anonymous ftp from the comp.speech archive site

ftp://svr-ftp.eng.cam.ac.uk/comp.speech/info/HumanAudioPe

rception

* Contact: Argiris A. Kranidiotis University Of Athens, Informatics Department email: akra@zeus.di.uoa.ariadne-t.gr

BEEP dictionary

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (43 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

* Description: Phonemic transcriptions of over 250,000 English words. (British English pronunciations)

* Availability: By anonymous ftp:

BEEP dictionary README file

svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep-0.7.R

EADME

BEEP Dictionary (1.1M)

svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep.tar.g

z

CMU dictionary

* Description: Phonemic transcriptions of 100,000 words with American English pronunciation.

* Availability - WWW: http://www.speech.cs.cmu.edu/cgi-bin/cmudict

* Availability - ftp: By anonymous ftp from the directory

ftp://ftp.cs.cmu.edu/project/fgdata/dict/

with the files README, cmudict.0.2.Z, cmulex.0.1.Z, phoneset.0.1

CUVOLAD dictionary (Oxford Dictionary)

* Description: Computer Usable Version of the Oxford Advanced Learner's Dictionary containing 70,000+ entries. Has British English pronunciations and parts of speech.

* Availability: Anonymous ftp

ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/

Documentation:

ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/text710.doc

Comprehensive Word List

* Description: A comprehensive word list which should contain most common American words, abbreviations, hyphenations, and even incorrect spellings. The word lists were compiled from a number of sources: commercial news services, UseNet news postings, existing dictionaries, name lists, company lists, UNIX man pages, project Gutenberg's E-texts, project Wordnet, received mailings, etc. The current size is 460,000 words.

* Availability: anonymous ftp ftp://wocket.vantage.gte.com/pub/standard_dictionary Note 1: There seems to be some sort of network problem reaching the server. Note 2: There is a README file which explains the file formats.

EAT: Edinburgh Associative Thesaurus

* Description: A set of word association norms showing the counts of word association as collected from subjects.

* Availability: Source and WWW interactive versions

Interactive version Provided by Computing and Information Systems Department (CISD) of Rutherford Appleton Laboratory, UK http://www.cis.rl.ac.uk/proj/psych/eat.html

Set of word association norms

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (44 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

ftp directory. 6 MB http://www.cis.rl.ac.uk/proj/psych/eat/eat/

Homophone List

* A list of homophones in General American English is available by anonymous FTP from the comp.speech archive site:

ftp://svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/homo

phones-1.01.txt

Moby Lexical Resources

* Description: A set of lexical resources compiled by Grady Ward. 3449 Martha Ct., Arcata, CA 95521-4884, USA Email: grady@netcom.com OR grady@northcoast.com

* Availability: Mirrored by Malcolm Crawford (m.crawford@dcs.shef.ac.uk) at the Institute for Language Speech and Hearing, the University of Sheffield. WWW: http://www.dcs.shef.ac.uk/research/ilash/Moby/ FTP: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/

* Contents:

Moby Hyphenator: mhyph.tar.Z 185,000 entries fully hyphenated. 980kB.

Moby Language: mlang.tar.Z Word lists in five major languages. 2.3MB.

Moby Part-of-Speech: mpos.tar.Z 230,000 entries with part(s) of speech listed in priority order. 1.2MB.

Moby Pronunciator: mpron.tar.Z 175,000 entries fully International Phonetic Alphabet coded. 3.1MB.

Moby Shakespeare: mshak.tar.Z The complete unabridged works of Shakespeare. 2.3.MB.

Moby Thesaurus: mthes.tar.Z 30,000 root words, 2.5 million synonyms and related words. 12MB.

Moby Words: mwords.tar.Z 610,000+ words and phrases. 4.0MB.

MRC Psycholinguistic Database

* Description: A machine usable dictionary containing over 150000 words with up to 26 linguistic and psycholinguistic attributes for each (e.g. pronunciation, part of speech, word frequency). Psycholinguistic Database was the basis for the "Oxford Psycholinguistic Database" available for Apple Macs from Oxford University Press.

* Availability: Several versions with different formats:

Interactive Version of MRC Psycholinguistic Database Produces lists of words meeting user-definable selection criteria. Provided by the Dept. of Psychology, University of Western Australia. http://www.psy.uwa.edu.au/uwa_mrc.htm

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (45 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

ftp'able MRC Psycholinguistic Database Approximately 12M of data.

ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/

README:

ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/readme.

Information: ftp://ota.ox.ac.uk/pub/ota/public/dicts/info

WordNet

* Description: WordNet is an on-line lexical reference system in which English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets. WordNet was developed in the Cognitive Science Laboratory at Princeton University under the direction of Professor George Miller.

* Availability:

WWW Interface

http://www.cogsci.princeton.edu/~wn/w3wn.html

Source Distributions Unix (9.1MB), PC (5.8MB), Macintosh (7.5MB), Prolog (database only, 4.2MB). ftp://clarity.princeton.edu/pub/wordnet/

Extended interfaces developed by WordNet users (for X, Lisp etc) are listed in the WordNet home page.

* Further information: Email: wordnet@princeton.edu WWW: WordNet home page: http://www.cogsci.princeton.edu/~wn/ README: ftp://clarity.princeton.edu/pub/wordnet/README Publications: ftp://clarity.princeton.edu/pub/wordnet/5papers.ps

Dictionaries on the WWW

For a while, there was a range of dictionaries and other lexical resources on the WWW and elsewhere on the Internet. However, due to copyright reasons, fewer sites are publishing dictionary information. When last checked, the following sites provide dictionaries or links to dictionaries on the net:

CMU Dictionary http://www.speech.cs.cmu.edu/cgi-bin/cmudict

Institute of Phonetic Sciences, Amsterdam Electronic dictionaries, including French, Norwegian Swahili and English.

http://fonsg3.let.uva.nl/Other_pages.html

1913 Webster's Revised Unabridged Dictionary Available as a searchable HTML form at the University of Chicago ARTFL project site, and as a tagged working file and downloadable version (45MB) of the HTML at Project Gutenberg.

Martin Ramsch's Englisch-Worterbucher aller Art Lists of on-line dictionaries, translation dictionaries, technical dictionaries, etc. http://www.uni-passau.de/forwiss/mitarbeiter/freie/ramsch/engli sch.html

Galaxy's list of dictionaries etc. A comprehensive list of dictionaries, acronym lists, translation resources, and a Thesaurus.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (46 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-

Information/Dictionaries-etc.html

Webster's dictionary online

http://c.gp.cs.cmu.edu:5103/prog/webster

International Phonetic Alphabet

* Description: The International Phonetic Association (http://www.arts.gla.ac.uk/IPA/ipa.html) defines the International Phonetic Alphabet. It is a standard set of symbols for transcribing the sounds of spoken languages. The full chart of IPA symbols is published on the International Phonetic Association WWW site. Also provided are charts for consonants, vowels, tones and accents, suprasegmentals, diacritics and other symbols. A cassette of sounds is available: see http://www.phon.ucl.ac.uk/home/wells/cassette.htm

WWW: Phonetic Fonts and Examples Online

George L. Dillon's list of phonetic resources [http://weber.u.washington.edu/~dillon/PhonResources.html]

Vowel sounds of American English Examples of standard American vowels along with the IPA phonetic symbols and links to recordings. http://weber.u.washington.edu/~dillon/vowels.html

Consonant sounds of English Examples of consonants along with the IPA phonetic symbols and links to recordings. http://weber.u.washington.edu/~dillon/consonants.html

Vowel Quadrilaterals for American and British English Charts and audio. http://weber.u.washington.edu/~dillon/newstart.html

IPA-ASCII A scheme for representing IPA transcriptions in ASCII for use in Usenet articles and email. http://weber.u.washington.edu/~dillon/ipaascii.html

Some things about studying Speech Information on speech physiology, acoustic phonetics, speech perception, speech recognition and voice recognition. http://www.ccp.uchicago.edu/grad/Francis_Alex/speech.html

Summer Institute of Linguistics IPA Fonts

* Platform: Apple Macintosh and Mircosoft Windows

* Description: International Phonetic Alphabet (IPA) fonts are available as freeware from the Summer Institute of Linguistics (SIL). The SIL Encore IPA Fonts are a set of scalable IPA fonts containing the full International Phonetic Alphabet with 1990 Kiel revisions. Three typefaces are included: SIL Doulos (similar to Times), SIL Sophia (similar to Helvetica), and SIL Manuscript (monowidth). Each font contains all the standard IPA discrete characters and non-spacing diacritics as well as some suprasegmental and punctuation marks. Each font comes in both PostScript Type 1 and TrueType formats.

* Availability: Via the WWW and Gopher:

+ WWW: http://www.sil.org/

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (47 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

+ Gopher:

gopher://gopher.sil.org/11/gopher_root/computing/software/fon

ts/

+ Ftp for Windows: ftp://ftp.sil.org/fonts/win/silip12a.exe

+ Ftp for Mac: ftp://ftp.sil.org/fonts/mac/silipa12.sea_hqx

Also available through the SIL email server. Send either of the following commands to MAILSERV@sil.org.

Windows:

 

SEND/MODE=BLOCK/ENCODING=UUENCODE

[FTP.FONTS.WIN]SILIP12A.EXE

Mac:

SEND [FTP.FONTS.MAC]SILIPA12.SEA_HQX

Finally, they are available on diskette from the address below. $US5 to cover the cost of shipping.

* Contact: International Academic Bookstore Summer Institute of Linguistics 7500 W. Camp Wisdom Road, Dallas, TX 75236 U.S.A. Ph: 214-709-2404, Fax: 214-709-2433 e-mail: academic.books@sil.org WWW: http://www.sil.org/

Phonetic Fonts for TeX and LaTeX

Linguistics/Tex mailing list ling-tex@ifi.uio.no Subscription method unknown.

TIPA

Created by Rei Fukui: fkr@tooyoo1.l.u-tokyo.ac.jp. Source: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/ Postscript manual:

ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps Compressed postscript manual:

ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/tipaman.ps

WSUIPA: Washington State University International Phonetic Alphabet fonts A basic WSUIPA font contains 128 phonetic characters and/or diacritics in five different point sizes (8, 9, 10, 11 and 12) and in three typefaces (roman, slanted and bold extended). Each size and typeface includes a TFM (TeX Font Metric) file and its related GF, PK or PXL file. A macro package and manual are provided. Apparently LaTeX 2.09 compatible - not LaTeX 2e compliant. Available from ftp://ftp.wustl.edu/packages/TeX/fonts/wsuipa/ OR from CTAN-ftp-archives: e.g. ftp://ftp.digital.com/pub/text/TeX/fonts/wsuipa/

Yamada Language Center

* Platform: Apple Macintosh and Mircosoft Windows

* Description: The Yamada Language Center maintains an archive of fonts to assist users who wish to display or type non-English

fonts on their computers. Their WWW and ftp sites include five International Phonetic Alphabet fonts (or near IPA). They also have fonts for over 40 languages (American Sign Language, Arabic,

Armenian, Bengali, Burmese, Celtic, Cherokee

).

* Availability: :

WWW Font List

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (48 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

http://babel.uoregon.edu/yamada/fonts.html

Windows Fonts http://babel.uoregon.edu/yamada/winfonts.html

IPA Fonts http://babel.uoregon.edu/yamada/fonts/phonetic.html

ftp site

ftp://yftp@www-vms.uoregon.edu/fonts/

* Contact: Yamada Language Center, University of Oregon

The vOICe

* Description: Peter Meijer's Java applet/application for sound analysis and synthesis.

+ Platform: All (where Java VM available)

+ Interactive spectrographic synthesis: draw your own sound

+ Image sonification

+ Mathematical function sonification

+ Spectrographic sound analysis (Fourier, spectrogram)

+ Vision substitution research

* Contact: Peter Meijer

The Learning Company's Language Training

* Platform: Windows and Macintosh

* Description: Foreign-language training software for Spanish, French, German, Italian, Japanese, and English. In the Windows version for English, speech-recognition technology is used to help users improve accents.

* Contact: The Learning Company Ph: (800) 852-2255 Email: webmaster@learningco.com WWW: http://www.learningco.Inter.net/foreign.html

Wildfire - an Electronic Assistant

* Platform: ?

* Description: Wildfire is a phone-based electronic assistant. Functions include:

+ Screens, routes, and announces incoming calls.

+ Contact list with voicedialing.

+ Schedules and reminders for follow-up calls and action items.

+ Messaging and advanced voicemail features.

* Contact: Wildfire Communications, Inc. 20 Maguire Road, Lexington, MA 02173 USA Ph: +1-617-674-1500, Fax: 617-674-1501 Demo line: 1-800-WILDFIRE Email: info@wildfire.com WWW: http://www.wildfire.com/

Copyright (c) 1993-6 by Andrew Hunt, all rights reserved. This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as long as it is posted in its entirety and includes this copyright statement. This FAQ may not be distributed for financial gain. This FAQ may not be included in any collections or compilations without express permission from the author.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (49 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt

---

Andrew Hunt Speech Applications Group Sun Microsystems Laboratories 2 Elizabeth Drive, MS UCHL03-207 Chelmsford, MA 01824, USA

Ph: (508) 442-2681 Fax: (508) 250-5067 Email: andrew.hunt@east.sun.com

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part1.txt (50 of 50) [10/31/2003 8:41:14 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

Newsgroups: comp.speech,comp.answers,news.answers Subject: comp.speech Frequently Asked Questions - part 2/3 From: andrew.hunt@east.sun.com (Andrew Hunt) Reply-To: andrew.hunt@east.sun.com (Andrew Hunt) Followup-To: comp.speech Organization: Speech Applications Group, Sun Microsystems Laboratories Summary: Information on Speech Technology Approved: news-answers-request@MIT.Edu

Archive-name: comp-speech-faq/part2 Last-modified: 1997/09/06 URL: http://www.speech.su.oz.au/comp.speech/

COMP.SPEECH FAQ POSTING - PART 2/3

[Note: this document has been automatically extracted from a WWW site:

http://www.speech.su.oz.au/comp.speech/ This may introduce some formatting errors.]

Signal Processing for Speech

comp.speech FAQ Section 2

* SpeechLinks: Signal Processing for Speech

* Q2.1: What sampling do I need for speech?

* Q2.2: Finding the pitch of a speech signal

* Q2.3: How do I find the start and end points of a speech signal?

* Q2.4: Where can I find FFT software?

* Q2.5: Signal processing in speech technology

* Q2.6: Speech sampling and signal processing hardware

* Q2.7: How do I convert to/from mu-law format?

* Q2.8: Signal Processing Software

Q2.1: What sampling do I need for speech?

For recorded speech to be understood by humans you need an 8kHz sampling rate or more and at least 8 bit sampling. This produces poor quality speech - but in can be understood.

Improvements can be achieved by increasing the number of bits in sampling to 12bits or 16bits, or by using a non-linear encoding technique such as mu-law or A-law (see Q2.7). This improves the "signal-to-noise" ratio.

Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz, improves the frequency response: the higher the sampling frequency the better the high frequency content will be. A 16kHz sampling rate is a reasonable target for high quality speech recording and playback.

When doing speech recognition you need to remember that the your computer is not as good as your ear so it will have trouble with poor quality sounds. The choice of an appropriate sampling setup depends very much on the speech recognition task and the amount of computer power available.

Q2.2: Finding the pitch of a speech signal

This topic comes up regularly in the comp.dsp newsgroup. Question 2.5

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (1 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

of the FAQ posting for comp.dsp gives a comprehensive list of references on the definition, perception and processing of pitch. The comp.dsp FAQ posting is posted regularly to the comp.dsp newsgroup, and is also available by ftp and on the WWW:

* http://www.bdti.com/faq/dsp_faq.htm

* ftp://rtfm.mit.edu/pub/usenet/comp.dsp/

The following provide pitch tracking software:

* Most of the speech processing environments listed in Q1.9 including CSRE, ESPS, Kay Elemetrics Computer Speech Lab, OGI Speech Tools, Speech Filing System, Signalyze, Soundscope.

Q2.3: Finding start and end points of a speech signal

End-point detection algorithms identify sections in an incoming audio signal that contain speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be obtained for inputs which contain only speech surrounded by silence (no other noises). Typical algorithms look at the energy or amplitude of the incoming signal and at the rate of "zero-crossings". A zero-crossing is where the audio signal changes from positive to negative or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to guess that there is speech. More detailed descriptions are provided in the papers cited below and in the documentation for the following software.

End-point detection software is available from:

*

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz

*

ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.t

ar.gz

Plenty of research papers have been presented on end-pointing. Try the following:

* Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints of Isolated Utterances", Bell System Technical Journal, Vol 54, No. 2, pp 297-315, 1975.

* Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans on Communications, Vol 26, No 1, Jan 78, pp. 140-145.

* Newman, W.C. "Detecting Speech with an Adapative Neural Network." Electronic Design. 22 March 1990.

* Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE Proc. Sci. Meas. Technol., Vol 141, No.3, May 1994, pp 153-159.

Q2.4: FFT Software

* Comprehensive list of FFT software Links to over 65 different pieces of one-dimensional FFT code. http://tjev.tel.etf.hr/josip/DSP/fft.html

* FFT Software including optimised fft routines and mixed-radix algorithms ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz OR, ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/fft-stuff. tar.gz

* mixfft03.zip: C-source for a very fast arbitrary N FFT routine

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (2 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

The C-source is ShareWare: read the text file included in the package before using the FFT routine commercially. Jens J. Nielsen: jnielsen@internet.dk Available from

ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/mixfft03.z

ip OR ftp://ftp.coast.net/simtel/msdos/c/mixfft03.zip

* FFTW

FFTW is a C subroutine library for computing the FFT in one or more dimensions. It is not limited to sizes that are powers of two, and includes real-complex and parallel transforms. Also on the FFTW web site are benchmarks comparing the performance and accuracy of many public-domain FFT implementations on a variety of platforms, as well as links to other sources of FFT code and information. Available from http://theory.lcs.mit.edu/~fftw Developed by Matteo Frigo and Steven G. Johnson:

fftw@theory.lcs.mit.edu

Q2.5: Signal processing in speech technology

This question is far to big to be answered in a FAQ posting. Here are some WWW resources and books which cover the area well.

Tony Robinson's Course Notes

Dr. Tony Robinson of the Engineering Dept of Cambridge University has put his Speech Analysis course notes on the web. The base page is http://svr-www.eng.cam.ac.uk/~ajr/SA95/. There is information on the following:

* Sampling theory

* Filter bank analysis

* Short-term fourier analysis

* Linear prediction analysis

* Formant analysis and voicing analysis

* Speech coding

* and more

Joseph Picone's Course Notes

Joseph Picone of the Institute for Signal and Information Processing (ISIP) at Mississippi State University has put two sets of course notes on the web:

EE 4773/6773: Digital Signal Processing The course covers sampling, frequency analysis, z-transforms, filter design and more. The WWW site provides the syllabus, assignments, some source code data, exams, homework and solutions, lecture notes and more.

EE 8993: Fundamentals of Speech Recognition The course covers background probability and phonetics/acoustics, speech signal analysis, dynamic programming, dynamic time warping, hidden Markov modelling, language modelling, neural networks, etc. The WWW sites provides the syllabus and lecture notes.

Signal Processing Home page

The Signal Processing Home page has information on a range of DSP issues. It includes references to a range of software and much more. http://tjev.tel.etf.hr/josip/DSP/sigproc.html

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (3 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

Books and other References

There are many good books which discuss signal processing for speech:

* Digital processing of speech signals; L. R. Rabiner, R. W. Schafer. Englewood Cliffs; London: Prentice-Hall, 1978

* Voice and Speech Processing; T. W. Parsons. New York; McGraw Hill

1986

* Computer Speech Processing; ed Frank Fallside, William A. Woods Englewood Cliffs: Prentice-Hall, c1985

* Digital speech processing : speech coding, synthesis, and recognition edited by A. Nejat Ince; Kluwer Academic Publishers, Boston, c1992

* Speech science and technology; edited by Shuzo Saito pub. Ohmsha, Tokyo, c1992

* Speech analysis; edited by Ronald W. Schafer, John D. Markel, New York, IEEE Press, c1979

* Applied Speech Technology Edited by: Ann Syrdal (AT&T Bell Labs, Holmdel, New Jersey), Raymond Bennett (Ameritech, Hoffman Estates, Illinois) and Steven Greenspan (AT&T Bell Labs, Murray Hill, New Jersey). Publisher: CRC Press.

* Speech Communication: Human and Machine Douglas O'Shaughnessy, Addison Wesley series in Electrical Engineering: Digital Signal Processing, 1987.

* Discrete-time processing of speech signals; John R Deller, John G Proakis, John H L Hansen; Macmillan 1993.

* Signal processing of speech; F J Owens; Macmillan 1993.

Q2.6: Speech sampling and signal processing hardware

In addition to the following information, have a look at the Audio File format document prepared by Guido van Rossum (see details in Section 1.8).

Information is included on hardware for the following systems:

* Macintosh Audio Hardware

* PC Audio Hardware

* Unix Audio Hardware

Can anyone provide information for SGI, NeXT, other UNIX hardware and any other PC soundcards?

Macintosh Audio Hardware - an overview

* Description: ALL Macintosh computers come with the ability to play back sounds at any sample rate (sample rate conversion is done in software.) Older machines have 8 bit stereo output (hardware runs at 22254 samples/second). The newer machines have 16 bit stereo hardare running at 44100 samples/second. Most of the recent Macintosh computers come with sound input hardware. There are probably exceptions to this, but the older and some of the current low-end machines have 8 bit (linear) mono hardware running at 22254.54 samples/second. All of the PowerPC, AV, and the 500 series notebook computers come with 16 bit 44kHz stereo sampling hardware. They can also record at 22050 samples/second. The sound manager implements an AGC (Automatic Gain Control) function for the 8 bit hardware. The drivers have a switch to turn off the AGC. There are a number of DSP vendors that support high quality audio. Generally this means quieter analog sections, and more IO formats

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (4 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

(AES/IBU, for example). Try DigiDesign and Spectral Innovations. The software drivers for sound are described in "Inside Macintosh:

Sound". If you want to see some sample code check out the sources for the Matlab "Sound and Image Toolbox". They can be found at

ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.

hqx

Routines that play and record sounds using the toolbox are included (and interfaced to Matlab).

PC Audio Hardware

Note: new soundcards are becoming available all the time - the information below is definately not up to date. Check out the following newsgroups for up-to-date information.

* comp.sys.ibm.pc.soundcard

* comp.sys.ibm.pc.soundcard.GUS

* comp.sys.ibm.pc.soundcard.advocacy

* comp.sys.ibm.pc.soundcard.games

* comp.sys.ibm.pc.soundcard.misc

* comp.sys.ibm.pc.soundcard.music

* comp.sys.ibm.pc.soundcard.tech

The Soundcard WWW Site is an excellent source of information:

* http://www.wi.leidenuniv.nl/audio/

An good source of programs and information for soundcards is SimTel:

* http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

Additional information on PC soundcards is provided by the FAQ postings for the comp.sys.ibm.pc.soundcard.misc newsgroup. These are available by anonymous ftp from:

ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/

* Aria Soundcard FAQ

* Aria Soundcard Support List

* MIDI files software archives on the Internet

* Turtle Beach sound cards FAQ

Unix Audio Hardware

Could someone please provide information on the audio capabilities of other Unix platforms?

Sun standard audio port: SPARC I & II

* Input and Output: 1 channel, 8 bit mu-law encoded, 8kHz sample rate. This provides telephone quality sampling.

Sun DBRI audio port (SPARC 10 & 20)

* Input and Output: Stereo (2 channels). 16-bit linear sampling. Multiple sample rates (48000, 44100, 37800, 32000, 22050, 18900, 16000, 11025, 9600, 8000 Hz)

Silicon Graphics Audio

The Silicon Graphics audio Frequently Asked Questions (FAQ) is the best place to get information on SGI audio capabilities and programming. It provides information on connecting the audio output, using the DSP capabilities, controlling the audio output, programming,

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (5 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

useful software and more. It is available from:

* WWW: http://www-viz.tamu.edu/~sgi-faq/faq/html/audio/

* News: comp.sys.sgi.misc

* Ftp: ftp://viz.tamu.edu/pub/sgi/faq/

Ariel Signal Processors

* Platform: Various

* Description: A range of signal I/O, A/D, D/A and DSP products are available. There are too many to list.

* Contact: Ariel Corp. 433 River Road, Highland Park, NJ 08904. Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124

Q2.7: How do I convert to/from mu-law format?

Mu-law coding is a form of compression for audio signals including speech. It is widely used in the telecommunications field because it improves the signal-to-noise ratio without increasing the amount of data. Typically, mu-law compressed speech is carried in 8-bit samples. It is a companding technqiue. That means that carries more information about the smaller signals than about larger signals.

On SUN Sparc systems have a look in the directory /usr/demo/SOUND. Included are table lookup macros for ulaw conversions. [Note however that not all systems will have /usr/demo/SOUND installed as it is optional - see your system admin if it is missing.]

OR, here is some sample conversion code in C.

/** ** Signal conversion routines for use with Sun4/60 audio chip **/

#include stdio.h

unsigned char linear2ulaw(/* int */); int ulaw2linear(/* unsigned char */);

/* ** This routine converts from linear to ulaw ** ** Craig Reese: IDA/Supercomputing Research Center ** Joe Campbell: Department of Defense ** 29 September 1989 ** ** References:

** 1) CCITT Recommendation G.711 (very difficult to follow) ** 2) "A New Digital Technique for Implementation of Any **

**

Continuous PCM Companding Law," Villeret, Michel,

**

et al. 1973 IEEE Int. Conf. on Communications, Vol 1, 1973, pg. 11.12-11.17

** 3) MIL-STD-188-113,"Interoperability and Performance Standards **

**

** ** Input: Signed 16 bit linear sample ** Output: 8 bit ulaw sample */

for Analog-to_Digital Conversion Techniques," 17 February 1987

#define ZEROTRAP #define BIAS 0x84 #define CLIP 32635

/* turn on the trap as per the MIL-STD */ /* define the add-in bias for 16 bit samples */

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (6 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

unsigned char

linear2ulaw(sample)

int sample; { static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,

4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,

5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,

5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,

6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,

6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,

6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,

6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,

7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};

int sign, exponent, mantissa; unsigned char ulawbyte;

/* Get the sample into sign-magnitude. */

sign = (sample >> 8) & 0x80; if (sign != 0) sample = -sample; if (sample > CLIP) sample = CLIP;

/* set aside the sign */ /* get magnitude */ /* clip the magnitude */

/* Convert from 16 bit linear to ulaw. */ sample = sample + BIAS; exponent = exp_lut[(sample >> 7) & 0xFF]; mantissa = (sample >> (exponent + 3)) & 0x0F;

ulawbyte = ~(sign | (exponent << 4) | mantissa); #ifdef ZEROTRAP

if (ulawbyte == 0) ulawbyte = 0x02; #endif

/* optional CCITT trap */

return(ulawbyte);

}

/* ** This routine converts from ulaw to 16 bit linear. ** ** Craig Reese: IDA/Supercomputing Research Center ** 29 September 1989 ** ** References:

** 1) CCITT Recommendation G.711 (very difficult to follow)

** 2) MIL-STD-188-113,"Interoperability and Performance Standards **

**

** ** Input: 8 bit ulaw sample ** Output: signed 16 bit linear sample */

for Analog-to_Digital Conversion Techniques," 17 February 1987

int

ulaw2linear(ulawbyte)

unsigned char ulawbyte;

{

static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764}; int sign, exponent, mantissa, sample;

ulawbyte = ~ulawbyte; sign = (ulawbyte & 0x80); exponent = (ulawbyte >> 4) & 0x07;

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (7 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

mantissa = ulawbyte & 0x0F; sample = exp_lut[exponent] + (mantissa << (exponent + 3)); if (sign != 0) sample = -sample;

return(sample);

}

Q2.8: Signal Processing Software

[Note: Question 1.9 lists speech laboratory environments and audio editors, many of which provide basic and advanced signal processing capabilities.]

Signal Processing Products

* SigLib from Numerix Ltd.

On the Web

The following sites provide lists of useful DSP software. Not all the software is directly applicable to speech processing.

comp.dsp FAQ http://www.bdti.com/faq/dsp_faq.htm

DSP Internet Resources

http://www.eg3.com/

http://www.eg3.com/dsp.htm

Poynton's Digital Signal Processing Resource List http://www.inforamp.net/~poynton/Poynton-dsp.html

WWW Pages Relating to Sound Computation http://datura.cerl.uiuc.edu/netstuff/sigsoundLinks.html

Yahoo - Signal and Image Processing http://www.yahoo.com/Science/Engineering/Electrical_Engineering /Signal_and_Image_Processing/

Sound Related Resources http://pscinfo.psc.edu/~geigel/menus/sound.html

SPLIB: Signal Processing url LIBrary http://jazz.rice.edu/splib/

Wavelet's Home Page http://www.mat.sbg.ac.at/~uhl/wav.html

SigLib from Numerix Ltd.

* Platform: Windows, Unix and all major DSPs

* Description: SigLib is an ANSI C Source DSP Library and includes functions for the following areas : spectrum analysis, windowing, filtering (fixed and adaptive coefficient), convolution, correlation, covariance, signal generation, statistical analysis, regression analysis, communications and modulation, digital effects, vectors processing, control, graphics and file I/O. Detailed product information and a description of the application of SigLib to speech processing is provided on the Numerix Ltd. WWW site.

* Availability: A free demonstration of SigLib V2.0 is available from the Numerix Ltd. WWW site. Educational discount is available for SigLib.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (8 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

* Contact: Numerix Ltd., 157 Sileby Road, Barrow-on-Soar, Leics, LE12 8LW, UK. Phone/Fax : +44 (0)1509 413195 Email: numerix@numerix.co.uk WWW: http://www.numerix.co.uk/

Speech Coding and Compression

comp.speech FAQ Section 3

* SpeechLinks: Speech Coding

* Q3.1: Speech compression techniques

* Q3.2: Information on speech coding and compression

* Q3.3: Speech Compression / Coding Software

Q3.1: Speech compression techniques

Provided by Tony Robinson:

The aim of speech compression is to produce a compact representation of speech sounds such that when reconstructed it is perceived to be close to the original. The two main measures of closeness are intelligibility and naturalness.

The standard reference point is toll quality speech, this is the same as what would be expected over a telephone line, for example, speech coded at 8 kHz using 8 bit ulaw coding and a maximum frequency of about 3.3 kHz. This is a bit rate of 64 kbps, and as such represents a compressed form over (say) 16 bit, 16 kHz speech which is the standard in speech recognition work.

ulaw coding does not exploit the (normally large) sample to sample correlations found in speech. ADPCM is the next family of speech coding techniques, and does exploit this redundancy by using a simple linear filter to predict the next sample of speech. The resulting prediction error is typically quantised to 4 bits thus giving a bit rate of 32 kbps (see, for example, the software in Q3.3: 32 kbps ADPCM, G.711/721/723 Compression, shorten). The advantages of ADPCM are that is simple to implement and has very low delay.

To obtain more compression specific properties of the speech signal must be modelling. The main assumption is known as the source filter model of speech production. This assumes that a source (voicing or fricative excitation) is passed through a filter (the vocal tract response) to produce the speech. The simplest implementation of this is known as a LPC synthesiser (e.g. LPC10e). At every frame the speech is analysed to compute the filter coefficients, the energy of the excitation, a voicing decision, and a pitch value if voiced. At the decoder a regular set of pulses for voiced speech or white noise for unvoiced speech is passed through the linear filter and multiplied by the gain to produce the speech. This is a very efficient system and typically produces speech coded at 1200-2400bps. With clever acoustic vector prediction this can be reduced to 300-600bps. The disadvantages are a loss of naturalness over most of the speech and occasionally a loss of intelligibility.

The CELP family of coders compensates for the lack of quality of the simple LPC model by using more information in the excitation. Each of a set of codebook of excitation vectors is tried and the index of the one that best matches the original speech is transmitted. This results in an increase in the bit rate to typically 4800-9600bps. Most speech

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (9 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

coding research is currently directed towards CELP coders. (See, for example, CELP 3.2a, a TMS implementation, a G.728 LD-CELP vocoder, and the L&H implementation.

Q3.2: Information on speech coding and compression

Reference Books

The following books cover speech coding/compression.

* Douglas O'Shaughnessy, Speech Communication: Human and Machine, Addison Wesley series in Electrical Engineering: Digital Signal Processing, 1987.

* Bishnu Atal in ed. Fallside, F. and W. Woods, ed. Computer Speech Processing. London: Prentice/Hall International, 1985. N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall, ISBN 0-13-211913-7 01, 1984.

* W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis, Elsevier, Amsterdam, 1995. Contents, preface etc on the WWW:

http://www.elsevier.nl/section/engtech/scs/menu.htm

* Thomas P. Barnwell, Kambiz Nayebi and Craig H Richardson, Speech Coding: A Computer Laboratory Textbook, John Wiley and Sons Inc,

1996.

* Schuyler R Quackenbush, Tom P Barnwell III, Mark A Clements, Objective Measures of Speech Quality, Prentice-Hall, 1988.

And the are good tutorial articles.

* Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the IEEE 63 (1975): 561 - 580.

On the WWW

comp.compression FAQ Includes a few questions and answers on the compression of speech. ftp://rtfm.mit.edu/pub/usenet/comp.compression/

Tony Robinson's Speech Analysis Course A complete course on speech analysis, including some stuff on speech coding.

http://svr-www.eng.cam.ac.uk/~ajr/SA95/

http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html

ITU Coding Standards Members of the ITU (International Telecommunications Union) can obtain copies of the Series G Recommendations (including G.711/721/723/728) from the ITU WWW site (http://www.itu.ch/) and from http://www.itu.ch/itudoc/itu-t/rec/g/g700-799.html.

Jason Woodard's Speech Coding Page Introduction to speech coding plus information on a series of speech coding standards. http://www-mobile.ecs.soton.ac.uk/speech_codecs/index.html

WWW searchable online-bibiliography for Phonetics and Speech Technology Over 8000 entries provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt. http://www.uni-frankfurt.de/~ifb/bib_engl.html

Ciaran McElroy's Speech Coding Page Introduction to many types of speech coding.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (10 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.h

tml

Examples of speech coding

Nam Phamdo's Speech Coding Demonstration Examples of ADPCM, LD-CELP, CELP, LPC10 and CELP coding and coding over a noisy channel. http://admii.arl.mil/~fsbrn/phamdo/speech_demo.html

Phil Karn's Digital/Analog Voice Demo Examples of several speech coding systems. http://www.qualcomm.com/people/pkarn/voicedemo/

Q3.3: Speech Compression / Coding Software

The following speech compression software is described in the FAQ.

* 32 kbps ADPCM

* Castleton Network Systems - G.729 Voice Coder

* CELP 3.2a & LPC-10

* 8 Kbit/s CELP on the TMS320C5x family of DSP chips

* CyberVoice

* Rockwell's DigiTalk

* File format conversion

* G.711/721/723 Compression

* G.728 LD-CELP vocoder

* G.728 Compression

* GSM 06.10 Compression

* Lernout & Hauspie Speech Coding (5 products)

* Lernout & Hauspie Speech Coding SDK

* MPEG Audio

* shorten - a lossless compressor for speech signals

* Sipro Lab Telecom Inc. Coding

* Sonarc: Digital Audio Compression

* StarAudio Compressor/Player

* TrueSpeech from DSP Group

* U.S.F.S. 1016 CELP vocoder for DSP56001

* ToolVox from Voxware

32 kbps ADPCM

* Platform: SGI and Sun Sparcs

* Description: 32 kbps ADPCM C-source code (G.721 compatibility is uncertain)

* Contact: Jack Jansen

* Availablity: http://www.cwi.nl/ftp/audio/adpcm.shar

Castleton Network Systems - G.729 Voice Coder

* Platform: TI TMS320C5x DSP

* Description: G.729, also called CS-ACELP (Conjugate-Structure Algebraic Code Excited Linear Prediction), is a state-of-the-art voice compression ITU (International Telecommunications Union) standard that can be used in a wide range of applications including wireless communications, digital satellite systems, packetized speech and digital leased lines. G.729 provides 8000 bits/s bandwidth for compressed speech at toll quality (equivalent to G.726 32 kbit/s ADPCM under clean channel condition). Also, G.729 has lower complexity and lower bit rate than G.728.

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (11 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

The Castleton G.729 implementation provides a bit-exact implementation of the ITU standard on a single TI TMS320C5x DSP. The software is C callable and fully re-entrant, which allows easy interfacing and multi-channel capability. The encoder and decoder are fully independent, therefore, a DSP device can run a number of full-duplex or half-duplex channels. The coder and the decoder are able to operate under a real-time task switching kernel.

* Cost and Availablity: Contact Castleton Network Systems.

* Contact: Castleton Network Systems Corporation 350 Terry Fox Drive, Kanata, Ontario, Canada K2K 2W5 Ph: 613-591-8786, Fax: 613-591-8783 Email: inquire@castleton.com WWW: http://www.castleton.com/

CELP 3.2a & LPC-10

* Platform: Sun (the makefiles and source can be modified for other platforms)

* Description: CELP is lossy compression technqiue. The US Department of Defences's Federal-Standard-1016 based 4800 bps code excited linear prediction voice coder version 3.2a (CELP 3.2a). Fortran and C simulation source codes.

* Availability: By anonymous ftp from:

ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z

Or from the comp.speech ftp server

ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.Z

ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.gz

LPC-10 Fortran source code is also available:

ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz

Here is a modified LPC-10 release that includes ANSI C source:

http://www.arl.wustl.edu/~jaf/lpc/

* Documentation: The following articles describe the Federal-Standard-1016 4.8-kbps CELP coder:

+ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The Federal Standard 1016 4800 bps CELP Voice Coder," Digital Signal Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.

+ Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The DoD 4.8 kbps Standard (Proposed Federal Standard 1016)," in Advances in Speech Coding, ed. Atal, Cuperman and Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p.

121-133.

The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400 bps linear prediction coder (LPC-10) was republished as a Federal Information Processing Standards Publication 137 (FIPS Pub 137). It is described in:

+ Thomas E. Tremain, "The Government Standard Linear Predictive Coding Algorithm: LPC-10," Speech Technology Magazine, April 1982, p. 40-49. There is also a section about FS-1015 in the book:

+ Panos E. Papamichalis, Practical Approaches to Speech Coding, Prentice-Hall, 1987.

The voicing classifier used in the enhanced LPC-10 (LPC-10e) is described in:

+ Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced Classification of Speech with Applications to the U.S. Government LPC-10E Algorithm," Proceedings of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1986, p.

473-6.

* Vendors:

Realtime DSP code for FS-1015 and FS-1016 is sold by:

+ John DellaMorte, DSP Software Engineering 165 Middlesex Tpk, Suite 206, Bedford, MA 01730, USA

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (12 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt

Ph: 1-617-275-3733 Fax: 1-617-275-4323 Email: dspse.bedford@channel1.com DSP Software Engineering's FS-1016 code can run on a DSP

Research's Tiger 30 (a PC board with a TMS320C3x and analog interface suited to development work).

+ DSP Research 1095 E. Duane Ave, Sunnyvale, CA 94086, USA Ph: (408)773-1042 Fax: (408)736-3451

8 Kbit/s CELP on the TMS320C5x family of DSP chips

* Description: For low bandwidth transmission of voice, compact

voice storage for archival purposes, low-cost digital answering machines and efficient storage for voice mail. Features :

+ near toll quality at 8 Kb/s.

+ Variable rate option with 1 Kb/s silence encoding.

+ Implemented on a fixed-point processor for lower system cost.

+ Attractive licensing scheme.

+ Future availability of 4 Kb/s.

+ Custom rates possible. Capacity :

+ Two half-duplex or one full duplex channels on the 20 MIPS 'C5x (at 95% and 55% CPU utilization respectively).

+ Two full duplex channels on the 28.6 MIPS 'C5x (at 77% CPU utilization).

+ Requires 9 K-words program memory and 3 K-words data memory.

+ Decoding in real-time on a 486 class CPU.

* Contact:

CVI Inc. 443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3 Tel: (604) 987 1719 Fax: (604) 986 8139 Email: cvi@extropia.wimsey.com

CyberVoice

* Description: Cybernetics InfoTech, Inc. offers the following products

+ Telephone voice compression at 1.2, 2.4, 4.8 and 6.0 kbit/s with good-communications-quality to near-toll-quality coded voice;

+ Wideband voice (7-kHz bandwidth) compression at 16 kbit/s with near-original-quality coded voice;

+ Internet Voice E-mail software with voice editing, high-quality low-data-rate voice compression, fast/slow voice playback, and more.

* Availablity: C code and Windows .DLL for telephone voice compression and wideband voice compression are available for licensing. Real-time DSP codes are under development. Voice E-mail software is available for purchase and download from the CyberVoice home page.

* Contact: Cybernetics InfoTech, Inc. 2 Professional Dr., #228, Gaithersburg, MD 20879 WWW: http://www.cybit.com/ E-mail: info@cybit.com Fax: 301-590-0359

Rockwell's DigiTalk

* Description: The DigiTalk coder operates at a sampling rate of

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt (13 of 23) [10/31/2003 8:41:18 AM]

http://mi.eng.cam.ac.uk/comp.speech/text/FAQ-part2.txt