Index

Speech Technology:
From Pre-History to History

Joseph Mariani
Director, ICT Department
French Ministry of Research
& LIMSI-CNRS

(Contribution to Panel on the History of Speech Technology,
Session 35D1b; Janet Baker, Chair
Interspeech 2005, Lisbon, September 6, 2005)

Some selected milestones

• Early implementations and applications of speech technology in France
• Use of quantitative evaluation
• Spoken language modelling and knowledge
• Speech within multimodal communication
• Availability of Multilingual LR
• Internationalization of speech research
• Speech and Language...

Early implementations

• Speech research activities at LIMSI-CNRS (1972)
• Vecsys as a speech company (1979)
• ASR (J.S. Liénard, J. Mariani, J.L. Gauvain)
– Template matching, Non-linear time compression
• MOISE: Speaker-Dependent Isolated Word Recognition single board system (1980)
• MOZART: connected speech recognition (1982)
• ICOLOG: Diphone-based TTS for French (1979)
• Voice Terminal (1981)

Early applications

• Human-Machine Communication
– Pilot-plane dialog in cockpit design (First flight : 1982)
• Speech synthesis in helicopters
– Telephone voice dialling (1985)
– Air controller interaction (1985)
• Insufficient performances for critical tasks
• Need of intelligent infrastructure behind
• Shift to non-critical tasks: document retrieval
• From Human-Machine Communication to Human-Human through Machine Communication

Use of quantitative evaluation

• ARPA SUS program (1971-1976)
• The « 99%+ correct » commercial claims
• The Martine Kempf case (1984)
• NATO RSG10 initiative (1980)
– American + British English, Dutch, French, German
• DARPA-NIST initiative (1984/1987-)
• Opening to international participation (1992)
• Pre-history to history thanks to pioneers
• Extend to NLP / Computer Vision

Spoken language modelling and knowledge

• Statistical vs rule-based approaches (AI)
– Hidden-Markov Models / Expert spectrogram reading
– 1986 Evaluation results
– ANN + evaluation paradigm criticism (1995-1996)
• Large improvement of ASR systems
– Refined methods / more computer power / more data
• Smaller increase of knowledge on human speech recognition processes
• Use advanced ASR tools and large LR to increase knowledge of HSR

Speech within Multimodal Communication

• Speech usually used together with other communication modalities (except telephone conversation)
• Multimodal speech recognition
– speech signal + visual lip reading
• Fusion of modalities
– «Put That There ! » : voice+gesture+vision (1980,1993)
– Multimodal communication modelling
– Multimodal resources (recording, transcription)
– Still open issues : meeting transcription

Availability of Multilingual Language Resources

• Need for Language Resources to develop acceptable quality systems
• LDC : Linguistic Data Consortium (1992)
• ELRA : European Language Resource Association (1995) + ELDA (Distribution Agency)
• Cocosda: speech databases and evaluation (1991)
– Oriental Cocosda
• ICCWLRE/WRITE (2002-2003)

Internationalization of speech research

• Speech Technology Seminar (1974 : Stockholm)
• IEEE ICASSP (1982: Paris, 1989: Glasgow)
• ICST (1987: Edinburgh)
• ESCA (1988) - ISCA (1999)
• Eurospeech (1989) - Interspeech (1999)
• ICSLP (1990)
• Cocosda (1991)
• ELRA (1995) - LREC (1998: Granada)

Speech and Language

• ELSNET (1991)
• Cocosda (1991) - WRITE (2002)
• ELRA (1995) - LREC (1998)
• NIST evaluations : speech / NLP - TIDES (1999-)
• Survey of the State of the Art in HLT (CUP, 1998)
• EC FP5 HLT program (1998-2002)
• US HLT conference series (2001-)
• IEEE/ACL Spoken Language Technology seminar (2006)

Bibliography

• J. Mariani, "Recent Advances in Speech Processing", IEEE ICASSP, Glasgow, May 23-26 1989, pp 429-440
• J. Mariani, J.L. Gauvain, L.F. Lamel, " Comments on "Towards increasing speech recognition error rates", by H. Bourlard, H. Hermanski, and N. Morgan" ", Speech Communication, Vol. 18, N. 3, Mai 1996
• R. Cole, J. Mariani, H. Uszkoreit, N. Varile, A. Zaenen, A. Zampolli, V. Zue, " Survey of the State-of-the-Art in Human Language Technology ", Cambridge University Press, 1998
• B.H. Juang, D. Childers, R.V. Cox, R. de Mori, S. Furui, J. Mariani , P. Price, S. Sagayama, M.M. Sondhi, R. Weischedel, " Speech Processing : Past, Present and Outlook ", IEEE Signal Processing Magazine, Mai 1998
• J. Mariani, "Are we losing ground to the US ?: A contrastive analysis of EU versus US frameworks", HLT Open House, Luxembourg, September 26, 2000.

Saras Institute

History of Speech and Language Technology

Speech Technology:
From Pre-History to History

Some selected milestones

Early implementations

Early applications

Use of quantitative evaluation

Spoken language modelling and knowledge

Speech within Multimodal Communication

Availability of Multilingual Language Resources

Internationalization of speech research

Speech and Language

Bibliography

Saras Institute

History of Speech and Language Technology

Speech Technology: From Pre-History to History

Some selected milestones

Early implementations

Early applications

Use of quantitative evaluation

Spoken language modelling and knowledge

Speech within Multimodal Communication

Availability of Multilingual Language Resources

Internationalization of speech research

Speech and Language

Bibliography

Speech Technology:
From Pre-History to History