Speech Technology:
From Pre-History to History
Joseph Mariani
Director, ICT Department
French Ministry of Research
& LIMSI-CNRS
(Contribution to Panel on the History of Speech Technology,
Session 35D1b; Janet Baker, Chair
Interspeech 2005,
Lisbon, September 6, 2005)
Some selected milestones
• Early implementations and applications of speech technology in France
• Use of quantitative evaluation
• Spoken language modelling and knowledge
• Speech within multimodal communication
• Availability of Multilingual LR
• Internationalization of speech research
• Speech and Language...
Early implementations
• Speech research activities at LIMSI-CNRS (1972)
• Vecsys as a speech company (1979)
• ASR (J.S. Liénard, J. Mariani, J.L. Gauvain)
– Template matching, Non-linear time compression
• MOISE: Speaker-Dependent Isolated Word Recognition single board system (1980)
• MOZART: connected speech recognition (1982)
• ICOLOG: Diphone-based TTS for French (1979)
• Voice Terminal (1981)
Early applications
• Human-Machine Communication
– Pilot-plane dialog in cockpit design (First flight : 1982)
• Speech synthesis in helicopters
– Telephone voice dialling (1985)
– Air controller interaction (1985)
• Insufficient performances for critical tasks
• Need of intelligent infrastructure behind
• Shift to non-critical tasks: document retrieval
• From Human-Machine Communication to Human-Human through Machine Communication
Use of quantitative evaluation
• ARPA SUS program (1971-1976)
• The « 99%+ correct » commercial claims
• The Martine Kempf case (1984)
• NATO RSG10 initiative (1980)
– American + British English, Dutch, French, German
• DARPA-NIST initiative (1984/1987-)
• Opening to international participation (1992)
• Pre-history to history thanks to pioneers
• Extend to NLP / Computer Vision
Spoken language modelling and knowledge
• Statistical vs rule-based approaches (AI)
– Hidden-Markov Models / Expert spectrogram reading
– 1986 Evaluation results
– ANN + evaluation paradigm criticism (1995-1996)
• Large improvement of ASR systems
– Refined methods / more computer power / more data
• Smaller increase of knowledge on human speech recognition processes
• Use advanced ASR tools and large LR to increase knowledge of HSR
Speech within Multimodal Communication
• Speech usually used together with other communication modalities (except telephone conversation)
• Multimodal speech recognition
– speech signal + visual lip reading
• Fusion of modalities
– «Put That There ! » : voice+gesture+vision (1980,1993)
– Multimodal communication modelling
– Multimodal resources (recording, transcription)
– Still open issues : meeting transcription
Availability of Multilingual Language Resources
• Need for Language Resources to develop acceptable quality systems
• LDC : Linguistic Data Consortium (1992)
• ELRA : European Language Resource Association (1995) + ELDA (Distribution Agency)
• Cocosda: speech databases and evaluation (1991)
– Oriental Cocosda
• ICCWLRE/WRITE (2002-2003)
Internationalization of speech research
• Speech Technology Seminar (1974 : Stockholm)
• IEEE ICASSP (1982: Paris, 1989: Glasgow)
• ICST (1987: Edinburgh)
• ESCA (1988) - ISCA (1999)
• Eurospeech (1989) - Interspeech (1999)
• ICSLP (1990)
• Cocosda (1991)
• ELRA (1995) - LREC (1998: Granada)
Speech and Language
• ELSNET (1991)
• Cocosda (1991) - WRITE (2002)
• ELRA (1995) - LREC (1998)
• NIST evaluations : speech / NLP - TIDES (1999-)
• Survey of the State of the Art in HLT (CUP, 1998)
• EC FP5 HLT program (1998-2002)
• US HLT conference series (2001-)
• IEEE/ACL Spoken Language Technology seminar (2006)
Bibliography
• J. Mariani, "Recent Advances in Speech Processing", IEEE ICASSP, Glasgow, May 23-26 1989, pp 429-440
• J. Mariani, J.L. Gauvain, L.F. Lamel, " Comments on "Towards increasing speech recognition error rates", by H. Bourlard, H. Hermanski, and N. Morgan" ", Speech Communication, Vol. 18, N. 3, Mai 1996
• R. Cole, J. Mariani, H. Uszkoreit, N. Varile, A. Zaenen, A. Zampolli, V. Zue, " Survey of the State-of-the-Art in Human Language Technology ", Cambridge University Press, 1998
• B.H. Juang, D. Childers, R.V. Cox, R. de Mori, S. Furui, J. Mariani , P. Price, S. Sagayama, M.M. Sondhi, R. Weischedel, " Speech Processing : Past, Present and Outlook ", IEEE Signal Processing Magazine, Mai 1998
• J. Mariani, "Are we losing ground to the US ?: A contrastive analysis of EU versus US frameworks", HLT Open House, Luxembourg, September 26, 2000.