doi: 10.17586/2226-1494-2023-23-1-88-95


.Dialogue system based on spoken conversations with access to an unstructured knowledge base

S. M. Masliukhin


Read the full article  ';
Article in Russian

For citation:
Masliukhin S.M. Dialogue system based on spoken conversations with access to an unstructured knowledge base. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 1, pp. 88–95 (in Russian). doi: 10.17586/2226-1494-2023-23-1-88-95


Abstract
This paper describes an approach for constructing a task-oriented dialog system (a conversational agent) with an unstructured knowledge access based on spoken conversations including: written speech augmentation that simulates the speech recognition results; combination of classifiers; retrieval augmented text generation. The proposed approach provides the training data augmentation in two ways: by converting the original texts into sound waves by a text-to-speech model and then transforming back into texts by an automated speech recognition model; injecting artificially generated errors based on phonetic similarity. A dialogue system with access to the unstructured knowledge base solves the task of detecting a turn, which requires searching for additional information in an unstructured knowledge base. For this purpose, the Support Vector Machine, Convolutional Neural Network, Bidirectional Encoder Representations from Transformers, and Generative Pre-trained Transformer 2 models were trained. The best of the presented models are used in the weighted combination. Next, a suitable text fragment is selected from the knowledge base and a reasonable answer is generated. The tasks are solved by adapting the retrieval augmented text generation model Retrieval Augmented Generation. The proposed method tested on the data from the 10th Dialogue System Technology Challenge. In all metrics, except Precision, the new approach significantly outperformed the results of the basic models proposed by the organizers of the competition. The results of the work can be used to create chat-bot systems that provide automatic processing of user requests in natural language based on an unstructured knowledge access, such as a database of answers to frequently asked questions.

Keywords: dialogue systems, conversational agents, information retrieval, text augmentation, retrieval augmented generation

Acknowledgements. This research is financially supported by the Russian Science Foundation (No. 22-11-00128, https://rscf.ru/project/22-11-00128/).

References
  1. Moghe N., Arora S., Banerjee S., Khapra M.M. Towards exploiting background knowledge for building conversation systems. Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2322–2332. https://doi.org/10.18653/v1/D18-1255
  2. Dinan E., Roller S., Shuster K., Fan A., Auli M., Weston J. Wizard of wikipedia: Knowledge-powered conversational agents. arXiv, 2019, arXiv:1811.01241. https://doi.org/10.48550/arXiv.1811.01241
  3. Zhou K., Prabhumoye S., Black A.W. A dataset for document grounded conversations. Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 708–713. https://doi.org/10.18653/v1/D18-1076
  4. Hearst M., Dumais S., Osuna E., Platt J. Scholkopf B. Support vector machines. IEEE Intelligent Systems and their Applications, 1998, vol. 13, no. 4, pp. 18–28. https://doi.org/10.1109/5254.708428
  5. Johnson R., Zhang T. Convolutional neural networks for text categorization: Shallow word-level vs. deep character-level. ArXiv, 2016, arXiv:1609.00718. https://doi.org/10.48550/arXiv.1609.00718
  6. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 (Long and Short Papers), 2019, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423
  7. Radford A., Narasimhan K., Salimans T., Sutskever I. Improving language understanding by generative pre-training. Preprint. 2018.
  8. Karpukhin V., Oğuz B., Min S., Lewis P., Wu L., Edunov S., Chen D., Yih W.-T. Dense passage retrieval for open-domain question answering. Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
  9. Humeau S., Shuster K., Lachaux M., Weston J. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv, 2020, arXiv:1905.01969. https://doi.org/10.48550/arXiv.1905.01969
  10. Lewis M., Liu Y., Goyal N., Ghazvininejad M., Mohamed A., Levy O., Stoyanov V., Zettlemoyer L. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Proc. of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703
  11. Kim S., Liu Y., Jin D., Papangelis A., Hedayatnia B., Gopalakrishnan K., Hakkani-Tur D. DSTC10 Track Proposal: Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations. 2021.
  12. Kim S., Eric M., Gopalakrishnan K., Hedayatnia B., Liu Y. Hakkani-Tur D.Z. Beyond domain APIs: task-oriented conversational modeling with unstructured knowledge access. Proc. of the 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2020, pp. 278–289.
  13. Budzianowski P., Wen T.-H., Tseng B.-H., Casanueva I., Ultes S., Ramadan O., Gašić M. MultiWOZ - A large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 5016–5026. https://doi.org/10.18653/v1/D18-1547
  14. Eric M., Goel R., Paul S., Sethi A., Agarwal S., Gao S., Kumar A., Goyal A., Ku P., Hakkani-Tür D. Multiwoz 2.1: Multi-domain dialogue state corrections and state tracking baselines. Proc. of the Twelfth Language Resources and Evaluation Conference, 2020, pp. 422–428.
  15. Zang X., Rastogi A., Sunkara S., Gupta R., Zhang J., Chen J. MultiWOZ 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. Proc. of the 2nd Workshop on Natural Language Processing for Conversational AI, 2020, pp. 109–117. https://doi.org/10.18653/v1/2020.nlp4convai-1.13
  16. Baevski A., Zhou H., Mohamed A., Auli M. Wav2vec 2.0: a framework for self-supervised learning of speech representations. Proc. of the 34th International Conference on Neural Information Processing Systems (NIPS'20), 2020, pp. 12449–12460.
  17. Panayotov V., Chen G., Povey D., Khudanpur S., Librispeech: An ASR corpus based on public domain audio books. Proc. of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964
  18. Heafield K. KenLM: Faster and smaller language model queries. Proc. of the Sixth Workshop on Statistical Machine Translation, 2011, pp. 187–197.
  19. Gopalakrishnan K., Hedayatnia B., Wang L., Liu Y., Hakkani-Tür D. Are neural open-domain dialog systems robust to speech recognition errors in the dialog history? an empirical study. Proc. Interspeech 2020, 2020, pp. 911–915. https://doi.org/10.21437/Interspeech.2020-1508
  20. Wang L., Fazel-Zarandi M., Tiwari A., Matsoukas S., Polymenakos L. Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors. Proc. of the 2nd Workshop on Natural Language Processing for Conversational AI, 2020, pp. 63–70. https://doi.org/10.18653/v1/2020.nlp4convai-1.8
  21. Xu L., Lian J., Zhao W.X., Gong M., Shou L., Jiang D., Xie X., Wen J. Negative sampling for contrastive representation learning: A review. ArXiv, 2022, arXiv:2206.00212. https://doi.org/10.48550/arXiv.2206.00212


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика