Improving Large Language Model’s Ability to Find the Words Relationship
Abstract
Background: It is still possible to enhance the capabilities of popular and widely used large language models (LLMs) such as Generative Pre-trained Transformer (GPT). Using the Retrieval-Augmented Generation (RAG) architecture is one method of achieving enhancement. This architectural approach incorporates outside data into the model to improve LLM capabilities.
Objective: The aim of this research is to prove that the RAG can help LLMs respond with greater precision and rationale.
Method: The method used in this work is utilizing Huggingface Application Programming Interface (API) for word embedding, store and find the relationship of the words.
Result: The results show how well RAG performs, as the attractively rendered graph makes clear. The knowledge that has been obtained is logical and understandable, such as the word Logistic Regression that related to accuracy, F1 score, and defined as a simple and the best model compared to Naïve Bayes and Support Vector Machine (SVM) model.
Conclusion: The conclusion is RAG helps LLMs to improve its capability well.
References
[1] Z. Dai and J. Liu, “Communotion and the Evolution of Human Language,” J. Arts Humanit., vol. 08, no. 09, pp. 100–110, 2019, [Online]. Available: https://www.theartsjournal.org/index.php/site/article/view/1737
[2] L. Damjanovic, S. G. B. Roberts, and A. I. Roberts, “Language as a tool for social bonding: evidence from wild chimpanzee gestural, vocal and bimodal signals,” Philos. Trans. R. Soc. B Biol. Sci., vol. 377, no. 1860, 2022, doi: 10.1098/rstb.2021.0311.
[3] A. M. Turing, “Computer Machinary and Intelligence,” pp. 433–460, 1950, doi: https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf.
[4] J. S. Nixon and F. Tomaschek, “Introduction to the special issue emergence of speech and language from prediction error: error-driven language models,” Lang. Cogn. Neurosci., vol. 38, no. 4, pp. 411–418, 2023, doi: 10.1080/23273798.2023.2197650.
[5] M. E. Peters et al., “Deep contextualized word representations,” arXiv Comput. Sci. - Cumputation Lang., Feb. 2018, [Online]. Available: http://arxiv.org/abs/1802.05365
[6] M. Shanahan, “Talking About Large Language Models,” arXiv Comput. Sci. - Cumputation Lang., no. December 2022, Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.03551
[7] J. Hoffmann et al., “Training Compute-Optimal Large Language Models,” Adv. Neural Inf. Process. Syst., vol. 35, no. 2020, pp. 1–36, 2022.
[8] M. Lamparth and J. Schneider, “Why the Military Can ’ t Trust AI,” Foreign Aff., pp. 1–8, 2024.
[9] J. Ge et al., “Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation,” medRxiv Prepr. Serv. Heal. Sci., Nov. 2023, doi: 10.1101/2023.11.10.23298364.
[10] K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang, “REALM: Retrieval-Augmented Language Model Pre-Training,” 37th Int. Conf. Mach. Learn. ICML 2020, vol. PartF16814, pp. 3887–3896, Feb. 2020, [Online]. Available: http://arxiv.org/abs/2002.08909
[11] U. Khandelwal, O. Levy, D. Jurafsky, L. Zettlemoyer, and M. Lewis, “Generalization Through Memorization: Nearest Neighbor Language Models,” 8th Int. Conf. Learn. Represent. ICLR 2020, 2020.
[12] W. Xu, X. Qian, M. Wang, L. Li, and W. Y. Wang, “SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic Mistakes,” Proc. Annu. Meet. Assoc. Comput. Linguist., vol. 1, pp. 5166–5183, 2023, doi: 10.18653/v1/2023.acl-long.283.
[13] C. Gutierrez and J. F. Sequeda, “Knowledge graphs,” Commun. ACM, vol. 64, no. 3, pp. 96–104, 2021, doi: 10.1145/3418294.
[14] M. Boudin, G. Diallo, M. Drancé, and F. Mougin, “The OREGANO knowledge graph for computational drug repurposing,” Sci. Data, vol. 10, no. 1, p. 871, Dec. 2023, doi: 10.1038/s41597-023-02757-0.
[15] V. K. Chaudhri et al., “Knowledge graphs: Introduction, history, and perspectives,” AI Mag., vol. 43, no. 1, pp. 17–29, 2022, doi: 10.1002/aaai.12033.
[16] N. Torabian, H. Radaei, B. Minaei-Bidgoli, and M. Jahanshahi, “Enhancing Knowledge graph with Selectional Preferences,” Res. Sq., pp. 1–23, Nov. 2023, doi: 10.21203/rs.3.rs-3620069/v1.
[17] R. Ludolph, A. Allam, and P. J. Schulz, “Manipulating google’s knowledge graph box to counter biased information processing during an online search on vaccination: Application of a technological debiasing strategy,” J. Med. Internet Res., vol. 18, no. 6, 2016, doi: 10.2196/jmir.5430.
[18] G. Michelet and F. Breitinger, “ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (local) large language models,” Forensic Sci. Int. Digit. Investig., vol. 48, no. March, 2024, doi: 10.1016/j.fsidi.2023.301683.
[19] K. I. Roumeliotis, N. D. Tselikas, and D. K. Nasiopoulos, “LLMs in e-commerce: A comparative analysis of GPT and LLaMA models in product review evaluation,” Nat. Lang. Process. J., vol. 6, no. November 2023, p. 100056, 2024, doi: 10.1016/j.nlp.2024.100056.
[20] L. Tunstall et al., “Zephyr: Direct Distillation of LM Alignment,” pp. 1–14, 2023, [Online]. Available: http://arxiv.org/abs/2310.16944
[21] A. Q. Jiang et al., “Mistral 7B,” Comput. Sci. - Comput. Lang., pp. 1–9, Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.06825
[22] P. Assiroj, A. Kurnia, and S. Alam, “The performance of Naïve Bayes, support vector machine, and logistic regression on Indonesia immigration sentiment analysis,” Bull. Electr. Eng. Informatics, vol. 12, no. 6, pp. 3843–3852, 2023, doi: 10.11591/eei.v12i6.5688.
[23] W. Chen, H. Hu, X. Chen, P. Verga, and W. W. Cohen, “MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text,” Proc. 2022 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2022, no. October, pp. 5558–5570, 2022, doi: 10.18653/v1/2022.emnlp-main.375.
This work is licensed under a Creative Commons Attribution 4.0 International License.