Uncertain about ChatGPT: enabling the uncertainty evaluation of large language models

ChatGPT, OpenAI’s chatbot, has gained consider-able attention since its launch in November 2022, owing to its ability to formulate articulated responses to text queries and comments relating to seemingly any conceivable subject. As impressive as the majority of interactions with ChatGPT are, this large language model has a number of acknowledged shortcomings, which in several cases, may be directly related to how ChatGPT handles uncertainty. The objective of this paper is to pave the way to formal analysis of ChatGPT uncertainty handling. To this end, the ability of the Uncertainty Representation and Reasoning Framework (URREF) ontology is assessed, to support such analysis. Elements of structured experiments for reproducible results are identified. The dataset built varies Information Criteria of Correctness, Non-specificity, Self-confidence, Relevance and Inconsistency, and the Source Criteria of Reliability, Competency and Type. ChatGPT’s answers are analyzed along Information Criteria of Correctness, Non-specificity and Self-confidence. Both generic and singular information are sequentially provided. The outcome of this preliminary study is twofold: Firstly, we validate that the experimental setup is efficient in capturing aspects of ChatGPT uncertainty handling. Secondly, we identify possible modifications to the URREF ontology that will be discussed and eventually implemented in URREF ontology Version 4.0 under development.