Here, I present the roles that generative AI can play in the various types of data analysis. I give prompt examples, and I evaluate the reliability and validity of using ChatGPT for MMR. When asking ChatGPT to conduct analysis, I recommend that you give it frameworks, evaluation criteria or any other standards that it could use to improve outputs [68]. Ask ChatGPT to be brutally honest, to look for problems and lack of cohesion, to evaluate your writing critically, and to find ways to overcome positivity bias. ChatGPT can act as a research assistant who is available, teachable, and inexpensive. Meet your artificial assistant, ChatGPT, also referred to here as Chad (the only good Chad you will ever meet). The Chad is a naïve but well-meaning intern working for $20 monthly. Chad needs much supervision and makes silly mistakes but can add value when managed well. Chad is a well-mannered Chatbot (thanks to the guardrails) with much potential. Unfortunately, he is a people-pleaser, which can sometimes be annoying.
Qualitative data analysis with ChatGPT
ChatGPT and other similar generative AI platforms have immense capacity to analyse multimodal data [16]. ChatGPT can play several roles in qualitative analysis, including coding, where you ask ChatGPT to generate codes for textual data based on frameworks or theories. In addition to textual analysis, ChatGPT can detect sentiment, such as the emotional tone of the text [69]. ChatGPT can also generate themes and evaluate qualitative analysis for consistency by acting as a specialist inspector. The possibility of generating visualisations and tables with ChatGPT-4 is also helpful in quantifying or summarising aspects of the qualitative data. You can paste a limited amount of qualitative text into ChatGPT-3.5. In ChatGPT-4, the Advanced Data Analysis module makes it possible to upload data, ask for Excel codes per line, do more checks and comparisons, and create figures. Consider ChatGPT as an additional coder or a research assistant in this process. However, crucially, do your analysis and not rely solely on the Chatbot as it may miss rich aspects of your data or misinterpret the text.
In my experience, inductive coding does not work well in generative Artificial Intelligence settings, and I advise you to avoid asking the Chatbot to cold code. Instead, ChatGPT should be given both an analytical framework and a theoretical or conceptual framework to guide the coding and theme generation, as shown in Fig. 2 (see also the Prompt Library in the appendix). Examples of analytical frameworks to consider using include thematic, narrative, content, and discourse analysis. Another suggestion is to code the first few lines or paragraphs to show ChatGPT as examples.
ChatGPT assigns generic codes such as "other" when unsure or struggling to interpret the text. Such codes and themes require additional investigation as complex and valuable information is often submerged into these commonly named themes. When facing token restrictions, analyse a few interviews at a time; after that, ask ChatGPT to adjust themes based on additional interviews. As can be seen in Fig. 2, Chad provides well-written themes and quotes the participants as requested. The theoretical framework has also been woven into his answers.
Figure 3 demonstrates the prompt used to ask ChatGPT to code open-ended survey items. I uploaded the Excel file with the responses and labelled the columns where I wanted the bot to add the codes per line.
In addition to the downloadable Excel file Chad produced, reasonable themes were also written, as seen in Fig. 3. Checking the Excel file to see how every line was coded was very useful, and many vague statements had been given the code "other", and I could compare this to the human coding, which was more nuanced and consequential.
Quantitative data analysis with ChatGPT
Generative Artificial Intelligence (GAI) fulfils additional roles that GUI software did to a limited extent or not at all. ChatGPT can assist with cleaning data, for example, finding out-of-range values in data files. The Advanced Data Analysis custom ChatGPT can also scan your data file and recommend analysis. Here, I suggest sharing a codebook with the Chatbot with variables, labels, categories, and any required explanations, such as the Level of measurement. Use the suggestions from the Chatbot with care, as ChatGPT might suggest analysis that cannot or should not be done with the type of data presented. Chad can quickly generate descriptive statistics, as I will demonstrate here, as well as a paragraph with initial interpretation for the researcher to consider using. Commonly used inferential statistics can also be conducted in ChatGPT, with the bonus of interpretation. Currently, generative AI is fine for run-of-the-mill statistical analysis. Using generative AI to run more sophisticated statistical models requires further exploration. For example, I tried to conduct some structural equation modelling (SEM) with ChatGPT, but Chatbot has not yet been able to handle such complex analysis.
In Fig. 4, I show the prompt asking for Chad's recommended data analysis.
As can be seen in Fig. 4, Chad recommends many different types of analysis for my specific data set. However, not all of them are feasible or applicable. Nonetheless, this is a good starting point and ChatGPT may suggest types of analysis you had yet to consider.
Figure 5 is where I asked ChatGPT to produce descriptive statistics in a specified table format. I also asked for an interpretation and write-up of the table. While I did not fully agree with its interpretation, this again provides a good departure point to help the researcher start the writing process.
In Fig. 6, I illustrate the prompt for asking for inferential statistics from the Chad. Instead of asking for specific analysis, I asked ChatGPT to explore various relationships and to report statistical significance in an interpretative paragraph.
As can be seen from the response, the GAI accurately interprets the p-values, though it fails to report the effect sizes. I continued prompting beyond this point to obtain all the information I needed. Remember that just like a real assistant, you should keep speaking with the bot until you receive everything you need. See the appendix for the other prompts used.
Mixed methods data analysis with ChatGPT
A wide range of MMR analyses can be aided by generative AI, including concurrent triangulation, sequential explanatory and exploratory analysis, transformative designs (here, you must supply the paradigm) and multi-stage evaluations. ChatGPT-4 can also produce more complex figures and tables based on its own or your analysis, for example, side-by-side joint displays, statistics displayed by theme or vice versa and interview questions joint displays [70]. I have also created some infographics with ChatGPT-4, but these still require much input from the researcher, and it may be easier just to create your own in a different software environment. In ChatGPT-3.5, you can paste your findings from both the quantitative and qualitative analysis and work step by step with the bot to integrate analysis and output. To successfully use ChatGPT for mixed methods analysis, I recommend that you separately analyse the quantitative and qualitative data and combine the findings into a single document before asking the bot to assist with MMR analysis. In Fig. 7, I prompt ChatGPT to use my combined quantitative and qualitative findings document and conduct a sequential explanatory analysis using this study's theoretical framework. I specified both the analytical and the theoretical (or conceptual) frameworks as this yields the best results. Researchers should use comprehensive frameworks when conducting and analysing MMR data, as shown by Corrigan and Onwuegbuzie [71].
The output received in Fig. 7 was sparse, but further prompting into aspects reported by Chad led to more valuable outputs.
Based on the findings, I asked ChatGPT to create a joint table (see Fig. 8). While I would not use the table in its current format in a publication, I would use this summary to guide my writing and inform the creation of other graphics.
I recommend using generative AI to create initial tables, figures and text based on the quantitative and qualitative data. The final integration should be based on the aims of the study. When asking ChatGPT to create side by side type of tables, check this against your own version to make sure everything relevant is included. Write your interpretations based on your research aims and objectives; do not rely solely on the bot.
Reliability and validity of generative AI application
In Table 2, I compare the original analysis done by the research team and the outputs received from ChatGPT-4's Advanced Data Analysis Chatbot.
Table 2
Comparison of original quantitative and qualitative analysis with ChatGPT
Type
|
Original (Researchers)
|
ChatGPT version
|
Qualitative interviews
|
Theme: The academic program
|
Theme: Academic and Personal Growth
|
Qualitative interviews
|
Theme: Student support
|
Theme: Transition Challenges
|
Qualitative interviews
|
Theme: The role of peers in student success
|
Theme: Importance of Social Connections
|
Qualitative interviews
|
Theme: Behavioral and attitudinal factors
|
|
Quantitative data analysis
|
Demographic profile (Table & Text)
|
Demographic profile (Table & Text)
|
Quantitative data analysis
|
Recommended non-parametric analysis
|
Recommended non-parametric but also other not applicable analysis
|
Quantitative data analysis
|
Non-parametric analysis and interpretation
|
Non-parametric analysis done but less detail was provided
|
Open-ended survey questions
|
Theme: Intrinsic Psychological Motivation
|
Theme: Personal Interest and Passion
|
Open-ended survey questions
|
Theme: Engineering as Social Good
|
|
Open-ended survey questions
|
Theme: Focused on career opportunities
|
Theme: Career aspirations
|
Open-ended survey questions
|
Theme: Intrinsic behavioural motivation
|
Theme: Influence of Technology and Innovation
|
Open-ended survey questions
|
|
Theme: Curiosity and Problem-Solving
|
Comparing the original qualitative analysis done by the team of researchers to the ChatGPT themes, most were similar. The content of the themes is also consistent, but the most significant difference was related to the extent of coverage. ChatGPT found most of the same themes but gave shorter, less rich write-ups. ChatGPT also missed a theme we identified (behavioural and attitudinal factors).
In terms of the quantitative analysis, the Chatbot suggested creating demographic profiles (for example, gender, age, and ethnicity), mean distributions (for continuous variables) and summaries for the categorical variables. Other suggestions included cross-tabulations, non-parametric analysis, factor analysis (for constructs in the survey) and regression analysis to estimate predictors of the continuous variables. All the former suggestions I considered myself and are reasonable. Further suggestions, which I had yet to consider, included cluster analysis to group the students (this was not feasible with the data set), creating profiles within the engineering student population, and outcomes based on educational preferences. These last suggestions were exciting but irrelevant for my data set, once again pointing to the fact that the scholar needs to understand their data and the statistical possibilities. The demographic tables, figures, and interpretations provided by ChatGPT were of high quality and a good match for our own demographic profiles. When I asked ChatGPT to suggest inferential analysis, it did suggest the same non-parametric tests that I initially used, but it also suggested less useful and sometimes irrelevant analysis.
While descriptive statistics can quickly and reliably be done using ChatGPT, researchers should use their knowledge and judgment of statistics when deciding which tests to run in generative AI. The outputs are deemed acceptable and accurate and could be used in a research report or journal article, with the caveat that the text interpretation should be rewritten. Look for inaccuracies or exaggerations in the Chatbot’s output; for example, in my outputs, I found the claim that the sample is diverse to be an exaggeration.
The open-ended survey questions revealed most of the same themes, but again, ChatGPT's descriptions and unpacking of the themes were shallow when compared to human analysis. ChatGPT also missed a theme when compared to the human coders. At the same time, Chad had a theme that the humans did not identify. This comparison between human and artificial analysis can create avenues for identifying additional themes. ChatGPT contributes to analysis, but human judgement and insight should also be present throughout the process.
Troubleshooting
As is the case with all technology, ChatGPT can experience technical difficulties. Troubleshooting for ChatGPT includes clearing the history, cookies, and cache for the "all-time" option. Next, you can restart the session. Lastly, use different browsers (for example, Firefox works well) or disable all the Chrome extensions.