Historically, qualitative research in SE has grappled with challenges like the time-intensive nature of the work, limitations to scalability due to its manual nature, and the inherent subjectivity that qualitative methodologies can sometimes entail.
Time-intensive Work
Conducting qualitative research often requires intensive data analysis, which can be time-consuming. LLMs can help to automate or expedite parts of these processes. For example, they could help to make sense of large amounts of textual data, identify themes and patterns within data, and generate initial codes or categories. Such technical assistance could significantly speed up the data analysis process and allow researchers to handle larger datasets, thereby allowing them to scale qualitative analysis in ways hardly possible through a commensurate amount of manual effort
Generalizability
Qualitative research is hard to generalize universally or to wider populations outside the originally studied context, which is typically a relatively narrow phenomenon. Based on the constructive worldview, it may even be undesirable. However, the use of AI-based models and advanced natural language processing, such as those offered by LLMs, can help improve the relevance and generalizability of the qualitative findings, such as descriptive findings, taxonomies, and theories by expanding the contexts studied (Hoda 2021).
Consistency
Variations in qualitative data analysis are expected to exist across different researchers, but consistency can still be an issue for individual researchers. Depending on several factors not limited to external and personal circumstances, achieving high levels of human consistency is a known challenge (Gentles et al. 2015; Watson 2006). On the other hand, LLMs, being computing entities, can process and analyze data in a consistent way, considering the consistency in prompts. Improved consistency is likely to lend itself to better repeatability of the process and higher reproducibility of the research outcomes. This may be particularly desirable from a positive perspective.
Subjectivity
While it may be impossible or even undesirable to eliminate human subjectivity from qualitative research, LLMs could potentially add an additional layer to the analysis. For example, the use of LLMs can help a team of qualitative researchers discuss and agree on the concepts emerging from their individual analyses. Furthermore, the concepts generated by LLMs can act as a ‘third party’ reference to help address and reconcile differences emerging from personal beliefs, experiences, or emotions. It seems early and somewhat naive to suggest that an LLM can act as an objective baseline or a source of a deciding ‘expert opinion’. LLMs, like humans, are known to harbor their own set of biases based on the training data and parameters that can influence their inference logic when it comes to qualitative research (Navigli, Conia, and Ross 2023). With rapid enhancements in LLM capabilities, these aspects can be reexamined in the future.
New Frontiers, New Challenges
While LLMs may seem to be the panacea for many traditional qualitative research issues, they bring with them a set of unique challenges. We summarize these below.
Ethical and Privacy Concerns
Incorporating LLMs into data analysis poses ethical and privacy challenges, especially with sensitive data. Ethical issues include ensuring data consent, proper anonymization to enable de-identification, and addressing biases that AI may perpetuate (Arora, Grundy, and Abdelrazek 2023; Ebert and Louridas 2023). These concerns necessitate a responsible AI framework that respects individual privacy and data rights. For ethical usage, Nguyen-Duc et al. (2023) recommend integrating AI with an awareness of ethical implications and privacy risks, such as by using AI to enhance rather than replace human decision-making, and keeping sensitive raw data local to avoid exposure. Ozkaya (2023) further suggests robust data governance to ensure AI applications adhere to ethical standards and privacy regulations, balancing AI's potential with necessary oversight.
Model Biases
Like all machine learning models, LLMs can have inherent biases based on the data they were trained on, which can be flawed or insufficient. This could potentially skew the analysis or conclusions drawn from their use in qualitative research. For example, in SE qualitative research, if an LLM is trained on data that predominantly consists of contributions from male developers, it may inadvertently downplay or overlook the communication styles, coding preferences, or problem-solving approaches more common among female developers or those from underrepresented groups. In such cases, researchers have the responsibility to be aware of and acknowledge the inherent biases in the underlying data on which the LLMs are trained, as part of the limitations of their research.
Lack of Contextual and Philosophical Understanding
While LLMs can process and generate text based on patterns learned, they lack a true understanding of the context, which is crucial in qualitative research. This could lead to oversights and misinterpretations. For example, in a SE qualitative study analyzing developer communication on issue trackers, an LLM might interpret technical jargon or project-specific slang literally, missing the nuanced meaning intended by the developers. While LLMs could identify and summarize discussions on a given research topic from various sources, articles, and grey literature, but they might not fully grasp the subtleties of concerns that require a deeper philosophical understanding and contextual awareness, which human researchers provide. In such cases, the researchers should be paying special attention to any missing or misinterpreted contexts.
Dependency on Technology
There is a risk of becoming overly dependent on technology for research. While LLMs can assist in data analysis, they should not replace the human element of research, which includes critical thinking, contextual understanding, and ethical judgment (Bano et al. 2023). To educate and train the next generation of qualitative researchers it is important to not overly rely on augmented research technologies such as LLMs. We elaborate further on the level of expertise of researchers later in this paper.
Quality Control
Ensuring the quality and accuracy of the results generated by LLMs can be challenging. Researchers need to be vigilant and critical when interpreting the outputs of LLMs. For example, ChatGPT is known to be prone to hallucinations, instances where LLMs generate inaccurate or entirely fabricated information. Not checking for inaccurate and fake information generated by LLMs can land researchers in trouble. To address the issue of hallucinations the involvement of human researchers is imperative. As pointed out by Rudolph et al. (2023) and Alkaissi and McFarlane (2023), these hallucinations can lead to misinterpretation of research outcomes, compromise the validity of results, and introduce bias or error. To counteract this, researchers must scrutinize, verify, and interpret the outputs of LLMs meticulously, ensuring that the conclusions are aligned with the actual context and maintain the integrity of the research. This human intervention is necessary not only for validation but also to continually refine and calibrate the models, thereby improving their understanding and minimizing potential drawbacks (Watkins 2023).
Reproducibility
As LLMs are continuously updated, and old models are deprecated, the ability to reproduce an analysis with the same precision diminishes over time, a phenomenon known as model drift. Researchers may provide exhaustive details on their methodology, including data sets, prompts, parameters, and the versions of models used, but this does not guarantee that the same analysis can be reproduced in the future by LLMs. Unlike human researchers, where insights and analytical reasoning can be revisited or clarified, LLMs do not offer the possibility to revisit the reasoning behind their outputs once the model version is no longer available.
Context of Related Work
Integrating an LLM’s data analysis within the broader context of related work poses a significant challenge, primarily because the model cannot access the entirety of potentially relevant literature due to constraints on data availability and access rights due to paywalls. This limitation hampers the LLM’s ability to draw comprehensive connections and insights that are informed by the existing research, potentially narrowing the scope and depth of its analytical outputs. In the future, if LLMs are capable of handling large quantities of raw data from literature along with the context of related work, this could lead to augmenting systematic literature reviews (Kitchenham 2004) with LLMs.
Critical Thinking
Developing critical thinking in LLMs is a complex challenge, as it involves the model's exposure to a variety of data, including incorrect statements, to enhance its evaluative capabilities (Emmert-Streib 2023). To ensure LLMs are exposed to such a range of data, researchers could deliberately include datasets with known errors or contradictory information during the training phase. This method could potentially help LLMs learn to discern and evaluate the accuracy of information they analyze. However, this approach also raises concerns about how to effectively teach LLMs to recognize and appropriately handle incorrect information without perpetuating or amplifying these errors in their outputs. Currently, it’s unclear how critical thinking might be incorporated in LLMs when analysing qualitative data.
Intellectual property (IP): IP concerns are another dimension to consider in the use of LLMs in research. The contribution of LLMs’ responses and analyses to the creation of a research output could raise questions about authorship, such as whether ChatGPT should be credited as a co-author, reflecting the model's role in data processing and knowledge generation (Balel 2023). Another layer of complication is the copyright and IP of the data on which LLMs are trained on. Determining the extent of LLMs’ contribution, and that of underlying sources, and its implications for IP rights and academic recognition is an ongoing debate in the research community (Polonsky and Rotman 2023).
The Evolving Role of Human Researcher
Amid the LLM revolution, the role of the human researcher is undergoing a nuanced shift.
Ensuring Ethical Practices
Researchers must ensure that their studies are conducted ethically. This includes obtaining informed consent from participants, ensuring privacy and confidentiality, and treating the data in a way that respects the rights and dignity of the participants/sources (Watkins 2023).
Prompt Engineering
Prompt engineering is emerging as a crucial skill, underscoring the fact that the quality of LLM outputs hinges significantly on the inputs it receives. It's important to note that prompt engineering can also be a stage where researchers might unintentionally introduce bias, as the way questions are framed can influence the direction and nature of the LLM's response, potentially reinforcing certain perspectives or excluding others.
Defining Research Questions
Although LLMs can be used to brainstorm research topics and ideas, the researcher must define the research questions and objectives. An LLM can help process data, but LLMs do not have intellectual curiosity, intention, motivation, or enough information to set research directions, which will depend on the researcher.
Data Collection
While LLMs can help process and analyze large amounts of data, and now, with web searchability can collect data as well, it is still the researcher's responsibility to collect the data in certain qualitative research contexts such as interviews or surveys. However, in some instances where it is extremely difficult to recruit real participants for research, e.g. in health domain patients with chronic ailments, LLMs can be used to simulate and role-play certain personas for data collection. The known limitations of using personas in research, as well as the lack of lived human experience in simulated data, will continue to be a challenge.
Interpreting Outputs of the LLM
An LLM can proficiently identify patterns and themes within a dataset, presenting a synthesized analysis. However, it remains the domain of a human researcher to ascribe meaning to these findings, contextualizing them within the framework of the research objectives. One might wonder why the task of interpretation cannot also be delegated to an LLM. The reason lies in the nuanced understanding and subjective judgment required—qualities that are distinctly human and currently beyond the ability of LLMs. Additionally, while it is possible for one LLM to analyze another LLM’s output (Jiang, Ren, and Lin 2023), this still does not replace the depth of insight and complex reasoning a human brings to the interpretation of research data.
Quality Checking
It is important for researchers to check the quality of the work done by the LLM. For instance, they need to look for biases in the analysis and ensure that the LLM is correctly interpreting and coding the data.
Theorizing
Developing rich theories that are grounded in evidence requires a deep understanding of the data, the ability to see connections and patterns, and the creativity to formulate a theory. These are all skills that are currently beyond the reach of LLMs.
Writing and Dissemination
Finally, the researcher is responsible for writing up the results of the study and disseminating them, and is generally accountable for the research and its results. For example, the Journal of Information and Software Technology allows the use of Generative AI for improving readability and language, provided that authors have to give explicit acknowledgment statements for the accountability of their produced work.
This includes presenting the findings in a way that is understandable and useful to others and publishing or sharing the results in relevant forums.
The Promise of LLMs Across Varied Research Expertise
Qualitative research is often rooted in a constructivist paradigm emphasizing the non-replicable human capacity to understand and contextualize social phenomena (Easterbrook et al. 2008, Hoda 2021). The constructivist paradigm in SE research is concerned with socio-technical realities that are not objective but constructed through human experiences and contexts. This paradigm values the researcher's role in interpreting data, where their involvement and perspective are considered integral to the analysis, especially in methods like ethnography, participant observation, and grounded theory.
Qualitative research in SE also offers unique advantages in exploring complex socio-technical processes and aiding in theory construction. It can reveal underlying reasons behind intricate socio-technical dynamics and is often used to generate new research questions and insights. These aspects underscore the necessity of the human element in data interpretation, despite the analytical capabilities of LLMs.
The expertise of a researcher is crucial across all research modalities, including the application of LLMs, as it guides the critical interpretation of data, the strategic questioning that leads to deeper insights, and the contextual understanding that LLMs alone cannot provide.
Further to the opportunities and challenges presented by LLMs in SE qualitative research discussed above, we present our collective thoughts on how these may vary by the experience level of the researchers. Firstly, and most importantly, with the introduction of LLMs, ethical considerations come to the fore. It is crucial for researchers at all stages to understand and uphold ethical practices, especially concerning data privacy, possible plagiarism, and potential biases that the LLMs might introduce or perpetuate (Treude and Hata 2023).
For novices in qualitative research in SE, LLMs can be both an assistive tool and a challenge. LLMs can be used to sift through extensive datasets, identify initial patterns, and assist in some basic data coding, making the initiation phases smoother. However, novice researchers must be cautious. Relying heavily on LLMs without understanding the underlying domain of inquiry or the principles of qualitative data analysis can compromise the quality of research outputs and their own capabilities as researchers. It is essential to strike a balance to ensure data integrity and true learning of the research process.
Intermediate researchers will find LLMs useful as they dive into more complex data. LLMs can aid in identifying recurring themes and intricate patterns, potentially elevating the quality of the analysis through its comprehensive approach. However, there is a potential risk of overreliance on the technology, leading to overconfidence in automated outputs. It is crucial for researchers to maintain a critical eye, ensuring that their growing reliance on LLMs does not overshadow the need for rigorous human oversight and contextual interpretation that their increasing experience affords them.
For seasoned qualitative researchers, LLMs present an opportunity to explore new breadth and depth within data analysis. For example, LLMs can be used to scale qualitative research beyond what is typically possible through human effort. Experienced qualitative researchers can boost their practice by taking on larger datasets for analysis, training bespoke LLMs where accessible, and developing descriptive findings, taxonomies, and theories that capture a wider range of contexts and are, therefore, more widely generalizable. But with this deeper dive comes a heightened responsibility for research integrity and accountability. The research outputs, while possibly enhanced by LLMs, must be thoroughly reviewed for inadvertent errors or biases. Furthermore, while LLMs can handle the heavy lifting of data analysis, experts must remain fully accountable for the interpretations and conclusions drawn.
For all levels of researchers, LLMs can expedite the data processing phase, but it is paramount that researchers do not bypass the essential learning and understanding phases of the research process. LLMs should be tools to enhance the process, not shortcuts that diminish the depth and richness of qualitative research in software engineering. The use of LLMs should not eclipse the importance of human judgment and insight.