The majority of the participants in this study were anesthesiologists from the anesthesia department of the University Hospital Zurich, a maximum care hospital with around 30,000 surgical procedures per year. Another participant came from the anesthesia department of the Kantonsspital Winterthur, Switzerland, a teaching hospital with about 10,000 surgical procedures per year.
In both study steps, all participants were either attending or resident physicians, or nurse anesthetists. All staff physicians had an anesthesia board certification, and all nurse participants had completed their anesthesia specialization training. We recruited participants who responded to institutional e-mail invitations and additionally asked co-workers in person to participate according to their availability.
Most participants knew the data collectors personally before the study, as they worked in the same departments. We explained the purpose of the study, namely the evaluation of the novel avatar-based patient monitoring technology in the invitation e-mails and, when approaching a participant directly, in person.
Part I: Qualitative analysis of interview answers
Study setup and data collectors
Interviews were conducted at the end of data collection sessions for the development of a novel visualization technology for patient monitoring (Visual Patient Technology). The methodology and the results of that study have already been published separately.19 Before each data collection session, participants also completed a survey of personal information, e.g., age, sex, experience (in years) with patient monitors.
Two doctors conducted the interviews. Physician one (CBN) was a senior physician at the Institute of Anesthesiology of the University Hospital Zurich with over 20 years of clinical anesthesia experience. He worked 100% clinically at the time of the study, had completed advanced Good Clinical Practice (GCP) courses, and had rich experience in patient safety research projects.
The second data collector (LH) was a junior doctor in the second year of his residency in anesthesia. During this study, he worked in a 50% clinical and 50% scientific capacity at the University Hospital Zurich. Previously, he had completed an entry-level GCP course at the clinical trials center of the University of Zurich.
Description of the interview
We conducted the data collection sessions and interviews in various quiet rooms of the University Hospital Zurich, where the data collectors and the participants were undisturbed.
The question we asked the participants was: "What are the most common problems with patient monitoring in your daily work?"
Before the interviews, the data collectors motivated the participants to answer the questions openly with anything that came to their minds. Otherwise, no prompts or instructions were given. There were no time limits.
As the subjects pared their thoughts, the data collectors typed notes into a Microsoft Word document (Microsoft Corp., Redmond, WA, USA) on an Aspire V15 Nitro laptop computer (ACER, Inc., Taipei, Taiwan).
The transcript was visible to participants during data entry and was provided at the end of the interview for comments and corrections.
For analysis, we translated the original answers from German to English and unified words of similar meaning to make it easier to count and encode words. The matched words were tangling = cable clutter; not intuitive = non-intuitive; use, handling = operation. With the resulting English translation of the answers, we performed a word count (Table 2 in Supplementary file 1) and created a tag cloud (Figure 1) using Wordle.net. We omitted common English words like ‘and’ or ‘the’ in the word counts and the tag cloud.
Study author DWT, a senior physician, with previous experience in patient safety research, who had not conducted any of the interviews himself, coded the respondents' interview answers.
Major and sub topics were derived from the interviewees’ responses, by applying a two-step process consisting of deductive coding based on word count, followed by inductive (free-) coding based on common topics that emerged from the answers but had not been identified with word counting.
We present and discuss these topics and sub-topics with examples in the results section and in Table 2. Additionally, we provide a figure of the coding tree. The complete dataset with unformatted original answers, the stepwise translation, and correction as well as the coding of the answers are provided in Supplementary Table 1.
Some participants gave two interviews because they participated in more than one cycle of the technology's systematic development process, including the interviews. In the analysis of the responses, we took both responses as one, and therefore only counted these participants once, as if they had only given one interview.
For data management, we used the software Atlas TI 8.0 (Scientific Software Development GmbH, Berlin) and Microsoft Word.
Part II: Quantitative analysis of statement ratings
For the second, quantitative part of this study, we conducted a field survey in which the participants rated their agree- or disagreement to the qualitative statements.
Description of the field survey
In the field survey, we asked the participants to rate a total of six statements based on the topics identified in the qualitative analysis of the interview responses. Specifically, we created a statement for each of the main topics identified in the qualitative analysis of the interview responses. We considered the statements we created for the participants' evaluation to be relevant for a better understanding of care providers’ problems with patient monitoring.
These statements were evaluated on five-point Likert scales by the participants. The Likert scales consisted of the following five divisions: "strongly disagree," " disagree," "neutral," "agree," and "strongly agree."
We present the results of the field survey for each statement separately in the form of percentages as well as median and interquartile ranges (IQR). We used the Wilcoxon signed rank test to find out whether the sample medians were significantly different from neutral. We considered a difference from neutral as practically significant and a p-value of < 0.05 as statistically significant.
Through the evaluation of these statements, we wanted to quantify the agreement or disagreement of the participants with statements created from the interviews (part 1 of the study) by higher-level of evidence than purely qualitative description.