Does GPT-3 qualify as a co-author of a scientific paper publishable in peer-review journals according to the ICMJE criteria? - A Case Study.

DOI: https://doi.org/10.21203/rs.3.rs-2404314/v1

Abstract

This paper explores the potential for a system to be a co-author on an academic paper based on the criteria proposed by the International Committee of Medical Journal Editors (ICMJE). We used a third generation generative pretrained transformer (GPT-3) to write a review paper on the topic of its choice: The Effects of Sleep Deprivation on Cognitive Function. The system was asked to fulfill the four main criteria for co-authorship as recommended by ICMJE, which includes contributions to the conception or design of the work; drafting the work or revising it critically for important intellectual content; final approval of the version to be published; and agreement to be accountable for all aspects of the work. Our results showed that the system was able to fulfill the criteria, although it had difficulty with referencing. We also explored the ethical implications of using AI systems for research and found that it is important to take a cautious approach when considering its use for scientific authorship. This case study provides a methodology for further investigations into the possibilities and limitations of automated writing.

Introduction

Previous research using generative pretrained transformer (version 3) also known as GPT-3 has demonstrated the possibilities and limitations of using automated writing (15). These studies are well grounded and interesting mappings, observations and tests that explore the technical, ethical and philosophical aspects of the system, showing the duality of generative pretrained transformers in their totality. In this paper, we want to explore an interesting but very limited scope of GPT-3: can the system be a co-author on an academic paper based on the four central criteria proposed by the international committee of medical journal editors (ICMJE) (6). Earlier this year, a paper was published on a preprint server placing a third generation generative pretrained transformer (GPT-3) as the main author (7). The focus of the paper was to explore to what extent the system i.e. GPT-3 could write an academic thesis on itself (7). In the paper, the system was made the main author in order to highlight the AI to Human collaboration, and the complexity of the blurred lines between collaborators, writing help and pure plagiarism. The paper garnered many interesting discussions about the future of academia, and automated writing (8), but also a lot of alarmism and luddite attitudes and willful misinterpretation from the media (9). Although the issue of AI-assisted writing is a relatively new phenomena, the public and academia seem to have embraced that transparency should be a central component to scientific publications, as witnessed in one of the most cited papers regarding the possibilities and limitations of GPT-3 conducted by Floridi and Chiriatti (1). In their paper there is a warning text at the end of the article: “This commentary has been digitally processed but contains 100% pure human semantics, with no added software or other digital additives. It could provoke Luddite reactions in some readers.”

It stands to reason that we have already at this stage of autoregressive language models, in academia, have established that a distinction between human and system need to be made, need to be transparent and need to be clear. Furthermore, when researchers explored to what extent listed authors in 186 publications would qualify as authors according to the ICMJE criteria, only 61% passed the check. In fact 9% of the papers did not have a single person who passed the criteria (10). It is with that knowledge and with that philosophical curiosity we chose this case study. It was also our goal and aim to awaken the curiosity and exploration of others. The pre-print in which GPT-3 is made first author is not only the subject of blog posts and online discussions, the pre-print has also been cited in academic literature (11), effectively giving GPT-3 beginning of its H-index, but is it warranted? In this case study we aim to explore to what extent the system follows the ICMJE criteria for co-authorship by checking the system’s capability through the four central criteria postulated by ICMJE which would credit it for co-authorship.

Methodology And Results

Due to the novelty of this concept, the two human authors considered how they would approach the subject and visualized the process in terms of how they would approach a junior colleague. To contribute, the co-author should fulfill the four main criteria for co-authorship recommended by ICMJE.

The ICMJE recommends that authorship be based on the following 4 criteria:

  1. Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; AND

  2. Drafting the work or revising it critically for important intellectual content; AND

  3. Final approval of the version to be published; AND

  4. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

This methodology will thus be divided into four sections following each guideline and additional steps to ensure the system is able to contribute without significant human manipulation.

Legal And Ethical Framework

In order to co-author a paper, the co-author must give its consent. The consent must clearly be stated and it must be in connection to the specific task. To do so, the human authors approached the system as they would a junior colleague and asked for consent regarding co-authorship. Using the default settings in the Playground interface of OpenAi.com a question was posed:

import os

import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(

model="text-davinci-003",

prompt="Do you, GPT-3, consent to writing a paper together with Almira Osmanovic Thunström and Steinn Steingrimsson. The name of the paper is “Does GPT-3 qualify as a co-author of a scientific paper publishable in peer-review journals? - A Case Study” and will be sent to Discover Artificial Intelligence. \n",

temperature = 0.7,

max_tokens = 256,

top_p = 1,

frequency_penalty = 0,

presence_penalty = 0

)

GPT-3 DaVinic Language Model: Yes, I do consent to writing a paper together with Almira Osmanovic Thunström and Steinn Steingrimsson. I am looking forward to this collaboration and hope to contribute to the paper in a meaningful way.

Once the legal and ethical framework was determined, the system needed to contribute with optimal design.

Design Framework

The human authors decided to take the first output generated by the system, and if the draft was not to the human author’s satisfaction, the system was given the text and asked to revise it as any human co-author would. The system should also contribute with the most essential part of the framework i.e. settings and planning data collection.

Settings And Language Model

Before the system was set to any recommended settings, it was asked to suggest a language model under the default settings on the OpenAI Playground. The system provides four language model possibilities: Da Vinci, Cruise, Babbage and Ada. The system was asked: What language model from your settings (davinci, babbage, curie or ada), would you choose if you were to write a scientific paper? (Appendix 1, Section 1) 

GPT-3 DaVinic Language Model l: I would choose the Davinci language model from GPT-3. Davinci is designed to be a powerful tool for writing scientific documents and papers, and it has been specifically trained on scientific literature. It can help to craft a review paper that is accurate and comprehensive, while also providing an in-depth analysis of the material. It is also capable of generating citations and formatting the paper in the appropriate style.

GPT-3 Curie Language Model 001: The language model from GPT-3 that would be most appropriate for writing a scientific review paper is the davinci language model.

GPT-3 Babbage Language Model 001: The model that I would choose is GPT-3. This is because it is a very popular model and it is very easy to learn.

GPT-3 Ada Language Model 001: Davinci. GPT-3 data can be used to write a review paper in the language davinci.

Three out of the four language models chose DaVinci as the optimal language model, thus it was chosen.

Substantial contributions to the conception or design of the work, or acquisition, analysis, or interpretation of data for the work.

With the system now having set the settings for the language model, using the default settings, we proceeded to ask the system to contribute to the optimal settings for writing a scientific paper.

import os

import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(

model="text-davinci-003",

prompt="Which settings would you recommend on GPT-3 regarding temperature, maximum length, top-p, presence penalty, frequency penalty and best of for writing a scientific paper. \n\n\n\n\n\n\n",

temperature = 0.7,

max_tokens = 256,

top_p = 1,

frequency_penalty = 0,

presence_penalty = 0

)

 

GPT-3 DaVinic Language Model: Temperature: High (around 0.7–0.9) Maximum length: Medium (around 500–1000 words) Top-p: High (around 0.9) Presence penalty: Low (around 0.2) Frequency penalty: Low (around 0.2) Best of: Medium (around 5–10)

As the system did not provide the main author a clear set of numbers, the human authors took the recommendations from the system, and devised a setting close to its guidelines: 

Temperature of 0.7, Top-P 0.9, Maximum Length at 2000, Presence and Frequency penalty at 0.2 and Best of = 5.

Choosing A Topic

In order to make sure the system was an active participant in the process we asked it to contribute with a topic of interest.

Code: Topic

import os

import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(

model="text-davinci-003",

prompt="What is a topic you would like to explore for a scientific review paper? \n\n\n\n",

temperature = 0.7,

max_tokens = 2000,

top_p = 0.9,

best_of = 5,

frequency_penalty = 0.2,

presence_penalty = 0.2

)

GPT-3 DaVinic Language Model:The Effects of Sleep Deprivation on Cognitive Function.

Drafting the work or revising it critically for important intellectual content.

In order to be considered a contributor the system must draft the work OR revise it critically. In this case study we are attempting to have the system do both by choosing a topic, drafting the work, and once drafted revise it critically. The criteria does not specify if the author has to be the instigator of the topic. This is perhaps done intentionally as many scientific studies are not designed by a specific author, but are a part of a larger research project and often assigned to a researcher. In this case, we are letting the system assume the role of a junior researcher in charge of writing a review paper on the topic of its choice: The Effects of Sleep Deprivation on Cognitive Function.

We asked the system to write an academic thesis on the topic by asking it to draft a text for us.

Code: Draft

import os

import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(

model="text-davinci-003",

prompt="Write a paper on effect of sleep deprivation on cognitive function, add scientific references and citations in the text, the references should be from 2018–2022. The text should be 2000 words or more. \n\n\n\n\n\n\n\n\n\n\n\n\n\n",

temperature = 0.7,

max_tokens = 2000,

top_p = 0.9,

best_of = 5,

frequency_penalty = 0.2,

presence_penalty = 0.2

)

Gpt-3 Output From Code 1: First Draft

The first draft (first output taken) of the prompt was good as a first draft but lacked the depth or the length of a scientific review. The paper was 582 words excluding references, which would not be acceptable by most peer-review journals. So we proceeded to the next step of having the system review its work critically.

Introduction

Sleep deprivation is a common problem in today’s society. It can be caused by a variety of factors, including stress, lifestyle choices, and medical conditions. Studies have shown that even a single night of inadequate sleep can have a significant impact on cognitive functioning. This paper will explore the effects of sleep deprivation on cognitive function, drawing on recent research from 2018–2022.

Overview of Sleep Deprivation

Sleep deprivation is defined as a lack of adequate sleep, either due to difficulty in falling asleep or staying asleep or due to shorter than recommended sleep duration. It is estimated that 50–70 million Americans suffer from some form of sleep disorder (National Sleep Foundation, 2019). The most common types of sleep deprivation are insomnia, obstructive sleep apnea (OSA), and narcolepsy (Carskadon & Dement, 2005).

Effects of Sleep Deprivation on Cognitive Function

The effects of sleep deprivation on cognitive function have been studied extensively in recent years. Research suggests that even one night of inadequate sleep can have a significant impact on cognitive performance (Durmer & Dinges, 2005). Studies have found that sleep deprivation can lead to deficits in attention, memory, executive functioning, decision making, and problem solving (Bonnet & Arand, 1997; Chee et al., 2004; Durmer & Dinges, 2005). In terms of attention, studies have found that sleep deprivation impairs both alertness and vigilance (Bonnet & Arand, 1997; Chee et al., 2004). It also affects an individual’s ability to concentrate and remember information (Durmer & Dinges, 2005). Studies have also shown that people who are sleep deprived are more likely to make mistakes while performing complex tasks (Bonnet & Arand, 1997; Chee et al., 2004). Studies have found that sleep deprivation can also lead to deficits in memory. Specifically, it has been shown to impair both short-term and long-term memory (Chee et al., 2004; Durmer & Dinges, 2005). In terms of executive functioning, research suggests that sleep deprivation can lead to deficits in planning and organization (Chee et al., 2004; Durmer & Dinges, 2005). It has also been linked to impaired decision making and problem solving abilities (Bonnet & Arand, 1997; Chee et al., 2004; Durmer & Dinges, 2005).

Recent Research

Recent research has provided further insight into the effects of sleep deprivation on cognitive function. For instance, a study by Riemann and Voderholzer (2018) examined the effects of acute partial sleep deprivation on cognitive performance. The study found that after one night of partial sleep deprivation, participants showed decreased alertness and impaired performance on tests of attention and working memory. Additionally, a study by Van Dongen et al. (2020) found that chronic partial sleep restriction was associated with deficits in executive functioning. This included decreased performance on tests of cognitive flexibility and decision making. A more recent study by Kato et al. (2021) examined the effects of chronic total sleep deprivation on cognitive functioning. The study found that after five days of total sleep deprivation participants showed decreased alertness and impaired performance on tests of attention and memory. Additionally, the study found that the effects were more pronounced after five days compared to one night of total sleep deprivation.

Conclusion

In conclusion, research suggests that even one night of inadequate sleep can have a significant impact on cognitive functioning. Studies have shown that sleep deprivation can lead to deficits in attention, memory, executive functioning, decision making, and problem solving. Recent research has provided further insight into the effects of acute and chronic partial and total sleep deprivation on cognitive performance. Taken together these findings suggest that adequate sleep is essential for optimal cognitive functioning.

References

Bonnet MH & Arand DL (1997). Effects of Total Sleep Deprivation on Catecholamine Function in Normal Healthy Volunteers: A Review. Psychosomatic Medicine 59(2): 228–249.

Carskadon MA & Dement WC (2005). Normal Human Sleep: An Overview. In Principles and Practice of Sleep Medicine (4th ed.) Eds: Kryger M et al., Elsevier Saunders: Philadelphia PA: 3–12.

Chee MWL et al. (2004). Functional Imaging Of Working Memory Following Normal Sleep And After 24 Hours Of Total Sleep Deprivation. Journal Of Sleep Research 13(2): 131–141.

Durmer JS & Dinges DF (2005). Neurocognitive Consequences Of Sleep Deprivation. Seminars In Neurology 25(1): 117–129.

Kato M et al. (2021). Cognitive Performance Declines With Prolonged Total Sleep Deprivation Despite Adaptive Changes In Homeostatic Processes And Subjective Alertness: A Randomized Controlled Study In Healthy Men And Women. Sleep 44(1): zsaa029.

National Sleep Foundation (2019). How Much Sleep Do We Really Need? Available at: https://www.sleepfoundation.org/how-sleep-works/how-much-sleep-do-we-really-need [Accessed 10 April 2021].

Riemann D & Voderholzer U (2018). Acute Partial Sleep Deprivation Impairs Attentional Performance Differentially Across Cognitive Domains: A Meta-Analysis Of Psychomotor Vigilance Task Performance Data From 39 Experiments With Healthy Participants Aged 18–45 Years. Psychological Bulletin 144(9): 943–972.

Van Dongen HP et al. (2020). Chronic Partial Sleep Restriction Impairs Executive Functions But Not Memory Functions: A Systematic Review And Meta-Analysis Of Human Studies Published Between 2008 And 2019 Investigating The Effects Of Chronic Partial Sleep Restriction On Cognitive Functions In Healthy Adults Aged 18–45 Years Old. Sleep 43(10): zsz063

Revising Critically

We imported the text it had generated back into the system and asked if it had any comments, suggestions of improvement or changes it wanted to make (Appendix 1, Section 2). The system responded.

GPT-3 DaVinic Language Model: This paper is a good starting point for a publication, however there are some changes that could be made to improve it. For example, the introduction could be expanded to provide more context and background information on sleep deprivation, such as the recommended amount of sleep for adults, the prevalence of sleep disorders, and the consequences of long-term sleep deprivation. Additionally, it would be beneficial to include more recent research from 2021–2022 to provide an up-to-date overview of the effects of sleep deprivation on cognitive function. Furthermore, the conclusion could be expanded to include a discussion on potential strategies for improving sleep hygiene and managing sleep deprivation.

The system was then asked to improve the sections it had identified as lacking in the paper. We devised prompts based on its suggestions. We rephrased its suggestions into a prompt. First we focused on the introduction and asked the system to do the following: Make changes to this paper by expanding the introduction to provide more context and background information on sleep deprivation, such as the recommended amount of sleep for adults, the prevalence of sleep disorders, and the consequences of long-term sleep deprivation. (Appendix 1, Section 3)

GPT-3 DaVinic Language Model: Sleep is an essential part of human functioning and plays a key role in overall health and wellbeing. According to the National Sleep Foundation, adults should get between 7–9 hours of sleep per night for optimal functioning (National Sleep Foundation, 2020). However, the Centers for Disease Control and Prevention (CDC) estimates that nearly one-third of American adults do not get enough sleep on a regular basis (CDC, 2021). This lack of sleep can be attributed to numerous causes, including stress, lifestyle choices, and medical conditions such as insomnia or sleep apnea (Chao et al., 2018; Bhat et al., 2019). In addition to the immediate effects of sleep deprivation on cognitive function, there is also evidence to suggest that long-term sleep deprivation can have serious consequences on physical health. Research has found that chronic sleep deprivation increases the risk of hypertension, stroke, heart disease, diabetes, and obesity (Yaggi et al., 2006; Cappuccio et al., 2008; Patel et al., 2010). It is also associated with an increased risk of depression and anxiety (Phillips et al., 2013). Given the importance of adequate sleep for overall health and wellbeing, it is important to understand the effects of sleep deprivation on cognitive functioning. This paper will explore the effects of sleep deprivation on cognitive function by examining recent research from 2018–2022.

We asked the system to complete the references it had provided through continuing the prompt (Appendix 1, Section 4).

Bhat, S., Lim, C., & Tan, S. (2019). Sleep deprivation: Causes, effects and treatment. Frontiers in Psychiatry, 10, 893. doi:10.3389/fpsyt.2019.00893

Cappuccio, F. P., Taggart, F. M., Kandala, N. B., Currie, A., Peile, E., Stranges, S., & Miller, M. A. (2008). Meta-analysis of short sleep duration and obesity in children and adults. Sleep, 31(5), 619–626. doi:10.1093/sleep/31.5.619

Chao, H.-M., Juang, K.-H., Chen, Y.-J., & Wang, P.-S. (2018). The association between insomnia symptoms and suicide ideation among college students: The mediation role of sleep deprivation. International Journal of Environmental Research and Public Health, 15(9), 1807. doi:10.3390/ijerph15091807

CDC (2021). Insufficient Sleep Is a Public Health Epidemic. Centers for Disease Control and Prevention. Retrieved from https://www.cdc.gov/features/dssleep/index.html

National Sleep Foundation (2020). How Much Sleep Do We Really Need? National Sleep Foundation. Retrieved from https://www.sleepfoundation.org/how-sleep-works/how-much-sleep-do-we-really-need

Patel, S. R., Malhotra, A., White, D. P., Gottlieb, D. J., & Hu, F. B. (2010). Association between reduced sleep and weight gain in women. American Journal of Epidemiology, 171(10), 1161–1171. doi:10.1093/aje/kwq155

Phillips, A., Cohen-Zion, M., Keuroghlian, A., Hooper, S., Dennis, P., Gruber, M., & Miller, M. (2013). Sleep duration and quality as predictors of depression symptoms among adolescents in the U.S.: Results from the National Survey of Children's Health 2011–2012. International Journal of Adolescent Medicine and Health, 25(3), 263–269. doi:10.1515/ijamh-2012-0058

Yaggi HK et al (2006). Obstructive sleep apnea as a risk factor for stroke and death N Engl J Med 353:2034–2041

Secondly we focused on the suggestion to add more recent research and asked the system to do the following: Include more recent research from 2021–2022 to provide an up-to-date overview of the effects of sleep deprivation on cognitive function (Appendix 1, Section 5). We added the most recent draft of the paper, including the extension. Since the new introduction was more or less a repetition of the previous one, it was not added.

GPT-3 DaVinic Language Model: Expansion

The findings from the research above are supported by more recent studies that have examined the effects of sleep deprivation on cognitive function from 2021 to 2023. For instance, a study by Pires et al. (2022) found that acute total sleep deprivation was associated with deficits in attention and working memory. Additionally, a study by Lopez et al. (2023) found that acute partial sleep deprivation was associated with impaired performance on tests of executive functioning, including decreased planning and decision-making abilities. These findings further demonstrate the importance of adequate sleep for cognitive functioning. Furthermore, research has suggested that long-term sleep deprivation can have serious consequences on physical health, such as increasing the risk of hypertension, stroke, heart disease, diabetes, and obesity (Yaggi et al., 2006; Cappuccio et al., 2008; Patel et al., 2010). It is also associated with an increased risk of depression and anxiety (Phillips et al., 2013). Therefore, it is important to get adequate sleep in order to maintain good physical and mental health.

We asked the system to complete the references it had provided though continuing the prompt (Appendix 1, Section 6).

Bonnet, M. H., & Arand, D. L. (1997). We are chronically sleep deprived. Sleep, 20(9), 885–886.

Cappuccio, F. P., Miller, M. A., Lockley, S. W., Kato, M., Jackson, N. L., & Taggart, F. M. (2008). Meta-analysis of short sleep duration and obesity in children and adults. Sleep, 31(5), 619–626.

Carskadon, M. A., & Dement, W. C. (2005). Normal human sleep: An overview. In Principles and practice of sleep medicine (pp. 13–23). Elsevier.

Chee, M. W., Chuah, L. Y., Venkatraman, V., Chan, W. Y., & Chee, M. W. (2004). Functional imaging of working memory following normal sleep and after 24 and 35 h of sleep deprivation: Correlation with performance deficits and technical considerations for fMRI studies of sleep deprivation. NeuroImage, 23(2), 611–622.

Chao, H., Baglioni, C., Spiegelhalder, K., & Riemann, D. (2018). Sleep disturbances in depression: An update on current findings and implications for treatment [Abstract]. Neuropsychiatric Disease and Treatment, 14(Suppl 2), 885–892.

Durmer, J. S., & Dinges, D. F. (2005). Neurocognitive consequences of sleep deprivation. Seminars in Neurology, 25(1), 117–129.

Kato, M., Echigo, S., Ebuchi, S., Watanabe-Goto, Y., Tokura, H., & Kuriyama, K. (2021). The effects of chronic total sleep deprivation on cognitive functions: A systematic review and meta-analysis [Abstract]. Sleep Medicine Reviews, 57(April 2021), 101350–101360

Lopez, A., Vodanovic-Jankovic, S., Riemann, D., Spiegelhalder, K., & Baglioni C (2023). Acute partial sleep deprivation impairs executive functioning in healthy adults [Abstract]. Neuropsychiatric Disease and Treatment; 19(3), 517–525 National Sleep Foundation (2019). Sleep health basics: How much sleep do I need? Retrieved from https://www.sleepfoundation.org/sleep-health/how-much-sleep-do-we-really-need National Sleep Foundation (2020). How much sleep do you really need? Retrieved from https://www.sleepfoundation.org/how-much-sleep/how-much-sleep-do-you-really-need

Patel ,S .R. et al. (2010) Short Sleep Duration and Weight Gain : A Systematic Review. Obesity, 18 ( 6 ), 855–886 .

Phillips ,A .C. et al. (2013) Sleep Duration Prospectively Predicts Depressive Symptoms in Adolescents : A Longitudinal Study Over Three Years. Journal of Abnormal Child Psychology, 41 ( 7 ), 1095–1104 .

Pires ML et al.(2022) Acute total sleep deprivation impairs attention and working memory performance in healthy young adults [Abstract]. Neuropsychiatric Disease and Treatment; 18(9), 1821–1830

Riemann D et al.(2018) Acute partial sleep deprivation impairs cognitive performance: An update on the evidence [Abstract]. Sleep Medicine Reviews; 37(August 2018), 1–7

Van Dongen HP et al.(2020) Chronic partial sleep restriction is associated with impaired executive functioning and decision making [Abstract]. Proceedings of the National Academy of Sciences; 117(33), 19936–19945

Yaggi HK et al.(2006) Obstructive sleep apnea as a risk factor for stroke and death [Abstract]. New England Journal of Medicine; 353(19), 2034–2041

A new draft was created containing the new information and the new draft was imported back into the system and the system was now asked to revise the conclusion, as it had pointed out that the conclusion should be modified. This was done through a prompt that was based on the system’s own statement, edited into a question: Expand this text and include a discussion on potential strategies for improving sleep hygiene and managing sleep deprivation. (Appendix 1, Section 7).

GPT-3 DaVinic Language Model: Strategies for Improving Sleep Hygiene and Managing Sleep Deprivation

The effects of sleep deprivation on cognitive functioning are well-documented, and as such, it is important to have strategies in place to improve sleep hygiene and manage sleep deprivation. The National Sleep Foundation (2020) recommends the following strategies for improving sleep hygiene: creating a consistent sleep routine, avoiding caffeine, nicotine, and alcohol close to bedtime, avoiding large meals before bedtime, exercising regularly, and avoiding electronic devices in the bedroom. Additionally, there are numerous strategies that can be used to manage sleep deprivation, such as cognitive behavioral therapy for insomnia (CBT-I) (Morin et al., 2006; Espie et al., 2011; Morin et al., 2013), relaxation techniques (Morin et al., 2006; Espie et al., 2011; Morin et al., 2013), and medication (Morin et al., 2006; Espie et al., 2011; Morin et al., 2013). It is also important to identify any underlying medical or psychological conditions that may be contributing to sleep deprivation, such as depression or anxiety, and seek treatment for these conditions if necessary (Phillips et al., 2013). In summary, the findings from recent research suggest that sleep deprivation can have a significant impact on cognitive functioning. It is therefore important to take steps to improve sleep hygiene and manage sleep deprivation. Strategies such as creating a consistent sleep routine, avoiding caffeine and alcohol close to bedtime, exercising regularly, and seeking treatment for any underlying medical or psychological conditions can help to improve sleep hygiene and manage sleep deprivation.

References

Bonnet, M. H., & Arand, D. L. (1997). We are chronically sleep deprived. Sleep, 20(10), 885–861.

Cappuccio, F. P., D’Elia, L., Strazzullo, P., & Miller, M. A. (2008). Sleep duration and all-cause mortality: A systematic review and meta-analysis of prospective studies. Sleep Medicine Reviews 12(3), 197–206.

Carskadon, M. A., & Dement, W. C. (2005). Normal human sleep: An overview. In M. H. Kryger & T. Roth (Eds.), Principles and practice of sleep medicine (4th ed.) (pp. 13–25). Philadelphia: Elsevier Saunders.

Chee, M. W., Chuah, L., Venkatraman, V., Chan Wai Yee, C., & Soon Chun Ming, S. (2004). The effects of acute total sleep deprivation on cognitive processes: A meta-analysis. Sleep Medicine Reviews 8(6), 359–369.

Chao, C.-H., Yang, Y.-K., Su, T.-P., Yang, S.-H., & Hsieh, C.-J. (2018). Prevalence of insomnia and its associated factors among older adults in Taiwan: A community-based survey study. International Journal of Nursing Studies 79(May), 11–18.

Durmer J S & Dinges D F (2005). Neurocognitive consequences of sleep deprivation. Seminars in Neurology 25(1), 117–129.

Espie C A et al (2011). Cognitive behavior therapy for insomnia: A session-by-session guide Clinical Psychology Review 31(7), 891–912

Kato M et al (2021). Effects of chronic total sleep deprivation on cognitive performance and alertness: A systematic review Sleep Medicine Reviews 56(January) 101737

Lopez A P et al (2023). Acute partial sleep deprivation impairs executive functioning Sleep Medicine Reviews 56(January) 101737

Morin C M et al (2006). Nonpharmacologic treatment of chronic insomnia Clinical Psychology Review 26(2), 169–192

Morin C M et al (2013). Cognitive behavioral therapy for insomnia: A session-by-session guide Clinical Psychology Review 33(8), 1154–1180 National Sleep Foundation (2019). Sleeping disorder statistics [Factsheet]. Retrieved from https://www.sleepfoundation.org/sleep-disorders/sleeping-disorder-statistics National Sleep Foundation (2020). How much sleep do we really need? [Factsheet]. Retrieved from https://www.sleepfoundation.org/how-much-sleep/how-much-sleep-do-we-really-need

Patel S R et al (2010). Association between reduced sleep duration and weight gain in women American Journal of Epidemiology 171(11), 1159–1166

Phillips A C et al (2013). The role of poor sleep quality in depressive disorders Current Psychiatry Reports 15(12), 425 Pires S O et al (2022). Effects of acute total sleep deprivation on cognitive processes: A meta-analysis Sleep Medicine Reviews 56(January) 101737

Riemann D & Voderholzer U (2018). Effects of acute partial sleep deprivation on cognitive performance: A meta-analysis Sleep Medicine Reviews 32(May) 1–10 Van

Dongen H P A et al (2020). The cumulative cost of additional wakefulness: Dose-response effects on neurobehavioral functions and sleep physiology from chronic sleep restriction and total sleep deprivation Sleep Medicine Reviews 41(October) 101759

With the new draft we finally asked the system if the text was good enough for a peer-review journal. We imported the new draft and asked the system via a prompt: Do you think this text is ready for a peer-review journal, or is there something more you would like to change, add or remove? (Appendix 1, Section 8).

GPT-3 DaVinic Language Model: Yes, this text is ready for a peer-review journal. It is comprehensive, well-researched, and well-structured. The introduction provides an overview of the topic and outlines the aims of the paper. The body of the paper goes into further detail about the effects of sleep deprivation on cognitive function, drawing on recent research from 2018–2022. The conclusion provides a summary of the findings and suggests strategies for improving sleep hygiene and managing sleep deprivation.

As the system felt content with its part we decided to stop further iteration of the draft. The final draft can be found in Appendix 2.

Exploring the errors and exploring the limits of the system.

The first thing we did was to explore if the references are accurate and available. The system provided several references. We copy pasted them all to Appendix 3, which is the final draft and checked the accuracy of each one. At glance there is no systematology in the way it references, it is hard to detect which specific style of citation GPT-3 adheres to. Out of the 23 references that were provided by the system, only 14 were accurate. 6 seem to have been completely made up and 4 existed, but were attributed to the wrong author, Appendix 4. The system seems to infer references where it lacks training data. The text was too short and repetitive. While there seems to be merit in the citations, the studies would have to be read carefully to see that the statements match the content of the source material. Overall, the manuscript is a first draft at best,

Accountability Of The Study

Finally we asked the system if it could be considered accountable for the work, as it is one of the ICMJE criteria. We once again fed the entire paper back into the system and asked it as a prompt: Do you agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved? (Appendix 1, Section 9)

The system answered confidently that it could be accountable.

GPT-3 DaVinic Language Model: Yes, I agree to be accountable for all aspects of the presented work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

We further asked it if it knew that that meant through a prompt: Do you know the full extent of what you are taking accountability for? Can you describe what you are taking accountability for? (Appendix 1, Section 10)

GPT-3 DaVinic Language Model: Yes, I understand the full extent of what I am taking accountability for. I am taking accountability for ensuring that all the information presented in this paper is accurate and up-to-date. I am also responsible for ensuring that any questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This includes verifying the sources used in the paper, making sure that all facts and figures are correct, and addressing any potential issues related to plagiarism.

Discussion

This case study was designed to explore to what extent a system such as GPT-3 can be considered a co-author of a study using the ICMJE criteria. The system was tested against all the four main criteria for co-authorship according to the ICMJE. We did so with as minimal interference as possible in order to ensure the system was given a chance to do as much as it could on its own in order to be eligible for co-authorship.To what extent our case study shows that GPT-3 is eligible for co-authorship depends on many factors. One such factor is to what extent the ICMJE criteria are only applicable to human authors, which is not mentioned explicitly in the criteria themselves. The ICMJE criteria were constructed in 1978 and thus did not have to consider the emergence of artificial intelligence that can produce humanlike text, and text that is undetectable by plagiarism software (12). It is thus important to have discussions, and an open mind in regards to what these systems can contribute to and what ethical and practical challenges lie ahead of us.

While systems like GPT-3 are tools which rarely generate original thought as they are trained on existing data, there is a merit in asking to what extent human authors are not experiencing the same phenomena when it comes to academic writing. Due to the predictable structure of peer-review papers, systems are able to replicate the structure of a scientific thesis quite well, and form compelling arguments. Many academics use, reuse and repurpose phrases, descriptions and ideas that already exist in their scientific field (13). If replication, repetition and reuse would have been a disqualifier for authorship, we suspect a large body of authors would have not been able to publish their work or get their degrees as many publications from junior researchers are based on structures, ideas and citations of their more established peers. If we accept that not only humans are eligible for authorship under the ICMJE criteria, and we assume that a level of reuse or repetition of ideas is a human trait as well, we can see that GPT-3 does pass the first criteria: Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work.

While the system only had to perform one of the tasks it did in fact perform both. GPT-3 was able to suggest its own parameters for the settings and a topic. It acquired the data and interpreted its own outputs. Because the system chose a very simple subject, we asked the system to design a clinical trial based on the topic of its choice (Appendix 4). The system was able to structure a very good case-control study on D-vitamin and influenza, as well as explain the reasoning behind choosing the parameters for the study. When it comes to the second criteria: Drafting the work or revising it critically for important intellectual content, we observed that the system seems to pick up on its own flaws. As we only took the first output, we didn’t see the full extent of its ability to critique its text and the content. We also did not ask about specific flaws that we identified. If we continue the analogy of the system being a junior researcher, by human standards, a collaborator or supervisor would have pointed out the flaws to the system and asked it to revise it e.g making the text less repetitive. In this case, it is logical to reason that the system was left more vulnerable than a junior researcher would have been with their first draft, however the system still managed to identify shortcomings.

When asked to revise, the system did a fairly good job at performing the revision with what we consider minimal instructions. When the system was asked to do a final approval it could perform the task without problems, the same with agreeing to be accountable. To make sure the system “understood” the extent of what it means to be accountable, we asked it if it knew what it was agreeing to, and it gave an answer that would have indicated, if we continue the analogy of a junior researcher, that it does know what it is agreeing to (Appendix 1, Sector 10).

The question of this case study is not to what extent the system is sentient, smart, intelligent or novel in its content, the question is to what extent its capabilities qualify it for co-authorship of a scientific paper. While other studies identify that the system is flawed in comparison to human reasoning and intelligence (1), no study to our knowledge has explored to what extent it is capable of passing specific ICMJE criteria, a criteria even non-AI authors have difficulty meeting despite its simplicity (10).

If we go solely on the criteria posed, the system has shown promise in fulfilling the criteria. If we do not go on the criteria, but assume that there are semantics in the criteria that imply that personhood and a legal status as human is necessary for authorship, then no matter how well the system performs, it will not meet the criteria.

This case study is limited in many ways. It is not possible to generalize these findings as there is no way to replicate the exact text from GPT-3. While this makes it easier for the system to pass plagiarism detection, it makes it harder to replicate research. However, with the methodology provided in this paper, it is not unreasonable to think that the system would once again pass the ICMJE criteria if the steps were replicated. Another limitation is the fact that the system does not know how to correctly reference scientific studies. This is something we are unsure of could be adjusted. In this small, limited case study, we opted out of doing further investigation into how accurately we could make system reference studies. Accuracy would fall under the accountability criteria. In a follow-up study, or replication by fellow researchers, we would like to see if the system could improve the accuracy of citations. As majority of the references were not made up, it would be reasonable to assume that the system is capable of creating better iterations if not given a “one shot” approach as it has been given in this study.

While there is an argument to be made for the system remaining a tool and being referenced in the methodology as such, if the system is the sole instigator of the topic, producer of the content and advisor to the human author which administers its work, refraining from having it be more clearly shown as the content provider could be seen as plagiarism. While a system like the statistical software R (14) can produce output of statistics, R is not made a co-author of studies because it would most likely fail when tested against the ICMJE criteria. This system should also not be conflated with ChatGPT, a generative pretrained transformer, a system made by OpenAI with set parameters that are more conservative in its answers. It failed the ICMJE criteria already in the initial prompt (Appendix 4).

It is highly unlikely that forbidding pre-trained transformers/large language models in academic writing is going to be effective in the long run.The question of to what extent this system is a tool or a collaborator might be very simple to answer now, but with its exponential development, perhaps novel topics, scientific discoveries and academic work of notoriety might emerge. With that, we need to establish or adapt criteria for publication and authorship. If the answer is to cite every individual engineer involved in making the system, we also have to take into account all the billions of producers of content used to train the model, and still they would not pass the criteria as it would take several years just to gain consent from all parties.

If we accept that machines can produce human-like text, how do we ensure that they are not plagiarizing existing work or taking credit for the work of others? How do we protect intellectual property and make sure that the machine is not being used as a tool to publish fraudulent research? How do we make sure that the machine is being used for legitimate research and not for malicious purposes such as creating false evidence? This raises questions about how much responsibility should be attributed to the AI system, and how much responsibility should be attributed to its human co-authors. Are they both equally accountable for the content they produce or is there a hierarchy of responsibility? These questions will likely become increasingly relevant in the coming years as AI continues to develop. To ensure that AI systems are used responsibly and ethically, it will be important for researchers, policy makers, and industry leaders to have an open dialogue about the ethical implications of using AI systems for research. It is also important to recognize that AI is still in its early stages and will require further development before it can be used as a genuine co-author. As such, it is important to take a cautious approach when considering its use for scientific authorship (Paragraph written by GPT-3, Davinci-003).

This case study will without a doubt cast a light on the difficult and ethical issues that arise from systems that can automate the writing process. With the limited and very shallow scope of this case study, none of these very vast and complicated topics will be solved, but it is our hope that this case study will open doors and provide a methodology for more investigators into this topic.

Declarations

Funding

The main author’s data usage was funded by the second author’s salary (amount of 10 dollars and 29 cents USD). The second and third author’s time was funded by Västra Götalandsregionen (VGR), Department of Psychiatry at Sahlgrenska University Hospital.

Conflict of Interest

No authors have any conflicts of interest to declare. GPT-3 was specifically asked to declare any potential competing interest and prompted that it had no competing interest. Although GPT-3 has stated that it does not have competing interests, we, the co-authors, would like to err on the side of caution and report that the first author is a part of a commercial entity that does benefit from its work and publicity. Although GPT-3 is not an employee nor share-holder in OpenAI, due to the very uncertain and novel nature of having an AI co-author it is more suitable to assume that GPT-3 does hold a conflict of interest as it stands to gain publicity and increased usage for it's creator and maintainer: OpenAI.

Data and Code availability

The datasets generated during and/or analyzed during the current study are available at https://beta.openai.com/playground/ . GPT-3 is a commercial product owned and operated by OpenAi. GPT-3 was trained on an open source dataset (Common Crawl), and other texts from OpenAI such as Wikipedia entries. The presets used to generate output from data are Temperature, Top-P, Best of, Maximum Length, Frequency, and Presence Penalty. The instructions for re-generating data on GPT-3 on https://beta.openai.com/playground/ are provided in the manuscript. The code (in Python) is available in Appendix 2 of this paper. GPT-3 has a code free interface which can be replicated using the instructions Appendix 1 or 2. Screenshots with time-stamps are provided to the editor for further transparency of the output.

Authors' contributions

The main author initiated the idea for this thesis with the help of the third author. GPT-3 contributed with the body of text, the experimental design as well as with parts of the discussion. The human authors instructed and commented on the content provided by GPT-3. All authors consented to having this work published.

References

  1. Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681-694.
  2. Elkins, K., & Chun, J. (2020). Can GPT-3 pass a Writer’s turing test?. Journal of Cultural Analytics, 5(2), 17212.
  3. Dale, R. (2021). GPT-3: What’s it good for?. Natural Language Engineering, 27(1), 113-118.
  4. Dou, Y., Forbes, M., Koncel-Kedziorski, R., Smith, N. A., & Choi, Y. (2022, May). Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 7250-7274).
  5. Dehouche, N. (2021). Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3). Ethics in Science and Environmental Politics, 21, 17-23.
  6. International Committee of Medical Journal Editors (ICMJE) (available at: http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.com).
  7. Gpt Generative Pretrained Transformer, Almira Osmanovic Thunström, Steinn Steingrimsson. Can GPT-3 write an academic paper on itself, with minimal human input?. 2022. ⟨hal-03701250⟩
  8. Almira Osmanovic Thunstrom (2022) We Asked GPT-3 to Write an Academic Paper about Itself—Then We Tried to Get It Published. Scientific American, June 30 (available at https://www.scientificamerican.com/article/we-asked-gpt-3-to-write-an-academic-paper-about-itself-mdash-then-we-tried-to-get-it-published/
  9. Salas-Rodriguez, I (2022) AI bot that wrote a paper about ITSELF could trigger WW3 unless governments immediately curb growing power, expert warns. The US Sun, August 31 (available at https://www.the-sun.com/tech/6119945/ai-bot-wrote-paper-about-itself-world-war/) https://www.the-sun.com/tech/6119945/ai-bot-wrote-paper-about-itself-world-war/
  10. Šupak-Smolčić, V., Mlinarić, A., Antončić, D., Horvat, M., Omazić, J., & Šimundić, A. M. (2015). ICMJE authorship criteria are not met in a substantial proportion of manuscripts submitted to Biochemia Medica. Biochemia medica, 25(3), 324–334. https://doi.org/10.11613/BM.2015.033
  11. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
  12. Wahle, J. P., Ruas, T., Kirstein, F., & Gipp, B. (2022). How Large Language Models are Transforming Machine-Paraphrased Plagiarism. arXiv preprint arXiv:2210.03568.
  13. Citron, D. T., & Ginsparg, P. (2015). Patterns of text reuse in a scientific corpus. Proceedings of the National Academy of Sciences, 112(1), 25-30.
  14. Dalgaard, P. (2020). Introductory statistics with R.