Chronicle of the Causes of Famine in the World during 1840–2019 and their Risks

Famine still exists in this world, hindering the achievement of sustainable development goal 2: zero hunger. The history and mechanism of famine have been broadly studied; however, few studies have focused on quantitative and long-term analysis of the general characteristics of famine. This study aims to analyze the factors inuencing famine and estimate its risk. We developed a famine database and estimated the probability of the occurrence of famine through logistic regression with physical and social factors. Herein, we identied a few factors that are strongly related to the occurrence of famine and found that soil water and gross domestic product per capita are the most inuential. By extrapolating the logistic regression to a future scenario, we revealed that although the number of countries with high famine risk would decrease, some countries in sub-Saharan Africa would remain at high famine risk for a while.


Introduction
Famine is one of the oldest and most catastrophic forms of disaster worldwide. The history of famine can be dated back to 1700 BC [1] and many people have died from the lack of food since the dawn of human history. Approximately 70-80 million people died of famine in the 20 th century, and this gure is far greater than that of any previous centuries [2].
Eradicating hunger is included in the UN Sustainable Development Goals (SDGs); SDG 2 [3] aims to achieve "Zero Hunger." However, the number of undernourished people has increased since 2015, and 821 million people were suffering undernourishment in 2017 [4], with famine as one of the leading causes.
Hunger is estimated to rise in some regions due to the COVID-19 pandemic [5]. With the rising concerns about food insecurity, exacerbated by population growth, climate/environmental changes, and the pandemic, the elimination of famine will remain a challenge even in the 21 st century.
It is essential to assess the future risk of famine in each country to achieve this goal, and the investigation of past famines is necessary. The causes of famine and its processes have been widely examined [6]; however, most of the previous studies are descriptive and detailed for individual events rather than quantitative and comprehensive. Although some studies conducted quantitative analysis on past famines, most of them revealed facts and insights about famines, but not their risks. The Famine Early Warning System [7] adopts a scenario-based approach to assess the risk of food insecurity; however, the system is designed to support the administrator's decision in urgent food insecure situations and does not offer future prospects longer than six months.
The purpose of this study is to identify the essential components of past famines and their changes and to assess the risk of famine in the future. This study consists of two parts. In the rst part, data on major famines since 1840 were assembled into a famine database based on literature review, and their causes and processes were analyzed. In the second part, a logistic regression analysis was applied using physical and social factors related to famine, and the result was validated by the famine database. We also assessed the future risk of the famine in each country by applying the logistic regression model under a few socio-economic scenarios and concentration pathways.

Analysis Of Past Famines
The word "famine" has several de nitions. In this study, to clarify the criteria for adoption into the database, the term "famine" was de ned by a mortality-based de nition [8], which de nes a famine as an event in which more than 10,000 people died.
We developed a famine database targeting major famines since 1840. This database consists of two parts: factor and description. In the factor section, major factors that triggered the famine, such as drought or con ict, were attributed to each famine. The categorization of factors, shown in Supplementary Table 1, was de ned according to the chapter division of "Theories of Famine" [9], that provides a comprehensive and organized overview of famine factors. The perspective given by DeRose [10] was also used to organize the categories. In the second section, auxiliary information is provided: pre-famine situation, shock, and social response against the famine are summarized. Overall, recent famine were caused by a combination of various factors, which is consistent with the insights by Devereux [11]; famine causality is becoming more complex than ever. The trends in causes can be broadly divided into three periods by the combination of causes. In the rst period, from 1860 to 1899, environmental triggers, such as droughts, were the leading causes of famines. In the second period, from 1900 to 1959, the number of social triggers and domestic responses, including war or development failure, aggravated due to colonization, revolution, and the two World Wars. In the third period, after 1960, the factors became more diverse; environmental factors further increased, international responses increased, and social triggers decreased. As for the breakdown of the factors in each category (see Fig.   1.b-e), famines in the third period seem to be triggered by various factors; however, the number of the social triggers, domestic response, and international response decreased from the 1960s to 2000s, while there is consistently more than one environmental factor in each famine. The environmental factors primarily consist of drought, and in other categories, the proportion of each element varies depending on the period. In the social triggers, factors such as inadequate infrastructure and market failure increased, while the share of war or con ict was relatively constant after 1900.

Assessing The Risk Of Famine
Regarding the assessment of the risk of famine, in this study, basic theories of famine and studies on risk were reviewed to de ne the "famine risk." As for the famine theory, the Food Entitlement Declining (FED) theory, which was developed by Crichton [12], was applied to this research. Before the FED theory, the Food Availability Decline (FAD) theory, which argues that food production decrease would lead to famine, was primarily used; however, this theory had several issues [13]; Crichton [12] suggested that FAD is neither necessary nor su cient for famine. The lack of "entitlement," which means "the set of alternative commodity bundles that a person can command in a society using the totality of rights and opportunities that he or she faces" [14] would lead to famine.
There are many de nitions for "risk." In the context of disaster management, Devereaux [15] de ned risk as a combination of three aspects: hazard, exposure, and vulnerability (risk triangle), whereas Sen [16] de ned it as the composition of hazard, vulnerability, preparedness, and capability. Considering the feasibility of quantitative analysis on risk, we de ned the famine risk as the recurrence probability affected by hazard and vulnerability. This is due to the partial lack of numerical data on the scale of famine, such as the number of casualties.
Based on this, we conducted a logistic regression analysis on the risk of famine. We assumed that hazard and vulnerability, including food availability and entitlement, can be explained by the datasets shown in Supplementary Table 2. The de nitions of hazard and vulnerability in this research and the dataset explaining each aspect are given below. Due to data availability, the target period of past famine analysis was set from 1961 to 2014.
Hazard refers to the direct triggers of the decline in food production. Based on the database we developed, hazards can be primarily divided into droughts and con icts. As for the drought index, the amount of soil water in cropland was used [17]. As con ict is challenging to quantify, the ratio of years of con ict in the country during the target period was used to proximate the likelihood of the occurrence of con ict in each country.
Vulnerability is de ned as the sensitivity of society to food insecurity. The correlation coe cient of agricultural water input (AWI) and the value of agricultural production (VAP) was selected as the indices that can explain the vulnerability of food production, based on the suggestion of Nyariki and Wiggins [18], suggesting that food production varies with rainfall in food insecure regions in sub-Sahara. We hypothesized that, if this coe cient is large, production is in uenced by climatic conditions. Cereal import dependency (CID) was also used as an indicator of vulnerability, hypothesizing that a lower CID ratio would lead to food supply instability. Gross domestic product (GDP) per capita was selected as the indicator of the resilience of society against food insecurity, assuming that higher GDP would make it possible to import more food under food insecure situations. The urban population rate is expected to be an indicator of the ease of distribution of food or the proxy of access to food through the market. The Gini coe cient, which represents the income inequality in each country, was also adopted based on the assumption that income inequality results in the intensi cation of famine. Herein, GDP per capita, urban population rate, and Gini coe cient are expected to explain the people's capacity to gain food: the food entitlement. year's data is plotted. Fig. 2.a illustrates the two indices of hazard: soil water and con ict frequency. Most famines occurred in countries where the con ict frequency is greater than zero. Famine-experienced countries have relatively small soil water or high con ict frequency, indicating that these hazard values relate to famine occurrence. The indices of vulnerability are shown in Fig. 2.b-2.e. CID and the correlation coe cient between AWI and the value of agricultural production (CAV) are shown in Fig. 2.b. Countries with higher CAV experienced famine, supporting our hypothesis that CAV can be a proxy for production unstableness. Famine-experienced countries have a low CID value (approximately zero), implying that a self-su cient agricultural supply is likely to result in an unstable food supply. However, countries with negative CID, such as cereal-exporting countries, did not have experienced famine in the target period. There may be possible explanations; the surplus of cereal production can be a buffer for food insecurity, and countries where mass production of cereal is possible to have high GDP per capita, which reduces the countries' vulnerability to the famine. CID and GDP per capita are plotted in Fig. 2.c to support the second explanation, showing that countries with negative CID have a higher value of GDP per capita. As shown in Fig. 2.d, famine-experienced countries have lower GDP per capita and urban population rate. The Gini coe cient in famine-experienced countries shown in Fig. 2.e is slightly higher than those without, even though they do not have distinct differences.

Risk Assessment Of Famine
The former section illustrates that listed indices of hazard and vulnerability have some relationships with famine occurrence. Herein, we de ne the risk as the possibility of famine and applied a logistic regression analysis to estimate the probability of famine occurrence. In the logistic regression analysis, seven indices of hazard and vulnerability were adopted as explanatory variables. Table 1 shows each case's estimated probability of famine and whether the case was a famine case or not. In each probability band, the ratio of the famine case is within the range; therefore, it can be supposed that the probability of famine occurrence is well estimated. Fig. 3 shows the variation in the estimated probability of famine in each country. Famine-experienced countries have a higher value than those without; however, some countries with low probability have experienced famine. For example, North Korea experienced famine in 1995-1999 and 2012; however, the estimated probabilities in these cases were approximately 0.002, which is relatively low. This is likely due to the oversight of other components of famine; these famines occurred due to an economic crisis or political scheme that keeps the economic differential between civilians and people in the government and the military, which is di cult to quantify numerically.  [19]. Table 2 shows the weight of each variable in the logistic regression analysis. The uncertainties of these values were calculated by the non-parametric bootstrap method (see Method), and the mean and variance of these coe cients are shown here. GDP per capita and soil water have the largest absolute values, approximately 0.9, indicating that they are the major factors for famine occurrence. However, the means of the CID and the Gini coe cient are smaller than those of the others. As for CID, it is assumed that cereal-exporting countries (with negative CID value) have rooms for food supply, and cereal importing countries (with high CID value) have rather stable food supply; merely using the size of CID does not lead to a clear classi cation. As for the Gini coe cient, famine-experienced countries have a relatively high Gini coe cient (see Fig. 2.e), as stated in Nyariki and Wiggins [18]. However, in this study, high Gini coe cients does not mean that there is a higher risk of famine, and some countries without famine have higher Gini coe cients than countries with famine (Fig. 2e).

Future Projection
We conducted the future projection of famine risk from 2020 to 2050 by applying logistic regression analysis and the datasets of future scenarios (Supplementary Table 5). Fig. 4 represents the results of future projection, considering the socio-economic (SSPs) and climate change scenarios (RCPs). The gures of each SSP show the number of countries with famine probabilities ranging from 0.010 to 0.025, from 0.025 to 0.050, and above 0.050. We see that the risk of famine will generally decrease in the future. Notably, the number of countries with a high probability of famine occurrence will decrease in SSP1, whereas the number of such countries will be less likely to decrease in SSP3, and SSP2 is in between. This changing trend is primarily due to the increase in GDP per capita and urban population rate, and differences among SSPs are due to the different growth of GDP per capita in each scenario.
Countries that are estimated to have relatively larger famine risk are located in Africa, especially in sub-Saharan countries (see Supplementary Fig. 2). There are no distinctive differences in the famine risk distribution among different scenarios.

Discussion And Conclusions
In this research, a famine database was developed, the details of past famines were examined, and logistic regression analysis was applied to estimate the risk of famine by employing seven factors of hazard and vulnerability.
Famine is not a state, but an urgent event. This research focused on the events and did not analyze the famine's extent, such as the number of casualties, due to the lack of su cient information. An estimation of a famine's magnitude should be included in future work focusing on quantitative assessment of famine. Additionally, quantitative studies should be conducted on undernourished populations as well as on the famine, as it has been suggested that the number of famine casualties is decreasing [20], but undernourishment is prevailing in sub-Saharan Africa [21]. Considering the time-series processes of famine, the relationships among each factor are also critical. In this study, we evaluated the risk of famine from merely the annual value of each factor; however, the magnitude of the risk could be different depending on the seasonality of the parameters during the famine due to the lack of detailed data in monthly or weekly spans. Additionally, it is important to assess the causal relationships among these factors, including process-based analysis in the estimation of global long-term famine risk to obtain a clearer view of famine in the future.
Although there is room for a ner assessment, our study shows the comprehensive characteristics of famine, the general risk of famine, and its prospects. The famine database revealed that the factors triggering famine have become diverse. The past famine during 1961-2014 is explained by the logistic regression analysis based on risk assessment theories and famine showing that GDP per capita and soil water are the largest explanatory values. In the future, the rate of decrease in famine risk will be faster in SSP1 than in SSP2 or SSP3, and the sub-Saharan area would be the last part of the world to suffer from famine. This research is the rst step in the long-term assessment of famine risk, and it could guide in outlining the measures to help eradicate hunger from all countries in the long term.

Famine database
The detailed description of the famine database is as follows:

Year
The starting year for smaller famines is stated differently from literature to literature, and the end year is also different for larger famines. We selected the one that is most frequently used in the literature.

Country
Although many famines occurred in only a part of a country, each famine was attributed to the country/countries rather than regions, as statistical data are available at the country scale.

Number of deaths
The number of deaths signi cantly differ in literature. For some famines, there is no record at all. As famine often occurs during the war, it is challenging to separate famine deaths from war deaths.
Additionally, countries that experienced famine often lack the administrative capacity or political will needed for record-keeping. Therefore, the number of deaths was estimated from the available data and used only as a reference.

Contributing factors
Various explanations have been offered in literature for the mechanism and causalities of famine. To summarize these explanations into a comparative form, a literature study was performed, and each famine case was checked against 10 factor categories under four groups (Supplementary Table 1). Two or three studies were referred to for each famine, and if the literature included a keyword listed under a factor category as a cause, the famine received a score of one for that factor.

Datasets
The datasets used in this simulation are listed in Supplementary Tables 2 and 5. The global hydrological model H08 [22] was used to calculate the agricultural water input (AWI) and soil water. A coupled simulation of the land surface process, river, crop growth, and reservoir operation was conducted with a spatial resolution of 0.5° and a time resolution of 1 day. The annual value of AWI and soil water in each country were calculated from the model's output by the nation mask with a spatial resolution of 0.5°. The future GDP data provided by the National Institute for Environmental Studies, Japan (NIES) [23] were converted into USD in 2019 to align with past data. Gridded GDP per capita in SSP1, SSP2, and SSP3 were calculated with future GDP and population data, converted into each country's value by the nation mask. Con ict frequency was calculated based on the con ict dataset provided by the Uppsala Con ict Data Program (UCDP) [24] [25]. The ratio of years when con ict occurs during the target period was considered as the con ict frequency. For all data, missing values were lled in by interpolation.
To determine the signi cance of each dataset, the Mann-Whitney U test was conducted under the null hypothesis that the data of famine experience countries and other countries have no signi cant difference. The results shown in Supplementary Table 3 illustrate that the P-values are su ciently small that each data is signi cantly different between famine-experienced countries and other countries.

Logistic regression analysis
Logistic regression is a statistical regression that models the probability of a certain class, especially used for binary classi cation. Herein, the probability of being classi ed as a famine case is calculated as follows, using each parameter and coe cient: Logit, the natural logarithm of an odds ratio, is represented by the equation below. By using the maximum likelihood method, the regression coe cient is calculated. In the regression model, the values in each year and each country were used as explanatory variables. For the data on soil water, GDP per capita, and urban population rate, we used each year's values in each country. As for the CID and Gini coe cient data, the averaged values for each country were used for every year in the target period because the available dataset was limited to a short period. Additionally, each index was normalized so that the mean was 0, and the variance was 1, to see the weight of the indices explaining the risk of famine. The dependent variable is the occurrence of famine, which can be represented by "Famine case" or "Non-famine case," considered in the famine database developed. Herein, the year when the famine occurred and the three years before the famine were agged as "Famine case" because the number of famine cases was small and the previous three years seem to have a high possibility of famine occurrence. There were only 76 famine cases amongst all the 8748 cases during the target period.
In the future predictions, projected datasets of GDP per capita under SSP1, SSP2, and SSP3, soil water calculated under RCP2.6, RCP4.5, RCP6.0, and RCP8.5, and urban population rate were used. The same variance and the mean of each variable's past datasets were used to standardize these future datasets.
The same data as in the past were used for CIV, CID, and con ict frequency, assuming that the values will remain the same in the future.

Examination of uncertainty
A non-parametric bootstrap method was used to check the uncertainty of each coe cient. Logistic regression was applied to a half size of the random sample 100 times, and the mean and variance of each coe cient are presented in Table 2. Supplementary Table 4 shows the p-value between all the pairs of variables under the null hypothesis that there is no difference between all variables. In every pair, the pvalue is much lower than 0.0001, and thus, each pair of variables has a statistically signi cant difference. Tables   Table 1 Probability of famine For each country, each year was counted as one case, and the number of all cases was 8748 (162 countries, 54 years). Famine cases refer to the year within three years of famine, and non-famine cases are the others. Ratio is the proportion of famine cases out of all cases in each probability band. Note that this famine case includes the years within three years before famine (description about the dependent variables is in Method).