Behavioral absorption of Black Swans: simulation with an articial neural network

This article attempts to formalize the Black Swan theory as a phenomenon of collective Behavioral change. A mathematical model of collectively intelligent social structure, which absorbs random external disturbances, has been built, with a component borrowed from quantum physics, i.e. that of transitory, impossible states, represented by negative probabilities. The model served as basis for building an articial neural network, to simulate the behaviour of a collectively intelligent social structure optimizing a real sequence of observations in selected variables of Penn Tables 9.1. The simulation led to dening three different paths of collective learning: cyclical adjustment of structural proportions, long-term optimization of size, and long-term destabilization in markets. Capital markets seem to be the most likely to develop adverse long-term volatility in response to Black Swan events, as compared to other socio-economic variables.


Introduction
The present socio-economic context, with biologically induced uncertainty of the pandemic, once again encourages to study socio-economic resilience to events, which we tend to label as catastrophic and unpredictable. That environment seems to be suiting perfectly a reconsideration of the Black Swan theory (Taleb 2007;Taleb & Blyth 2011). The shock of today will subside, sooner or later, and the big question is: how are we, as civilization, going to absorb and own that experience? How are we collectively learning from it? Across the economic system, spots of new patterns can be observed, when businesses look as if they were bracing for events bound to occur, yet unknown in their exact nature and magnitude. The signi cant build-up of cash and cash-equivalent balances in the balance sheets of Big Tech companies is economic equilibrium as a general concept. In one perspective, economic equilibrium is the only sensible place to be, whence concepts such as 'Golden Rules' or 'Golden Paths'. Yet, another perspective is possible, where social change is a chain of equilibriums and disequilibriums, with the latter being the necessary spin and push of change. Adaptive social change can be interpreted as a recurrent rivalry between different strategies -involving both a choice of preferred outcomes and a repertoire of means to achieve them -with a 'niche' strategy corresponding to actions commendable in the presence of Black Swan events. This article investigates the unfolding of collective reaction to Black Swan events with a strong connection to that last viewpoint. Question: how exactly does it happen?
The Model Human societies can be represented as collectively intelligent phenomenological structures where the recurrent incidence of mutually coherent Behavioral patterns yields observable socio-economic outcomes. The 'phenomenological' adjective means that, whilst the society is composed of individuals, it is structured by and into their recurrent, patterned behaviour. Mutual coherence of Behavioral patterns means that social coordination happens through Behavioral coupling between individuals, tacitly or explicitly.
Formally, a human society can be studied as a complex structure {SR, P SR , LC SR  The structure {SR, P SR , LC SR , O} is collectively intelligent to the extent that learns, i.e. modi es its component subsets SR, P SR , LC SR , and O, by experimenting with many alternative versions of itself.
Experimentation is sequential, i.e. each such alternative version of the structure {SR, P SR , LC SR , O} occurs in a different moment t in time. Therefore, the process of collectively intelligent learning happens as a chain of states {SR(t), P SR (t), LC SR (t),O(t)}. Absorption of external stressors means their transformation into endogenous constraints, which are expressed, in the rst place, as regards collective outcomes. Any given instance {SR(t), P SR (t), LC SR (t),O(t)} of the structure {SR, P SR , LC SR , O} pitches its real local outcomes O(t) against their expected local state . It is to stress that expected outcomes are essentially local, i.e. they are instrumental to absorbing external stressors. There are no grounds to assume something like a general state of expectation E(O) in the structure {SR, P SR , LC SR , O}.
With the above-stated assumptions, the sequence of states from instance {SR(t0), P SR (t0), LC SR (t0),O(t0)} to {SR(t), P SR (t), LC SR (t),O(t)} can be studied as a Markov chain of states, which transform into each other through a σ-algebra. The current state {SR(t), P SR (t), LC SR

(t),O(t)} and its expected outcomes E[O(t)]
contain all the information from past learning, and therefore the local error in adaptation, i.e. e(t) = {E[O(t)] -O(t)}*dO(t) -where dO(t) stands for the local derivative (local rst moment) of O(t) -conveys all the information from past learning. Error e(t) in adaptation is factorised into a residual difference and a rst moment, as it is assumed that any current state instance {SR(t), P SR (t), LC SR  Constraints produced by the structure {SR, P SR , LC SR , O} in response to external stressors take two forms: recurrent and incidental. The former impact individual decisions to endorse a given social role, i.e. those decisions take into account the past state of the structure {SR, P SR , LC SR , O} and randomly distributed, current exogenous information X(t). That random exogenous parcel of information affects all the people susceptible to endorse the given social role sr i which, in turn, means arithmetical multiplication rather than addition, i.e. P SR (t) = X(t)*[P SR (t-1) + e(t-1)].
Incidental exogenous stressors, thus events in the type of Black Swans, consist in short-term, violently disturbing events, likely to put some social roles extinct or, conversely, trigger into existence new social roles. Extinction of a social role means that its probability becomes null: P(sr i ) = 0. The birth of a new social role, on the other hand, means that some pre-existing skillsets gain social recognition from the distant social environment of people possessing them, and therefore turn into professions, crafts, business models etc.
Mathematically, it means that the set SR of social roles entails two subsets: active and dormant. Active social roles display p(sr i ;t) > 0, and, under the impact of a local, Black-Swan type event, they can turn p(sr i ;t) = 0. Dormant social roles are at p(sr i ;t) = 0 for now, and can turn into display p(sr i ;t) > 0 in the presence of a Black Swan.
The above development leads to the issue of negative probabilities. With the assumptions stated above, It is an otherwise frequent situation when the actual outcomes are below expectations, and yet display a positive gradient of change. The case of p(sr i ) < 0 is, technically, an impossible state. Can collective intelligence of a human society go into those impossible states? Quantum physics supply a possible interpretation in that respect. If the probability of an event is conditional on another probability, and this is precisely the case with the here-presented model, negative probability corresponds to an essentially intermediary state, i.e. a state impossible to hold or impossible to be veri ed directly (Feynman 1987;Curtright & Zachos 2001). That formal interpretation in quantum physics somehow mirrors the distinction between economic equilibrium, and the lack thereof, in the theory of economic cycles (Schumpeter 1939). In that perspective, states marked by p(sr i ) < 0 are the necessary spin and push in the long-term learning of the collectively intelligent social structure.
Economic sciences supply interesting stylized facts with respect to equilibriums. There are some fundamental proportions in societies, such as the average time worked per person per year, or the average consumption of energy per person per year, which we collectively adjust ourselves around without even noticing much of it, and yet there are visible trends of change in those proportions. Adjustment of size, would it be demographics or the gross real output, is a bit harder, in the sense that it entails occasional bumps and requires collective effort (e.g. proper economic policies). Adjustment in markets is probably the hardest and the bumpiest, whence the common observation that prices and their gradient of change are the beating pulse of the economic system and can become volatile under new external stressors. Hence, a working hypothesis is being formulated for the empirical application of the above-presented model: 'Under the sudden, Black-Swan-type impact of external stressors, human societies can develop one of the three possible paths of collectively intelligent learning: a) structural, cyclical adjustment without visible social change b) adjustment of size with visible social change and c) long-term destabilization'.
The Dataset And The Method Of Analysis Penn Tables 9.1 (Feenstra et al. 2015) have been chosen as source of empirical data. It is assumed that each variable in that database is the outcome of a collectively intelligent social structure {SR, P SR , LC SR , O}, and, accordingly, each country-year observation in that variable is the expected local outcome E[O(t)] coupled with a local instance {SR(t), P SR (t), LC SR (t),O(t)}. That coupling, together with the entire model developed in the preceding section, is translated into an arti cial neural network designed by the author of this article. As the neural network requires a database with no empty cells inside, the initial contents of the Penn Tables 9.1 have been reduced to N = 3006 fully lled country-year observations. The basic cycle of learning applied to the neural network is precisely N = 3006 experimental instances.
The architecture of the network is based on two streams: the mainstream of adaptive walk in rugged landscape, with random recurrent disturbance, on the one hand, and incidental deep disturbance akin to Black-Swan-type events. For the sake of presentational clarity, the former is further designated as the stream of learning, whilst the latter is the stream of disturbance. Since the stream of disturbance is relatively simpler in its architecture, it is being presented in the rst place, and, in the next presentational step, it is being incorporated into the stream of learning as regards the {SR, P SR , LC SR , O} structure.
The stream of disturbance starts with the input neuron (input layer), which generates quasi-random numbers X(t) between 0 and 1. This neuron corresponds to the strictly speaking occurrence of random disturbance. The next two neurons (layers) transform the raw happening into stimuli for the intelligent structure {SR, P SR , LC SR , O}. The rst phase of transformation consists in giving to that raw event as many dimensions as there are input variables in the network, thus as many as there are social roles sr i in the component set SR of the structure. Since that number, in the model, is 'm', the second neuron of disturbance takes X(t) to the power m. The idea behind this logical step is that each social role in the structure {SR, P SR , LC SR , O} absorbs just a local impact of the Black Swan. The third neuron transforms that multi-dimensional event into neural perception, through the function of hyperbolic tangent, or exp{2* [X(t) m ]} -1/ exp{2*[X(t) m ]} + 1. For clarity, the value exp{2*[X(t) m ]} -1/ exp{2*[X(t) m ]} + 1 is further called 'observable disturbance' or DO(t). Finally, the fourth neuron of disturbance compares DO(t) to the error e(t-1) generated in the preceding experimental round. If DO(t) > e(t-1), the stream of disturbance feeds the disturbance strictly spoken to the stream of learning (the logic of that particular feed is explained further in this section). In the opposite case, thus when DO(t) ≤ e(t-1), the stream of disturbance remains passive. With X(t) being quasi-randomly generated by the software supporting the network, the procedure 'Generate X(t) => Take X(t) to power m => take hyperbolic tangent of X(t) power m' generates incidental, essentially non-cyclical feed of disturbance. The reader can compare this approach to that proposed by Prestwich (2019 op. cit.).
In the stream of learning, the network contains m = 40 social roles in the component set SR of the structure, and therefore the input layer of learning consists in m = 40 probabilities P SR = {p(sr 1 ), p(sr 2 ), …, p(sr 40 )}. The m = 40 social roles SR are divided into 20 active social roles, with initial probabilities p(sr i ,t0) > 0 randomly generated in a standardized normal distribution, and 20 dormant social roles with initial probabilities p(sr i ,t0) = 0. Subsequent probabilities t1 ÷ t3005 are being fed forward from the output layer of the learning stream. That forward feed is conditional: at this point, the network starts generating conditional probabilities. That entails the interpretation developed earlier in the model: negative conditional probabilities are real, yet they designate essentially transitory states. If DO(t) > e(t-1), then p(sr i ,t) = 0 for active social roles and p(sr i ,t) = X(t) for dormant social roles, where X(t) is once again a randomly generated value. If, on the other hand, DO(t) ≤ e(t-1), then p(sr i ,t) = p(sri,t-1)+e(t-1) for the active social roles, and p(sr i ,t) = 0 in the dormant ones. This is the rst impact of disturbance on learning: it breaks the replication of active social roles, breaks incremental learning, and triggers into activity dormant social roles.
Given further arrangement of subsequent layers, it can be assumed that each social role sr i in the input layer is a separate neuron, as it separately projects into the 2 nd layer, which consists in another set of 40 neurons, each computing the lateral coherence LC(sr i ) with m -1 other social roles in the set SR, and that coherence is de ned as the average Euclidean distance between p(sr i ) and the probabilities p(sr j ) of other social roles, as in equation (1) below. The use of Euclidean distance as measure of coherence inside a social structure is based on the swarm theory, where learning in a collective manifests as sequential shift between different strengths of Behavioral coupling between individuals and groups (Stradner et al. 2013 [1]). The 2 nd layer of learning is not directly exposed to disturbance.
Neurons from both the 1 st and the 2 nd layer project into the third one, which generates quasi-randomly weighed stimulus for neural activation. That 3 rd layer of learning is also made of m = 40 neurons, which absorb the recurrent feed from layers 1 and 2, whilst generating another recurrent random factor X(t), and absorbing the disturbance DO(t). One remark is due, once again for clarity: random numbers X(t) are generated both in the stream of disturbance and in the stream of learning, but they are independently random. In the third layer of learning, if DO(t) > e(t-1), then stimulus TI(sr i ,t) = X(t)*p(sr i ,t). Should DO(t) ≤ e(t-1), then TI(sr i ,t) = X(t)*p(sr i ,t)*lc(sr i ,t-1). Given the logical structure of learning layers 1 and 2, the third layer reinforces the impact of the disturbance. When DO(t) > e(t-1), the ad-hoc activated dormant social roles are fed into the neural activation function, next in line, with no link to lateral coherence, whilst the sofar active social roles remain zeroed. On the other hand, if disturbance remains at DO(t) ≤ e(t-1), dormant social roles remain dormant, and active social roles are fed into neural activation with correction for their lateral coherence.  , is one perspective. The way that expected state comes into being, thus the strictly spoken behaviour of that structure, makes another type of insight.
The expected states of the structure {SR, P SR , LC SR , O}, as measured with average probabilities brought forth in Table 1, do differ according to the actual outcome being optimized. That disparity is much more pronounced as regards the initially active social roles, as compared to the dormant ones. Interestingly, disparity is almost inexistent inside those two categories, and, even more interestingly, any kind of differentiation happens only between the initially dormant social roles, whilst those initially active land as uniformly distributed. That seems counterintuitive. The initial state of active social roles is a quasirandom, Gaussian distribution, whilst dormant social roles start from a uniform distribution of null values. The neural network used to process those initial states somehow inverted their structures. The initially randomly distributed probabilities in active social roles end up as uniform, whilst the initially uniformly zeroed probabilities of dormant roles nish as slightly differentiated.
Thus, the network has the property of levelling out disparate states of the world, and conversely, adding some disparity to the uniform ones. On the other hand, as uniformized as p(sr i ) become across the initially active social roles, they are truly disparate according to the empirical variable from Penn Tables 9.1., taken as the expected outcome E[O(t)], thus the empirical base for assessing error e(t). Three categories of experiments delineate themselves. Networks pegged on, respectively, the headcount of population, real output, real capital stock, and price level in capital formation, make the rst category: they produce noticeably low expected p(sr i ) in the initially active social roles, making them practically level with those initially dormant. These speci c socio-economic outcomes have the property of spreading the occurrence of all 40 social roles almost evenly across the structure.
On the other end of the spectrum, the network pegged on a randomly weighted compound index made of all the variables in Penn Tables  Productivity (CTFP) is probably the most intriguing among the 13 studied as regards the process of learning (see Figure 2 in the Appendix). Whilst it produces somehow median probabilities as the expected state of the structure, the way it arrives to that outcome is puzzling, even in the light of assumptions laid out in the theoretical model. Error ampli es all the long of the learning process and that is unusual for a simple network like this one. One is almost tempted to call that process 'unlearning'. The average lateral coherence lc(sr i ) between social roles swings cyclically, and steadily increases across the 3006 rounds of learning. The cyclicality of lc(sr i ) is, by the way, a common denominator across all the networks studied. It looks as if that speci c intelligent structure was sequentially tightening its internal coherence and releasing it. The cycle is arhythmical, and yet visible. Both the graph of error e(t), and the graph of average coherence lc(sr i ) look, in this CTFP-pegged network, like Ito processes with a clear upwards drift.
One can observe a somehow weaker version of the that strange process of 'unlearning' when the network is optimizing price level in capital formation (PL_I). Error e(t) tends to widen its swing over the 3006 experimental rounds, yet there is no visible drift in it. Interestingly, other market-based variables -price level in household consumption (PL_C) and price level in exports (PL_X) -produce a different process of learning, with the graph of error much more similar to that produced by the network, when it is pegged on variables computed as structural proportions: AVH, LABSH, IRR, DELTA, and HC. These experiments yield a cyclical error, which neither widens nor narrows down its amplitude and follows something like a steady swing with no visible drift.
Experiments which force the network to optimize size-type variables, namely POP, RGDPO, and RNNA, are the only ones to produce a process possible to label as true learning, at least from the standpoint of arti cial intelligence. The network pegged on population seems to be the most salient in this respect: error e(t) visibly narrows down, producing something akin equilibrium between the randomly disturbed structure {SR, P SR , LC SR , O} and the expected output, thus the number of humans being around.

Final Discussion
Results produced by experiments conducted with the neural network allow exploring and verifying the working hypothesis, formulated as conclusion of the theoretical model. Three different patterns emerge, indeed, corresponding to the three hypothetical paths of collectively intelligent learning: cyclical adjustment devoid of clear trajectory, long-term adjustment in size-related variables, with visible learning, and, nally, the path of destabilization under the repeated impact of random disturbances.
How exactly should the probabilities p(sr i ) produced by the network be interpreted as social phenomena?
Probabilities are likelihoods that a randomly chosen individual endorses a social role. Relatively high probabilities mean that individuals are likely to endorse many social roles at once. This is a society of quick learning across many, socially relevant skillsets. On the other hand, low probabilities correspond to a compartmentalized state of society, more likely to follow the model of one person endorsing just one social role. What is so exceptional about Total Factor Productivity CTFP, as measured in Penn Tables 9.1., to make it destabilize the neural network is such a peculiar way? Why does this speci c experiment behave as if the metric of productivity was particularly sensitive to random disturbances induced by the network, thus as if it was a highly volatile market price? Tentatively, it can be assumed that Total Factor Productivity is greatly impacted by the price of capital goods, perhaps more than by anything else. That explanation would coincide with the observably similar characteristics of the experiment pegged on the price level of capital formation AKA PL_I. Still, that leaves us with two more puzzles. Why do networks pegged on variables pertinent to the labour market (AVH, LABSH, HC) behave so differently? Perhaps, in the presence of random disturbance, the labour market is more accommodative and sort of 'sucks in' the randomly happening Black Swans, whilst the market of capital goods is much more prone to going haywire?
Furthermore, why do experiments pegged on other price levels, namely household consumption (PL_C), and exports (PL_X) behave as if these variables were structural proportions rather than market prices? Perhaps they are.
The theoretical model which served to run the empirical research is largely based on the theory of evolutionary games by Hammerstein & Selten (1994 op. cit.). Empirical research in itself allows formulating interesting hypotheses for future research in that respect. The occurrence of random disturbance in the type of Black Swan events seems to have the property of dragging dormant patterns of behaviour, thus niche strategies, out of the shadow, and that whatever kind of outcome is the social structure pursuing. Black Swan events could be seen as bene cial, as they force to value and acknowledge fringe skillsets.
Finally, empirical results can be translated into the actual social reality we are living at the moment of writing this article. Black Swans coming from biology (pandemic), from climate change or from political instability are the most likely to destabilize capital markets. As strange as it may sound, our lifestyles, as well as the size of our economy, are much less likely to go south under the impact of those events. In these respects, our social system seems to be much more absorptive and accommodative in relation to environmental stressors.

Declarations
Con icts of interest/Competing interests: Hereby I declare that research presented in this article, as well as the publication thereof, does not involve any sort of competing interests or con ict of interest, would it be related to funding or to any other occurrence. Krzysztof Wasniewski Availability of data and material (data transparency): all the empirical data used for the present research is fully available, either from external sources referred to in the article, or with me personally. Krzysztof Wasniewski Code availability (software application or custom code): The Excel-programmed structure of the neural network used in the here-presented research is fully available, either through links placed in the article or directly with me. Krzysztof Wasniewski.