Information transmission among multiple investors: a micro-perspective revealed by motifs

The concept of motifs provides a new perspective for studying local patterns, which is useful for understanding the nature of a network structure. In this study, the types and evolution of the motifs of the shareholder co-ownership network, constructed based on common shareholding data from 2007 to 2017, are explored from a micro-perspective. Although with a low proportion, the closed motifs were found to be important motifs with statistical significance in the network. Furthermore, the motifs containing financial investment company shareholders tend to disappear on both short (quarterly) and long (annual) time scales. In contrast, the motifs containing general corporate shareholders tend to remain unchanged. Finally, the abnormal abrupt changes in the proportions of important motifs in the real network relative to the random network before and after the financial crisis are calculated. The number of Motif 4 containing state-owned companies, general companies, and individual investors decreases abnormally during the financial crisis. This research is of great significance for understanding information interaction among multiple investors.

them as small-scale isomorphic subgraphs with statistical significance that repeatedly occur in real networks. The application of motifs in a shareholder co-ownership network (SCN) provides an effective means to analyse the information interaction between shareholders at the micro-level. As the basic building blocks of the network, the motifs in the SCN refer to the miniature ''circle of friends'' of specific shareholders who tend to cluster together. By observing the behaviour of other people in motifs, shareholders may change their investment decisions after rational reasoning and psychological preferences [8].
There are profound economic implications behind the formation and evolution of specific motifs. Consider the 3-node motifs formed by two fund companies and a state-owned enterprise as an example. During the stock prices rise, two fund companies and a stateowned enterprise form a stable motif structure. Subsequently, as prices fall, the state-owned enterprise disappears, and the motif structure is broken. There may be violations of financial institutions ''lifting the sedan chair'' for the state-owned enterprise in this process. 1 In another practical case, four individual shareholders appeared in the list of the top ten shareholders of Jinyun Laser (stock code: 300,220) in the 2019 annual report. Subsequently, these four shareholders were exposed to close family relationships and were suspected of manipulating stock prices. Based on the analysis of network motifs, the number of 3-node motifs formed by individual shareholders increased significantly during the price rise process, suggesting that there may be violations of family joint action. At the theoretical level, Battiston et al. [9] pointed out in their review article published in Science that ''major financial risk events are often caused by unpredictable random events that trigger minority shareholders to sell stocks. This behaviour spread throughout the network and ultimately led to the collapse of the entire stock market.'' Therefore, it is necessary to explore the motif structures 2 of the SCN and their evolutionary laws, regardless of the practical level or the theoretical level.
Then, what types of important motif structures exist in SCN? 3 How do they evolve? Is it possible to provide early warning of financial crises based on the evolutionary laws of abnormal motif structures? These are the questions to be studied in this article.
First, this research explores the important motifs in the SCN. The Rand Enumerating Subgraph (Rand-ESU) algorithm is adopted to efficiently identify the motif structures in the SCN and its corresponding random network. This research must introduce an analytical and unbiased maximum-entropy technique, which uses the knowledge of both strengths and degrees to reconstruct the unbiased ensemble of random networks. Considering the link weights, the closed motifs are identified as important with statistical significance in the network. Further considering the node attributes, the motifs with all three nodes being financial investment companies have the highest proportion.
Second, the evolution of different types of motifs is investigated on two-time scales: short cycle (quarterly) and long cycle (annual). This research distinguishes three evolutionary states: motif invariant, motif vanishing, and motif transforming. All motif structures were found to have a higher tendency to vanish. The longer the period, the greater the possibility of vanishment. Furthermore, considering the node attributes, motifs containing financial investment companies nodes have a higher tendency to disappear, while motifs containing general companies tend to keep the structure unchanged.
Finally, important motifs are being explored to determine whether there are abnormal quantitative mutations before and after the financial crisis. 1 Lifting the sedan chair, a proprietary term in the stock market, refers to the fact that the buyer anticipates that a certain stock will rise, and therefore buys a large amount of stock, which leads to an increase in the stock price. But the result is that other people (generally referred to as the banker) sell at a high position and make a profit, and eventually the buyer is locked up. 2 The 3-node motif structure is the most common subgraph form in the network [7]. If the number of nodes is less than 3, there are too few subgraph types, which is not conducive to mining local information of the network. If there are too many nodes, the number of possible motif structures increases exponentially, which increases the complexity of motif recognition. At the same time, the motif structure of more than 4 nodes can be generated by the superposition and combination of different 3-node motif structures. In a word, the 3-node motif structure can satisfy the mining of local features of the network. The ''motif structure'' in this study refers to the ''3-node motif structure'' unless otherwise specified. 3 The important motif structure is a subset of the motif structure, which refers to the motif structure whose proportion in the real network is significantly higher than that in the random network.
Compared with other important motifs, Motif 4 shows a significantly abnormal downward trend during the financial crisis. Furthermore, the node attributes of Motif 4 are distinguished to explore the evolution of specific motif categories. Specifically, the Motif 4 category, which contains nodes of state-owned companies, general companies, and individual investors, is exceptionally low during the financial crisis. In contrast, the Motif 4 category, which consists of nodes of general companies and individual investors, is unusually high around the financial crisis.
This study makes several contributions to the field. First, the maturity of motif recognition algorithms has prompted a large number of studies to explore the motif structure of various networks, including biological system networks [10], social networks [11], electronic communication networks [12], investor transaction networks [13], interbank networks [14,15], listed company networks [16] and stock time series networks [17]. However, few studies have explored the motif structures of investor information networks. Considering the link weights, node attributes and different time scales, this research explores the motif structures of the SCN and their dynamic evolution.
Second, the analysis of economic and financial networks as channels of crisis transmission has received extensive attention. Much effort has been devoted to the search for regularities in the overall structure of financial networks, i.e., looking for community structure [18], core-periphery structure [19], network similarity [20], network connectedness [21], network centrality [22], and network structure [23]. However, a major financial crisis usually develops from a change in local network characteristics to the collapse of the overall network. Squartini et al. [14] and Saracco et al. [24] studied the abnormal motif mutations of interbank networks and world trade networks before and after the financial crisis, respectively. To most of our knowledge, few studies have conducted early warnings of financial crises from the perspective of shareholder information network motifs. This study explores the abnormal changes in the specific motif structures of the SCN before and after the financial crisis.
Third, this study also makes some theoretical contributions. The process of motif recognition needs to compare the relative values of the real network and the corresponding random network. The simplest and most widely used random network model in weighted networks is the weighted configuration model (WCM), which is defined as an ensemble of random graphs with a given strength sequence [25][26][27][28]. It was recently shown that, despite its conceptual simplicity, the WCM has brought significant bias problems [29][30][31]. This study introduces an analytical and unbiased maximum-entropy technique, which uses both strengths and degrees knowledge to reconstruct the unbiased ensemble of weighted networks. In the shortest possible time, this method directly provides the expected value of the desired reconstructed properties in such a way that no explicit sampling of reconstructed graphs is required. Moreover, being based on maximum-entropy distributions, this method is unbiased by construction.
In applying our enhanced method to several networks of different nature, the results show that it leads to a significantly improved reconstruction. Moreover, rigorous information-theoretic criteria are introduced to confirm that the joint specification of the strengths and degrees cannot be reduced to that of the strengths alone. In other words, the reconstruction of the weighted network can be greatly enhanced by exploiting the irreducible set of joint degrees and strengths.
The remainder of this paper is organized as follows. Section 2 describes data and modelling. Section 3 reports results and analysis classified into three parts: motif identification and motif evolution and risk warning of important motifs. Section 4 concludes the paper.

Data
The data of the top ten shareholders of tradable stocks comes from the RESSET database. The time is from the first quarter of 2007 to the fourth quarter of 2017. 4 The samples are the constituent stocks of the CSI 300 index and are updated every quarter. 5 Shareholder names in different files are manually unified.
Subsidiaries of the same parent company are considered to be the same shareholder due to concerted action. In addition, index funds and index-enhanced funds have been eliminated because of their passive ownership of stocks and lack of information interaction with other shareholders. In addition, controlling shareholders are eliminated because their holdings are to gain control of the company, which is different from ordinary shareholders' use of private information for profits. According to Li et al. [32], shareholders are divided into five types: financial investment companies (FICs), state-owned companies (SOCs), general companies (GCs), qualified foreign institutional investors (QFIIs), and individual investors (IIs). The classification standard for shareholder categories comes from the RESSET database.

The SCN modelling
According to the network construction method in Newman [33], the shareholders is taken as the node, and the number of stocks together held by two shareholders is the edge weight. SCN is a weighted but undirected network used to describe collective holding behaviour among investors. Using H ¼ fH 1 ; H 2 ; . . .; H n g to represent shareholders, matrix W describes the SCN as follows: where w ij is the number of listed companies that shareholders H i and H j jointly hold. Referring to Khan et al. [34,35], the nomenclature is provided in Appendix Table 5.

Motif categories in SCN
In undirected and unweighted networks, only two forms of 3-node motifs exist, shown as Motif 1 and Motif 4 in Fig. 1. The links between shareholders in the SCN are divided into strong and weak links. Shareholders jointly holding one stock is a weak link, while jointly holding more than one stock is a strong link. When considering the link weights, the 3-node motifs have seven forms, as shown in Fig. 1. Figure 1 shows seven weighted 3-node motifs. The thick line is the strong co-holding relationship, and the thin line is the weak co-holding relationship.
Motifs 1, 2, and 3 have two nodes that are not directly connected but indirectly related through intermediate nodes to form a chain structure (called the chain motifs). In contrast, Motifs 4, 5, 6, and 7 are connected to form a closed structure (called the closed motifs). As the shape of the motif changes from chain to closed and the link weight changes from weak to strong, the speed of information transmission of the motif increasing progressively [8]. Furthermore, node attributes are considered and classified into five types: FIC, SOC, GC, QFII, and II. Figure 2 shows the schematic figure of motif identification when considering both link weights and node attributes. Figure 2 shows the schematic figure of motif identification. The node colour represents the shareholder category, where the red node represents FIC, the black node represents SOC, and the blue node represents QFII. The red edges represent the process of motif identification.
The Rand-ESU algorithm was adopted to identify the motif structures in the SCN efficiently. Compared with traditional algorithms such as edge sampling (ESA) and enumerating subgraph (ESU), the Rand- ESU algorithm judges whether a subgraph needs to be extended by probability in the process of subgraph enumeration, thus greatly improving detection efficiency. The motif identification steps of the Rand-ESU algorithm are as follows. First, the motif identification process shown in Fig. 2 on the real network is performed to calculate the proportion of various motifs. Second, 1000 simulations on random networks of the same scale are performed to calculate the proportion of various motifs. Finally, the motif statistical significance is calculated.

Statistical indicators of motifs
Three common types of indicators measure the importance of motifs in networks, including the frequency of motifs, the p value of motifs, and the Zscore of motifs.
(1) The frequency of the motif For a 3-node Motif M with n nodes, the number of occurrences in the real network is n M ð Þ. The total number of occurrences of all 3-node motifs is N; then, the frequency of the motif M is as follows.
(2) The p value of the motif The p value of the motif refers to the number of times that a specific motif appears more frequently in the random network than in the real network divided by the total number of random networks. 6 Therefore, the p value of the motif is between 0 and 1. The smaller the p value, the more significant the motif, and the more critical it is in the real network.
(3) The Z-score of the motif For Motif M, the number of times it appears in the real network is N real M , the number of times it appears in the random network is N rand M , and the standard deviation is r rand M . The average value of N rand M is N rand M . The Zscore of Motif M in the real network is as follows.
The construction logic of the Z-score and p value is similar. They both compare the frequency of the motif in the real and random networks and use the difference to measure motif importance.

Random network model
The most widely used random network model in weighted networks is the weighted configuration model (WCM), which is defined as an ensemble of random graphs with a given strength sequence [25][26][27][28]. The expected weight of the link between nodes i and j predicted by the WCM is routinely written in the form where s i denotes the strength of node i and N is the number of nodes. Unfortunately, despite its widespread use, the WCM has brought significant problems of bias. Although Eq. (4) is treated as an expected value, there is no indication of which probability distribution it is derived from. Therefore, it is impossible to derive the expected value of topological properties, which are nonlinear functions of the weights. This research now develops a maximum-entropy approach to generate an unbiased ''enhanced configuration model'' (ECM) with given strengths and degrees as the random network model.
Formally, an ensemble of weighted networks with N nodes can be characterized by a collection W f g of N Â N matrices and by an appropriate probability P W ð Þ. On each network W, the strength is defined as w ij , and the degree is defined as . w ij is defined as a non-negative integer number. The normalized probability needs to be found to ensure that each node's the expected degree and strength are both constrained while leaving the ensemble as random as possible. This is achieved by requiring that P W ð Þ maximizes Shannon's entropy S À P W P W ð ÞlnP W ð Þ with a constraint on the expected degree and strength sequences k, s. The fundamental result of this constrained maximization is the probability where x and ỹ are two N-dimensional Lagrange multipliers controlling for the expected degrees and strengths, respectively (with x i ! 0 and 0 y i \1, 8 i), and is the probability that a link of weight w exists between nodes i and j. In the above expression, Due to the presence of x ð Þ, Eq. (6) defines the 'mixed' Bose-Fermi distribution where establishing a link of unit weight between two nodes requires a different 'cost' than the reinforcement of an already existing link. This feature is due to the existence of both binary and weighted constraints, making ECM suitable for modelling real networks.
Next, this research applies the maximum-likelihood approach to implement the ECM. Consider a particular real weighted network W Ã with only degrees k Ã i k i W Ã ð Þ and strengths s Ã i s i W Ã ð Þ. The log-likelihood of the ECM defined by Eqs. (5) and (6) is Now, look for the specific parameter values x Ã , ỹ Ã that maximize L x; ỹ ð Þ. Similar to the solving process of other random network models [27], the real solutions of x Ã , ỹ Ã can be obtained from the following 2 N coupled equations: Therefore, the likelihood-maximizing values x Ã , ỹ Ã are precisely those ensuring that the expected degree and strength sequences are consistent with the observed sequences k Ã , s Ã . x Ã , ỹ Ã are exactly the values we want to find.
It is worth noting that the values x Ã and ỹ Ã contain all the information needed to rebuild the network. Therefore, the maximum-likelihood approach equivalently translates the time-consuming and bias-prone problem of generating multiple reconstructed networks into the much simpler problem of maximizing the function L x; ỹ ð Þof 2N variables. To find x Ã and ỹ Ã , the assignment method implemented by MATLAB is applied to solve Eqs. (8)- (9). Note that finding x Ã and ỹ Ã only requires the information of degrees and strengths and not that of the entire network W Ã . This is consistent with the fact that k Ã and s Ã are sufficient statistics.

Empirical tests based on real networks
Next, the superiority of our enhanced model over the naive model will be determined based on the reproduction of different types of real network topological properties. This study considers two classic social networks from [36], including research group social network (RGSN) and fraternity social network (FSN), and six food networks collected in [37], including Chesapeake Bay food network (CBFN), Crystal River food network (CRFN), Michigan Lake food network (MiLFN), Maspalomas Lagoon food network (MaLFN), Mondego Estuary food network (MEFN) and Everglades Marshes food network (EMFN). In Fig. 3, the real and reconstructed values of the topological properties (pure and weighted) of all networks in the sample are compared.
The purely topological property we choose is the simplest non-local one, i.e. the average nearest neighbour degree (ANND) measured by the correlation between the degrees of adjacent nodes, defined as where W is the full weighted matrix, and w ij is the weight of the link between node i and node j. a ij is a Boolean variable with the value of 1 if w ij [ 0 and 0 if w ij ¼ 0 to judge whether there is a connection between nodes i and j (compactly, we can write a ij w 0 ij with the convention 0 0 0). The corresponding weighted property is the average nearest neighbour strength (ANNS) defined as For all nodes and all networks, the measured values of the two properties for the real network and the corresponding reconstructed network predicated by the WCM (ECM) are illustrated in the left (right) two panels of Fig. 3. Each point in the graph is a node. The goal of a good reconstruction approach is to place all the points along the identity line. In most cases, the reconstructed values of the WCM for all nodes in a given network are along the horizontal line, i.e. they are almost equal, completely independent of the true value. In contrast, our enhanced approach achieves a significant improvement over the naive approach.
Most points are near the identity, meaning that our approach can successfully reconstruct. The above findings indicate that the reconstructing networks from local node-specific information is intrinsically problematic, possibly because of the involvement of higher-order mechanisms in the formation of real networks. In fact, WCM is often used as a null model in important high-order attribute Fig. 3 Topological properties of the models detection to filter local heterogeneity of nodes, thus interpreting difference between real data and WCM as important features of non-local models. However, our enhancement method has now proved to reach a highly satisfactory level, at least for the networks considered here. Therefore, it can be expected that if the ECM is viewed as an improved null model to detect higherorder patterns such as communities or motifs, the result will be quite different from those obtained routinely using WCM prediction in the modular definition. Besides representing an improved reconstruction method, the ECM has the potential to become a non-trivial tool as a null model of networks with local constraints.

Information-theoretic tests
The superiority of our enhanced reconstruction method has been analysed on basis of its increased accuracy, with respect to the naive approach. A rigorous goodness-of-fit approach will be used in this section to confirm these results, comparing whether ECM preserves better efficiency from WCM. Precisely, the degrees as extra parameters introduced into the ECM will be tested for non-redundancy.
To start with, the likelihood of the ordinary WCM need to be compared with that of ECM. Note that the WCM can be regarded as a special case of the ECM by setting x ¼ 1 (where x i ¼ 18i). The log-likelihood of the WCM is therefore the reduced function L 1; ỹ of N variables, and is maximized by a new vector ỹ ÃÃ 6 ¼ ỹ Ã which is also the solution of Eq. (9) with x ¼ 1: In the WCM, Eq. (9) no longer works. If the maximized likelihoods of WCM and ECM are simply compared, the conclusion L x Ã ; ỹ Ã ð Þ!L 1; ỹ ÃÃ Þ can be easily obtained since ECM always improves the fit of the real network, given that it includes the WCM as a special case. However, statistical and information-theoretic criteria can be used to evaluate whether the increase in accuracy of a model with more parameters is the result of overfitting. The most popular and simplest criterion is Akaike's information criterion (AIC). This test strictly achieves the optimal trade-off between accuracy and parsimony, which is achieved by discounting the number of free parameters from the maximized likelihood. AIC for our two competing null models is defined as The optimal model is the one with minimum AIC. Even though the difference between AIC values is small, the two models are still comparable. A converted form of quantitative criterion is given by the so-called AIC weights, which in our case read where w AIC ECM is the probability of supporting ECM as the best model, and w AIC WCM is the one of supporting WCM as the best model.
The AIC weights of the two reconstruction methods are shown in Table 1 for all networks. With the exception of a social network, our enhanced approach is always superior to a naive approach and achieves the best unit probability in the two models. A closer look at this social network with the opposite result reveals that it is almost fully connected. This explains why the degree sequence is redundant for this network. The local constraints are reduced to strength sequences, so the ''naive'' WCM is preferable. However, most realworld networks are often not fully connected, and the strengths and the degrees must be separately specified for better reconstruction. Therefore, it can be expected that degree sequences are irreducible to strength sequences for most real-world networks. In such case, the inclusion of degree sequences in our enhanced method is non-redundant, which explains why our method can maintain higher information efficiency.
In addition, corrected Akaike& s Information Criterion (AICc) is also used to correct small samples. Based on the value of AICc, the corresponding weights are calculated in analogy with Eqs. (12) and (13). For all the networks in our sample, the AICc weights are the same as the AIC ones.

Summary statistics
Panel A of

Consider the link weights
The motif structures help to understand local information transmission among shareholders. Considering the link weights, the important motifs of the SCN are explored in this section. Table 3 provides the results. Although with a low proportion, the four types of closed motifs were found to be important motifs with statistical significance in the network. In contrast, although the proportion of chain motifs is very high, they are not important motifs in the SCN. These results indicates that the closed motifs play an essential role in information transmission among shareholders.

Consider the node attribute
Next, further considering the node attributes, the components of the important motifs (closed motifs) in the SCN are explored. Table 4 lists the top five closed motif types with the highest frequency of occurrence in all quarters. The results show that the motifs with three FIC nodes have the highest proportion. Other types of motifs contain at least two FIC nodes. In brief, FIC shareholders are important components of closed motifs. Additionally, the proportion of closed motifs in 2013-2017 is lower than that of the other two periods, which may be related to the stock market crash in 2016. A large number of shareholders sold their shares, causing the motif to disappear.

Motif evolution
Next, the motif evolution of the SCN is investigated to reveal the behavioural laws of shareholders in different motif structures. The motif evolution is divided into three steps: First, the shareholder intersection of the previous and subsequent periods is calculated. Second, the local structure of motif 1-7 in the SCN formed by shareholder intersection in the previous period is excavated. Finally, the corresponding evolved motif structure is mined in the later period. Two time scales, short cycle (quarterly) and long cycle (annual), are selected to explore the evolution of motifs.

Motif evolution in the short cycle
The motif in the later period corresponds to three evolution states: invariant motif, vanishing motif, and transforming motif. Specifically, the invariant motif refers to a motif that maintains the same structure in the two periods. The vanishing motif refers to a motif with at least one isolated node due to edge fracture. The transforming motif refers to a motif change between two periods due to edge fracture or reconnection. Figure 4 shows the results of the motif evolution on the quarterly time scale. The first column corresponds to the motifs with at least one isolated node, representing the vanishing motif. The diagonal line represents invariant motifs that have the same structure in the front and later periods. The other positions in Fig. 4 are the transforming motifs. Panel A shows summary statistics of different types of shareholder holdings. FH represents the frequency of shareholders, PS represents the proportion of shareholders, NH represents the number of shareholders, SHS represents the number of stocks held by shareholders, TMC represents the total market capitalization of stocks held by shareholders, and AMC represents the average market capitalization of stocks held by shareholders. Panel B shows the summary statistics of the top five strong links. FIC-FIC represents a strong link between two nodes whose attributes are FIC. Other links have similar meanings. Panel C shows the summary statistics of the top five weak links. Panel D shows summary statistics of network topology characteristics, including the number of edges (NE), number of nodes (NN), average weighted degree (AWD), average path length (APL), network diameter (NDia), clustering coefficient (CC), and network density (NDen). These characteristics are calculated following Guan et al. [8]. Figure 4 shows the motif evolution in the short cycle. The vertical axis is the motif structure in the previous period, and the horizontal axis is the motif structure in the later period. The circle (and the shade of the colour) represents the evolution probability, which is the average of all quarters.
As shown in the first column, all motif structures have a higher tendency to disappear. With the change in the link weights from weak to strong and the structure from chain to closed (from Motif 1 to Motif 7), the trend of motif disappearance gradually decreases. Second, as shown from the diagonal, all motif structures have a relatively high propensity to remain unchanged, especially Motif 4. Third, in the transforming motif, the transforming probability from a motif with high information transmission speed (lower triangle) to a motif with low information transmission speed (upper triangle) is higher. Nevertheless, the possibility of the reverse is relatively lower. In short, edges tend to break rather than reconnect. Such evolution results are in line with the principle of entropy increase in physics. The system always tends to transform from an ordered state to a disordered state [38].  Table 3 shows the significance of motif occurrences. Freq represents the frequency of the motif, and the Z-score represents the Z score of the motif. The proportion of motifs in each year is the average of the four quarters, and the last row is the average of all quarters. *** indicates the significance level of 1% Next, considering the node attributes, the composition of each motif evolution state are calculated. The results are shown in Fig. 5. To ensure the accuracy of the results, motif types whose total number is less than 100 are deleted. Among the invariant motifs, the proportion of yellow and green nodes is relatively high, indicating that the holdings of GC and II are stable and tend to remain unchanged during motif evolution. Especially for Motif 4, the probability that the motif with all three nodes as II remains unchanged is 0.98. Furthermore, among the vanishing motifs, the motifs containing red nodes account for a relatively high proportion, indicating that FIC has active market transactions and is likely to disappear during motif evolution. Finally, among the transforming motifs, the holdings of FIC and QFII are unstable, so motifs containing these two types of nodes are easily transformed into other varieties in the evolution process. Figure 5 shows motif evolution in the short cycle considering node attributes. 1st, 2nd, and 3rd represent the evolution probability of the top 3 motifs. The number in parentheses indicates the probability of motif evolution. Taking 0.81 in the upper left corner of the figure as an example, it means that in the evolutionary motif where Motif 1 remains invariant, the probability of the motif with all three nodes being GC is the highest, which is 0.81. In other words, there are 100 motifs 1 with all three nodes as GC, and 81 motifs 1 remain unchanged in the next quarter. The node colour represents the shareholder category. The red node represents FIC, the yellow node represents GC, the green node represents II, the black node represents SOC, and the blue node represents QFII.
The above analysis shows that motifs containing GC and II are stable during the quarterly evolution, and motifs containing FIC tend to transform or even disappear. In contrast, motifs containing QFII tend to transform into other types.

Motif evolution in the long cycle
Next, the motif evolution is discussed on the annual scale. The calculation steps are the same as those in the quarterly period. The results are shown in Fig. 6.  Table 4 lists the top five closed motif types with the highest frequency of occurrence. The node colour represents the shareholder category. The red node represents FIC, the yellow node represents GC, the green node represents II, the black node represents SOC, and the blue node represents QFII Fig. 4 Motif evolution in the short cycle Figure 6 shows the motif evolution in the long cycle. The vertical axis is the motif structure in the previous period, and the horizontal axis is the motif structure in the later period. The circle (and the shade of the colour) represents the evolution probability, which is the average of all quarters.
As shown in the first column of Fig. 6, all the motif structures have a high tendency to disappear. Compared with the motif evolution results in a short period, the probability of the vanishing motif in a long period is better. These results show that investors' shareholding behaviour is more unstable in a long period. Furthermore, as shown in the diagonal, all the motif structures have a relatively higher tendency to remain unchanged. Nevertheless, the probability is reduced compared with the situation in a short period. Third, the transforming probability of high information transmission speed motifs (lower triangle) to low information transmission speed motifs (upper triangle) is higher. These results are the same as the conclusion of the short-period motif evolution. The system always tends to evolve to a state of increasing entropy.
Next, considering the node attributes, the compositions of each motif evolution state are counted. The results are shown in Fig. 7. The conclusion of the motif evolution in the long period is the same as that in the short period. However, the probability of motif evolution remaining unchanged in the long period is less than that in the short period. In contrast, the probability of the motif vanishing and transforming in the long period is greater than that in the short period. Figure 7 shows motif evolution in the long cycle considering node attributes. 1st, 2nd, and 3rd represent the evolution probability of the top 3 motifs. The number in parentheses indicates the probability of   6 Motif evolution in the long cycle motif evolution. The node colour represents the shareholder category. The red node represents FIC, the yellow node represents GC, the green node represents II, the black node represents SOC, and the blue node represents QFII.

Risk warning of important motifs in the financial crisis
The stock market is essentially an information market, and information guides stock prices [39]. As important investors in listed companies, shareholders often play the role of information carriers [32]. Major financial risk events are often caused by unpredictable random events that trigger minority shareholders to sell stocks. This behaviour spread throughout the network and ultimately led to the collapse of the entire stock market. In 2015, the Chinese stock market suffered the worst crash in history.
The Shanghai Composite index fell as much as 2000 points in a two-month period from June to July. The huge short-term volatility of the stock market caused heavy losses for investors. This section will explore the evolution of important motifs during the stock market crash. Regardless of the node attributes, only the link weights are considered. Each panel in Fig. 8 demonstrates the evolution law of the different important motifs. To establish the proposed scheme, the motif ratios of the real network and the null network (ECM) are shown in the same figure. Intuitively, the proportion of Motif 4 in the real network compared to that in the null model shows an abnormal decline from 2014 before the stock market crash to 2016 after the stock market crash. In contrast, Motif 5, Motif 6 and Motif 7 show no significant decreasing trend.
Furthermore, the node attributes of Motif 4 are distinguished to explore the evolution of specific motif categories. The top 10 Motif 4 categories per year considering node attributes are calculated and shown in Fig. 9. The Motif 4 category composed of GC, II and SOC maintained a high proportion until 2014. However, in the years before and after the stock market crash (2014-2016), the motif experienced an abnormal decline. In contrast, the Motif 4 category, which consists only of GC and II nodes, was unusually high around the stock market crash. These results suggest that SOCs did not play a ''ballaststone'' role in stabilizing the market during the financial crisis. In contrast, GC and II became ''scapegoat'' during the financial crisis and suffered huge losses. Figure 8 displays the evolution of important motifs. The horizontal axis is the time, and the vertical axis is the motif ratio.   Figure 9 shows the top 10 Motif 4 categories per year considering node attributes. The horizontal axis is the time, and the vertical axis is the top ten Motif 4 categories considering node attributes. The red node represents FIC, the yellow node represents GC, the green node represents II, the black node represents SOC, and the blue node represents QFII.

Conclusion
Based on shareholding data from 2007 to 2017, this study mines the motif structures of the SCN from a micro-perspective. Specifically, considering the link weights, the closed motifs are confirmed to be important motifs with statistical significance in the network. Further considering the node attributes, the motifs with all three nodes being FIC have the highest proportion.
Furthermore, the evolution of different types of motifs is investigated on two time scales: the quarterly and the annual cycle. All motif structures are found to have a higher tendency to vanish. The longer the period, the greater the possibility of vanishment. Furthermore, considering the node attributes, motifs containing FIC nodes have a higher tendency to disappear, while motifs containing GC nodes tend to keep the structure unchanged.
Once again, important motifs are being explored to determine if abnormal quantitative mutations exist before and after the financial crisis. Motif 4, which contains SOC, GC, and II nodes, was exceptionally low during the financial crisis. In contrast, the Motif 4 category, which consists only of GC and II nodes, was unusually high around the financial crisis. These results show that SOCs did not play a role in stabilizing the market during the financial crisis.
In addition, at the theoretical level, this research introduces an analytical and unbiased maximumentropy technique, which uses the knowledge of both strengths and degrees to reconstruct the unbiased ensemble of weighted networks. In applying our enhanced method to several real networks of different nature, the results show that it leads to a significantly improved reconstruction. Moreover, rigorous information-theoretic criteria are introduced to confirm that the joint specification of the strengths and degrees cannot be reduced to that of the strengths alone.
In short, this research explores the microstructure and evolution mechanism of information interaction among shareholders from a micro-perspective. It is of great significance to understand how information is transmitted in the network.

Declarations
Conflict of interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest. The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript.
Data availability The authors confirm that data will be made available on reasonable request.   The log-likelihood estimation