Investigating Current Directions in Cross-Industry Innovation Research – a Systematic Literature Review

Background: The companies’ operative environment is covered by a constant change through digitization, industry 4.0, artificial intelligence and megatrends, which pose enormous challenges. In order to master these challenges, the development of knowledge from foreign industries is a possible solution. This phenomenon is called cross-industry innovation. The aim of this study is to identify major directions in cross-industry innovation research. An essential part of science is to build on and enhance already published knowledge. In order to identify this knowledge and to offer a comprehensible and reproducible process, a systematic approach is required. For this study, a systematic literature review will be applied. During the review of the relevant literature, an understanding of the broad and deep nature of the literature emerges and it is possible to identify research gaps, which is often based on qualitative analyses. We will develop a new approach for analyzing literature based on semantic similarity analysis. Methods: In order to achieve the research objectives a systematic literature review is applied and slightly modified. First, we will apply a systematic literature review protocol to establish reproducible results. Second, we develop a new approach for analyzing literature using a combination of semantic similarity analysis and a cluster analysis. Conclusion: This review will aid in determining the directions of cross-industry innovation. Overall, six directions of cross-industry innovation research can be identified. Furthermore, this study offers a new approach for analyzing literature based on quantitative analysis like semantic similarity analysis.


Background
The digital transformation is confronting economies with radical structural change. Existing value chains are being modified by new data, networking, automation and the digital customer interface [1]. Many of these technological challenges cannot be mastered with existing knowledge, known strategies and proven technologies. New solutions are needed, which require interdisciplinary developed innovative solution. The search for interdisciplinary solutions is connected to an opening of the innovation process. The open innovation paradigm elaborated by [2] states a shift from the traditional thinking of closed companies' innovation departments to a more open corporate innovation culture.
However, while we know a lot about the open innovation phenomenon described by [2], literature on open innovation primarily focus on "one-on-one relations between innovation partners" [3]. According to [3], researchers should address more complex settings of open innovation practices. [4] identifies the need for more complex settings of open in-novation practices. They pursue the idea that the company's own innovation potential can be increased by involving distant knowledge source in the innovation process. [5] uses the term cross-industry innovation (CII) to describe the transfer of knowledge and solutions beyond industry boundaries based on analogical thinking.
CII is also gaining interest in industry. Since 2016, NASA has organized an annual "cross industry innovation summit" with topics related to CII [6]. A further conference took place 2018 in Hamburg, which focused primarily on CII between the creative industry and public or semi-public institutions [7].
In addition to the interest of the business community in the context of conferences, an article in the management magazine Harvard Business Review also refers to the topic of CII [8]. Besides industry, science is also interested in CII. A Google Scholar search of the term "cross industry innovation" in the period between 1970 and 1999, which was conducted on 11th March 2019, leads to 12 results. The same search in the period between 2000 and 2018 leads to 872 results. This raises the question about the development of CII research. Surprisingly, no study is published which summarizes the CII research and presents its major directions. Therefore, this study sheds light on CII research and contribute a systematic literature review for an improved understanding of its origin and its state-ofthe-art. This leads to this paper's research question: What are the major directions in CII research?
The structure of this study is divided into six chapters. After the introduction in chapter 1, the methodological basics with the focus on the systematic literature review follow in chapter 2. In chapter 3, the systematic literature review is adapted and concretized for the case of the CII. Chapter 4 presents the results, which is followed by chapter 5 that introduces future research by the use of the morphological box method. Chapter 6 summarize the work of this study with a conclusion.

Research Methodology
An essential part of science is to build on and enhance already published knowledge. In order to identify this knowledge and to offer a comprehensible and reproducible process, a systematic approach is required. For this study, a systematic literature review (SLR) is applied and adapted.
During the review of the relevant literature, an under-standing of the broad and deep nature of the literature emerges and it is possible to identify research gaps [9]. In the following, according to [9] a SLR process is applied and slightly modified. [9] analyzes scientific articles on the methodology of literature review with the objective to develop a guidance of how to conduct a SLR. The process can be divided into the major steps "planning the review", "searching and extracting literature" and "analyzing literature and reporting the review". The three major steps comprise eight sub steps "formulate the problem", "develop and validate the review protocol", "search the literature", "screen for inclusion", "assess quality", "extract data", "analyze and synthesize data" and "report findings" (see Fig. 1).
insert Fig. 1 around here The first major step "planning the review" comprises the sub steps "formulate the problem" and "develop and validate the review protocol". First, it is important to become aware of the SLR's goal and "formulate the problem" which requires determining re-search questions to customize the following research design. This is because the research design should be designed to answer the research questions [9].
Within the sub step "develop and validate the review protocol" a suitable research de-sign has to be developed and documented. The protocol allows a reproducible SLR, which improves the reliability.
Furthermore, the protocol should include the research question, inclusion criteria, search strategies, quality assessment, data export strategy and reporting.
The second major step "searching and extracting literature" comprises the sub steps "search the literature", "screen for inclusion", "assess quality", "extract data". The major goal is to identify and collect the relevant articles and ensure the quality.
Within the sub step "search the literature" the literature search methods are determined. [10] proposes a backward and forward search. The backward search uses the references of an article as a starting point to get a list of potential articles. A research area can be explored in an iterative search for further topic related references [10]. In contrast to backward search, forward search considers the citing articles of a basic article. In an iterative forward search approach, it is possible to open up a research area [10]. Today's literature databases (Scopus, Web of Science, and Google Scholar) are the primary way to search for literature. These databases can be searched based on keywords [11].
Different types of databases existing, each with a different scope.
In addition to the basic selection of the search strategy, detailed characteristics of the search query must be determined. In literature databases, the entire metadata of an article can be searched using a search string. By brainstorming or reviewing the common literature, keywords can be identified which delimit the research field. Furthermore, it is important to identify possible synonyms. In addition to the keywords, the publication date, indices, type of publication, etc. can also be used to restrict the search space.
In the sub step "screen for inclusion" the identified articles must be reviewed if they are relevant or not. It is necessary creating inclusion and exclusion criteria to evaluate the articles.
In the sub step "assess quality" [9] checks the quality of the identified articles based on the complete textual information.
In the sub step "extract data" depending on the type of review, different information of an article can be relevant. The required data leads to different extraction procedures. Thus, in a meta-analysis that is connected with a meta-regression, only the required data is collected [9]. In a review, with the aim of opening up the content of a research area, the major focus is on the acquisition of full text information and the subsequent extraction of topics.
The third major step "analyzing literature and reporting the review" comprises the sub steps "analyze and synthesize data" and "report findings". In the sub step "analyzing and synthesizing data" different analysis steps have to be carried out depending on the type of chosen review. At this point, both qualitative and quantitative analysis methods can be considered. One considerable qualitative analysis method is the qualitative content analysis, in which text components can be categorized based on a codebook [12]. A Co-citation analysis or a semantic similarity analysis can be regarded as quantitative analysis methods. A co-citation analysis links cited articles to identify core articles in scientific literature and offers an opportunity to enhance the comprehension of the intellectual structure [13][14][15]. Two articles, which have been cited, together by another article are counted as a co-citation [16][17][18]. Articles with a high number of co-citations therefore have a higher similarity.
Relationships between articles can be established not only with the help of citations, but also by analyzing the semantic content. To establish connections between articles, the content of the identified articles can be subjected to a semantic similarity measurement. Articles with a high number of similar text elements have a higher similarity.
The sub step "report findings" serves to present and interpret the analysis' results. Furthermore, the sub step should point out opportunities and directions of future research. All novel findings and unsuspected results should be highlighted.

Adaption And Concretization Of The Systematic Literature Review Process
Following the approach of [9], an adaptation and concretization of the SLR is done in this section to identify current directions of cross-industry innovation (CII) research (see Table. 1). Adaptation means (i) selection or deselection of sub steps from the SLR process by [9] and (ii) extension by newer and particularly fitting elements such as semantic similarity analysis. Concretization is the concrete application of the sub steps. The sub steps "formulate the problem" and "develop and validate the review protocol" are not adapted or concretized as they are necessary for the planning of the study but do not provide any results. In the following the sub steps "search the literature", "assess quality", "screen for inclusion", "extract data", "analyze and synthesize data" and "report findings" are adapted and concretized.
insert Table. 1  First, the search is performed in the entire full text including references. If an article refers to another article with the expression "cross-industry innovation" in the title, the citing article will be identified as CII article based on its citation. Second, a reproducible search in Google Scholar is not possible because the search algorithm is not available. Third, predatory journals are listed in Google Scholar, which leads to lack of quality. Therefore, Google Scholar is not considered as search database.
To increase the recall and the reliability, the literature databases Web of Science (WoS) and Scopus are selected. These databases are among the largest and most comprehensive literature databases in the world [19]. Both databases allow a search by restricting different fields of the metadata. Within the search string, these fields can be linked by Boolean operators (AND, OR, NOT). Five different fields are restricted for the search in Scopus and WoS. The first field is restricted by the keywords. Based on brainstorming, keywords for CII research are searched and synonyms are derived. The major objective in brainstorming is to abstract the first two word components of the term cross-industry innovation in order to find synonyms. Thus, the synonym "inter" was identified for the term "cross". For the term "industry" the synonyms "sector/sectoral" and "organizational" are identified. No synonyms are searched for the term innovation, as this is the core component of research interests. After the brainstorming, all possible combinations of the brainstorming results are created and saved for further processing. During a first test, by including the word combination "inter-organizational innovation", the amount of data increases exponentially, which leads to a high decrease of precision. Therefore, this term combination is removed from the search string. As a second database restriction, the type of publications is restricted to peer reviewed articles, since these articles are reviewed and should have a higher quality [11]. Consequently, conference proceedings, dissertations or working papers are not considered. As a third database restriction, only articles in English are taken into account, since a semantic similarity analysis is applied a unified language is necessary for further analyses. For the fourth database restriction, the period is limited. The period between 2000 and 2018 will be considered, assuming that research efforts in the field of open innovation have also begun with the emergence of research efforts in the field of CII. For the fifth database restriction, the indices in which journals are listed are restricted to Social Science Citation Index (SSCI), SSCI Expanded, and Emerging Sources Citation Index (ESCI). This ensures that no articles from distant scientific disciplines such as medicine are included. These database restrictions are translated into the query language of WoS and Scopus. The search stings are shown in appendix.
This search leads to the identification of 376 articles in WoS and 343 in Scopus. After a merging and cleaning the data sets from duplicates, 512 articles remain as potential CII articles.
(ii) Assess quality -For this study, in addition to [9] the quality assessment is related to the search string to test whether the search string has a sufficient recall. For this purpose, a basic article of the research field has to be identified. Afterwards, the references and citing articles of the basic article have to be examined to create a comparative data set. Then the search string results and the comparative data set is compared whether the data sets match or whether there are any deviations.
If there are deviations, it has to be examined why articles could not be found and an adjustment of the search string is accomplished. Subsequently, sub step "search the literature" is carried out again.
The article [5] is used as basic article and a forward and backward search is applied, which leads to the identification of 29 CII articles. After comparing results of the sub steps "assess quality" and "search the literature", 22 of 29 CII articles can be found in the first data set. The subsequent analysis of the unidentified articles shows that some articles do not use the term "cross-industry innovation" but deal with topics around co-operation or innovation across industry boundaries. For this reason, the search string is extended by the word combinations "analogical thinking innovation", "cross industry alliances innovation", "external problem solver + innovation", "crossing, domain specific boundaries + innovation". Accordingly, with the extended database search string it is possible to identify 27 out of 29 articles. The identification of the remaining two articles would be possible using an additional search string adjustment but this would also lead to an increased number of nonrelevant articles, a loss of precision and an enormous assessment effort.
To examine the robustness of the search string, an analysis of the CII related terms "open innovation", "absorptive capacity" and "knowledge transfer + innovation" is con-ducted. These terms are taken from the standard literature and can be regarded as theoretical background or directions of CII research. The search is performed in Scopus using the same search string like in sub step "search the literature". However, the key-words are replaced with the new search terms. Each search term is separately searched. The search term "open innovation" leads to 2566 results. The search term "absorptive capacity" leads to 976 results and the search term "knowledge transfer + innovation" leads to 351 results. In order to investigate how many CII articles can be found in the data set with the respective search term, a sample (confidence interval = 95%; margin of error = 10%) is drawn.
This leads to a sample size of 93 articles for the search term "open innovation", 88 articles for the search term "absorptive capacity", and 77 articles for the search term "knowledge transfer + innovation". The maximum percentage share of CII articles in the samples is 2%. All identified CII articles in the samples were also identified by the first search string executed in sub step "search the literature". Consequently, the implementation of the three presented terms does not lead to an improvement of the recall, rather to a reduction of the precision.
(iii) Screen for inclusion -The research design of this SLR leads to the following inclusion criterion: The study should analyze cooperation or innovation across industry boundaries. Furthermore, the study should focus on a company level not an economic industry perspective.
After an initial review of the title, abstract, keywords the exclusion criteria are established. Following the inclusion criteria, the research focus should lie on cooperation or innovation across industry boundaries. In some cases the term cross industry appears in combination with words like sample, comparison or data which suggests a cross-industry data basis rather than an analysis of CII research.
If articles are identified only based on these terms, they will be removed from the data set. The screening for inclusion leads to 48 identified articles.
(iv) Extract data -Data extraction involves downloading and merging metadata from Scopus and WoS on the one hand, and full text articles on the other. Downloading the full text articles is divided into four stage. In the first stage, the 48 full text articles are downloaded from the official pages of the publishers. This can lead to the problem that the university's own library has not concluded a license agreement with the publisher and therefore has no access to the publisher's full text articles. 37 of 48 full text articles can be downloaded. Furthermore, it turns out that in two cases the article's language is stored incorrectly in the databases. Only English articles are searched for, but in two cases, the abstract is in English while the full text is in German. These two articles are removed from the data set. The second stage involves a search on Researchgate. Researchgate is a social media platform that enables scientists to present and share published articles and projects. The remaining nine full text articles are searched on Researchgate and, if available, the authors are contacted and the full text article is inquired. Therefore, three additional full text articles can be procured. In the third stage, the focus shifts to direct contact and the e-mail addresses of the first authors are identified.
Afterwards, six authors are directly contacted and five of them reply to share their articles. In the fourth stage, the last article is searched for with the help of interlibrary loan and a copy is obtained.
Interlibrary loan is a service where copies of articles can be searched and requested at other universities. Within this four-stage process, it is possible to obtain all 46 full text articles.
All text information are copied from the available 46 articles and inserted into Excel. Each chapter of an article is assigned to one of seven categories. The categories represent an exemplary structure of a scientific article and include abstract, introduction, da-ta, method, results, discussion, and conclusion. With this categorization, it is possible to analyze selected categories separately or in bundles. Unfortunately, almost every article has a different structure, so that the assignment is partly connected with a bias. In addition to the full text extraction, the articles' metadata from Scopus and WoS are down-loaded and consolidated.
(v) Analyze and synthesize data -The data is analyzed on two levels. First, the metadata of the articles are used to display descriptive statistics. Second, the major thematic directions are identified by a semantic similarity analysis and cluster analysis based on the articles full text information.
Another possibility would be a clustering based on co-citations but a first test showed that only 27 articles have co-citations of which only 20 articles have more than ten co-citations. The clusterinternally topics of the resulting clusters are slightly consistent. Furthermore, there is a strong time bias in the co-citation analysis. It should be noted that many articles were published in the last three years of the regarded period. Therefore, the number of citations is on a low level. Consequently, the similarity between articles is calculated based on the full text information.
Before the semantic similarities can be measured, however, the data set must first be cleansed [20]. Different filters are used for this purpose using the PatVisor™, an analysis tool of the Institute for Project Management and Innovation [21]. (i) Stop words such as "the" or "and", etc. are removed from the texts. (ii) Roman and Arabic numbers are removed, as they have no relevance to the content. (iii) Synonyms are harmonized using a search and replacement filter to make the nouns used consistent in the data set. (iv) Furthermore, conjugated verbs must be converted to their infinitive form using a lemmatizer. Nouns that occur in the plural are transformed into their respective singular forms [20]. The similarity of texts to each other is determined with the help of a semantic analysis.
Thereby word combinations are formed, extracted, counted and compared with each other to measure similarities [22]. The calculation between the articles is done based on bi-grams with a minimum word length of three in a word window of four. Thus, word combinations of two words are compared with each other. All text information except images, formulas and references from the articles are taken into account for the similarity calculation. In order to obtain meaningful similarity values, filters are applied which filter out stop words, numbers and unify synonyms. Since numbers pro-vide no added value in terms of content, they are identified and deleted using a Roman and Arabic number filter. In addition, a lemmatizer is applied which transforms conjugated verbs into their infinitive forms and Nouns in plural to their respective singular forms. Finally, the articles are searched for abbreviations and synonyms and these are unified with the help of a synonym filter.
Following [23] or [24], the linkage type complete linkage type b and c, and the similarity coefficient double single-sided-inclusion is selected. Since the size of the available full text articles varies greatly and the double single-sided-inclusion coefficient takes into account the different text length between comparative articles, this coefficient is the right choice.
The cluster analysis is applied in a two-step procedure. In the first step outliers have to be identified. Therefore, a cluster analysis using the single linkage method is applied for all 46 articles. The method combines the most similar articles systematically to form clusters. Especially dissimilar articles are clustered at the end of the procedure. Articles that are clustered at the end of the method's procedure are outliers and eliminated. The cluster procedure is implemented in the programming language R with the function hclust. The dissimilarity matrix is read and the function hclust is executed. In addition, to identify outliers the corresponding dendrogram is analyzed. A dendrogram is a graphical illustration of a hierarchical cluster formation in the form of a tree structure. By means of a dendrogram, the point of clusters' union or division can graphically identified [25]. The similarity in the data set is on a low level. This can be related to the thematic distance as well as to the style of writing. Based on the dendrogram, the last six added articles are removed, since the overall distance level is very high. These six articles are additionally reviewed which leads to the insight that the research of these articles seems to be far away from the core CII research field.
(vi) Report findings-The findings are divided into two parts. The evaluation of the different crossindustry innovation directions and a discussion about future research based on the creativity technique morphological box. The morphological box is a creativity technique based on [26], which is a method to identify solutions in multi-dimensional and complex environment. The morphological box is built on parameters and their corresponding specifications. A complex problem can be divided into its independent parameters and possible specifications. A followed holistic view on the combination of the different parameters' specifications leads to diversified solution space. This method can be used to find opportunities or needs for future research.

Reporting The Results Of The Review
This section presents the results of the descriptive statistics plus the semantic similarity and cluster analysis. First, the descriptive statistics are presented, followed by the se-mantic similarity and cluster analysis. Finally, the resulting clusters are examined in de-tail.

Descriptive statistics
Scopus and Web of Science (WoS) offer a large number of evaluable fields, which can be different.
The field "research field", for example, which gives a rough classification of a research field, is only available in the Scopus. In order to give an overview of CII research, the following section describes insert Fig. 3

around here
Third, the evaluation of the authors gives information to leading scientists in the field of CII (see Table   2). The 46 identified articles are written by 92 different authors. Ellen Enkel leads the ranking with nine articles. She is involved in 20% of all identified CII articles. Thus, it can be stated that she is one of the most influential researchers in the field of CII. Ellen Enkel is followed by Oliver Gassmann and Sebastian Heil, each with four articles. Some of these articles are also published in co-authorship with Ellen Enkel.
insert Table 2 around here Fourth, closely associated with the authors is the country distribution. For this purpose, the country codes of the corresponding affiliations from the metadata are used. Since several authors can submit an article, a weighting is calculated and presented based on the overall sum of countries' percentage share at the articles (see Table 3). Most of the research originates from Germany. insert Table 3 around here Fifth, the citations are used to identify influential publications. When looking at the citations per publication, four articles are characterized by a high number of citations. [5] is the most cited article with 140 citations. This is followed by [27] (70 citations), [28] (58 citations) and [29] (40 citations).
These four publications can be classified as very influential. They are followed by 15 articles with a citation rate between 15 and 39. 11 articles can be classified having a weak influence because these articles have less than 10 citations. 30 articles have not any citations, which is not necessarily surprising, since it takes a certain amount of time for an article to be cited.

Semantic similarity measurements and cluster analysis -
The clustering of the remaining 40 articles is done by hierarchical cluster analysis based on the Ward method. This cluster method is also realized with the function hclust in the programming language R using the dissimilarity matrix. Since six articles have been re-moved from the data set, the similarity measurement is performed again for the 40 remaining articles with the same settings. Using the resulting dendrogram and the scree plot, the number of clusters is determined. To determine the number of clusters, first the scree plot is used and the elbow criterion is applied [30]. According to this criterion, the number of clusters should be five. In this case, the elbow criterion serves only as a rough orientation, since the scree plot is irregular. Therefore, the number of clusters is varied to check if there is a better solution. In addition to the five-cluster solution, the six-cluster solution is tested. The six-cluster solution divides a relatively large, undefined cluster into two smaller but more consistent clusters. Therefore, the six-cluster solution is select-ed for further analyses.
By means of multidimensional scaling (MDS), the similarities and the identified clusters can be illustrated on a map. The MDS is a technique, which allows visualizing dissimilarities on a map having the objective to represent the distances as good as possible [31]. For this study, a MDS is created based on the similarity matrix and the Euclidean distance measurement. Furthermore, the articles of six clusters are highlighted at the MDS with different symbols (see Fig. 4). In general, the stress value according to [32] is used as the value for the quality of the MDS. For this MDS the stress value is 0.21, which is a value that titles a relatively weak significance of the MDS. Evaluating the MDS all articles from cluster 4 are relatively close to each other. Furthermore, clusters 3 and 6 show a relatively large dispersion on the map.

around here
By means of the cluster, analysis six clusters were identified which are described in the following (see Table 4).
insert Table 4  The second cluster (C2) contains two case studies of CII in Italy's food and pharmaceutical industry.
The cluster includes two articles from 2016, which use the CII theory for "exploring to what extent external knowledge sourcing affects innovation" and "how internal and external drivers affect innovation in Italian food industry" [35,36]. The studies identify an increased need for absorptive capacity and cooperation with universities for knowledge transfer and product innovation. Again, the sample size is listed as a limitation. Therefore, the results are representative in a limited way.
The third cluster (C3) has a very broad thematically base. It comprises 17 articles in the period between 2009 and 2018. The research questions of this cluster are not positioned at the core research field namely absorptive capacity, knowledge transfer and CII processes. The third cluster including CII case studies [27,42,44]. However, [51] attempt to develop identification methods for a specific type of CII. Furthermore, the topics knowledge sharing [47], inter-sectoral technological integration [49], knowledge transfer [38] are mentioned.
The [54] investigates how to build potential absorptive capacity for distant collaboration in order to obtain radical solutions. Furthermore, [53] examines a case study (for the company Henkel) how firms systematically learn from distant partners across industry boundaries. [28] examines the influence of absorptive capacity on different types of knowledge. Knowledge from companies' own industry, knowledge from other industries and knowledge from research institutions are examined. [28] mentions that R&D intensity does not significantly influence the absorptive capacity for inter-and intra-industry knowledge. in the data set and is to be understood as the core research on CII through two conceptual articles.
Within the two conceptual articles, process models for the generation of innovations are presented [29,64]. In addition to conceptual approaches, qualitative and quantitative research methods are also applied in this cluster.

Perspectives In A Morphological View
From the previously described clusters, in combination with the morphological box, eight different parameters of CII research are identified (see Table 5). The parameters may have different specification. (1) The first parameter is called "research design", which is distinguished between the specifications qualitative, quantitative or mixed method. (2) The second parameter is the "data basis", which is distinguished between the specifications survey, patent, interview and customer innovation survey. (3) In the development or exploitation of knowledge, different CII types occur, which leads to the third parameter "CII types" and differs in the specifications outside-in, inside-out and coupled. (4) Some authors mention the radicalness of CII, which leads to the fourth parameter "degree of innovation" with the specifications radical or incremental. (5) Many authors do not exactly specify the cooperation structure or focus on the analysis of two partners, which leads to the parameter "number of CII partners" with the specifications 2, 3 or greater than 3. (6) The sixth parameter is the "cultural composition" of CII partners, i.e. whether the partnership is national or international. The parameter "cultural composition" is derived with the specifications national and international. (7) In some articles, the small sample size with the associated limited representativity is notice-able. Thus the seventh parameter "sample size" is derived with the specifications 1-50, 51-100, >100. (8) Using cluster analysis, six research directions were identified, which leads to the seventh parameter "research direction" with the specifications distance measurement, food pharma case study, diversified spectrum, absorptive capacity, knowledge transfer, and processes and analogies.