Next-generation antivirus endowed with web-server Sandbox applied to audit fileless attack

Almost all malwares running on web-server are php codes. Then, the present paper creates a next generation antivirus (NGAV) expert in auditing threats web-based, specifically from php files, in real time. In our methodology, the malicious behaviors, of the personal computer, serve as input attributes of the statistical learning machines. In all, our dynamic feature extraction monitors 11,777 behaviors that the web fileless attack can do when launched directly from a malicious web-server to a listening service in a personal computer. Our NGAV achieves an average 99.95% accuracy in the distinction between benign and malware web scripts. Distinct initial conditions and kernels of neural networks classifiers are investigated in order to maximize the accuracy of our NGAV. Our NGAV can supply the limitations of the commercial antiviruses as for the detection of Web fileless attack. In opposition of analysis of individual events, our engine employs authorial Web-server Sandbox, machine learning, and artificial intelligence in order to identify malicious Web-sites.


Introduction
The internet has been characterized as the main way of communication in contemporary society. The internet is notable for the convergence of all previously existing media. Through the World Wide Web, it is possible to watch television, listen to the radio, read the newspaper and have access to any other form of information among different peoples, languages and cultures.
With the popularization of the Internet, students create their own virtual study environment, provide their own content and can interact, actively and constantly, in the search for knowledge. The World Wide Web impels the creativity, technological abilities, and makes possible the opening of different visions, as well as the increase of communication and learning abilities.
As a side effect, the increasing popularization of the Internet propitiates that malware 1 production continues to growing in a fast way still during some years seen that internet is a large middle of propagation of malicious applications. Only in 2016, were created more than 7.1 million malwares, an increase of 47.3% compared to the year 2015 (Intel 2018). It is emphasized that most of the disorders caused by malwares are irreversible.
Therefore, more and more is being investing in digital security through new technologies in antivirus, firewall and biometrics. It is estimated that antivirus services are present in 95% of personal computers, in addition to 84% of Internet users having firewall services enabled and 82% have Automatic Updates enabled on Microsoft Operating System (Microsoft 2013).
Despite the massive presence of cyber-surveillance mechanisms in almost all personal computers, cyber-attacks have been causing billionaires damages on an increasingly larger scales (Microsoft 2013). One of the reasons for this failure is because once vulnerability is solved attackers try to come up with another tactic (Sophos 2014).
Currently, instead of conventional infections, through portable executable files (PE files), modern cyber attacks employ fileless attacks. Technically, fileless server-side attacks are launched directly from a malicious web-server to a listening service in an endpoint (personal computer) (Conrad et al. 2017). According to monitoring of Skybox Security during 2017, of the 55 new vulnerabilities exploited, only 24% were client-side vulnerabilities and 76% were server-side. The decline in client-side exploits reflects the trend in the decline in client-targeting exploit kits (Skybox 2018).
Symantec Research estimates that the main forms of cyber-infections, over the internet, are actually regular Web sites that have been compromised or infected with malicious code (Symantec 2012). The full list of most dangerous website attack categories can be seen in Fig. 1. It is interesting to note that web-sites hosting adult/pornographic content are not in the top five, but ranked tenth. Moreover, religious and ideological sites were found to have triple the average number of threats per infectious site than adult/pornographic sites (Symantec 2012). It is concluded that, regardless of their behavior, an Internet user is not safe of infectionssince conventional advice is no longer useful as, for example, not accessing pornography sites aimed at avoiding cyberinvasions.
Once installed on compromised sites, malicious scripts dynamically launch drive-by attacks through web browsers in client-side (personal computers). Almost all malware run-1 malware (Malicious + Software) ning on server-side are php codes (a server-side scripting language commonly used on web-sites) (Sophos 2014).
In synthesis, server-side attacks, through php malwares, have the ability to deceive web-hosting providers and other cyber-surveillance mechanisms (Sophos 2014). Then, the present paper investigates (i) 86 commercial antiviruses for the expertise of malicious phps. Malwares detection ranged from 0 to 78.50%, depending on the antivirus. It is emphasized that in our study, the analyzed malwares have their malicious performances documented by the incident responders. Even so, more than a half part of the commercial antivirus evaluated had no knowledge about the stocks of the malware files investigated.
In order to validate our authorial antivirus, the proposed paper develops (ii) a controlled environment named Web-Server Next Generation Sandbox. In our environment, the Server-side and the Client-side are developed in order to virtualize the malicious web-server and the personal computer, respectively. Then, the malicious behaviors, originating from the web fileless attack, serve as input attributes of the statistical learning machines. Our feature extraction monitors 11,777 behaviors that the fileless attack can do when launched directly from a malicious web-server to a listening service in a personal computer. Our NGAV (Next Generation Antivirus) solution can (re)construct a chain of events, visualize the actions that an actual attacker might take, as opposed to looking at individual, discreet events.
Our NGAV (iii) achieves an average performance of 99.95% in the distinction between benign and malware. So, the present paper demonstrates that artificial intelligence is a good alternative for commercial antivirus manufacturers. The limitations of cyber-security mechanisms can be procued by our antivirus expert in auditing web filelless attacks. Our engine employs advanced data science, machine learning, and artificial intelligence in order to identify malicious behavior from Server-side.
This work is organized as follows: in Sect. 2 we present the limitations of commercial antiviruses. In Sect. 3, we discuss the state-of-the-art regarding artificial intelligence antiviruses; in Sect. 4, we present the proposed methodology; in Sect. 5, we make a comparison between the authorial network and classic ones; in Sect. 6, we show the results and some discussions. Finally, in Sect. 7, we make the general conclusions and discuss the perspectives of our work.

Commercial antiviruses limitation
Despite being questioned more than a decade ago, the modus operandi of the antiviruses is based on signatures when the suspicious file is queried in blacklisted databases (Lima 2020;Sans 2017). Therefore, it is enough that the hash of the investigated file is not in the blacklist of the antivirus so that the malware is not detected. The hash functions as a unique identifier for a specific file. So, understanding the limitations of commercial antiviruses, developing and distributing variants of malicious applications is not a difficult task. To do this, basically minor changes to the original malware, such as repetitive loops and conditional deviations, can be made to the original malware using routines that actually have no utility to the program, without instructions within its scope. However, useless changes can make the hash value of the modified malware different from the hash value of the original malware. Therefore, antivirus software classified as original malware will not detect malware that increments by empty routines. It is worth noting that there are exploits that automatically create and distribute the original malware variants. We came to the conclusion that signature-based antivirus is not effective when attacked by variants of the same malware (Lima 2020;Sans 2017).
Through the VirusTotal platform, the proposed paper investigates 86 commercial antivirus with their respective results presented in Table 1. We have utilized 200 malicious phps obtained from the PAEMAL dataset (2020). The aim of this paper is to verify the amount of virtual threats catalogued by the antiviruses. The motivation is that the acquisition of new virtual viruses plays an important role in combating malicious applications. Therefore, the larger the malware database, named blacklist, best tends to be the defense provided by the antiviruses.
Initially, the malware is sent to a server belonging to the VirusTotal platform. After that, php files will be analyzed through 86 commercial antiviruses associated with VirusTotal. Soon, the antivirus provides diagnostic information on the php files submitted to the platform. VirusTotal allows three different types of diagnostics to be issued; malware, benign, and omission.
As for the first possibility of VirusTotal, the antivirus detects the malignancy of the suspicious file. In the proposed experimental environment, all submitted files are malware documented by the incident responders. Soon, the antivirus hits when it detects the malignancy of the investigated file. Malware detection indicates that the antivirus provides a robust service against cyber-intrusions.
In the second possibility, antivirus software proves the benignity of the file under investigation. Therefore, when the antivirus software declares the benignness of the file, since all samples are malicious, it is a false negative. This means that the file under investigation is malicious software, but the antivirus software has proved its benign in the wrong way.
In the third possibility, antivirus software will not diagnose on suspicious files. Omission means that the investigated file has never been evaluated by an antivirus program, so the robustness of real-time evaluation of it is low. The antivirus software did not perform a diagnosis, which shows its limitations for large-scale services. Table 1 shows the results achieved by the 86 commercial antiviruses evaluated. Ikarus antivirus obtained the best performance being able to detect 78.50% of the malware investigated. A major adversity is the fact that the antivirus manufacturers do not share their respective blacklists of malwares due to commercial disputes.
Through the analysis of Table 1, the proposed paper points to an aggravating factor of this adversity; the same antivirus manufacturer does not even share its databases among its distinct antiviruses. Note, for example, that Avast and AVG antiviruses belong to the same company. Their blacklists are not shared with each other. Therefore, the commercial strategies, of the same company, hinder the chartering of malwares. It is complemented that antivirus manufacturers are not necessarily concerned with avoiding cyber-invasions, but in optimizing their commercial incomes.
Malware detection ranged from 0 to 78.50%, depending on the antivirus investigated. On average, the 86 antiviruses were able to detect 16.82% of malware evaluated, with a standard deviation of 21.88 %. The high standard deviation indicates that the detection of malicious samples may change suddenly depending on the antivirus software selected. We concluded that the protection against network intrusion is due to the choice of powerful anti-virus software with a powerful latest blacklist.
On average, the antiviruses attested false negatives in 49.49% of the cases, with standard deviation of 38.32%. Atone to the benignity of a malware may imply unrecoverable losses. A person or institution, for instance, would rely on a certain malicious application when, in fact, it is a malware. Also as an unfavorable aspect, about 57% did not issue an opinion on any of the 200 malicious samples. On average, the antiviruses were omitted in 33.68% of the cases, with a standard deviation of 45.61%. The omission of the diagnosis points to the limitation of the antivirus regarding the detection of malwares in real time.
It includes as adversity, in the fight against malicious applications, the fact that commercial antivirus does not have a standard in the classification of malwares as seen in Table 2. We chose 3 of the 998 malwares to exemplify the miscellaneous of rankings given by commercial antivirus. Because there is no default, the antivirus will give them the names they want. For example, McAfee-GW-Edition can identify a php malware such as "HEUR_HTJS. HDJSFN "and McAfee, belonging to the same company, identify it as "JS. Blacole. H ". Therefore, the lack of a pattern hinders the cybersurveillance strategies as each category of malware should have different treatments (vaccines). We concluded that it is not feasible to learn the supervised machine aiming to recognize the standard of php malware categories. Due to this confusing labelling of MultiClass Classification, provided by the experts (antiviruses) as seen in Table 2, it is statistically improbable that some machine learning technique acquires generalization capability.

State-of-the-art
A large difficulty in combating malicious phps, is that web-browsing and other web-based applications are realtime by nature (PALOALTO 2013). Traditional signature-based detection engines often miss a large number of today's threats. While signature detection is great for known malware, detecting new forms with signature profiles is extremely difficult (PALOALTO 2013).
Given the limitations of commercial antiviruses, organizations seek to supply the shortcomings of traditional antiviruses through cyber-security mechanisms named NGAVs (Next Generation Antivirus). NGAVs solutions seek to recognize patterns of malware behaviors through advanced data science, machine learning, and neural networks (Lima 2020;Sans 2017). The recommendation of the incident researchers is that the NGAVs add multiple layers of threat intelligence and advanced analytics (Skycure 2016). Table 3 presents a list of some important antiviruses of the state-of-the-art. As the state-of-the-art antiviruses target distinct systems (IoT, Windows, Android), all these learning algorithms were replicated employing our materials and methods in order to avoid unfair comparisons. Comparison among the authorial antivirus and the state-of-the-art are in Sect. 7. Lima et al. (2021) create a NGAV able to detect PE files malwares with an average accuracy of 98.32%. The executable is submitted to a disassembling process. Then, the executable can be studied, and therefore, it is possible to investigate the malicious intent of the file. Analysis made by Lima et al. (2021) extracts 630 features of each executable. These features are the input neurons of artificial neural networks. The classification of neural networks aims to group executables of 32-bit architectures into two classes: benign and malware. Antivirus made by Lima et al. (2021) employs shallow neural networks.
On the other side, deep nets-based antiviruses have also achieved excellent accuracies. Su et al. (2018) achieve an average accuracy of 94.00% in order to detect IoT (Internet of Things) malwares (Su and Vasconcellos 2018). The deep network structure has 6 layers. There are 3 layers with learnable weights: 2 convolutional layers, and 1 fully connected layer. The network is trained with 5000 iterations with a training batch size of 32 and learning rate 0.0001. Maniath et al. (2017) create antivirus in order to detect ransomware by employing LSTM (Long-Short Term Memory) deep networks (Maniath and Ashok 2017). The training network consists of 3 layers with 64 LSTM nodes in each layer. The deep network is trained with training for 500 epochs with a batch size of 64. Maniath et al. (2017) achieve an average accuracy of 96.67%.
The growing popularity of mobile devices makes them targets for malicious applications.   In addition to antivirus, deep networks have also been employed in firewall (Wozniak and Silka 2015). The goal is to segregate malicious network traffic from benign. A nonintelligent firewall has a static formulas which block selected user ports and applications. If the user needs this port for some application, he must manually disable the blocking. This port opening can result in a vulnerability for malicious traffic. In firewall made by Wozniak et al. (2015), the training network consists of 16 layers of LSTM nodes ranging from 256 to 2 neurons in the final layer. The supposition is that the deep network is trained for 1000 epochs. The firewall made by Wozniak et al. (2015) achieve an average accuracy of 99.99%.
Due to the excellent results obtained by deep learning techniques, common sense has been created that deep learning can provide the best accuracy in any application type; in fact, this consideration is untrue. Deep neural networks, especially convolutional networks work based on linear filter convolution. Although it has an important role in computer applications, filter convolution is limited to applications when forming vector flow gradients.
Consider, for example, biomedical images from mammography equipment. The image is full of phenomena that interfere with the breast (Lima et al. 2016). Then, the convolution of the filter is important in order to eliminate noise, and therefore, discard small irregularities in findings corresponding to potential cancers. The convolution technique as a Gaussian filter is very important for reducing noise in biomedical images.
Deep learning also is important when applied to incomplete (unconstructed) dataset. It is common to have failure to collect data in various types of applications such as detection of air circulation, sea current, voice recording, among others. In these situations, the convolution of filters is important. There is the calculation of the weighted averages of the gradient. Thus, the convolutional deep network is able to make inferences and fill in the pending data.
Inference on pending data makes deep learning capable of handling multiple datasets simultaneously. For example, pending voice signals can be inferred from digital image processing in the same time unit. Conversely, noisy sound signals can be filtered out because there is no match in the other signals in the same time unit.
As a counter-example, consider the repository illustrated in Table 4. The features are completely disconnected from each other despite belonging to the same neighborhood. An application suspected of trying to check Wi-fi data has no correlation with accessing the victim's image gallery or browser. Then, when applying the linear convolution of filters in the repository, illustrated in Table 4, accessing browser, containing the value 0, would be treated as noise. The explanation is that its neighborhood has positive values. In synthesis, the suspect application would be accused of accessing the victim's browser even the extraction of features having audited the inverse. Then, convolutional techniques suffers a disadvantage when applied to malware pattern recognition.
Another major disadvantage of the deep net is the long training time. As an aggravating factor, deep networks have lower parallel capabilities because their layers are sequential. Deep networks are data processing models which use cascade deep graph. Conventionally, an architecture is constructed containing many sequential layers. State-of-the-art deep network architectures use layers containing different types of processing. Usually, the layers concern the: convolution, normalization, functions of activation, and dimensionality reduction. 2 Each layer uses as input data the result of the processing of the layer immediately before it. If all layers were executed simultaneously, the producer-consumer paradigm would occur. Reading would occur by the consuming layer while the data would still be processed by the producing layer. Thus, the consumer layer could access premature and incorrect data (Patterson 2017).
State-of-the-art deep learning has millions of adjustable parameters (Chollet 2017). Even if a hypothetical supercomputer had millions of cores, deep nets would not be able to optimize their processing times. The producer-consumer paradigm prevents all layers from running simultaneously. A layer can be executed only after the upper layer has completed its work. For example, the normalization layer cannot run in parallel with the convolutional layer in state-of-the-art deep nets. The sequential cascade model of deep nets presents a major challenge to parallel processing. In applications that require frequent training (learning) as antivirus software, too much time consumption can become an obstacle, because on average 8 new malwares are created every second (Intel 2018). In synthesis, there should be no difference in the learning time of antivirus software compared with the rate of new malware generation worldwide.
One of our specific goals is to prove the efficiency of antivirus with shallow neural networks compared to deep learning. The shallow neural network can obtain the same performance as the next-generation deep learning model after proper parameter setting and training (Ba and Caurana 2014). In addition to training time, our shallow neural networkbased antivirus can also provide statistically higher accuracy than the deep network-based antivirus with reduced computational cost.
Despite the difficulties, deep nets-based antiviruses are able to obtain average accuracies superior to 90% in malware detection (Maniath and Ashok 2017;Su and Vasconcellos 2018). The suspect executable goes through a reverse engineering process aiming to revert the binary file in its assembly code. This methodology, named static analysis, can overcome the limitations of traditional signature-based detection engines.
However, static analysis can be easily bypassed by a web fileless attack. In synthesis, static feature approaches are invalid in order to combat fileless attack seen there is no way, for a personal computer, to audit source codes stored and executed on a remote web server.
The inability of the static feature approach to accurately detect fileless attack shifted the focus of malware research to dynamic approach. Then, instead the impracticable static analysis, the extraction of features of our NGAV concerns the traces of calls performed by all processes spawned by the malware, files being created, deleted and downloaded by the malware during its execution, memory dumps of the malware processes, and network traffic trace in PCAP format.
In all, our dynamic feature extraction monitors 11,777 behaviors that a web fileless attack can do when launched directly from a malicious web-server to a listening service in a personal computer. As experiments, the authorial antivirus has its accuracy compared to state-of-the-art antiviruses. In order to avoid unfair comparisons, feature extraction stage is standardized with our 11,777 behaviors in evaluated state-ofthe-art antiviruses. Our antivirus can combine high accuracy with reduced learning time.

Materials and methods
There are a variety of reasons why web-based malware presents such a challenge for traditional antivirus products (PALOALTO 2013). One of these concerns the acquisition of the malware php file since it would be necessary to allow the web-hosting provider from where php was run remotely. However, in digital forensic practice web-hosting business often work in a disintegrated way and does not share information with cyber-security companies (Sophos 2014). Therefore, the action strategies of web hosting companies hinder and slow the coping with phps malwares.
It is increasing the use of malicious php scripts designed to make web-servers able to perform nefarious activities (Sophos 2014). As adversity, web server-side attacks are already exceptionally difficult to catalog (Sophos 2014). Due to the low-margin nature of the hosting business, when some hosting providers discover an infectious server, they often simply rebuild a new virtual server instance, rather than diagnosing what happened (Sophos 2014). Since neither they nor their security partners understand what happened, the new instances often become rapidly infected as well (Sophos 2014).
So the proposed paper claims that it is necessary to integrate web-hosting providers and cyber-surveillance companies targeting Server-side malware sharing. The lack of information sharing is one of the main challenges in order to combat malwares. The level of insight that cyber-defenders have of cyber criminals' activities is considerably limited, and the identification of evolving tactics usually occurs after malicious campaigns start (Intel 2018).
On the other hand, cyber-criminals have the privilege of accessing researches conducted by communities of enthusiasts who can download and use open source malwares (Intel 2018). This discrepancy in access to information, between cyber-defenders and cyber-criminals is on ascension and is technically named of asymmetrical cyberwarfare (Intel 2018).
Then, the present paper employs the PAEMAL (php Analysis Environment Applied to Malware Machine Learning), a dataset which allows the classification of php files between malicious and benign. PAEMAL is composed of 200 php malware files and 1,000 other benign php files. In regard to malwares, PAEMAL extracted malicious php files from VirusShare. 3 In order to catalog the 200 samples of php malwares, it was necessary to acquire and analyze, by authorial script, about 1.3 million malwares from the reports updated by VirusShare daily.
Regarding the benign php files, the catalog was given from native scripts of open source tools such as phpMyAdmin. 4 It is emphasized that all benign files were submitted to the VirusTotal audit. Therefore, the samples of benign php files, contained in the PAEMAL, had their benevolence attested by the world's leading commercial antiviruses companies. The results obtained corresponding to the analyses of the benign php files and malwares, resulting from the audit of Virus-Total, are available for consultation in the virtual address of PAEMAL (2020).
If there was no treatment in PAEMAL, there would be a tendency of higher hits in the majority class (benign) and high error rate in the minority class (malware). The explanation is because the number of benign and malware samples are unequal: 200 and 1000, respectively. Therefore, when employing unbalanced databases, the accuracy rates of the classifiers can be favored if they are tendentious in relation to the majority class. Aiming not to favor biased classifiers, the present work employs a strategy inspired by biomedical engineering works. In the health area, the presence of an abnormality (e.g. cancer) occurs every thousand diagnoses of healthy patients.
Then, the biomedical strategy concerns to repeating the training according to the ratio between the majority and minority classes (200:1000 = 5 iterations) (Wang 2017). In our paper, for each five iterations, a distinct package of 200 samples of the major class (benign) is presented to the 200 samples of the minority class (malware). In this way, the non-favoring of tendentious classifiers is guaranteed, allied to maintain the diversity of the different samples, from the majority class (benign), contained in the dataset (Wang 2017).
In clinical practice, the absorption of a malignant sample (e.g., cancer) leads to a false negative. It is worth noting that the patient's chances of recovery are associated with early detection of the cancer. Then, the proposed paper is inspired by the state-of-the-art methodological care of biomedical engineering in order to reserve relevant amounts of benign and malware specimens in separate packages for training and testing. Therefore, assuming a sample reserved for testing with little or no instance of the malware class, then the classification, tendentious to the benign class, would have its favored hit rate. Thus, the proposed paper presents the methodological care to select equally, randomly, benign and malware samples destined for training and testing.
The purpose of PAEMAL dataset creation is to give a full possibility that the proposed methodology being replicated by third parties in future works. Therefore, PAEMAL freely makes available all its benign and malware samples: • Virustotal audits, • Dynamic analysis made by our Web-Server Next Generation Sandbox, In its virtual address, PAEMAL also provides its 1,000 benign php files. In addition, our dataset displays the relationship of all other 200 php files, this time, malwares. Then, there is the possibility of acquiring all malwares, employed by PAEMAL, through the establishment of agreement and submission to the terms of use of ViruShare. It is concluded that our PAEMAL database enables transparency and impartiality to research, in addition to demonstrating the truthfulness of the results achieved. Then, PAEMAL is expected to serve as a basis for the creation of new scientific works targeting new Web-Server Next Generation Antiviruses. Figure 2 shows the diagram of the methodology proposed in block diagram. Initially, an web application is created employing a suspicious php script on Server-side. Then, the client requests the suspected Web page from Server-side. From there, malicious behaviors, originating from web-fileless attack, are audited in Windows 7 by our Web-Server Next Generation Sandbox. In following stage, the dynamic features of php files are stored in the format of machine learning repository. As method of features mining, some behaviors audited by our Sandbox are despised. The adopted criterion of mining refers to the elimination of features which concern a single php file, for example, process IDs, process names, md5, sha, among others.

Proposed methodology
PAEMAL presents 200 and 1,000 malignant and benign php files, respectively. If there was no treatment in PAEMAL, there would be a tendency for greater accuracy hits in the majority (benign) class and high error rate in the minority class (malware). Thus, the proposed methodology employs a strategy inspired in the biomedical engineering state-of-theart (Wang 2017). For five iterations, a distinct package of 200 copies of the major class (benign) is presented to the 200 copies of the minority class (malwares). After the database balancing, the suspicious behaviors of the php files serve as input attributes of the artificial neural networks employed as classifiers. The goal is to group the php files into two classes; benign and malwares. In each combination (200 benign: 200 malwares) from the dataset balancing, the k-fold cross validation method is used, where k = 10. In the first iteration, the first part is destined to the test set, while the others are reserved for the training. This alternation occurs for ten iterations until all ten parts have been applied to the test phase. The accuracy of the classifier is the arithmetic mean of the hit rate obtained in the ten iterations.

Web-Server next generation sandbox
Sandboxes are excellent controlled environments in order to audit suspicious files. 5 The actions audited by the Sandbox refer to alterations in the OS registry, traces of calls performed by all processes spawned by the malware file, files being created, deleted and downloaded by the malware during its execution, memory dumps of the malware processes, and network traffic trace.
There are also Sandboxes which only accept auditing through their web-sites, in this case, it is enough that the user uploads the suspicious file, obviously, present in his/her computer. Although they play a key role in digital forensics, the Sandboxes employed by the state-of-the-art do not present mechanisms to audit a fileless attack.
In order to validate our NGAV, the present work develops a controlled environment named Web-Server Next Generation Sandbox. Our goal is to monitor the harmful effects of webfileless attacks launched directly from malicious web-servers to listening services in personal computers. In our environment, we have developed the Server-side and the Client-side in order to virtualize the malicious Server-side and the personal computer, respectively. Our Server-side consists of: • Linux OS: Linux is the underlying operating system running a large percentage of the Internet's web servers including many of the world's most important, highest volume, always-connected websites (Sophos 2014). • Php interpreter: Security Threat Report has been identifying that the large majority of web-exploits are malicious php scripts designed to make Linux servers operate as nodes in nefarious activities (Sophos 2014). Php scripts are dynamic web codes executed on the Server-side, unlike HTML that is static and it is executed on the Clientside (browser). Php is invoked on the server by the client and has its source code executed remotely. Therefore, its result, including HTML, is forwarded to the client for its use. • MySQL Server: php works as a complement of the Web server adding new functionalities, much used for database querying in MySQL language, and widely used in the persistence of financial transactions and personal data of internet users. MySQL Servers are employed in a vast amount of web applications like e-commerce, social networks, blogs and human resource control portals. • Apache HTTP Server: Apache Server is responsible for executing the HTTP protocol, the basis for communication on the World Wide Web. HTTP uses the Client-server model based on the request and response paradigm. Initially, the Client-side tries to establish a connection with Server-side by sending a request. The server then responds with the requested content if the connection succeeds, in addition to the other technical information on the server, such as the protocol version. After sending the response by the server, the established connection is closed.
In the Client-side, the Windows 7 SP1x86 Operating System is used endowed with the following facilities: • Java Virtual Machines: fileless exploits, commonly, aim to corrupt the JVM and consequently affect the Java Security Manager (IBM 2014). The reason is that Java Security Manager is a class that manages the external boundary of the JVM. Java Security Manager controls how Java application, executing within the JVM, can interact with resources outside the JVM (OS level, e.g.: Windows 7) (IBM 2014). • Edge default browser: even after the malignant web-page has been closed, its malefactions can persist due to webbrowser infection. From the corruption of files linked to the execution of the web-browser, the victim starts to suffer nefarious activities such as web-browser redirection, automatic downloads of other malwares and robbery/ hijacking of social networking passwords (CISCO 2018). • Adobe Reader: similarly, even after the malign web-page has been closed, there may be persistent malefactions in non-browser environments such as Adobe Reader. The explanation is that malicious scripts can be embedded in various file types such as pdf among others. In this way, hidden malware can be executed automatically when the document is opened by the victim (CISCO 2018). • Microsoft Office: besides pdf files, malwares can also be embedded in various types of files such as rtf, doc, ppt, among others. Thus, the malware can be executed automatically when the document is opened by the victim through Microsoft Office. As strategy anti-forensic, the first stage script extracts another script into randomly named files on disk and creates a scheduled task to start it a minute later (Symantec 2017).
In our Web-Server Next Generation Sandbox, Client-side queries server-side through the standard URL 6 regarding the http protocol, local domain, and port 80 for TCP 7 in the transport layer. Then the suspicious php file is invoked on the server and has its code executed and therefore its result is forwarded to the Client-side for its use. From there, the malicious behaviors, originating from the Web-fileless attack, are audited in Windows 7 by our Sandbox. Forensics, conducted by our sandbox, employs authorial scripts, Cuckoo Sandbox scripts, Oracle software and VirtualBox virtual machine.
On average, our dynamic feature extraction monitors 11.777 behaviors that the fileless attackcan do when launched directly from a malicious web-server to a listening service in a personal computer. Our NGAV solution can actually (re)construct a chain of events, visualizing what the actual attacker might be up to, as opposed to looking at individual, discreet events. In the next sub-section, the features cataloged by our Web-Server Next Generation Sandbox will be described.

Dynamic feature mining
The features of php-type files originate through the dynamic analysis of suspicious files. Then, in our methodology, malware is executed in order to infect the Windows 7 audited, in real time (dynamic), by our Web-Server Next Generation Sandbox. The amount of dynamic features depends on the iterations from the database balancing. For five iterations, a distinct packet of 200 samples of the (benign) major class is presented to the 200 samples of the minority class (malware). Then, in the five iterations are audited 11,767, 11,786, 11,802, 11,764, and 11,767 suspicious behaviors, respectively.
In our Web-Server Next Generation Sandbox, the number of features depends on the behavior of the audited files. On average, 11,777 features are generated regarding the monitoring of the suspect file in the proposed controlled environment. To facilitate understanding the input layer neurons, our repository extends the description of the attributes audited by the authorial antivirus. Also, authorial machine learning repositories are freely available in matlab and csv formats (PAEMAL 2020). Next, the groups of features related to the controlled monitoring of the files investigated are detailed.
• Features related to Code Injection, a technique used by an attacker to introduce code into vulnerable programs and change their behavior. The auditory checks whether the tested server tries to: -Execute a process and inject code while it is uncompressed; -Injecting code into a remote process using one of the following functions: CreateRemoteThread or NtQueueApcThread.
• Features related to Keyloggers, programs that record all user-entered keyboard entries. Primary purpose is illegally capturing passwords and other confidential information. • Features related to the search for other possibly installed programs. The goal is to verify if the audited server searches for: -Discover where the browser is installed, if there is one in the system. -Discover if there is any sniffer or a installed network packet analyzer.
• Features related to disable Windows components. Our NGAV checks if the tested server tries to disable any of the windows programs: CMD.exe, Device Manager, or Registry Editor, by manipulating the Windows Regedit. • Features related to memory dump, process in which the contents of RAM memory is copied for diagnostic purposes. The proposed digital forensics audits if the server tries to find malicious URL's in memory dump processing. • Features related to crypto-coin mining. Our NGAV verifies if the tested server tries to connect to mining pools, the goal is to generate virtual currencies without the cognition (and not benefiting) the computer owner. • Features related to system modifications. Our NGAV verifies if the tested server tries to create or modify system certificates, security center warnings, user account control behaviors, desktop wallpaper, or values in the ADS (Alternate Data Stream). • Features related to Microsoft Office. It checks if the server tested tries to: -Create a suspicious VBA object -Run Microsoft Office processes inserted in a command line interface packed object.
• Features related to packing and obfuscation. The proposed digital forensic verifies that the tested server: -Has packet or encrypted information indicative of packing -Creates a slightly modified copy of itself (polymorphic packing); -Is compressed using UPX (Ultimate Packer for Executables) or VMProtect (software used in order to obfuscate code and virtualize programs).
• Features related to persistence, functionality of backup information in a system, without the need to register them before. Our Sandbox audit if suspicious server tries to use JavaScript in a registry key value in regedit. • create an ADS (Alternate Data Stream), NTFS feature that contains information to find a specific file by author or title, used maliciously because as the information that is present in it does not change the features of the file associated with it, transforming them into an ideal option for the construction of rootkits, because they are hidden (esteganography). • Feature related to POS (Point Of Sale), type of attack that aims to obtain the information of credit and debit cards of victims. • Features related to powershell code injectors. Our Sandbox checks if the tested server attempts to create a suspicious powershell process; • Features related to processes. Checks if the tested server: -Is interested in some specific process in execution; -Repeatedly searches for a process not found; -Tries to fail some process.
• Features related to ransomwares, cyber-attacks that turn the computer data inaccessible, requiring payment in order to restore the user access. • Features related to Windows 7 OS (Regedit): -Changes in associations between file extensions and software installed on the machine; -Changes to the current user information; -Driver corruption; -Changes to the Windows appearance settings and settings made by users, such as wallpaper, screensaver, and themes; -Changes to Hardware Settings.
• Features related to Trojans (malicious program that enters a computer masked as another program, legitimate) of remote access, or RAT (Remote Access Trojans). • Features related to network traffic trace Windows 7 OS in PCAP format. • Features related to DNS servers (Domain Name System, servers responsible for the translation of URL addresses in IP).

Neural networks for malware pattern recognition
As for malware pattern recognition, an essential task relates to assigning a class (label) to each file investigated from its features. So, based on a file setting, named training set, it is possible to formulate a hypothesis about the different classes linked to our NGAV. Therefore, it is up to the classifier, to estimate the class of an unprecedented file by comparing the features of its audited behavior in real time and those captured during the training stage. The present work employs artificial neural networks as classifiers.
In order to select the best configuration of the neural network architecture, diverse learning functions and initial configurations that require a bigger volume of computations are computed, such as doubling the number of neurons in the hidden layer. The neural network architectures have an input layer containing a number of neurons relative to the feature extraction vector of the fileless attack.
In order to select the best configuration of the neural network architecture, different learning functions and initial configurations that require a lot of computation are used, such as doubling the number of neurons in the hidden layer. The neural network architecture has an input layer, which contains many neurons relative to the feature extraction vector of the fileless attack. In arrange to select the finest arrangement of the neural organize design, diverse learning capacities and introductory setups that require a bigger volume of computations are utilized, such as multiplying the number of neurons within the covered up layer. The neural arrange designs have an input layer containing a number of neurons relative to the include extraction vector of the fileless assault.
As mentioned before, in the five iterations, from the database balancing, 11,767, 11,786, 11,802, 11,764 and 11,767 suspicious behaviors are audited, respectively. Thus, the input layer of nets contains as many neurons as the audited behaviors in each of the iterations. They relate to the dynamic features coming from the fileless attack. The output layer has two neurons, corresponding to benign and malware classes.
Neural networks are computational intelligence models used to solve problems of pattern recognition having as main characteristic the power of generalization in front of data not presented to the network. In most of the neural networks, such as MLP (Multilayer Perceptron) (Xiang et al. 2005), knowledge about network parameters is needed to obtain maximum performance in order to solve the problem. A common concern in this network type is avoid getting stuck in minor locations (Huang et al. 2012), being necessary to add network control methods to get rid of these regions. Another common characteristic in this type of network is the high training time required to make the network able to perform classifications correctly.
The ELM (Extreme Learning Machine) network has as main characteristic the training speed and data prediction when compared MLP neural networks. ELMs (Extreme Learning Machines) are powerful and flexible kernel-based learning machines whose main characteristics are fast training and robust classification performance (Huang et al. 2012). The ELM network is a single hidden layer network, not recurrent, based on an analytical method to estimate the network output weights, in any random initialization of input weights.
The ELMs have been widely applied in several areas such as Biomedical Engineering (Azevedo 2015a(Azevedo , b, 2020Lima et al. 2020Lima et al. , 2014Lima et al. , 2016Pereira 2020). ELMs networks can greatly contribute to the advancement of digital security of devices. The proposed paper applies the ELMs in the area of information security specifically in the recognition of malware patterns.
Mathematically, in ELM neural network the input attributes x ti correspond to the set {x ti ∈ IR; t = 1, ...v i = 1 , ...n; }. Therefore, there are n features extracted from the application and v training data vectors. The hidden layer h j , consisting of m neurons, is represented by the set h j ∈ IR; j ∈ N * ; j = 1, ...m . The ELM training process is fast because it is composed of only a few steps. Initially, the input weights w ji and bias bias b jt are defined in a random generation. Given an activation function f :IR→IR, the learning process is divided into three steps: • Random generation of weight w ji , corresponding to the weights between the input and the hidden layers, and bias bias b jt . • Calculate the matrix H , which corresponds to the output of the neurons of the hidden layer. • Calculate the matrix of the output weights β = H † Y , where H † is the generalized Moore-Penrose inverse matrix of the matrix H , and Y corresponds to the matrix of desired outputs s.
The output of the hidden layer neurons, corresponding to the H matrix, is calculated by the ϕ kernel, dataset inputs and weights between the input and the hidden layers shown in Eq.
The learning of ELMs networks is based on kernels. The kernels are mathematical functions used as learning method of ELMs neural networks. Kernel-based learning offers the possibility of creating a non-linear mapping of data. There is no need to increase the number of adjustable parameters such as the learning rate commonly used in neural networks based on backpropagation. Equation (4) describes a Sigmoid kernel ϕ of an ELM network with the results shown in the Fig. 3a. ϕ is in function of f (x t,1...n , w 1...m,1...n , b 1...m,t ).
The proposed work resulted in an antivirus composed of ELMs neural networks seeking the preventive detection of malwares. Instead of using conventional kernels, authoring kernels will be used for ELMs. We employ mELMs (morphological ELMs), ELMs with hidden layer cores based on the morphological operators of Erosion and Dilation image processing. Kernels are mathematical functions employed as a method for learning neural networks. This learning method enables the creation of non-linear data mapping. Thus, there is no need to increase the number of adjustable parameters, as in the learning rate used in networks with backward propagation.
There are two fundamental morphological operations, Erosion and Dilation. The theory of Mathematical Morphology can be considered constructive, because all operations are built based on Erosion and Dilation. Mathematically, Erosion and Dilation are defined according to Eqs. (5) and (6), respectively: Where f : S → [0, 1] and g : S → [0, 1] normalized images in the form of a matrix named S format, where S ∈ IN 2 . pixel is defined by the Cartesian pair (u, f (u)), where u is the position associated with the value f (u). v is the matrix of f (u), covered by g. The operators and are associated with the maximum operation, while and are associated with the minimum operation. g is the structuring element for both Erosion and Dilation (Santos 2011). g is the negation of g. In Eq. (5) initially occurs the negation of the structuring element g. Then, it happens the operation of maximum is technically named the active region of the image. Finally, the value g ( f )(u), in the position u, of the eroded image receives the minimum value between the maximums, via the operator. g ( f )(u) gets the value 0 associated with absolute black. Erosion overlays g to the original image f . The goal is that areas similar to g expand (Santos 2011). By associating 1's to absolute white and 0's to absolute black, Erosion increases the darker areas and eliminates the regions with greater intensity (Santos 2011).
Equation (6) shows the performance of the morphological Dilation operation. Due to mathematical precedence, the minimum ∧ operation denoted by f (v) ∧ g(u − v), occurs, where f (v) refers to the original image matrix f covered (matched) by g. Therefore, the value δ g ( f )(u), at the u, position, of the expanded image receives the maximum value between the minimums, through the ∪ operator. Dilation superimposes the structuring element g on the original image f . The goal is that areas similar to g expand. By associating 1's with absolute white and 0's with absolute black, the Dilation increases the areas with more intense tonality and eliminates the dark regions (Santos 2011).
Our antivirus employs mELMs (Morphological Extreme Learning Machines). They are inspired by mathematical morphology based on non-linear operators of Erosion and Dilation. The authorial kernels make the association between image processing and artificial neural networks. The active region of image corresponds to input neurons of the neural network. The structuring element corresponds to weights between the input and hidden layers. According to Eq. (5) concerning the Erosion image operator, the Erosion ELM kernel can be defined according to Eq.  (x t, 1...n , w 1...m, 1...n , b 1...m, t ).
Similar to the Erosion kernel, Eq. (8) defines the Dilation kernel inspired by Eq. (6) and referring to the morphological operator of Dilation.
In order to achieve good performance in ELMs, it is necessary to choose a kernel that is able to optimize the decision boundary for the presented problem as seen in Fig. 3a. A Linear kernel gets great results when used to solve a linearly separable problem. However, when used to solve non-linearly separable problems as shown in Fig. 3b for a sigmoid distribution, it does not perform satisfactorily. Figure 3c, d show the performance of the mELM kernel Erosion and Dilation, with the respective accuracies of 95.05% and 99.50%. It is possible to notice when analyzing the figures that the mELMs have the capacity to accurately map the different distributions referring to different problems.
The effectiveness of our morphological neural networks is due to their ability to adapt to any type of distribution, since their mapping does not obey any conventional geometric figure. Nonlinear algorithms allow the treatment of data containing discontinuous regions (Abo-Hammour 2014; Abu Arqub 2016; Arqub 2020). The mapping of decision border is made by their own training data, the very position in n-dimensional space that will determine whether that surrounding region belongs to class 1 or class 2. The n represents the number of neurons in the input layer. Therefore, our mELM kernel is able to naturally detect and model the n-dimensional regions divided into different classes by using Math Morphology.

Results of ELM networks
We employ seven different kernel types for the ELMs neural networks. In the state-of-the-art, seven of these kernels are described by Huang et al. (2012), they are; Wavelets Transform, Sigmoid, Sine, Hard Limit and Tribas (Triangular Base Function). In addition, two authorial kernels are employed: Dilation and Erosion.
The Wavelets kernel have no hidden layer (Huang et al. 2012). The calculations are based on the transformation of the input data and can work nearly to kernels contain-  (Huang et al. 2012). A good generalization capability of these kernels depends on an adjusted choice of parameters (C, γ ) (Huang et al. 2012). The cost parameter C refers to a reasonable equilibrium point between the hyperplane margin width and the classification error minimization in relation to the training set. The kernel parameter γ controls the decision boundary in function of classes (Huang et al. 2012). There is no universal method in the sense of choosing the parameters (C, γ ).
The best combination (C, γ ) depends on the data set employed (Huang et al. 2012). In the proposed paper, there is the investigation of the parameters (C, γ ) inspired by the method proposed by Huang et al. (2012), which consists of training increasing sequences of C and γ , mathematically, 2 n , where n = −24, −10, 0, 10, 25 . The hypothesis is to verify if these parameters with values different from the standards; (C = 1, γ = 1), generate better results. In the Linear kernel, there is only the investigation of the cost parameter C, it is not possible to explore the kernel parameter γ (Huang et al. 2012). Table 5 details the results obtained by the ELMs neural networks with Wavelets kernel. For five times, a separate package of benign samples (majority class) is presented to the package of malware samples (minority class). In each of these 5 times, we investigate 10 folds in relation to crossvalidation of the k-fold method, where k = 10. Then, 5 × 10 total 50 distinct executions in each row on Table 5. In relation to precision in the test phase, the maximum average performance was 57.05% in the distinction between benign and malware samples through the Wavelets kernel with the parameters (C, γ ) = (2 10 , 2 0 ). In Table 5, there are only the best and worst case descriptions, in that order.
The Sigmoid, Sine, Hard Limit, Tribas, Dilation, and Erosion kernels employ hidden-layer architectures. Then, there is the investigation in relation to amount of neurons in the hidden layer of these kernels. The hypothesis is to verify if architectures that require a higher volume of calculations, such as doubling the number of neurons in the hidden layer, are able to generate better accuracies rates compared to architectures that require a smaller amount of calculations. There is the evaluation of two architectures; they employ 100 and 500 neurons in their respective hidden layers. These architectures have excellent accuracy in the application of ELM networks in Biomedical Engineering (Lima et al. 2016). Table 6 details the results obtained by the ELMs neural networks with the Sigmoid, Sine, Hard Limit, Tribas (Triangular Basis Function), Dilation, and Erosion kernels. For five times, a separate package of benign samples (majority class) is presented to the package of malware samples (minority class). In each of these 5 times, we investigate 10 folds referring to the k-fold method, where k = 10. Then, 5 × 10 total 50 distinct executions in each row of Table 6. Regarding precision, the maximum average performance was 99.95% with standard deviation of 0.05 through the Dilation kernel endowed with 500 neurons in its hidden layer. Then, our Dilation kernel suffers abrupt changes due to the initial conditions.

Results in relation to the state-of-the-art
Extreme neural networks can contribute to the advancement of information security. But deep networks are state-of-theart even with their inadequacies. To prove our theoretical basis, authorial antivirus uses shallow extreme neural networks instead of deep networks. In this present section, the proposed antivirus is compared to state-of-the-art deep network-based antiviruses.
In order to avoid unfair comparisons, the feature extraction stage is standardized by monitoring 7690 behaviors that the suspicious php file can do when executed purposely. In order to avoid unfair comparisons, the feature extraction stage is standardized by monitoring the behaviors that web fileless attack can do when launched directly from a malicious web-server to a listening service in a personal computer. In the classification stage, our antivirus is endowed with the mELM Dilation kernel and contains 500 neurons in its hidden layer. As experiments, the authorial antivirus is compared to antiviruses based on deep as based on shallow networks.
With regard to shallow net-based, the antivirus made by Lima et al. (2021) is replicated. Whereas the antivirus made by Lima et al. (2021) employs neural networks based on data backpropagation. Lima et al. (2021) investigate eleven distinct learning functions in order to optimize the accuracy of their antivirus. For each learning function, Lima et al. (2021) explore 4 hidden layer architectures. Our authorial antivirus is also compared to antivirus based on deep neural network. The antiviruses made by Su et al. (2018), Faruki et al. (2019 are replicated. Finally, the firewall developed by Wozniak et al. (2015) was also replicated.
Figures 4 and 5 are graphical representations of the results described in Table 7. Figure 4a presents the boxplots, from the training stage, in relation to authorial antivirus and state-of-the-art. The authorial antivirus obtained an average performance of 100.00% with a standard deviation of 0.00%. The antivirus made by Lima et al. (2021) obtained average accuracy of 49.70% and 100.00%, in its worst and best scenarios, respectively. These results were obtained using the learning functions "Resilient backpropagation (Rprop)" and "Fletcher-Powell conjugate gradient backpropagation" with 100 neurons in their hidden layers, respectively. The antivirus made by Su et al. (2018) obtains a training accuracy of 99.51% on average. The average accuracy, resulting from the training, was 89.99% through the antivirus made Maniath et al. (2017). Figure 4b shows the boxplots for the best accuracy in the test phase. The authorial antivirus obtained an average accuracy of 99.95%. Antivirus made by Su et al. (2018) obtains an accuracy of 99.50% on average. Antivirus made by Maniath et al. (2017) achieved an average performance of 89.90%. The antivirus made by Lima et al. (2021) achieved a mean accuracy of 49.67% and 99.96%, in its worst and best scenarios, respectively. Therefore, it is corroborated that neural networks based on backpropagation can suffer major variations, in their accuracies, depending on their configuration parameters. Then, the decision made by Lima et al. (2021) was salutary. This state-of-the-art antivirus explores different learning functions, gradients and architectures in order to optimize the accuracy of its neural networks based on data backpropagation. Figure 5a, b present the boxplots referring to the times spent during the training and test phases, respectively. In relation to training time, deep network-based antiviruses are slower since there is the use of deep network recurrent structure. In opposite, the authorial antivirus consumes only 23.05 s in order to conclude, on average, its training. Although learning based on backpropagation, the work made by Lima et al. (2021) concludes its training in the order of seconds. Regarding the time consumed during the test phase, all techniques consumed very close times without great discrepancies. Table 8 shows the confusion matrices of the techniques presented in Table 7 in percentage terms. The confusion matrix is important in order to verify the quality of supervised learning. In Table 8, B. and M. are abbreviations of Benign and Malware. The desired classes are arranged on the vertical label while the obtained classes are on the horizontal label. On confusion matrix, the main diagonal is occupied by cases whenever obtained class coincides with expected class, named true positives cases. Then, a good classifier has main diagonal occupied by high values and other elements have low values. Table 8 shows main diagonals emphasized in bold. Our antivirus, in the test phase, mistakenly classified on average 0.10% of cases as benign when they were malware cases (false negative). Following the same reasoning, there was a  mean classification of 0.00% of cases said to be malware when they were benign applications (false positive).
In digital forensics, a false positive would imply a benign application wrongly condemned. On other side, a false negative can result in an undetected malware. It is worth mentioning that malware can cause irreversible and irrecoverable harm to the entire world wide web. In synthesis, a false negative can result in the loss of the victim's dignity, finances and mental health. It is emphasized that authorial antivirus presents the lowest average percentage of false negatives with only 0.10%.
Still regarding Table 8, sensitivity and specificity refer to the ability of the antivirus to identify malware and benign applications, respectively. The proposed work presents the confusion matrix in percentage terms in order to facilitate the interpretation of sensitivity and specificity. In synthesis, the sensitivity and specificity are presented in the confusion matrix itself, described in Table 7. For example, the proposed antivirus averages 100.00% with respect to both sensitivity and true positives. Following the same reasoning, authorial antivirus obtains, on average, 99.90% for both specificity and true negatives.
In Table 9, we employ F1-score & MCC (Matthews correlation coefficient), and Kappa Index in order to give a holistic perspective in compassion between the authorial antivirus and the state-of-the-art. These metrics are derived from the confusion matrix shown in Table 8. Table 10 shows the parametric t-students and non-parametric Wilcoxon hypothesis tests between our antivirus and the state-of-the-art. It is possible to conclude that our authorial antivirus is statistically equivalent in comparison to best configuration of the antivirus made by Lima et al. (2021).
The explanation is that in both the parametric t-students and the non-parametric Wilcoxon tests, the null hypothesis were accepted. Therefore, our authorial antivirus and the best scenario of antivirus made by Lima et al. (2021) are statistically equivalent.
Authorial antivirus demonstrated a major advantage when compared to the state-of-the-art. Our antivirus was able to achieve the best average accuracy with 99.95% accuracy accompanied by a training time corresponding to 23.05 s. The relationship between percentage accuracy and training time in reverse order is employed in Biomedical Engineering (Lima et al. 2016). It is admitted that the establishment of this relationship assumes an important role in Information Security since 8 (eight) new malwares are released per second (Intel 2018). So, paradoxically, a newly launched antivirus may already be obsolete and require new training through a newly discovered vulnerability. In syntheses, the learning time of an antivirus should not be discrepant in comparison to the rate of new malware creation worldwide.

Conclusion
With the growth of the World Wide Web, the estimative is that malware propagation will continue to grow for a few years as the Internet is the primary means of cyber-infection  (Microsoft 2013). Despite the presence, almost totalitarian, of antivirus on personal computers, malwares have caused billionaire losses on increasingly larger scales. One explanation is that cyber-attacks are systematically renewed (Sophos 2014). Among modern cyber-invasions, we highlight the fileless attacks mainly through malicious web-servers. Currently, the user is infected through attractive web-sites endowed with Social Engineering, with ideological and religious content, unable to raise any kind of suspicion under their real intentions (Symantec 2012).
In this paper, we evaluated 86 conventional antiviruses against cyber-threats in php files since almost all malwares run on web-server are php codes (Sophos 2014). The detection variation of php malwares ranged from 0 to 78.50% depending on the antivirus. Commercial antiviruses, on average, managed to detect 16.82% of the threats. On average, conventional antiviruses reported false negatives and were omitted in 49.49-33.68% of cases, respectively.
The current work used VirusTotal as a system to automatically submit malicious code to commercial antiviruses. VírusTotal limits file submissions to full product platforms only. This implies that it was not possible to make comparisons between the fulls and free versions of the major worldwide antiviruses. It is assumed that the results of the free versions are substantially lower compared to the full licensed versions.
On average, 57% of commercial antiviruses, in full mode, did not detect any malware php file. It is important to note that the malicious samples evaluated are public domain and catalogued by incident responders. We understood that there are failures in services provided by commercial antiviruses in relation to protection against cyber-threats on a large scale and in real-time.  In order to supply the limitations of commercial antiviruses, state-of-the-art employs the analysis of the source code of the suspicious file, named static feature approach. Then, the algorithm can be studied and, therefore, it is possible to investigate the malicious intention of the file before even of it to be executed by the user Lima et al. (2021), Maniath and Ashok (2017), Su and Vasconcellos (2018). However, static analysis is impractical with fileless attack since the file is executed remotely and, therefore, its source code is not present on the personal computer. Then, instead of the unworkable static analysis, the extraction of features from our antivirus concerns the audit of the anomalous behavior on the victim's computer due to fileless attack.
State-of-the-art often employs deep network models for malware pattern recognition. The goal is to group suspicious applications into two classes: benign and malware. A disadvantage of deep networks is the long training time. As an aggravating factor, deep networks have low capacity for parallelism because the convolutional layers are sequential. Thus, a layer can only be executed after the layer immediately before it has finished its work. This can be a hindrance in applications that need frequent training like antivirus software. On average, 8 (eight) new malwares are released every second. Even a newly deep net-based antivirus may already be obsolete.
Extreme Learning can overcome the limitations of stateof-the-art deep network models. Extreme neural networks allow the creation of non-linear mapping of data. Authorial antivirus employs shallow extreme neural networks instead of deep nets. Our extreme learning class provides statistically equal accuracy to any deep learning technique. As advantage, extreme learning time costs seconds compared to a deep learning based antivirus.
Our NGAV can be extended to provide cyber-protection to local networks. Then, the future goal is that our NGAV can be executed in personal computers and in proxy server which is the intermediary between the World Wide Web and the local network. It will be promise of NGAV, executed in the proxy, monitoring the network traffic trace in PCAP format. In this way, it will be minimized the workload of our NGAV executed also in the personal computers. For this, it is necessary to create a new Web-Server Next Generation Sandbox endowed with an architecture composed of a webserver, a proxy server and multiple personal computers.
Author Contributions WS and WS conceived the presented idea of morphological extreme learning machine. WS developed the theoretical formalism and WS performed the implementation. RP, DS, and SS carried out the experiment. PL, RL, JO, TM constructed the dataset and validated the samples, SL wrote the manuscript with support from SF and EA. All authors discussed the results and contributed to the final manuscript and the interpretation of the results. All authors provided critical feedback and helped shape the research, analysis and manuscript.
Funding This study is not financed.

Data availability
To make our work easier to understand, the machine learning repository is freely available in matlab and csv formats. Also, we make available all benign and malware samples.

Conflict of interest
The authors declare that they have no conflict of interest.
Animal and human rights The authors declare that no human participants were involved in this research.
Informed consent This research did not include healthcare intervention of human participants.