Memory Malware Analysis: Detecting Malicious Signatures In Memory By VolatilityPlugin’s

doi:10.21203/rs.3.rs-2500418/v1

Download PDF

Research Article

Memory Malware Analysis: Detecting Malicious Signatures In Memory By VolatilityPlugin’s

https://doi.org/10.21203/rs.3.rs-2500418/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Memory forensics is used to implement and investigate malware that is executed or stored in RAM. Whether it is static malware analysis or dynamic malware analysis,each time the malware investigator retrieves the result, it is displayed in plaintext, and the investigator begins examining each result in the plaintext and triaging the malicious request. It's a labor-intensive process, and occasionally an investigator will upload malicious files to his or her computer to be analyzed for malware. These malicious files could contain worms or have the potential to infect the investigator's computer; if that happens, the attacker will keep an eye on all future investigations and the evidence they produce. With the help of this research and algorithm, whenever a malicious DLL or request is made, the algorithm will be able to identify it and flag it. This will save the investigator a lot of time because the investigator can upload files to his or her computer without worrying about whether they will be flagged as malicious behavior. We experimented wih multiple malicious files and our algorithm shows 98% efficacy.

Malware Analysis

Memory Forensics

Volatility

python

Kali Linux

Reverse Engineering

Today world has seen an enormous surge in computing traffic [20] [21[22] [23] [24]. With this hike in computing power,] more and more cyber-attacks are affecting corporate and government networks and sometimes even the IT systems underlying the critical infrastructure. These attacks raise major concerns from the law enforcement standpoint. Owing to the borderless nature of cyber-attacks, many criminals/offenders have been able to walk away due to the lack of supporting evidence to convict them. In this context, cyber forensics plays a major role by providing scientifically proven methods to gather, process, interpret, and use digital evidence to bring a conclusive description of cybercrime activities [1]. In the commercial software market flooded by security products, the development of forensics IT solutions for law enforcement has been limited. Though outstanding results have been achieved for forensically sound evidence gathering, little has been done on the analysis of the acquired evidence [2]. This is particularly true for volatile evidence sources such as physical memory and cache which is mainly due to the volatile and unstable nature of data which is residing on these media.

One saying that always stuck with me is ‘malware can hide, but it must run’. Malware will try and hide in places a user may not navigate, however when it runs there will be a process running on the device for the malware. So, based on this the easiest way to look for malware is to take a close look at the running processes on a device. [2][9]

There are two type of Analysis in the malware investigation static malware analysis and dynamic malware analysis after each every malware investigation there are few malicious signatures, hash and DLL’s stored in the memory like RAM once the target machine is turned on by forensic investigator those malware may be self-destroyed so the evidence will be lost or if this kind of malicious data store in the memory [15] so that’s the reason for memory forensics analysis Whenever investigator dump victim machine memory data to his/her system to analyze malicious data and to collect evidence if that harmful data infect the investigator machine than lot of other cases evidence will also be lost to prevent this kind of incidents this experiment will be good source because while memory investigation if any this kind of data triage it will pop in red color to prevent the investigator machine and secure the evidence to implement this we used tool called volatility it is an open source for memory forensics it was built using python programming language and plugins .[1][2]

1.1 Stattic Malware Analysis

Analysis that has been performed before it infects the target machine example, if a malware or application tries to infect the User machine but Antivirus or SOC Team block the trojan, after the block, the Incident response team start the investigation about this malware, the source and the Ip that it is communicating with And what are the files it is trying to import and export once after the Analysis they will take action not to repeat this kind of attack But sometimes it is also safe to perform basic Analysis on memory with this Analysis.[4] Example DLL Analysis by dependency walker, monitoring running application through process monitor. [4][5]

Advanced static analysis—also known as code analysis—dissects the binary file to study each component, still without executing it. One method is to reverse engineer the code using a disassembler [4]. Machine code is translated into assembly code, which is readable and understandable [4][5]. By looking at the assembly instructions, an analyst can tell what the program is meant to do. A file’s headers, functions, and strings can provide important details. Unfortunately, modern hackers are adept at evading this technique. By embedding certain syntax errors into their code, they can misdirect disassemblers and ensure the malicious code still runs. Because static malware analysis can be more easily foiled, dynamic malware analysis is also necessary

1.2 Dynamic Malware Analysis

Analysis after the attack, the malware investigator performs reverse engineering deep analysis of malicious signatures that they found as evidence [4][5]. Once they clearly complete both Analyses, they will start the memory forensics to see if there any malicious hashes still store in memory example, Analysis from 0X000000000 to 0X7FFFFFFF And 0X7FFDF000 [4][17]. This is the Analysis it will be performed once the malware has been stored or executed in the target machine or network in a simple way. It can be done through volatility plugins. Dynamic analysis tracks the program’s behavior looking for any signs of potentially malicious intent. This process may include analysis of any changes it makes within the registry, any writes it makes to memory, and any calls it makes to servers using APIs [4][5]. Supplementary network analysis can also uncover useful data concerning the type and quantity of data the suspicious program leaks, and potentially, the specifics of its remote command and control structure [4] [5] [9]. While employing a dynamic analysis approach generally results in a higher detection rate than simple static analysis, increasingly sophisticated malware authors have developed malware that is purpose-built to defeat dynamic analysis methods [2][9].

1.3 Memory forensics Volatility

After the static and dynamic Malware Analysis to verify the malicious hashes in the memory, the investigator performs memory forensics to check the malicious hashes in. had and memory performance by using a tool called volatility and the programming language [1][2]. When the investigator conducts the analysis, the volatility uses to fetch the malicious logs and stored memory to comment the fetched data will be clean, no message will be triaged by the tool, and investigator has to identify the negative and good processes volatility just brings the data that investigator asks to fetch it won't show difference between good hash and wrong hash by this it will be dangerous to the investigator. [9]

1.4 N-GRAM Hash for Malware Investigation

N-grams have long been used as features for classification problems, and their distribution often allows selection of the top-k occurring n-grams as a reliable first-pass to feature selection. However, this top-k selection can be a performance bottleneck, especially when dealing with massive item sets and corpora [7]. In this work we introduce Hash-Grams, an approach to perform top-k feature mining for classification problems. We show that the Hash-Gram approach can be up to three orders of magnitude faster than exact top-k selection algorithms. Using a malware corpus of over 2 TB in size, we show how Hash-Grams retain comparable classification accuracy, while dramatically reducing computational requirements.

Recent paintings have shown that byte n-grams frequently analyze low entropy capabilities, which include function imports and strings, which has brought into query whether byte n-grams can research information similar to better entropy degrees, including binary code [7]. We investigate that hypothesis in this work by performing byte n-gram analysis on only specific sub-sections of the binary file and compare to results obtained by n-gram analysis on assembly code generated from disassembled binaries [7]. We do this by leveraging the change in model performance and ensembles to glean insights about the data. In doing so, we discover that byte n-grams can learn from the code regions but do not necessarily learn any new information. We also find that assembly n-grams may not be as effective as previously thought. Disambiguating instructions by their binary opcode, an approach not once used for malware detection, is critical for model generalization. [1][2]

1.5 Executing algorithm in Python programming and implement in the volatility plugin

In this we are using the package called latexify.jl The package allows for apexification of a many different kinds of Julia object and gives output on many markdown’s and [7] supported input strings on this package are expressions, strings, numbers, missing, symbols, symbolic expression from sentinel data frame from dataframe.jl

Example on modifying the algorithm.

[13k 130]

using Formatting

latexify([12893.1 1.328e2]; fmt=x->format(round(x, sigdigits=2), autoscale=:metric))

“x/(2*k_1+x^2)”

str = "x/(2*k_1+x^2)"

latexify(str)

1.2ε+04

latexify(12345.678; fmt="%.1e")

1.6 python program header used for volatility

In the given header every header is useful for the different analysis of memory by seeing this header’s we can prapose that in this paper we are going to analyze and debug registery heap and random access memory And most important in this paper we also mentioned about the virtual memory because that is the place where lot of evidence is stored. About the vitual memory address it is mentioned in the 2^nd section [1][2]

import textwrap

import volatility.conf as conf

config = conf.ConfObject()

import volatility.constants as constants

import volatility.registry as registry

import volatility.exceptions as exceptions

import volatility.obj as obj

import volatility.debug as debug

import volatility.addrspace as addrspace

import volatility.commands as commands

import volatility.scan as scan

1.7 Uniqueness

Volatility performs memory forensics to fetch the result from the virtual memory, and after bringing, the result will be plain text. The investigator should find the unnecessary malicious data and start the analysis. The fetched data will be a huge volume, and it takes a lot of time for the investigator to analyze every log and find out which one is malicious or not. If the investigator misses any malicious files and that kind of file will be executed in the investigator's machine, and a lot of evidence will be missing to stop this kind of attack. This experiment will be very helpful to the forensics investigator in this we detect the malicious hash using the N-GRAM hash algorithm, or if any of the processes consume more time than that process request will triage in red colour and the investigator will easily find out the malicious signature, and they will be secure will executing this kind of attack in their machines.

Case, Andrew, and Golden G surveys the state-of-the-art in memory forensics, provide critical analysis of current-generation techniques, describe important changes in operating systems design that impact memory forensics, and sketches important areas for further research. [1]. The art of memory forensics is the in-detail analysis of memory forensics in mac, linux, windows [2]. This evaluation on multiple datasets reveals that Eureka can simplify analysis on a large fraction of contemporary Internet malware by successfully unpacking and deobfuscating API references [3]. Goals and challenges for static and dynamic analysis [4]. Practically exploitiong and analyzing Malware by the Kris kendall security researcher in mandiat [5]. In mentioned research various challenging scenario is discussed and using regular expressions evidence signatures are identified. Besides these scenarios recent Ransomware attacks can also be solved using volatile memory forensic analysis [6]. we apply these criteria in the scope of web intrusion detection and empirically validate their effectiveness with different learning-based detection methods for client-side and service-side attacks [7]. The evaluation of MathLatexEdit not only demonstrates the accuracy of our model, but also the usefulness of MathLatexEdit in editing real-world posts which are accepted in Mathematics Stack Exchange [8]. The book mention in-detail analysis of mac and windows memory functions [9]. The proposed technique was able to detect and classify malware with an accuracy rate of 97.01% [10]. The objective was to study the existing growth of the memory forensics framework for investigation of cases involving dark web and anonymous network and find out the existing challenges in investigation of such cases [11]. A qemu-based micro-architectural and systems simulator for x86 multicore processors [13]. The state of art manifold learning and dimension reduction technique named UMAP was used for the first time in the problem domain for better discrimination[14]. Just an analysis and memory backup process [15]. As an application of the developed method, the techniques for recovery of AOL Instant Messenger (AIM) conversation fragments from RAM dumps are also discussed in this pape r[16]. where the analysis of memory structures enables us to interactively perform forensic analysis of the machine's memory content [17]. For analysis of captured memory, we used three different memory analysis tools. The resulting comparisons can be used by investigators to select tools as per their need [18]. A rootkit was successfully detected using the Volatility Framework on memory images retrieved by AMExtractor [19]. We also provide a remedy to this entire situation and provide a carbon footprint model in this paper that optimizes the total carbon emission from this green data center [20]. The experimental results unveil that our modeling approach offers a practical way of conserving the energy consumption of virtual machines running in data centers [21]. Compared with the existing strategies, REDUX provides a comprehensive view for the energy efficiency data center and demonstrates a prominent capability of mitigating average peak workload and boosting renewable-energy utilization [22]. We validate the accuracy of this model and demonstrate the usage of this model during peak and non-peak hours in a day in terms of reducing the brown energy consumption [23]. We also deal with the cache-coherence problem to improve the accuracy of our prediction algorithms [24].

3.1 system Architecture

Windows which has X86 Architecture, Machine with Linux operating system, volatility tool installation and plugins which were built through python programming, python programming, malicious signatures source for testing purpose Sandbox to confirm the log source is malicious, Github and raw memory data from .vhad virtual addresses [13]

3.2 Virtual addressing on memory

Virtual memory was developed at a time when physical memory -- also referenced as RAM -- was expensive. Computers have a finite amount of RAM, so memory will eventually run out when multiple programs run at the same time. A system using virtual memory uses a section of the hard drive to emulate RAM [12][13]. With virtual memory, a system can load larger or multiple programs running at the same time, enabling each one to operate as if it has more space, without having to purchase more RAM.[17]

While copying virtual memory into physical memory, the OS divides memory with a fixed number of addresses into either pagefiles or swap files [12]. Each page is stored on a disk, and when the page is needed, the OS copies it from the disk to main memory and translates [13][15] the virtual addresses into real addresses. the process of swapping virtual memory to physical is rather slow. [15] This means using virtual memory generally causes a noticeable reduction in performance. Because of swapping, computers with more RAM [13] are considered to have better performance. Fig 1.1

3.3 virtual memory dump using python (vol.py)

This is the code that was used to dump the virtual memory address data and this was also part of the volatility dump [9]

self.scanners = scanners

self.scan_virtual = scan_virtual

self.show_unalloc = show_unalloc

self.use_top_down = use_top_down

self.start_offset = start_offset

self.max_length = max_length

self.address_space = addr_space

self.pool_alignment = obj.VolMagic(self.address_space).PoolAlignment.v()

def is_kernel_text(addr_space, addr):

profile = addr_space.obj_vm

text_start = profile.get_symbol("_text")

text_end = profile.get_symbol("_etext")

return (addr_space.address_compare(addr, text_start) != -1 \ and addr_space.address_compare(addr, text_end) == -1)

The is_kernel_text function can be called by plugins that want to determine whether an address is within the code segment of the kernel. [9][13] The linux_check_syscall plugin walks each system call handler and uses similar logic as the is_kernel_text function to determine [15] whether a handler is in a trusted region

3.4 Dumping listing process in memory And determining process relation using volatility.

If we see the above mentioned screenshot Fig 1.2 the volatility has analyzed the downloaded application from the explorer.exe if we see it has created a tree. Application downloaded from explorer.exe firstly it has been executed in vmtoolsd.exe and next it took the access of outlook.exe, The outlook application downloaded excel.exe and this application exicyted the cmd.exe where the malware has been uploaded and implement the remote code execution. The investigator wll start analyzing doc6.exe for more evidence. In 1.2 vol.py where the python code is executed for the data fetch of DD and virtual disk.

The figure 1.3 shows the detailed explanation about the malware detection there was a malicious process id called pid1572 and it has been downloaded from explorer.exe and the pid is 1608. Once it is downloaded from explorer.exe that file takes access for outlook.exe (4068), vmtoolsd.exe (1708) and next it is access the excel.exe 1124 and the most malicious thing. the file downloaded from the explorer.exe has been accessing the cmd.exe 4056 and accessing the doc6.exe

4.Detecting Malware and Highlating Malware Hash

4.1 Successful execution of malware And highlighting malicious DLL

This Is the Main part in this project where by executing the implemented algorithm and program it is successfully detecting the malicious signatures and DLL (Functions) and they are highlated in the Red Mark this behaviour saves lot of time for investigator. And save there machines from any malicious document importing.

The volatility is used for the fetch result from memory to check the malicious signatures in memory, but it doesn’t show which one is malicious and which one is the safe result. It just brings the result to the screen. We have to check each one of them, so this experiment will perform, and it will execute the malicious result will be shown in red colour which would easy for the malware investigator to take that signature and analyze and they also will be safe to not execute those malicious signatures in their machine the main important tool used was volatility and python programming, assembly language where there is a python program vol.py which has been modified by implanting N-GRAM hash algorithm by using latexify package once the algorithm is modified and the program will be executed than the malicious application or signatures will be detected.

Acknowledgment

This paper was performed under the guidance of professor DR Tathagata Bhattacharya, tool used volatility, python. Auburn university at Montgomery has been sponsored for the log source, which is needed for this experiment.

*** Compliance with ethical Standard**

This article complies with the ethical standard

***Competing Interest**

Authors of this research paper has no competing interest

***Research data Policy and Data Availability Statement***

Data is available upon request

Case, Andrew, and Golden G. Richard III. "Memory forensics: The path forward." Digital Investigation 20 (2017): 23–33.
Ligh, Michael Hale, et al. The art of memory forensics: detecting malware and threats in windows, linux, and Mac memory. John Wiley & Sons, 2014.
Sharif, Monirul, et al. "Eureka: A framework for enabling static malware analysis.
Ernst, Michael D. "Static and dynamic analysis: Synergy and duality." (2003).
Kendall, Kris, and Chad McMillan. "Practical malware analysis." Black Hat Conference, USA. 2007.
Mistry, Nilay R., and M. S. Dahiya. "Signature based volatile memory forensics: a detection based approach for analyzing sophisticated cyber attacks." International Journal of Information Technology 11.3 (2019): 583–589.
Wressnegger, Christian, et al. "A close look on n-grams in intrusion detection: anomaly detection vs. classification." Proceedings of the 2013 ACM workshop on Artificial intelligence and security. 2013.
Ma, Suyu, et al. "Latexify Math: Mathematical Formula Markup Revision to Assist Collaborative Editing in Math Q&A Sites." Proceedings of the ACM on Human-Computer Interaction 5.CSCW2 (2021): 1–24.
Monnappa, K. A. Learning Malware Analysis: Explore the concepts, tools, and techniques to analyze and investigate Windows malware. Packt Publishing Ltd, 2018.
Shah, Syed Shakir Hameed, et al. "Memory Forensics-Based Malware Detection Using Computer Vision and Machine Learning." Electronics 11.16 (2022): 2579.
Chetry, Arjun, and Uzzal Sharma. "Memory forensics analysis for investigation of online crime-a review." 2019 6th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, 2019.
Osbourne, Grant. "Memory forensics: Review of acquisition and analysis techniques." (2013)
Patel, Avadh, Furat Afram, and Kanad Ghose. "Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors." 1st International Qemu Users’ Forum. 2011.
Bozkir, Ahmet Selman, et al. "Catch them alive: A malware detection approach through memory forensics, manifold learning and computer vision." Computers & Security 103 (2021): 102166.
Vidas, Timothy. "The acquisition and analysis of random access memory." Journal of Digital Forensic Practice 1.4 (2007): 315–323.
Gladyshev, Pavel, and Afrah Almansoori. "Reliable acquisition of RAM dumps from Intel-based Apple Mac computers over FireWire." International Conference on Digital Forensics and Cyber Crime. Springer, Berlin, Heidelberg, 2010.
Wang, Jiang, et al. "Firmware-assisted memory acquisition and analysis tools for digital forensics." 2011 Sixth IEEE International Workshop on Systematic Approaches to Digital Forensic Engineering. IEEE, 2011.
Ahmed, Waqas, and Baber Aslam. "A comparison of windows physical memory acquisition tools." MILCOM 2015–2015 IEEE Military Communications Conference. IEEE, 2015.
Yang, Haiyu, et al. "A tool for volatile memory acquisition from Android devices." IFIP International Conference on Digital Forensics. Springer, Cham, 2016.
Bhattacharya, Tathagata, et al. "Capping carbon emission from green data centers." International Journal of Energy and Environmental Engineering (2022): 1–15.
Mao, Jianzhou, et al. "Modeling energy consumption of virtual machines in dvfs-enabled cloud data centers." 2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC). IEEE, 2020.
Peng, Xiaopu, et al. "Exploiting Renewable Energy and UPS Systems to Reduce Power Consumption in Data Centers." Big Data Research 27 (2022): 100306.
Bhattacharya, Tathagata, and Xiao Qin. "Modeling Energy Efficiency of Future Green Data centers." 2020 11th International Green and Sustainable Computing Workshops (IGSC). IEEE, 2020.
Bhattacharya, Tathagata, et al. "Performance modeling for I/O-intensive applications on virtual machines." Concurrency and Computation: Practice and Experience 34.10 (2022): e6823.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Memory Malware Analysis: Detecting Malicious Signatures In Memory By VolatilityPlugin’s

Status:

Version 1

Abstract

Figures

1. Introduction

2. Literature Survey

3. Methodology

4. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1