Today world has seen an enormous surge in computing traffic [20] [21[22] [23] [24]. With this hike in computing power,] more and more cyber-attacks are affecting corporate and government networks and sometimes even the IT systems underlying the critical infrastructure. These attacks raise major concerns from the law enforcement standpoint. Owing to the borderless nature of cyber-attacks, many criminals/offenders have been able to walk away due to the lack of supporting evidence to convict them. In this context, cyber forensics plays a major role by providing scientifically proven methods to gather, process, interpret, and use digital evidence to bring a conclusive description of cybercrime activities [1]. In the commercial software market flooded by security products, the development of forensics IT solutions for law enforcement has been limited. Though outstanding results have been achieved for forensically sound evidence gathering, little has been done on the analysis of the acquired evidence [2]. This is particularly true for volatile evidence sources such as physical memory and cache which is mainly due to the volatile and unstable nature of data which is residing on these media.
One saying that always stuck with me is ‘malware can hide, but it must run’. Malware will try and hide in places a user may not navigate, however when it runs there will be a process running on the device for the malware. So, based on this the easiest way to look for malware is to take a close look at the running processes on a device. [2][9]
There are two type of Analysis in the malware investigation static malware analysis and dynamic malware analysis after each every malware investigation there are few malicious signatures, hash and DLL’s stored in the memory like RAM once the target machine is turned on by forensic investigator those malware may be self-destroyed so the evidence will be lost or if this kind of malicious data store in the memory [15] so that’s the reason for memory forensics analysis Whenever investigator dump victim machine memory data to his/her system to analyze malicious data and to collect evidence if that harmful data infect the investigator machine than lot of other cases evidence will also be lost to prevent this kind of incidents this experiment will be good source because while memory investigation if any this kind of data triage it will pop in red color to prevent the investigator machine and secure the evidence to implement this we used tool called volatility it is an open source for memory forensics it was built using python programming language and plugins .[1][2]
1.1 Stattic Malware Analysis
Analysis that has been performed before it infects the target machine example, if a malware or application tries to infect the User machine but Antivirus or SOC Team block the trojan, after the block, the Incident response team start the investigation about this malware, the source and the Ip that it is communicating with And what are the files it is trying to import and export once after the Analysis they will take action not to repeat this kind of attack But sometimes it is also safe to perform basic Analysis on memory with this Analysis.[4] Example DLL Analysis by dependency walker, monitoring running application through process monitor. [4][5]
Advanced static analysis—also known as code analysis—dissects the binary file to study each component, still without executing it. One method is to reverse engineer the code using a disassembler [4]. Machine code is translated into assembly code, which is readable and understandable [4][5]. By looking at the assembly instructions, an analyst can tell what the program is meant to do. A file’s headers, functions, and strings can provide important details. Unfortunately, modern hackers are adept at evading this technique. By embedding certain syntax errors into their code, they can misdirect disassemblers and ensure the malicious code still runs. Because static malware analysis can be more easily foiled, dynamic malware analysis is also necessary
1.2 Dynamic Malware Analysis
Analysis after the attack, the malware investigator performs reverse engineering deep analysis of malicious signatures that they found as evidence [4][5]. Once they clearly complete both Analyses, they will start the memory forensics to see if there any malicious hashes still store in memory example, Analysis from 0X000000000 to 0X7FFFFFFF And 0X7FFDF000 [4][17]. This is the Analysis it will be performed once the malware has been stored or executed in the target machine or network in a simple way. It can be done through volatility plugins. Dynamic analysis tracks the program’s behavior looking for any signs of potentially malicious intent. This process may include analysis of any changes it makes within the registry, any writes it makes to memory, and any calls it makes to servers using APIs [4][5]. Supplementary network analysis can also uncover useful data concerning the type and quantity of data the suspicious program leaks, and potentially, the specifics of its remote command and control structure [4] [5] [9]. While employing a dynamic analysis approach generally results in a higher detection rate than simple static analysis, increasingly sophisticated malware authors have developed malware that is purpose-built to defeat dynamic analysis methods [2][9].
1.3 Memory forensics Volatility
After the static and dynamic Malware Analysis to verify the malicious hashes in the memory, the investigator performs memory forensics to check the malicious hashes in. had and memory performance by using a tool called volatility and the programming language [1][2]. When the investigator conducts the analysis, the volatility uses to fetch the malicious logs and stored memory to comment the fetched data will be clean, no message will be triaged by the tool, and investigator has to identify the negative and good processes volatility just brings the data that investigator asks to fetch it won't show difference between good hash and wrong hash by this it will be dangerous to the investigator. [9]
1.4 N-GRAM Hash for Malware Investigation
N-grams have long been used as features for classification problems, and their distribution often allows selection of the top-k occurring n-grams as a reliable first-pass to feature selection. However, this top-k selection can be a performance bottleneck, especially when dealing with massive item sets and corpora [7]. In this work we introduce Hash-Grams, an approach to perform top-k feature mining for classification problems. We show that the Hash-Gram approach can be up to three orders of magnitude faster than exact top-k selection algorithms. Using a malware corpus of over 2 TB in size, we show how Hash-Grams retain comparable classification accuracy, while dramatically reducing computational requirements.
Recent paintings have shown that byte n-grams frequently analyze low entropy capabilities, which include function imports and strings, which has brought into query whether byte n-grams can research information similar to better entropy degrees, including binary code [7]. We investigate that hypothesis in this work by performing byte n-gram analysis on only specific sub-sections of the binary file and compare to results obtained by n-gram analysis on assembly code generated from disassembled binaries [7]. We do this by leveraging the change in model performance and ensembles to glean insights about the data. In doing so, we discover that byte n-grams can learn from the code regions but do not necessarily learn any new information. We also find that assembly n-grams may not be as effective as previously thought. Disambiguating instructions by their binary opcode, an approach not once used for malware detection, is critical for model generalization. [1][2]
1.5 Executing algorithm in Python programming and implement in the volatility plugin
In this we are using the package called latexify.jl The package allows for apexification of a many different kinds of Julia object and gives output on many markdown’s and [7] supported input strings on this package are expressions, strings, numbers, missing, symbols, symbolic expression from sentinel data frame from dataframe.jl
Example on modifying the algorithm.
[13k 130]
using Formatting
latexify([12893.1 1.328e2]; fmt=x->format(round(x, sigdigits=2), autoscale=:metric))
“x/(2*k_1+x^2)”
str = "x/(2*k_1+x^2)"
latexify(str)
1.2ε+04
latexify(12345.678; fmt="%.1e")
1.6 python program header used for volatility
In the given header every header is useful for the different analysis of memory by seeing this header’s we can prapose that in this paper we are going to analyze and debug registery heap and random access memory And most important in this paper we also mentioned about the virtual memory because that is the place where lot of evidence is stored. About the vitual memory address it is mentioned in the 2nd section [1][2]
import textwrap
import volatility.conf as conf
config = conf.ConfObject()
import volatility.constants as constants
import volatility.registry as registry
import volatility.exceptions as exceptions
import volatility.obj as obj
import volatility.debug as debug
import volatility.addrspace as addrspace
import volatility.commands as commands
import volatility.scan as scan
1.7 Uniqueness
Volatility performs memory forensics to fetch the result from the virtual memory, and after bringing, the result will be plain text. The investigator should find the unnecessary malicious data and start the analysis. The fetched data will be a huge volume, and it takes a lot of time for the investigator to analyze every log and find out which one is malicious or not. If the investigator misses any malicious files and that kind of file will be executed in the investigator's machine, and a lot of evidence will be missing to stop this kind of attack. This experiment will be very helpful to the forensics investigator in this we detect the malicious hash using the N-GRAM hash algorithm, or if any of the processes consume more time than that process request will triage in red colour and the investigator will easily find out the malicious signature, and they will be secure will executing this kind of attack in their machines.