4.1.1 Application Implementation
To run this web application monitoring, server or computer need the minimum and optimum hardware and software as follows:
minimum hardware specification can run web server monitoring that monitors about 35 web applications in 15 to 25 seconds. But if there is a slow response from webapp client, we set the timeout to 20 seconds. Hence, we need about 35 to 45 seconds (include timeout) to run each loop. For the application to run smoothly, we have a threshold to finish each loop below 60 seconds, because the application hits webapp client server every 60 seconds. Therefore, we need a minimum hardware specifications as shown in Table 4.
Table 4 Hardware Specification
RAM
|
4 GB
|
Memory
|
100 MB
|
CPU
|
4 x 2.20
|
Minimum and optimum software specification is show on Table 5. The reason Prometheus and Grafana need a correct version is on the Grafana documentation, it states that the scripted dashboard that needed to make a graph so web application server monitoring can show that graph is deprecated and will be removed in the next release of Grafana. For that specific reason, we use Grafana v6.0.2 which the scripted dashboard has not yet deprecated, and then we use Prometheus v2.8.0 because it is released in about the same period as Grafana v6.0.2 so minimizing if there is an incompatible version.
Table 5 Software Specification
Database
|
SQL SERVER 2012
|
Javamelody
|
> v 1.77.0
|
Prometheus
|
v 2.8.0
|
Grafana
|
v 6.0.2
|
1.1.2 Application Guidelines
Here is the guideline to run the app. On the home page, the section at the top contains text like current Server Monitor and Server Status, which is Maintenance, Up and Down, with total count of the web server that is related for each of that status. The explanation of each status is as follows:
- Maintenance Status which has a gray color indicates that the webapp server is under maintenance.
- Up Status which has a green color indicates that the webapp server is up and can be accessed by the users.
- Down Status which has a red color indicates that the webapp server is down and cannot be accessed by the users. However, this could also happen if the web server monitoring did not get a response from the web app server in about 20 seconds (timeout).
- Danger Status which has a yellow color indicates that the resource consumption of the web app server is above the limit that has been set.
Figure 12 below is the list of the web app server with their status. If we click one of the servers, system will redirect us to the server detail page.
If there is no connection, system will not show anything but a text “There is no internet connection “will be shown as mentioned on Figure 13.
Server detail page will be open when the user clicks one of the servers on the home page. Here, it has the name of the webapp in the top middle section. In the top right section, it has a back button that will redirect us back to the home. Below as shown on Figure 14, the name section, it has an input section to time filter period.
In addition, this page also contains 6 graphs that represent resource usage of the web app server. Below is the explanation of the 6 graph.
1. Active Threads
Active Threads represent processes that are running in the webapp server.
2. System CPU
System CPU represents CPU usage of the server in percentage metrics. The higher the usage indicates that we need to either improve the code and query to minimize the CPU usage, or upgrade our hardware.
3. Physical Memory
Physical Memory represent memory usage by OS (Operating System)
4. Used Memory
Used Memory represent RAM (Random Access Memory) usage in the server.
5. Garbage Collector
Garbage Collector is a process to remove unused objects. In Java programming language, Garbage Collector runs automatically until the app is closed. This Graph is important to know if there is a memory leak that can lead to an error out of memory.
6. Free Disk Space
Free Disk Space is the remaining space in the server disk. The lower the graph gets, indicates that we need to remove files that are not used like log or move the file to another disk.
We can show graph data in our desired period by input date, hour and minutes in From and End section and then click the button apply. Here is an example of the graph in the range of 30 minutes as shown on Figure 15.
Metrics Page (API)
This page providing data about metrics so Prometheus can read this API and make data that is readable by Grafana.
System CPU and garbage collector unit measure in percentage, while used memory, physical memory and free disk space unit measure in a byte. For active thread, it will measure with its actual usage thread that in the process when the data is received. After Prometheus reads the API, then Grafana will use the API from Prometheus to create a graph that can be displayed in web app server monitoring. Below is an example of API that Prometheus read.
# HELP CSFMOSS_web_P_last_value_system_cpu_load CSFMOSS_web_P value per minute
# TYPE CSFMOSS_web_P_last_value_system_cpu_load gauge
CSFMOSS_web_P_last_value_system_cpu_load 0.0
# HELP CSFMOSS_web_P_last_value_used_memory CSFMOSS_web_P value per minute
# TYPE CSFMOSS_web_P_last_value_used_memory gauge
CSFMOSS_web_P_last_value_used_memory 0.0
# HELP CSFMOSS_web_P_last_value_gc CSFMOSS_web_P value per minute
# TYPE CSFMOSS_web_P_last_value_gc gauge
CSFMOSS_web_P_last_value_gc 0.0
# HELP CSFMOSS_web_P_last_value_active_threads CSFMOSS_web_P value per minute
# TYPE CSFMOSS_web_P_last_value_active_threads gauge
CSFMOSS_web_P_last_value_active_threads 0.0
# HELP CSFMOSS_web_P_last_value_used_physical_memory_size CSFMOSS_web_P value per minute
# TYPE CSFMOSS_web_P_last_value_used_physical_memory_size gauge
CSFMOSS_web_P_last_value_used_physical_memory_size 0.0
# HELP CSFMOSS_web_P_last_value_free_disk_space CSFMOSS_web_P value per minute
# TYPE CSFMOSS_web_P_last_value_free_disk_space gauge
CSFMOSS_web_P_last_value_free_disk_space 0.0
# HELP CSFMOSS_ws_P_last_value_system_cpu_load CSFMOSS_ws_P value per minute
# TYPE CSFMOSS_ws_P_last_value_system_cpu_load gauge
CSFMOSS_ws_P_last_value_system_cpu_load 17.250841529191895
# HELP CSFMOSS_ws_P_last_value_used_memory CSFMOSS_ws_P value per minute
# TYPE CSFMOSS_ws_P_last_value_used_memory gauge
CSFMOSS_ws_P_last_value_used_memory 1.477268976E9
# HELP CSFMOSS_ws_P_last_value_gc CSFMOSS_ws_P value per minute
# TYPE CSFMOSS_ws_P_last_value_gc gauge
CSFMOSS_ws_P_last_value_gc 0.0
# HELP CSFMOSS_ws_P_last_value_active_threads CSFMOSS_ws_P value per minute
# TYPE CSFMOSS_ws_P_last_value_active_threads gauge
CSFMOSS_ws_P_last_value_active_threads 2.0
# HELP CSFMOSS_ws_P_last_value_used_physical_memory_size CSFMOSS_ws_P value per minute
# TYPE CSFMOSS_ws_P_last_value_used_physical_memory_size gauge
CSFMOSS_ws_P_last_value_used_physical_memory_size 2.4339001344E10
# HELP CSFMOSS_ws_P_last_value_free_disk_space CSFMOSS_ws_P value per minute
# TYPE CSFMOSS_ws_P_last_value_free_disk_space gauge
CSFMOSS_ws_P_last_value_free_disk_space 7.6714299392E10
List Server Addition
If the user wants to add or remove web app server they can change JSON data that can be read by the app. This method is preferable because it is easier to understand and modify.
To add or remove the server on the home page user can add or remove object data in the JSON file with format as follows.
{
"Server": {
"name" : "xxx", (name that will be displayed on the web app server monitor)
"url" : "https://www.xxx.com/ monitoring", (app Url that will be monitored and has javamelody applied)
"servertype" : "webapp", (Can be choose between webapp / services)
"username": "username", (username to access javamelody from the given url)
"password": "password", (password to access javamelody from the given url)
"sqlmean": "6500", (average time (in miliseconds), which if there is a sql query that runs above the limit of the sqlmean, then the system will record it in the database)
"sqlmax": "7500", (max time (in miliseconds), which if there is a sql query that runs above the sqlmax, then the system will record it in the database)
"alertThreads": "10", (Maximum thread in which if there is thread usage that exceeds the limit of activeThreads, then the web app status will change color to yellow or danger)
"alertFreedisk": "150000000000", (If the current free disk (in bytes) is below the AlertFreeDiskValue, then the web app status will change color to yellow or danger)
"alertGcmax": "5", (Maximum GC (in percentage). if GC usage volume exceeds alertGcmax, then the web app status will change color to yellow or danger)
"alertGcmean": "3", (average GC (in percentage), if GC average usage exceeds alertGcmean value, then the web app status will change color to yellow or danger)
"alertSystemCPU": "50", (Maximum SystemCPU (in percentage), if SystemCPU usage exceeds alertSystemCPU value, then the web app status will change color to yellow or danger)
"alertPhysicalMemory": "16000000000", (Maximum PhysicalMemory (in bytes), if PhysicalMemory usage exceeds alertPhysicalMemory value, then the web app status will change color to yellow or danger)
"alerttime": "5", (if there is a web app server that exceeds n number times alert in a row, then the system will send notification / email to the production team to check the server condition)
"alertDowntime": "120", (If email had been sent to the receiver and even if it exceeds n number times alert, then the email will not be sent, except if the period from the last email sent to current time has exceeded AlertDownTime (in minutes). This will minimize spam email to the team if the server is continuously down and exceed the alert limit)
"maintenancestart": "00.00", (if the current time is in the range of maintenancestart and maintenanceend, then the web app status will change color to grey or maintenance)
"maintenanceend": "00.00"
}
}
Email Alert
This feature is used to notify the production team developer if there is a server that down or resource usage is past the limit that had already been set on JSON data that can be seen above.
1. Server Down Alert
An Email will be sent to the developer when the web app server cannot be accessed by server monitoring or web application got a timeout when hits webapp client as shown on Figure 16.
2. Query Alert
An Email will be sent to the developer when there are queries that past mean or max time that already had been set on JSON as shown on Figure 17.
3. Alert Free Disk Space
An Email as shown on Figure 18 will be sent to the developer when the free disk space is below the parameter that had been set on JSON.
Information
Alert GC Mean, GC Max, Physical Memory, System CPU, Threads have the same format as Alert Free Disk Space. The difference is free disk below the parameter then application sent an email to developer, but for anything else beside the free disk, if usage resources above parameter then application sent email to the developer.
Database
In the database there are a total of 6 tables. Name and description of each table can be read as follows.
This is used to send emails to developers if there is a webapp server that experiences down or resource usage that exceeds the limit that had been set.
This table has 4 columns. which is:
1. Emailid
Emailid is the primary key.
2. Email_to
Email_to is what email address of the developer that will receive the email alert. If there is more than 1 email address that wants to be sent. Then use semicolon (;) as a separator. Example if want to send email to 2 email address: [email protected];[email protected]
3. Email_cc
Email_cc is the email address that will get alert email for carbon copy. If there is more than 1 email address that wants to get alert email for carbon copy. Then can use semicolon (;) as a separator. Example to send email to two recipients to get the carbon copy: [email protected];[email protected]
4. Is_active
To mark if the email_to or email_cc is still active. If not active, email will not be sent to the recepient.
PathData Table has 2 columns, which is:
1. Directory
Path in which JSON or Image file will be read by the application.
2. FileType
Inside this only has 2 data, which are json and img.
This table contains 8 columns which are stored a Query that running past the max or mean limit, each column can be read below:
- QueryID (PK)
- ClientCode
- URL
- QueryString: Heavy Query that is running on client webapp.
- MeanTime: Average time needed to execute QueryString above.
- MaxTime: Maximum time needed to execute QueryString above.
- FlagQuery: Value only have 1 or 0 where 1 is mean that query using operator LIKE
- TimeStamp: Time when QueryString is stored on the database.
This table contains 7 columns which is used to store webapp client usage resources every minute, each column can be read below:
- ID (PK)
- ClientCode
- Max_Memory: Memory usage on webapp client when data is taken.
- Max_Thread: Thread usage on webapp client when data is taken.
- Max_TPM: Sum transaction per minute on webapp client when data is taken.
- Time_Stamp: Datetime when data is taken.
- Max_GC: Garbage Collector usage on webapp client when data is taken.
In RV_ReportSummary Table, it contains daily summary of server usage resources from every webapp client server. Data from here is created by stored procedure at every start of the month. There are 8 columns of this table which is time data collected, name of the server, avg memory, max memory, max thread, avg TPM, max TPM, max GC on each day for each client.
In Utilities Table there are 4 columns as follows:
1. ID
Primary key.
2. Name
There are 7 data in this column which is:
- Sql_start_check: Start hour for application begin to check sql query and send alert if there is a query that passes mean or max limit.
- Sql_pause_check: interval in an hour between each check.
- Sql_end_check: End hour for application to end check sql query.
- Timeout: ping duration to the webapp server (ms).
- Send_sql_email: 1 or 0 (1 means application will send email to the developer if there is query that exceeds the limit or using operator LIKE)
- Send_email_down: 1 or 0 (1 means application will send email to developer if there is webapp client server which cannot be reached by application or got timeout response many times in a row)
- Alert_count_threshold: How much in a row webapp server got status down before sending email to the developer.
3. Value
Value of the name above. To see the example of the value, one can refer to the name section above.
4. Description
Description of the name above. To see the example of the value, one can refer to the name section above.
Table comparison evaluation between JavaMelody with web app server monitoring.
Table 6 provided the comparison data between JavaMelody that is taken manually by seeing and interpreting the value of the graph each day with web server monitoring for usage memory in Megabytes and percentage. For percentage we can use the formula as follows:
- If Manual <= Web App Monitoring, then:
- If Web App Monitoring < Manual, then:
Table 6 Comparison data between JavaMelody
Server 1 Webapp
|
18 June 2021
|
AVG Memory
|
MAX Memory
|
Max Thread
|
AVG TPM
|
MAX TPM
|
Manual
|
2000
|
3300
|
0
|
66
|
331
|
Server Monitoring
|
1795.585
|
3287.001
|
0
|
65
|
394
|
Accuracy
|
90%
|
100%
|
100%
|
98%
|
84%
|
19 June 2021
|
Manual
|
1900
|
3200
|
0
|
32
|
245
|
Server Monitoring
|
1750.147
|
3244.793
|
0
|
31
|
284
|
Accuracy
|
92%
|
99%
|
100%
|
97%
|
86%
|
20 June 2021
|
Manual
|
1800
|
3250
|
0
|
39
|
225
|
Server Monitoring
|
1743.984
|
3256.274
|
0
|
38
|
277
|
Accuracy
|
97%
|
100%
|
100%
|
97%
|
81%
|
Server 1 Webservices
|
18 June 2021
|
AVG Memory
|
MAX Memory
|
Max Thread
|
AVG TPM
|
MAX TPM
|
Manual
|
4000
|
9300
|
10
|
1200
|
4200
|
Server Monitoring
|
3816.403
|
9639.635
|
10
|
1213
|
4587
|
Accuracy
|
95%
|
96%
|
100%
|
99%
|
92%
|
19 June 2021
|
Manual
|
4000
|
10000
|
8
|
868
|
3848
|
Server Monitoring
|
3622.263
|
10166.94
|
8
|
868
|
4544
|
Accuracy
|
91%
|
98%
|
100%
|
100%
|
85%
|
20 June 2021
|
Manual
|
2000
|
3500
|
2
|
41
|
199
|
Server Monitoring
|
1911.265
|
3629.714
|
2
|
40
|
202
|
Accuracy
|
96%
|
96%
|
100%
|
98%
|
99%
|