In order to optimize and assist in the present question, a web-based application (systems designed for use through a web-browser) was developed. For that purpose, the Streamlit v0.581[5] framework was used, which is a set of classes developed based on Python v3.8.22 [6], and it was chosen due to its characteristics; this framework is aimed at the development of applications focused on analysis and exploration of data. Its use is also interesting given that the creation time of the application is short when compared to other methods, and it is customizable enough to meet the needs of the targeted application.
The application operating process was designed to be as simple and automatic as possible, just importing the data and exporting results; all intermediate steps are realized automatically. To guarantee the correct functioning of the application, the data must be contained in a xls file (Microsoft Excel Spreadsheets), without labels in the columns (only numerical data are necessary) and the information must respect the following order: time, blood pressure and pulse interval (PI) (it is possible to adjust for other types of data; this sequence applies for this specific case described here).
The application consists of a single sliding screen with all available options. First, the number of rows and columns contained in the imported file is displayed; after that, two tables display the first five lines of data and descriptive information about each column, such as mean, standard deviation (std), minimum (min), maximum (max) and quantiles. Finally, two graphs are presented, representing the PI and the BP over time. This exploratory analysis of the data allows to verify if the imported file is the correct one and the characteristics of the data. For this step, the Pandas v.1.0.3 package [7] was used.
After these steps, the application automatically suggests cutoff points to minimize outliers according to the distribution of the variable; by default, the algorithm suggests the use of percentile 1 and 99 (z-score − 2.32 and 2.32 of a normal distribution, percentile function available in the Numpy v1.18.4 package [8]), but it is also possible to customize the points, increasing or decreasing, according to the criteria established by the user. With the defined cut-off points, data processing occurs automatically without the need of any previous command. A table is displayed with the new data characteristics, post processing (mean, std, min, max, quantiles) and two overlapping graphs, with the original and the post-processing data, allowing the visualization of the changes. If necessary, it is possible to return to the step of defining cleaning point values and changes them, and the processing will automatically be redone, generating new tables and graphs, so this process can be repeated as many times as necessary.
The operating logic of the algorithm that performs the data processing is:
-
If the value of the analyzed point is greater than the maximum value defined for processing, this value will be replaced by the mean of the two preceding points.
-
If the value of the analyzed point is less than the minimum value defined for processing, this value will be replaced by the mean of the two following points.
For this, the algorithm traverses the vector of points from a loop i according to the range of the data, ranging from i + 1 to i -2, according to the logic below:
This logic applied to minimize the extreme points was the same established to perform manual corrections. The last step is the export of the processed data, which is performed by a specific function. The user only needs to click on download for the processed data to be stored on the local disk.
The application can be accessed for free at: https://signalproc.herokuapp.com/. Other information, such as the codes used in the development, data file for tests and more information are available on GitHub (https://github.com/gbazo/signalproc).