Brno University of Technology ECG Signal Database with Annotations of P Wave (BUT PDB)

Background: Brno University of Technology ECG Signal Database with Annotations of P Wave (BUT PDB) is an ECG signal database with marked peaks of P waves created by the cardiology team at the


Introduction
Electrocardiography (ECG) is nowadays still the most available and widely used method for the cardiovascular system examination [1]. ECG signal re ects the electrical activity of the heart and provides a signi cant amount of information about the heart function [2]. Accurate detection of ECG components, such as P wave, QRS complex and T wave are fundamental steps of ECG analysis and subsequent cardiac pathological events detection. In practice, automated evaluation of ECG records using software is necessary [3]. The detection of QRS complexes and T waves is usually e cient. However, methods for P wave detection are not so successful in physiological signals and especially in pathological signals. It applies both in real practise and research [4], [5],[6], [7].
One of the reasons that prevents the progress in this eld is a lack of publicly available datasets with correct P waves annotations suitable for training and testing of detection algorithms [5], [8].
The methods are usually tested only on a part of publicly available QT database [9], [10] or on a not publicly available CSE database [11], both with manual P waves annotations. In addition to this, there are two new databases, namely MIT-BIH Arrhythmia Database P-Wave Annotations [5], [9], [12] and Lobachevsky University Electrocardiography Database [9],[6], whose use is not frequent yet. There are also two publicly available databases with P waves annotations which contain mistakes -P waves annotations of MIT-BIH Arrhythmia database by Elgendi et al. [13] and automatically annotated part of QT database [10]. Thus, these annotations cannot be recommended to be used for testing of P wave detection algorithms.
The most commonly used databases, QT database and CSE database contain predominantly physiological ECG records, or contain only those pathologies which do not affect P wave detection.
However, the content of the pathologies in databases is very important for objective testing of P wave detection algorithms. During the pathological function of the heart, the information about the positions of the P waves is very important for determining the diagnosis. Unfortunately, current algorithms are not able to detect P waves in pathological signals reliably. Therefore, we ll this gap and introduce a new database of ECG signals with manually annotated P waves. The Brno University of Technology ECG Signal Database with Annotations of P Wave (BUT PDB) database consists of 50 2-minute 2-lead ECG signals with 23 types of pathologies. The P waves positions were manually annotated by two ECG experts with 7 years of practical experiences with evaluation of holter ECG records in cardiovascular ambulance [5]. The database will help to develop new, more accurate, and robust methods for processing and analysing ECG records in the sense of P wave detection.

Selection of data
The ECG signals were selected by two ECG experts from 3 existing databases of ECG signals -from MIT-BIH Arrhythmia Database (MIT-A) [9], [14], MIT-BIH Supraventricular Arrhythmia (MIT-S) Database [9], [15] and Long Term AF Database (LT-AF) [9], [16]. All the databases contain ECG signals and annotations of positions and types of QRS complexes. The MIT-A database additionally contains annotations of types of arrhythmias present in records. The detailed information about these databases is available on Physionet [9] or in the articles [14], [15], [16].
Two ECG experts went through the records and selected interesting two-minute sections with higher incidence of pathologies during which it is usually di cult to detect P waves by automatic algorithms. The signals were chosen to represent as many types of pathologies present in records in real medical practice as possible.
The nal database consists of 50 2-minute 2-lead ECG signals with various types of pathologies. The whole database contains 23 different types of pathologies. The amount and type of pathologies is chosen to represent a real sample of data from medical practice. From MIT-A database, 38 signals were selected. From MIT-S database, 5 signals were selected. From LT-AF database, 7 signals were selected.
The BUT PDB includes 7,638 QRS complexes of which 2,120 are without P wave (e.g. atrial brillation, ventricular beats or during nodal rhythm) and 81 P waves are without QRS complex (e.g. 2nd degree atrioventricular block). Altogether, the BUT PDB includes 5,599 P waves. In the BUT PDB, 23 different types of pathologies are present. Types of pathologies with their abbreviations used in the BUT PDB, number of cases (records) and IDs of the records in which the pathology is present are listed in Table 1.

Annotation Of Data
The P waves positions were manually annotated by two ECG experts with 7 years of practical experiences with evaluation of holter ECG records in cardiovascular ambulance [18], [5]. The rst expert provided manual annotations, and the second manually checked them. Unclear parts of the records were discussed by both experts until a consensus was reached. Everything was conducted manually without the use of automated annotating software. To facilitate the work of ECG experts, a free software tool, SignalPlant [17], was used for manual marking of P waves. The experts worked independently. Unclear parts of the records were discussed by both experts until a consensus was reached. The selected records are interesting two-minute sections with higher incidence of pathologies during which it is usually di cult to detect P waves by automatic algorithms. Each record also contains annotation of dominant diagnosis (pathology) and types of QRS complexes (from the original databases) [9], [14]. The information of pathologies present in records was checked (the original annotations were found correct) and taken over from the original databases and supplemented by ECG experts in the cases where the information was missing (all signals from MIT-S). The information about types of QRS complexes is taken over from the original databases.

Results And Discussion
The nal database contains 23 different types of pathologies. The amount and type of pathologies is chosen to represent a real sample of data from medical practice.
In Table 2, information about each record from BUT PDB is listed. The information contains: IDs of the record, types of pathologies present in each record, original database, IDs of the record from the original database and the start and the end (in samples) of the selected segment. The examples of records with various pathologies (ie. 2nd atrioventricular block (rec. 01), ventricular premature contraction, ventricular tachycardia, atrial brillation (rec. 48)) and newly annotated P waves are shown in Fig. 1.
The BUT PDB is available on Physionet [9], [19]. All data are provided in the WaveForm Database (WFDB) format, which is supported by the WFDB Software Package [9]. The IDs of the recordings are numbers from 01 to 50. The ECG signals are stored in les with su x *.dat, the annotations of P waves are in stored les with su x *.pwave, the positions of QRS complexes, their types and sampling frequency of each ECG signal are stored in les with su x *.qrs. The exact types of pathologies present in each signal are described in the text le with the name README.txt.

Conclusions
BUT PDB is created for the development, testing and objective comparison of algorithms for P waves detection. For objective comparison, the prerequisite is that they will use the entire database and that they will not select and/or shorten the signals. The database includes a representative sample of pathologies present in records in real medical practice. If the algorithms will be reliably able to deal with P waves detection in all types of pathologies present in this database, they will be usable for implementation to the software for ECG signals analysis in real medical practice.

Con icts of Interest
There are no con icts of interest..