IAMonSense: multi-level handwriting classification using spatiotemporal information

Online handwriting classification has become an open research problem as it serves as a preliminary step for handwriting recognition systems and applications in several other fields. This paper aims to extend the current trends and knowledge with multiple contributions in handwriting classification using spatiotemporal information. Firstly, it enriches the annotations of several publicly available online handwriting datasets, SenseThePen, IAM-onDB, and IAMonDo, for online handwriting classification and recognition tasks. The said datasets are updated with three distinguished levels of annotations, i.e., stroke, sequence, and line levels. The enriched annotations of these datasets extend their functionality for online handwriting classification at different levels for further research analysis. In addition to enrichment, it also unifies the annotation levels across the datasets, which enables the research community to benchmark proposed methods for comparative analysis using multiple datasets. All the datasets with enriched annotations are made publicly available for the research community as part of the IAMonSense dataset. Moreover, this paper presents a comprehensive benchmark of these datasets using multiple deep neural networks such as traditional convolutional neural networks (CNNs), graph convolutional networks(GCNs), attention-based neural networks, and transformers. These benchmarks can be used later on for further development in online handwriting classification.


Introduction
Handwriting data has been the most trusted and reliable source of information preservation for ages, from writing on tree barks, stones, and animal hides to writing on papers and digital assets. Even today, many areas, such as industries, banking, healthcare, education, and forensic sciences, rely heavily on handwriting data. The booming digital era urges digitizing handwriting data for record-keeping and storing essential information securely. The manual digitization process is nearly impossible and demands efficient and accurate automation. Hence, understanding handwriting becomes a vital step in this automation process.
Handwriting data analysis methods are broadly divided into two categories. Offline systems where only spatial information is available to process handwriting data, i.e., handwritten document images, whereas in online systems, both temporal and spatial information of handwriting data is available. The handwriting data, whether online or offline, depends on writing data acquisition mechanisms such as tra-ditional paper with pen results only in offline data. Using digital tools may provide both online and offline data. The handwriting acquisition tools greatly impact the writing style and influence the outcomes where data contains both interperson and intra-personal variations. These variations make handwriting classification, regardless of online or offline modality, a task far from trivial.
Handwriting analysis is a vast field, and the research community is actively investigating multiple aspects of it, such as handwriting recognition [1][2][3], handwriting mode detection [4][5][6], writer identification [7][8][9][10], and handwriting classification [11][12][13]. An important aspect of online handwriting analysis includes processing data at a specific level, whether it is stroke level, sequence level, and/or line level. Stroke level is considered a basic unit to define the online handwritten writing data, whereas sequence is composed of single or multiple strokes to convey meaningful information. Sequences in one line define line level, providing more context to the online handwriting data. Liwicki et al. [1] present an approach for online handwriting recognition using line-level information. Many other approaches [13][14][15] used stroke-level information for text non-text classification, handwriting recognition, handwriting mode-detection, and graph or flowchart diagram detection. Younas et al. [11,12], and Bresler et al. [15] present methods for online handwriting classification problems using sequence-level information. All these methods have significant contributions in the online handwriting analysis domain, but processing data at different distinguished levels limits their scope for comparative analysis with other methods. In this paper, we seek to address this problem by enriching the annotations of publicly available datasets, i.e., SenseThePen [11], IAMonDO [16], IAM-OnDB [17] to all the three distinguish levels, i.e., stroke, sequence, and line level.
In recent years there has been growing interest in deep learning methods, such as recurrent models (long shortterm memory (LSTM) [18] and bidirectional long short-term memory (BLSTMs) [19]) due to their ability to process temporal information using feedback connections, enjoyed a great reputation for online handwriting data processing. Recent trends in deep learning methods such as the graph convolutional networks (GCNs) [20], graph attention networks (GAT) [21,22], and transformers [23] look promising for online handwriting classification. In the presented work, we performed an ablation study to explore the potential of these methods to classify handwriting data using spatiotemporal information into plain text, mathematical expressions/formulas, and drawings/graphs. Furthermore, considerable uncertainty remains concerning which distinguish level is good for the classification task. This paper also investigates the influence/impact of distinguished levels to classify handwriting data using spatiotemporal information. This work aims to broaden current knowledge of online handwriting classification to different distinguished levels and classes.
Major contributions of this work are as follows: • To the best of the author's knowledge, it is the first attempt at systematic study with different classes (text, mathematical expression, and graph/drawing) at three distinguish levels (stroke, word, and line) to classify online handwriting. • Extending the functionality of publicly available datasets, SenseThePen [11], IAMonDO [16], and IAM-OnDB [17] by enriching their annotations at three distinguish levels, strokes, sequences, and lines. Enriched datasets SenseThePen+, IAMonDO+, and IAM-onDB+ are presented as part of the IAMonSense dataset. • A novel data interpolation method is presented to address the variable length of sequences without losing or adding meaningful information. • A benchmark is presented with state-of-the-art results using various deep learning methods.

Related work
The known history of handwriting classification tracks back to the early 20 th century when police departments from different countries started using handwriting templates of individuals as bio-metric tools [24,25]. Lately, towards the end of 20 th century, two different studies were conducted by Tappert et al. [26] and Plamondon et al. [27] for comprehensive and systematic analysis of developments for handwriting recognition. These studies also distinguished offline and online handwriting systems. Recent works in this field focus on handwriting recognition, whether offline or online [14,15,28,29]. There are existing methods to classify online handwriting data, every system and dataset has its standards, and these methods are not directly comparable to other methods, even using the same dataset. Thus, we aim to widen the current knowledge space to different levels and classes in the online handwriting classification domain. Younas et al. [12] presented a method for online handwritten sequence classification into text, math, and plot/graph. They used a simple sliding window method to segment handwritten data, followed by feature engineering to transform segmented sequences into feature vectors. The extracted feature vectors are used for evaluation over several machinelearning methods. They also look into the impact of using context information for classification results. In another study, Younas et al. [11] present the online handwriting classification dataset SenseThePen to classify handwritten sequences into several classes, such as text, mathematical expression, and graphs. They also present a comprehensive 49-feature set for online handwriting classification. Further-more, an ablation study on the efficacy of the proposed feature set in comparison to existing feature sets using several machine learning and deep learning classifiers is presented in the study. In this study, we extend the SenseThePen dataset functionality, and both methods are considered for comparison.
Liwicki et al. [17] presented a publicly available dataset IAM-onDB comprising online handwritten sentences acquired via an electronic interface on a whiteboard. The IAM-onDB dataset has stoke and line-level annotations, but sequence-level annotations were missing. The dataset consisted of online handwritten sequences of English text collected from 200 writers. The research community widely uses the IAM-onDB dataset for handwriting recognition, gender identification, and writer identification problem. Liwicki et al. [19] presented one of the first deep learning methods for online handwriting recognition using the recurrent neural network variant BLSTM. The presented method is evaluated using the IAM-onDB dataset. Carbune et al. [30] presented state-of-the-art results on the IAm-onDB dataset using a combination of finite state machines and LSTM networks to speed up the recognition process by reducing the waiting time up to 92.88%. Nguyen et al. [31] presented an online handwritten text recognition method using a finite state machine in combination with a recurrent neural network. Shivram et al. [32] used the IAM-onDB dataset for writer identification using hierarchical Bayesian models based on their online handwriting data. In another method, Shivram et al. [33] presented a comparative study for online writer identification to evaluate the influence of writing style (memetic factors) and an individual's handwriting features. Ahmed et al. [34] used the IAM-onDB dataset to implement an ensemble-based classifier system to predict the gender of the writer based on his handwriting data. The other method for gender classification using the IAM-onDB dataset was presented by Illouz et al. [35]. The proposed methodology used traditional convolutional neural networks and produced better results compared to human evaluation.
Indermühle et al. [16] presented a new large online handwriting database, namely IAMonDO dataset. The IAMonDO dataset is widely adopted by the online handwriting analysis community and is considered a benchmark for various problems in the domain. The IAMonDO dataset contains the annotations for text, diagrams, drawings, formulas, lists, tables, and markings, but at only the stroke level. IAmonDO dataset is used for online handwritten stroke classification, mode detection, and text/non-text classification. Weber et al. [6] featured an engineering method to classify online handwritten ink traces into text and graphics. Delaye et al. [13] presented a method based on conditional random fields to discriminate textual from non-textual ink strokes for unconstrained online handwritten documents. Phan et al. [4] proposed an approach using LSTM and recurrent neu-ral network for binary classification on online handwritten strokes into text and non-text classes. Ye et al. [36] presented an approach for text/non-text stroke classification in online handwritten documents. Using conditional random fields extends the functionality of the proposed approach to incorporate contextual information in the classification process. Ye et al. [5] proposed another method to group the strokes of the same class using an edge graph attention network. The proposed methodology uses the spatiotemporal relationship of strokes to build the graph and various attention mechanisms for information aggregation between the nodes. Graph attention networks produce impressive results for text/non-text and multi-class classifications. A recent approach using the IAMonDO dataset is presented by Degtyarenko et al. [37] for online handwritten stroke classification using a hierarchical recurrent neural network. The proposed approach gives stateof-the-art results for the multi-class classification of online handwritten strokes into text, tables, formulas, drawings, and lists.
Ott et al. [38] presented a set of new datasets for online handwriting processing, such as handwriting recognition (character recognition), mathematical expression recognition (digit recognition), and sequence classification (word recognition). These datasets preserve the natural writing behavior of the writers as the data are collected using a sensor pen with normal/traditional paper to write on. The proposed datasets can be used for various tasks in the online handwriting domain. However, these datasets are unsuitable for the problem at hand.
This work focuses on the classification of online handwriting data using spatiotemporal information using Sense ThePen [11], IAMonDO [16], IAM-OnDB [17] datasets, to all the three distinguish levels, i.e., stroke, sequence, and line level. We consider only those methods for comparison which address the problem of online handwriting classification using either of these datasets at any distinguish level.

Methods
In this work, we evaluate the efficacy of traditional neural networks, graph neural networks, and attention-based neural networks for classifying online handwritten data at three different distinguish levels. This section covers the details of data processing to train these multiple various networks, a novel data interpolation method to address the problem of variable length input data, data augmentation methods for addressal of class imbalance in the datasets, processing the online handwritten data into graph representation for graph neural networks, and brief introduction of networks used in this study.

Data interpolation
For traditional convolutional neural networks (CNNs), efficient handling of varying lengths of input sequences is a real challenge. Padding is the commonly used technique to address the varying length input sequences. However, here we introduce novel techniques for varying length sequences using median information and then enlarge or reduce the given sequence without losing meaningful information. Data interpolation for a given data, i.e., stroke, sequence, or line, A i is an ordered set of points defined as We calculate the median of given data med() to define the input data length and cardinality of A i given as: We process the input sequence to median length using enlarge and reduce functions defined in Eq. 1 The enlarging function enlarge(A i ) will add a new point between two points in A i recursively if the cardinality of a set of points z i = |A i | is lesser than the median. Similarly, the reducing function reduce(A i ) will recursively remove the points from A i if the cardinality z i = |A i | is greater than the median. These functions help keep the data representation and processing to the same length. Figure 1 shows the interpolation results at different distinguish levels.

Data augmentation
Data augmentation is a common method to address the class imbalance in the dataset for minority classes, preserving the original data style. Here, we use both over-sampling for minority-class data and under-sampling techniques for majority-class data to generate the samples for the minority class. We use a similar approach for data augmentation as Hamdi et al. [39]. We use the frequency method with low-pass filtering and the rotation method. A combination of low-pass filtering and frequency methods enables the adaptation in frequency and spatial domains. We modified the high harmonic attenuation and accentuation to attenuate the displacement variation of x and y. The goal is to generate smoothed online handwriting with attenuate high frequency using the high harmonic amplification represented by Eq. 2.
where H (n) is defined as G is the amplification gain factor with the range G min = 0.5 until G max = 5.0. Some results of data augmentation using Fourier transform with a low-pass filter are shown in Fig. 2.
We also combine the rotation with the Fourier method to further augment the data using multiple gain factors and rotation degrees, as shown in Fig. 3.

Graph data representation
We need the input data in graph representation to train the graph neural networks. A graph structure is defined as G = (V , E), containing the set of nodes V and edges E. An edge (u, v) ∈ E connects two nodes u and v, which shows the relation between them. As we have data at three distinguished levels, we start building the graph from the lowest level (stroke node) and then move toward higher levels (line node), as shown in Fig. 4. The basic node contains the features h i , using ( x, y) for each basic node. We add the edge feature e i using the time-frame feature t in the edge of the graph representation. Examples of the data and their corresponding graph representations are shown in Figs. 5 and 6.

Deep learning methods
We divide the deep learning methods used in this study into three categories: conventional deep neural networks, graph neural networks, and transformers.

Conventional deep neural networks
We used three different variants of conventional deep learning models; 1D-CNN [40], LSTM [18], and BLSTM [19]. They can be combined to extract refined features, particularly when data is Spatiotemporal. We used the OS-CNN variant of 1D-CNN, which is ideal for time series data. The 1D-CNN model, initially, is composed of a single 1D convolutional layer followed by a global max-pooling layer with a dense layer. Then, using the fully connected layer to predict the output. In the LSTM model, we use an LSTM layer with 128 hidden units, then a dropout layer is added with a value of 0.2, followed by a second layer of LSTM with 64 hidden units and a dropout layer. At the end of network, we used two fully connected layers to predict the classification outcome. BLSTM models also share the same architecture. We also    implement the combination of 1D-CNN models with both LSTM and BLSTM models, replicating the same architectures as mentioned earlier.

Graph neural network
We used three graph neural network architectures for this research work, graph convolutional networks (GCNs) [20], graph attention networks (GAT) [21], and GATv2 [22]. We tried several different architectures to study and analyze different variants of graph neural networks for the online handwriting classification problem because of their inherent property to incorporate the contextual information and the handling of varying length input sequences without requiring extra padding or pre-processing method. Graph neural networks used in this work start with two-layer architecture and then extend them sequentially to five layers using both mean_node graph reading and max_node graph reading mechanisms. Similarly, all variants of graph attention networks were built using the same architecture using two convolution layers with varying attention heads. The main difference between both GAT variants used in this work is their attention mechanism. GAT network [21] uses static attention heads, whereas GATv2 [22] uses an attention mechanism with dynamic attention heads.

Transformers
As transformer models are getting much attention for various problems, we also implement the transformers model to explore their potential to classify online handwriting data. We implement two different variants of transformers [23]: a single encoder transformer and gated transformers [41]. Gated transformers use a combination of two networks, in our case, 1D-CNN and a gated recurrent model (BLSTM). Both the variants of transformers network are initialized by three 1D-CNN layers followed by Nx Encoders. At the end of the transformers network, we used two layers of the BSLTM model. Compared to the general transformer models, the decoder from the network has been removed because we operated for the classification problem, not machine translation. We utilized only the encoder and removed the input embedding and positional encoding part. The Norm layer is also eliminated from encoders. The architecture of gated transformers used in our work is shown in Fig. 7.

Network parameters
All conventional deep learning models are trained using the Adam optimizer with learning_rate = 0.001, β_1 = 0.9, β_2 = 0.999, and = 1e −07 . The loss function uses the Graph neural networks have the inherent property to handle variable-length input sequences. Therefore, we did not use data interpolation methods with graph neural networks but only data augmentation methods to handle the class imbalance problem. We used Deep Graph Library (DGL) [42] to build the graph neural networks. All graph neural networks shared the same set of hyper-parameters: Adam optimization with learning_rate = 0.01, β_1 = 0.9, β_2 = 0.999, and = 1e −08 , along with cross-entropy loss. For graph attention networks, additional attention head units num_heads = {1, 3, 8} were used. The batch size of 5 is used for all graph networks.
Both variants of transformers used in this work share the same set of hyper-parameters during training. The used hyper-parameters are head_si ze = 128, num_heads = 5, num_trans f ormer_blocks = 4, mlp_dropout = 0.2, and f eed_ f orward_dimention = 4. The loss function is calculated using sparse categorical cross-entropy. We use Adam optimizer with the learningrate = 1e −4 . We also used two different batch sizes of 16 and 32 for stroke level evaluation.

Datasets
One of the aims of this study is to prepare publicly available datasets for the systematic evaluation of online handwriting classification. We chose three publicly available datasets SenseThePen [11], IAMonDO [16], and IAM-OnDB [17] to extend their functionality by enriching their annotations at three distinguished levels, i.e., stroke level, word/sequence level, and line/object level annotations. Stroke is defined as writing activity for a single pen down activity and is comprised of (x i , y i ). Word/sequence consists of multiple strokes. Subsequently, line/object is defined by multiple words/sequences belonging to a single line in the case of text and mathematical expressions and multiple parts of objects in the case of graph/drawing class. Table 1 presents an overview of existing limitations along with contributions for these datasets in the scope of this work.

IAM-OnDB+ dataset
The IAM-OnDB dataset was made publicly available by Liwiki et al. [17] to the research community for online handwriting recognition problems. Although this dataset contains only English language text data collected from more than 200 writers, we considered it in this study despite its focus on online handwriting classification. The reason is that online handwriting recognition is still an open problem, and text class is the majority class in the other two datasets for online handwriting classification. So, the IAM-onDB+ dataset is presented in this work with enriched annotations at the sequence level. The IAM-onDB+ dataset will extend the application of other datasets for online handwriting recognition problems.

IAMonDO+ dataset
Indermuhle et al. [16] introduced the IAMonDO dataset for mode detection from online handwritten documents. The IAMonDO dataset comprised online handwritten data from 200 writers in the form of 1000 documents. The IAMonDO dataset defines the strokes from digital ink traces described by (x, y) coordinates, time, and pressure information. The first part of Fig. 8 defines the structure of the online handwritten documents, which includes the data in the form of text blocks, lists, tables, formulas, and diagrams. As this work focuses on online handwriting classification at different distinguished levels, we update the existing annotations by merging them to make this dataset usable for online handwriting classification and present them as the IAMonDO+ dataset with sequence and line-level enriched annotations, as shown in the other part of Fig. 8.

SenseThePen+ dataset
SenseThePen [11] was the first publicly available dataset focusing on the classification of online handwriting data. The SenseThePen dataset is collected using a digital pen with a tablet from 30 users. The dataset contains the information described by (x, y) coordinates, time, and pressure of writing. As the SenseThePen data focused on online handwriting classification, it already contains the annotations in text, mathematical expressions, and graph/plot classes.  Fig. 8 The hierarchical structure of the online handwritten document in the IAMonDo dataset. The left image is the initial class structure adapted from [16]. The right image defines the newly modified class structure in IAMonDO+. Yellow is for text, green is for graphs, and pink is for mathematical expression (colour figure online) SenseThePen dataset already has the annotated data at two distinguished levels stroke and sequence level. So, this work extends the functionality to the line-level and presents an updated version of the SenseThePen+ dataset.

IAMonSense dataset
We present the IAMonSense dataset containing SenseThe Pen+, IAMonDO+, and IAMonDB+ datasets for online handwriting classification without interfering with their identity. All datasets were carefully checked and prepared manually so that this new dataset could be used for online handwriting classification at different distinguished levels (stroke, word, line) with annotations of three classes (text, math, graph). The IAMonSense dataset is prepared in a way that makes it ideal to be used with various deep learning models (conventional neural networks, graph neural networks, attention networks, and transformers). The IAMonSense dataset contains the information for (x, y), timestamps, pre_x, pre_y, class, class_line, stroke_id, word_id, line_id. Table 2 presents an overview of the IAMonSense dataset.

Evaluation protocol
To use the machine learning models, we need to split the datasets. A randomly stratified approach was chosen for splitting the dataset to get the train-validation-test sets. The train-validation-test split value is 60%, 20%, 20%. The graph data representation doesn't need the data interpolation preprocessing of length size input. The graph naturally can handle this issue. We can directly use the data with varying input lengths for graph data representation. Moreover, as the graph models don't require validation in the training process, we combined the validation data with training data. We set the train-test split for all the graph neural networks as 80% and 20%, respectively. We use accuracy as a standard metric to report the overall performance of the network and for individual classes.

Results and discussion
This section covers the comprehensive details of the results using conventional deep learning methods, graph neural networks, attention networks, and transformers using SenseThePen+ and IAm-onDO+ datasets at different distinguished levels. The IAM-onDB+ dataset is not considered for the evaluation process, as it only contains a single class data, i.e., text class. We start the evaluation process with simplistic models using conventional deep learning methods; 1D-CNN, LSTM, BLSTM, and their combinations. Combining 1D-CNN with BLSTM produced the best results for stroke level with an overall accuracy of 77.5% and line level data with an overall accuracy of 83.78%. At the same time, a combination of 1D-CNN with the LSTM model produces the best sequence-level results with an overall accuracy of 77.77%. We also notice that line-level data improves the over-  Bold values refer to the results of best-performing models Bold values refer to the results of best-performing models H: number of attention heads in graph attention network L: number of layers in graph convolutional network and graph attention network  Bold values refer to the results of best-performing models all performance of deep learning models for every class. The detailed results of conventional deep learning models using the SenseThePen+ dataset are shown in Table 3. Graph neural networks use graph data representation as input to the model. In this work, we build the graph using three distinguished levels stroke, sequence, and line level. Therefore, all graph networks are trained with the graph representation using line-level data. Using the SenseThePen+ dataset, graph neural networks outperform other deep learning variants used in this work and produce state-of-the-art results with an overall accuracy of 98.3%. The state-of-theart graph network classifies the text call with the accuracy of 98.6%, the math class with 96.25%, and the graph/drawing class with 100%. To our surprise, both variants of the graph attention networks, either using static or dynamic attention heads, fell short of our expectations. Using the SenseThePen+ dataset for the GAT network using 8 static attention heads yields an overall accuracy of 65.24%, and the GATv2 network with 8 dynamic static heads resulted in an overall accuracy of 61.5%. On the contrary, both variants of attention networks on the IAMonDO+ dataset perform relatively well. GAT network with static attention head resulted in an overall accuracy of 81.64%, and the GATv2 model with dynamic attention head produced a classification rate of 84.42%. Graph neural networks perform equally well on the IAMonDO+ dataset with an overall accuracy of 94.1%.
Using graph neural network, the text class is predicted with an accuracy of 95%, math class with 96%, and graph/drawing class with 91.4%. A comprehensive overview of detailed results on different variants of graph networks is furnished in Table 4.
All graph models perform exceptionally well for both datasets, as shown in Fig. 10. One of the main reasons for state-of-the-art performance is graph data representation itself. In this work, we try to build the graph representation in a hierarchical structure so that the lower levels are carried towards higher levels, enhancing the graph representation data's ability to incorporate context information. Younas et al. [12] in their work demonstrated that context information is an aid for a classifier to perform better. The results achieved by graph neural networks also established that context information helps the classifier improve its performance. Figure 12 shows a few occasions where the math class is confused with the text class for the SenseThePen+ dataset, whereas the IAMonDO+ dataset graph/drawing class is confused more with the text class than other classes. One potential reason for this confusion could be merging the table class with the text class while preparing the dataset. We also performed an ablation study to evaluate the influence of network depth and variation of attention heads on both datasets, as shown in Fig. 9. The results reveal that increasing the depth Bold values refer to the results of best-performing models improves not only the overall network performance but also individual classes. We also evaluated the different variants of transformer networks for online handwriting classification using Sense ThePen+ and IAMonDO+ datasets. Table 5 presents a detailed overview of results achieved with transformer networks. The gated transformer also uses recurrent neural networks, enhancing their ability to handle spatiotemporal correlation more effectively and efficiently. The same is reflected in the results, as the gated variant of transformers outperformed the normal variants at all distinguished data levels and across all the data classes. Using the IAMonDO+ dataset for stroke level data, gated transformers produced the results with an overall accuracy of 85.84%, and overall accuracy for sequence level and line level data is 91.3% and 94.7%, respectively. In the case of the SenseThePen+ dataset, overall accuracy is calculated to 79.4%, 78.18%, and 91.53% for stroke, sequence, and level data, respectively.
We select two existing methods presented by Younas et al. [12] and [11] for comparisons using the SenseThePen+ dataset. These existing methods use feature engineering in combination with machine learning classifiers. Results furnished in Table 6 show that all the methods used for the ablation study in this work achieved superior performance compared to machine learning models by margins. The results also endorse that feature learning by deep learning methods helps to achieve better performance when compared to feature engineering and heuristic methods. In this graph neural network, the accuracy for all classes (98.61% in text, 96.25% in math, 100.00% in the graph) with the overall accuracy of 98.29%, a new state-of-the-art result using the SenseThePen+ dataset.
We chose two current state-of-the-art methods to compare the proposed method on the IAMonDO+ dataset. Degtyarenko et al. [37] used a hierarchical RNN network to classify online handwritten strokes, whereas Ye et al. [5] used an edge-based GAT model for classification. Although our proposed methods achieve competitive results compared to the existing methods, the gated transformer network achieves state-of-the-art results for math class data with a perfect classification rate of 100%; detailed results are furnished in Table 7. Figure 11 shows that sometimes, the network confuses text and graph/drawing classes for table data. A potential reason for this confusion could be that we merged the annotations of table data into text data, as shown in Fig. 13. One possible solution is to drop the table class data from the text class and label them as a separate class.

Conclusion
The contributions of this work laid a foundation and served as a baseline for the systematic evaluation of online handwriting classification problems. First of all, this work extends the functionality of existing datasets by enriching their annotations at three different distinguish levels for online handwriting recognition and classification problem. Ablation studies results show the efficacy of several deep learning methods, including graph neural networks and transformers. The derived results further highlight the importance of context for the online handwriting classification problem, as the state-of-the-art results are produced using line-level data. The reported results also signify the importance of contextual information for online handwriting classification. The sequence level has more contextual information than stroke level information, and the line level has more context than the sequence level. Considering the progress of graph neural net-works, we recommend exploring the graph neural network and graph data representation to improve the state-of-the-art. We also plan to evaluate the IAMonSense dataset for online handwriting classification problems. To further extend the research, we plan to evaluate self-supervised and contrastive learning for online handwriting classification tasks. Figures 12 and 13 highlight the classification result error. The blue color indicates text, the green color indicates math, and the black color indicates graph. In Fig. 13, red color was added to see the error location easily. The left images are the ground truth, and the right images are the prediction. The blue color indicates text, the green color indicates math, and the black color indicates graph. The red color highlights the mis-classification (colour figure online)