Invoices Classication Using Deep Features Based on SME Perspectives

Automation is seen as a potential alternative in improving productivity in the twenty-first century. Invoicing is the essential foundation of accounting record keeping and serves as a critical foundation for law enforcement inspections by auditing agencies and tax authorities. With the rise of artificial intelligence, automated record keeping systems are becoming more widespread in major organizations, allowing them to do tasks in real time and with no effort as well as a decision-making tool. Despite the system's benefits, many small and medium-sized businesses, particularly in Malaysia, are hesitant to implement it. Invoices are mostly processed manually that prone to human errors and lower productivity of the company. Artificial intelligence will further improve automated invoice handling making it simpler and efficient for all levels of businesses especially the small and medium enterprise This study presents a deep learning approach on record keeping focusing on invoices recognition by detecting invoice image classification. The deep learning model used in this research including the classic architecture of Convolutional Neural Network and its other variation such as VGG-16, VGG-19 and ResNet-50. Besides that, the constrains and expectation of the system to be implemented in small and medium enterprise in Malaysia are also presented in the interview scores. The research highlighted a comparison result between deep learning model and the perspective of SME presented in the discussion section. ResNet-50 shows a significant value in both training and validation accuracy compared to the other models with 95.90% accuracy in training and 74.24% accuracy for validation data. Future work will look at the suggested other deep learning method and intelligence features to be implemented for a more efficient invoices recognition and for small and medium enterprise.


1.Introduction
Since invoices are often processed manually in businesses specially in small and medium businesses (SME), digital evolution and automation of invoice processing presents an excellent opportunity for businesses to reduce costs, simplify administrative activities, and boost productivity and competitiveness 4 . They also mention that the electronic exchange of invoices is expected to result in significant economic benefits, especially if the electronic invoice (e-invoice) contains standardized data for automated processing. Invoices serve as a connection between the business processes of order, distribution, payment, and accounting. The invoice, including selfbilling (invoicing by the supply receiving company), is the central component of the Malaysian tax scheme (SME Corp Malaysia). Data from invoices, when performed manually, including capturing, matching, and approval cause a slew of problems. Companies would like to avoid the logistical burden of processing inbound invoices. According to Teunissen, G. (2017) 25 , there would be of no benefit for invoices that are not standardly constructed. He stressed on another method is to classify invoices that have the same layout and assume that the data fields are positioned similarly. Image features, structural features, and/or textual features may all be used to classify data. However, this method also necessitates a learning set for each invoice class, reintroducing manual sorting. Invoice also known as a main document in record keeping which plays a major role in every organization. Good records keeping help directors of a business to understand the business such as which items are selling the best, which item cost the business most and most profitable products. Without proper record keeping and accurate financial information, it is hard to know exactly how the business doing 17 .
Today, especially in SME of Malaysia businesses, companies are still managing their invoices manually by keying-in their invoices data by departments. This consumed a lot of time and prone to human's mistakes. The trust of machine's reliability is also another limitation in invoice handling. Information and communication technology investing has a long-term impact on corporate growth as well as an immediate impact on labor productivity, thus managers should strive for the newest technology 7 . As the information in every invoice are crucial, the automated invoice recognition will help to reduce limitation faced by manual invoice handling. The functionality such as scan-and-go in invoice to identified types of invoices will help in saving a lot of time and the percentage of model's training has proved the reliability of the system. Artificial Intelligence indicator such as pattern recognition and expert business rules in promoting automated functionalities to invoice recognition as one of important record keeping will give a value added into record keeping technologies. This study focusing on the first step of invoice processing which is invoice classification for recognition processes. The study aims to classify invoices images using deep learning to enhance invoice classification for better recognition process and especially, to eliminate manual invoice handling that still being practiced in many SMEs. As invoice processing is crucial in any business, an improve machine learning technology should be applied over time for better classification and recognition performance.
To achieve these aims, the structure of the paper are as follows: next section is a background of previous research and related works. Section 3 briefly discussed methodology which consists of method selection and approach in the subject area. Section 4 will discuss the findings, analysis and discussion. Finally, section 5 is the conclusion of the research.

Related Works
The sections below will describe previous studies dealing in the areas of invoices classification, its relationship to artificial intelligence adaption and the implementation benefits for SME.

Invoices
The role of small business in an economic development proved significant, yet the needs of small business owners remain complicated, and therefore, a public programs and policies created intended to help small businesses to do a better job of addressing the difficulty that small businesses encounter 8 . However, majorities of SMEs holders are unsuccessfully identified the importance of well-oriented and structured accounting system that would have facilitated them to keep systematic financial statement accuracy. The management of accounting systems and methods should explore a cost-effective way through accessible resources for a better decisionmaking 16 . Intelligent systems that are intended to offer socio-relational services might require a higher level of anthropomorphism than AI systems that are intended to be functional in improving efficiency 26 . In the twenty-first century, electronic commerce is regarded as a viable option for lowering costs and increasing productivity, among other advantages, all of which are felt through the digitization of billing 3 . The implementation will help them to gain an insight into the business data, thus an explicit representation of logically related structure, can then be used to feed the database for decision making 9 . P Doshi, et al,(2020) 21 in their study stated that some invoicing procedures are automated in order to minimize workloads, ensure error-free processing, and reduce the need for human involvement. Artificial Intelligence techniques such as machine learning and deep learning have lately been implemented in accounting systems for a variety of reasons, with the goal of improving the efficiency and precision of tedious and repetitive operations 1 . Researchers are working on a classifier that can predict the contents of accounting papers as well as journal entries; specifically, they are focusing on the prediction of account codes 2 .

Convolutional Neural Network
Deep convolutional neural network has brought about break-throughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech 28 . Deep learning is making significant progress in tackling issues that have long withstood the best efforts of the artificial intelligence industry. A deep-learning architecture is a multilayer stack of basic modules that are all (or most) sensitive to learning and many of which calculate non-linear input-output mappings. Each module in the stack transforms its input to improve the representation's selectivity and invariance 10 . Convolutional neural networks (CNN) have inputs that are closer together and lexically linked, making them more suited to image classification 15 . Deep neural network (DNN) architecture namely CNN model includes many hidden layers, input parameters, and a large range of training image 14,15 . A CNN is made up of several hierarchical layers, including feature map levels and classification layers. As seen in Figure  1, CNN typically begins with a convolutional layer that receives data from the input 12 . CNN displays images as numbers, each of which represents a pixel. A set of procedures used to calculate the likelihood of resemblance to the original item. The network is made up of multiple layers, including convolution, rectified linear units (ReLU), pooling, fully connected, and loss. Kaiming H et. al (2015) 13 in their research stated that deeper neural networks require more effort to train. Hence, they introduced a residual learning framework to help with the training of networks that are far deeper than those previously used. Instead of learning unreferenced functions, they explicitly reformulated the layers as learning residual functions with reference to the layer inputs. According to Kaiming, ResNet-50 from residual network framework is 50 layers of stackable residual network. Rathi, et. al, (2020) 22 proposed a deep learning approach for automated sign language recognition by devising a novel 2-level ResNet50 to classify finger-spelled words. They used Barczak's (2011) standard American Sign Language Hand gesture dataset. The Level 1 model classifies the input image into one of four sets after undergoing various augmentation processes. After the image is classified into one of the sets, it is fed into the associated second level model, which predicts the image's actual class. The model produced an accuracy of 99.03% on 12,048 test images. Meanwhile in another reference (I. Z. Mukti and D. Biswas, 2019) 11 stated their experiment they fine-tuned a Resnet-50 framework to identify plant disease. The model has produced an accuracy of 99.80% as compared to the other CNN models.

Methodology
The purpose of this study includes an expectation perception of SME towards the automated invoice handling system. Hence an interview was conducted and experiment was done based on highest key findings using deep learning techniques for image classification.

Interview Selection (SME)
Since the interview was the primary data collection tool for the study 20 . A semistructured interview was selected, with questions carefully crafted to provide sufficient coverage for the research's intent. In this study an interview is design to gather a qualitative view on Malaysian perspective of automatic invoice recognition to apply on their business for better management and growth monitoring. This includes a virtual (video call) visit to the organization, observations, interviews and administration of questionnaires. The data will then be quantified and analyses into descriptive statistic. Sample of Malaysia's invoices are gathered in the interview to be used in invoice recognition processes using TensorFlow and Keras. Table 1 shows the criteria and selection used in conducting the interview. The interview selected includes SME that processes more than 100 invoices manually. Perceptions of SME in applying automated function in invoices handling are also recorded.

Criteria Selection
Company Selection SME company -registered with Suruhanjaya Syarikat Malaysia (SSM)

Location Pelabuhan Klang,Malaysia
Daily activities Processes daily invoices (>100 various invoices) manually Table 1. Criteria interview selection

Data Setup and Description of data
Total data images used in the experiments is 1008 images with 3 different classes, referring to Table 2. Classes of images labelled as 'Invoices', 'Tax Invoices' and 'Receipts'. To begin the experiment, 390 images of 'Invoices' has been selected and divided to 312 images for data training and 78 images for data validation. Total of images in 'Tax Invoices' selected was 313 images and divided to 253 images for data training and remaining 60 images for data validation. In 'Receipts' class, total of 305 images has been selected and divided to 245 images for data training and 60 images for data validation. Total of Train Images selected were 810 images and 198 images used for data validation, making the data distribution based on 0.8 for training and 0.2 for testing. Sample of Images used in the experiment is shown in Figure 2 below.

Image Pre-Processing
Image pre-processing requires two major phases which include a noise reduction and image enhancement. Noise Reduction can be achieved through using filters while image enhancement enhances image by making it suitable for more image analysis by changing the image attributes 23 . In the pre-processing step, filters are used to eliminate noise and in the extraction process of the features, the color moments are extracted as mean features of the images before introduced to the simple feed forward ANN for classification 18 . In this study, a normalization took place in all image dataset to transform the RGB value from [0,255] range into standardize value of [0,1] for ideal neural network by using a rescaling layer.

Model Parameter Setting
A real goal in this process is to complete this research's objective by experimenting the deep learning model towards image dataset to measure the reliability of the system. In order to evaluate the efficacy of deep learning network algorithms on invoice images, four types of convolutional neural network algorithms were chosen. The classic CNN, VGG-16, VGG-19, and ResNet50 are a few examples of the algorithm experimented. Fine tuning the parameters to suit the dataset was done to evaluate accuracy of training and validation performance. The summary of model's parameter setting is shown in the  Table 3. Parameter setting for models

Findings Analysis and Discussion
The breakdown of interview's findings shown in the Table 4 below. Based on the interview the constraints for the company to implement Automated Invoice Handling are based on limited staff knowledge on both accountancy and information technology skills, high investment on implementing IT system and architecture and also time constraints for the company to conduct extra training for their employees. They believe the need of IT specialist in maintaining and delivering the system will cost them a huge number of investments.  Table 4. Findings of SME Interview

Constraints Expectation
The expectations of the system are also recorded if they were to implement the automated system. The main expectation from the industry is the reliability of the system as invoice is the vital information in any accounting. They also stated that the system must be cost effective and able to reduced record error caused by human error.
An interview performed in this study were used to figure out the constraint and expectation of automation system. From the result of the interview, reliability of the system is the most important score the SME are hoping if they were to implement the system. Hence, the experiment is done to measure the accuracy of deep learning technique as subset of Artificial intelligence in invoice recognition by measuring the first step of the system which the classification of invoices.

Experiment results of Invoices Images Dataset using Deep Learning Models
Based on SME interview, reliability highlighted the most important measurement in implementing automated invoices handling. Hence an experiment on the first stage of invoice handling which is the classification of invoice using Deep Learning technique as part of Artificial Intelligence subset, was conducted to strengthen the reliability of the system. Experiment was conducted using Python with Tensorflow and Keras library for machine learning.
The training accuracy and validation results of all the models are combined as per Table 5 to clearly compare the accuracy value of each model in classifying image dataset. The maximum number of training epochs is set at 20 for both dataset training and validation.  Table 5. Training and Validation Accuracy based on models Table 5 above indicates that all four networks operate admirably. To assess the performance of the models on different epochs, the accuracy values at epochs -10 were recorded. In the training phase, most performance indicates an increase in value, but in the validation phase, VGG-19 exhibits a little reduction in value when maximum epochs are achieved. ResNet50 has the best training accuracy (95.9% at maximum epoch), but a somewhat lower validation accuracy compared to VGG-16 that produces higher validation accuracy of 76.77% at maximum epoch. The training accuracy of a network reveals how successful it is in correctly classifying the data on which it is being trained. As a result, a training accuracy of 0.9590 indicates that the network correctly classified 95.90 percent of the images in the training set. Figure 3 compares the value of each model by displaying the training and validation accuracy at maximum epochs. ResNet-50 has the highest training accuracy, whereas the classic CNN has the lowest training accuracy.  ResNet50 shows the best classifier according to training and validation accuracy. This has also shown that the automated function in invoice classifying is reliable with further data training and additional epoch. Based on the accuracy, this experiment should also, increase the confident on system reliability amongst SMEs that applying the technology in invoices handling will ease their daily tasks.
Deep neural networks are expressive in structure, and their capacity and ability to learn features is significant, allowing them to achieve a validation score of more than 70% as training epochs increase. In the experiment, CNN model shows the lowest accuracy of the training and validation, as higher epochs trained, the percentage increases. ResNet-50 in the other hand showing a great accuracy in a shorter epoch. Both training and validation accuracy shows percentage of 70% and above proving higher reliability of the system towards data input. Overfitting is detected on VGG-16, VGG-19 and ResNet-50 as the epochs getting higher. They were detected on training accuracy and validation accuracy results. To overcome the problem, image augmentation had been done and dropout function were added in every convolutional layer. The percentage increases while training but the models still shows an overfitting mechanism. This may be due to smaller data input as VGG and ResNet are initially made to train on larger datasets of 1000 classes. Albeit the experiment is not running on a large number of data, a few of limitation are encountered during the processes; 1) The structure of Invoice and Tax Invoice are almost in the same ledger format; hence the system is distracted in predicting images; 2) As deep learning often knows as data hungry, more quality data are needed for a better mapping between the inputs and outputs as well as better prediction accuracy.

Conclusion
In this study, we investigate several deep learning approaches for addressing the invoice classification issue utilizing deep features architectures, such as Convolutional Neural Network (CNN), VGG-16, VGG-19, and ResNet-50. In the invoice classification, an analysis of the models and algorithms was performed. Based on the performance evaluation, each model was utilized with the appropriate parameters to run the datasets and provided the best accuracy values in both training and validation. The SME's expectations in automating invoice handling system, particularly on system reliability, are further strengthened as the deep learning model, ResNet-50, is capable of producing pretty significant outcomes in a smaller number of epochs with training accuracy of 95.90% and 74.24% on validation accuracy outperform the other models such as classic CNN, VGG-16 and VGG-19. In the future, several suggestions might be made, such as increasing overall train and test accuracy to achieve more accurate results with more classes and instances in the dataset. Other that, to run an experiment on the datasets with another deep learning approach such as recurrent neural network (RNN) in long-short term memory (LSTM) model and hybrid approach, to gain more results for comparison with CNN methods. An intelligent feature to extract details information of the data in invoices (journal extraction) is another area to experiment.