All the methods we used were carried out in accordance with the relevant guidelines and regulations. Our experimental protocols were approved by the Ethics Committees of Tianjin Medical University. The pre-work divided into 3 parts, including planning, data collection and analysis. During the planning phase, an interview outline was developed. Interviewers were diabetes clinicians and they divided into twenty-six groups by interviewers’ hospital. Interviewers received professional training during this phase. During the data collection phase, interviewers took at least 2 hours interview with participants. All participants signed informed consent form before the interview which means that we conducted the interview with the consent of all subjects. During the analysis phase, TCA, which is a systematic method of simplifying data to extract the most salient information from many qualitative interviews, was used to analyze the qualitative data.
Sample Population
To make sure we could gain vulnerable patients as much as possible, an inclusion criterion was developed by expert group. Case filter and its definition were shown in Table 1. All participants were ≥18 years old. In the end, a total of 259 documents were received, 229 of which were able to meet the requirements for the next step of data analysis. The final collection of 229 interviews was able to meet the requirements for the next step of data analysis. From the 229 interviews, field staff digitally recorded the interviews and collected demographic and clinical information about the participants.
Table 1
Case filter and its definition
Case filter
|
Definition
|
High BMI
|
BMI>28kg/m2
|
High blood glucose levels
|
FBG>6.1 mmol/L,2h PBG>7.8 mmol/L
|
Duration if diabetes/comorbidities
|
Have co-morbidities
|
Health insurance
|
No worker basic insurance, urban basic
insurance or commercial health insurance
|
Employment status
|
Unemployment
|
Below poverty level
|
For urban residence, the per capita income is
less than 705 Yuan, for rural residents, the per
capita income is less than 540 Yuan
|
Body size and physical characteristics
|
Waistline: Male≥90cm,Female≥85cm
|
Distance between home and work
|
≥16km
|
Education background
|
Primary and illiteracy
|
Physical activity level
|
Low (Civil service and no exercise etc.)
|
Thematic Analysis
After discussed with the Cities Changing Diabetes Tianjin team and experts from the City University of London (UCL), we developed an initial code manual. The demographic and clinical overview of participants were developed alongside the code manual to generate “vulnerability matrix”. The coder coded two or three interviews according to the coding manual, and then the coder coded a transcript from another member of the team to ensure that the manual coding is valid. During coding, we opened several discussions to perfect the code manual. Finally, we identified 12 themes and 25 factors associated with patients’ vulnerability. All themes and factors were shown in Table 2.
Table 2
Themes and factors of vulnerability of diabetic patients in Tianjin
Themes
|
Factors
|
Financial constraints
|
Low income
|
Unemployment
|
No medical insurance/Low reimbursement ratio
|
Significant family expenditure
|
Severity of disease
|
Appear symptoms, complications, comorbidities
Poor disease control
|
Health literacy
|
Low literacy
|
Health beliefs
|
Perceived diabetes indifferently
|
Acquire health knowledge passively
|
Distrust of primary health services
|
Medical environment
|
Needs not met by medical services
|
Life restriction
|
Limited daily life behaviors
|
Occupational restriction
|
Lifestyle change
|
Adherence to the traditional or unhealthy diet
|
Lack of exercise
|
Low-quality sleep
|
Time poverty
|
Healthcare seeking behaviors were limited by work/taking care of family issues
|
Mental condition
|
Appearance of negative emotions towards diabetes treatment or life
|
Levels of support
|
Lack of community support
|
Lack of support from friends and family
|
Lack of social support
|
Social integration
|
Low degree of social integration
|
|
Faith in suffering alone
|
Experience of transitions
|
Diet transformation
|
|
Dwelling Environment/Place of residence transformation
|
Sampling
Our study sampled from Cities Changing Diabetes data. The data set was related to “Levels of support” and “Health beliefs”. We compiled 239 sentences and 104 sentences respectively for a total of 343 sentences. The data resource we sampled were too small to affect the pre-training effect of the model. Referring to many researchers who faced similar problems [13], we expanded the number of texts through operations such as synonymous substitution, changing sentence structure, and sentence transcription. At last, we got 899 statements about “Levels of support” and 400 statements about “Health beliefs”.
BERT
The BERT model has “Bidirectional Transformer Mechanism”. This mechanism considers the semantic information implied in the context and can adequately extract features from long and complicated sentences [8]. It uses two unsupervised approaches including Masked Language Model (MLM) and Next Sentence Prediction (NSP) to jointly pre-train. The former type of MLM can perform random masking on 15% of the words in a sentence, and then use the context to predict the content of the masking. NSP is to determine the contextual relationship by predicting the coherence of the contextual sentence. BERT is an advanced pre-trained word embedding model based on transformer coding architecture [14] and its result output is a vector or multiple vectors.
The hidden layer, Transformer layer and algorithm input layer in the BERT model rely on the activation function to connect the three parts and the most common activation function is Sigmoid and ReLU (Rectified Linear Unit). In order to improve the convergence speed, alleviate the gradient disappearance problem caused by Sigmoid, and improve the computational efficiency, we choose the ReLU function in this study. ReLU is a segmented linear function where the input has the same values as the output if the input is positive, otherwise the output is zero. The sparse model implemented by ReLU can be more effective in mining target-related features and fitting the training data. The vectorization of the text is implemented in the input layer of the algorithm, which consists of three components: word vector, text vector and position vector. The core of Transformer is the Attention Mechanism, which is to make the vector corresponding to each word in each sentence incorporate the information of all words in that sentence [8]. Just take out a vector corresponding to the token, and it contains the information of the whole sentence. SoftMax classification in the Transformer layer is a multi-classification algorithm that maps inputs to probabilities that are favorable for classification.
ERNIE
Inspired by the BERT masking strategy, ERNIE was designed to enhance learning language representations through Knowledge Masking Strategies, including base level Masking, entity Masking, and phrase Masking [9]. The model consists of two main layers. The first layer is the lower text encoder which is responsible for capturing the basic vocabulary and information from the input tokens. Another layer was the upper knowledge encoder which is responsible for integrating the knowledge information into the text information in order to represent the heterogeneous information of tokens and entities into a unified feature space. ERNIE treats a phrase or an entity as a unit which usually consists of several words. During word representation training, all words in the same units were Masked instead of just one word or character. ERNIE doesn’t add knowledge embedding directly, instead, it implicitly learns knowledge and longer with semantic dependency information, which is used to guide word embedding learning. The first learning stage is basic level Masking. As with BERT, ERNIE randomly masks 15% of the basic language units and trains a Transformer to predict the mask units using the other basic units in the sentence as input. Basic word representations are available at this stage. The second learning stage is phrase Masking stage, which is unique to ERNIE unlike BERT. ERNIE uses basic linguistic units as training input, and it masks and predict all basic units in the same phrase for a random selection of several phrases in the sentence. The third learning phase is the entity Masking phase which is unavailable in BERT. Named entities can be both abstract and actual. After the three stages of Basic Level Masking, Entity Masking and Phrase Masking, we can obtain a more semantically informative representation of words.
Pre-training Processing
Before the training started, we deleted meaningless words in sentences, such as unnecessary tone words and responses (e.g., um, ah, uh, huh, etc.). We did this in order to improve the learning efficiency of the model. The final 1299 statements were compiled and stored in a Linux, UTF-8 BOM Txt file. In the second step, the data sets were divided into training, validation and testing dataset in the ratio of 8:1:1, 7:2:1 and 6:3:1. In the third step, the relevant hyperparameters were set. Batch size is the number of samples trained in each Batch. Epoch refers to the process of propagating a complete data set once in the forward and once in the reverse direction through a neural network. In this study, the hyperparameters of both models were set to the same values. The fine-tuned hyperparameters of the BERT and ERNIE models were shown in Table 3.
Table 3
Fine-tuned hyperparameters of BERT and ERNIE models
Name of the hyperparameters
|
Hyperparameters Value
|
Hidden size
|
768
|
Learning rate
|
5e−5
|
Pad size
|
16
|
Require Improvement
|
1000
|
Epoch
|
100
|
Batch size
|
32
|
Performance metrics
In this study, we used the confusion matrix (Table 4) and its derived evaluation metrics including precision, recall, F1 score, and test accuracy for comprehensive comparisons of classifier performance. One side of the confusion matrix represented the actual class of the dataset, and the other side represented the classification of the dataset by the classifier.
Table 4
Confusion matrix of testing dataset
|
Predicted class
|
|
|
Health beliefs
|
Level of support
|
Actual class
|
Health beliefs
|
True positive (TP)
|
False negative (FN)
|
Level of support
|
False positive (FP)
|
True negative (TN)
|
-
Using this mixed matrix as an example, identifying “health beliefs” as positive result and “Levels of support” as negative result. To avoid duplication of presentation, we only show the matrix with health beliefs as positive outcomes.
-
In the process of calculating the relevant evaluation index of a factor, this factor is used as the positive result.
-
Cells in bold font indicated the correct classification of the model for the dataset.
The test accuracy represented the proportion of the total number of cases correctly predicted by the pre-trained model.
$$Test Accuracy=\frac{{TP+TN}}{{P+N}}$$
1
Precision indicated the number of correctly classified cases as a percentage of the total number of cases classified into this class.
$$Precision=\frac{{TP}}{{TP+FP}}$$
2
Recall measured the classifier completeness which represented the percentage of those who correctly predicted positive result as a percentage of all those who were positive result support.
$$Recall=\frac{{TP}}{{TP+FN}}$$
3
$${F_\beta }{\text{=}}\frac{{ {\beta ^2}{\text{+}}1 \times Precision \times Recall}}{{Precision+Recall}}$$
4
βas a parameter to adjust the weight between Precision and Recall. Generally, in the actual test we default Precision and Recall equally important and set it to 1.
$${F_1}{\text{=}}\frac{{2 \times Precision \times Recall}}{{Precision+Recall}}$$
5
To prevent the differences between the F1 scores of various classes being difficult to distinguish, we calculated macro-F1 scores for comparison.
$${\text{M}}acro - {F_1}{\text{=}}\frac{{{\text{F(Health beliefs)+F(Level of support)}}}}{2}$$
6