Development of a Physicians’ Choice Model Using Mixed Logit With Random Prices for Drugs Case Study on Diabetes Type II

This paper presents a rst experiment with random generator of drug prices and a rst simulation on physicians’ treatment choices (case on pharmacotherapies) for diabetes type II care. It also aims to compare the effects of the price variables according to public versus private health plans on physicians’ choices (Medicare versus commercial Health Plans). The base line model used is a Mixed Logit model with Random Price variables. A series of experiments with random parameters generations is designed with various sequences and number of draws. The model is tested on a real analytical dataset, extracted from the CDC physician survey (National Ambulatory Care Survey, NAMCS), for patients with diabetes type II without complications, for previous predictive econometrics with ENDEPUSresearch, Inc. The model uses a rst drug choice set with three alternatives: oral agents only, combined therapies, no drug. The choice models introduce qualitative dependent variables and complement the series of cumulative logistic models per disease. The matlab code for the new specication test on the Independence of Irrelevant Alternatives at individual level is modied to t this type of medical applications; rst runs compare main parameters of a full choice set versus reduced choice sets of alternatives. It is planned to design more experiments for extended choice sets and widespread applications, in order to lead to user friendly tools for medical systems. The collaboration with Professor Jerry Hausman on the US market will help with use of results and new ways to adjust the reliability on the selection of alternatives; it may provide additional guidance to the algorithms used by professionals and for health policies.

by Burda, Harding and Hausman [7]. This will overcome the limitations of nested logit models and may address in this case, all types of physicians' choices, with and without the IIA property. The new speci cation test [4] has rst been implemented in a Matlab code by Dr J Lustig for business research and needs special modi cations described in this paper for this medical application.

I. Review Of First Steps On The Development Of A Physician's Choice Model:
Prof C Huttin started this collaboration in 2017, with an invitation at MIT economic department to design physicians' choice models and transform the datasets she previously generated, under her company and research services provided by Harvard Medical school (Countway Medicine library).
The data management step is described in the rst working paper from December 2018 [8]. It especially aimed to transform the selected dataset on diabetes, into a long form for qualitative choice models, to impute new random price variables, in order to shift the models from cumulative logistic or partial odd ratio models to mixed logit models. The command asmixlogit in Stata IC15 was used to build this economic model, the results presented at Ispor [9] help to guide the simulation approach, described in the next section of this paper, to implement the new speci cation test for a mixed logit model on the selected diabetic dataset. The rst runs estimated parameters to obtain the convergence of the mixed logit model, using some predictors from the previous studies with cumulative logistic models [10]. The current simulation presented in this paper, implements necessary computations for this test, using both Stata license and Matlab software for the test, on the same dataset as previous models.
This diabetes care case is particularly timely in North America, with the current insulin crisis [11] and the policy proposals to import drugs from Canada [12,13]. The development of this economic model addresses the heterogeneity of physicians' choices on a medical market, at the interaction of supply and demand. Usually, clinical choices are analyzed mostly from the supply side, within the medical system, with a speci c concept of induced supply; this concept of inducement also applies on the demand side; more recent research from the principal agent literature on asymmetry of information uses the concept of induced demand. Usually business research methods are very useful to analyze these types of in uence, especially with experimental design plans (e.g. conjoint models); the application to integrate patient economics with a conjoint model, called "reversed conjoint" [14,15] clearly demonstrated it is very useful to understand how physicians' cost awareness varies with cost cognitive cues.
The economic studies on the US market by Huttin on the period 2007-2017, also validated that physicians are cost aware at the point of care (visit); for instance the 2007 study on hypertension and diabetes, provides statistical estimators on impact of patient economics on physicians' treatment choices (especially prescribing patterns), using analytical data sets extracted from the CDC National Ambulatory Care Survey (NAMCS) from CDC [16]. These statistical models (e.g. two steps models are often used in medical schools) mainly analyze the demand side of the market, especially with the impact of patients' characteristics, such as socio economic and insurance pro les and clinical and epidemiologic predictors on physicians' choices. The statistical runs covered different periods, which allow to compare quantitative estimates of such predictors. Originally this research stream aimed to disentangle how the reimbursement design of various public/private cost sharing arrangements affect physicians at the point of care, while rules of medical ethics usually separate economic and clinical information processing. The series of predictors identi ed with the different models help for the statistical analysis described in this paper.
Ii. Description Of The Baseline Model On Mixed Logit And Modi cations For The Simulation On The Diabetic Dataset: The current model development aims to use a different approach. It is a mixed logit model with random prices for drugs and procedures. The dependent variables are physicians' qualitative choices, among various combinations/considerations of choice sets. Independent variables combine product and patient characteristics, to analyze market adjustment of supply and demand. In the rst-choice experiments for the baseline mixed logit, drug prices are product attributes, they are generated with random-numbers sequences and patients' characteristics are case speci c variables. This type of model is mainly coming from business research. The base line model was implemented in a computer code, written by Dr J Lustig; the runs provide simulations of consumers' choice sets on a limited number of product characteristics. Originally, the code was written for a case scenario of 1000 consumers, three product characteristics and combinations of ten choice sets. The implementation of this test requires a comparison between combinations of three alternatives and a comparison of a reduced choice set, removing one alternative out of the three. This paper describes the modi cations from the baseline model for the application on the selected medical market case: diabetes type II, without complications. This rst experiment includes only one choice set: a drug choice set, while the original code could allow up to ten choice sets. This case study on physicians' choices, rst needs to be described to ensure that program modi cations to implement the new test, cover the various issues added to the baseline model [4]: The mixed logit model run with Stata, is not for consumers choices since the decision maker is the physician, who decides for or with the patient, different treatments and according to different patients' clinical and socio economic characteristics (so both product and patient' attributes) Physicians 'choices are among alternatives on treatment choices for diabetic type II, this requires also decisions on classi cations of drugs, ingredients, procedures for each alternative (usually dependent on clinical guidelines) This case scenario is on drug choices only, and the choice set includes three alternatives, including the "no drug" alternative; the grouping is presented in the next section with descriptive statistics of the sample Drug prices and the Interaction variable: Drug price for Medicare (the main federal public plan versus commercial plans) are random variables Three patient characteristics: age, obesity and sex are case speci c variables for this simulation Therefore, the model for this medical market case includes two random variables on products and three variables on patients. The rst baseline model from Dr Lustig only includes one random variable and the case speci c variables are linked to the decision maker's choices: the consumer. The random variables for this medical application are in log form contrary to the baseline model coded by Dr Lustig, who uses only normal distributions. The generation of a random variable for drug prices is also a critical step in the model development. The rst working paper discusses the sources of drug price data and the selection of main common forms. This selection will have to be improved to use more complex price indexes in the model. In this paper, a generator of random numbers for prices is experimented: The rst formula for the generation of random price numbers provided by Prof J Hausman has been the following: Usual formula for generation of random numbers use a uniform random generation, with originally distributed random variates over an interval (0,1). But in this formula for generation of random drug prices, the random generator for prices is modi ed to an interval (0,2), the 5% arti cial random variation is selected as a rst experiment and the formula is modi ed into a log form of the random variable (1). So, the formula used for generation of prices for this mixed logit model is the following: Formulas for random numbers were implemented in Stata 15C, The formula was rst tested with the "asmixlogit" Stata command: "lnormal", without the log form and only with the "runiform" command for the generation of random number over the selected intervals. However, the formula with the log form and the "runiform" command in it (b) lead to more statistically signi cant parameters than the rst formula (a) used with the lnormal Stata command. The formula avoids negative numbers in the random variates and a price equals to zero in the dataset (it helps with the "no drug" alternative) (1). Additional series of random number functions may be needed in future steps of the model development, especially to investigate correlations issues between alternatives.
Several random variables have been tested for prices since this economic model aims to adjust demand models with product prices, usually with data from the supply side. At this stage of the research, the two random prices represent general price parameters for a drug selection [9] for this type of Type II diabetic patients without complications; however the second random price variable also aims to capture the price differences paid between categories of insurance plans (these differences result to a large extent from differences in discount practices, in addition to variations in the supply chains and dispensing modes).
1. Economists usually need to use a log form in demand models of care, to incorporate very skewed distributions (long tailed), partly due to age distribution of various patient groups. Distributions of epidemiological data, used for instance in models for disease progression, are also not normal (usual distributions are for instance Weibull, etc.).
The power sample of this dataset is su cient to allow a comparison between public and private insurance: mainly between the federal public plan, Medicare and private plans' categories (the classi cation into insurance categories from the CDC was part of a detailed analysis in previous runs to estimate effects of various cost sharing categories).
Therefore, at this stage of development, the model includes two random price variables: one random drug price for each alternative (source: Redbook source) one random interaction variable: drug price x Medicare (the control group is in this case the private insurance category, other models in the future may add other categories such as Medicare advantage, Medicaid, dual eligible etc).
Mixed logit models (e.g. run with Stata software) are very used in the USA, especially since the health law passed under the Obama administration, with the creation of information exchanges on insurance plans. They are mainly used for comparative analysis of health insurance plans, using case speci c variables for main types of health plans and random variables for plan characteristics such as deductibles, copayments types (e.g. tier copays) …. Such analysis helps to represent net price paid by patients or proxies for payment arrangements; they are often used by economists, to analyze the demand side of a medical market. However, in this paper, the random price variable in this economic model is for price adjustment on speci c medical markets ; the categorical variables for insurance in the CDC survey are used to control for major differences of drug prices per main types of insurance plans, especially public versus private. This can help to examine whether there is a major difference in the coverage of patients under commercial plans and under Medicare, for age groups before 65 (especially between 55 and 65) and after 65. As this kind of modeling seems to be reliable in the statistical runs presented in this paper, it may be developed with additional types of insurance pro les and with more complex price indices for each plan category in further research. Main relevant price data providers have also been approached, a negotiation has started with Iqvia for special legacy of Pharmerit datasets (however, for the sampled case on diabetes, this dataset under-represents Medicare enrollees). Medicaid databases per state and IBM/Redbook were also consulted as possible sources. Additional sources used in main international price studies have also been reviewed and may be used for comparative analysis [17][18][19][20].
Therefore, this medical application to implement the generalization of the independence of irrelevant alternatives test and his recent speci cation [4] is proposed for a Mixed Logit Model including random variables for drug prices, with control variables for price differences per main type of insurance plans and patient variables, representing demographics and risk factors. At this stage, the model only includes three patient variables: age, sex and obesity. The variable age is a continuous variable, for adults over age 39 (as in previous predictive disease models already run), the cutoff point 39 is from the Diabetes clinical guidelines at the time of the study [21]. Obesity is a categorical variable in the CDC survey instrument. For the random variables, two types of sequences have been used for the simulations: Halton and Hammersley sequences; Hammersley sequence points may also be an alternative to random numbers, they seem to signi cantly improve e ciency for some simulations (e.g. Monte Carlo simulation, cited in Stata Docs [22]. However, only the Halton sequence could be included at this point: a user command of Stata was needed to input the parameters of the mixed logit models, for the implementation of the new speci cation tests, written in a Matlab code, for the application of this code on the diabetic dataset. Stata runs with this user command, were only available with an Halton sequence by default.

Iii. Results Of The Simulation With A Mixed Logit Model On The Diabetic Dataset:
The rst series of mixed logit models was tested on a drug choice set. The choice set includes three alternatives of treatments in primary care, for type II diabetes without complications (the selection of the three alternatives partly results from the power sample of medical records on Type II diabetic patient): Alternative 1 includes the new oral agents, the oral agent continued/old.

Alternative 2 includes combined drug therapies: insulin and oral agents, insulin and oral agent continued/old, insulin and oral new and continued/old.
Alternative 3 is the "no drug" alternative.
The sample for this real database on physicians' choices includes 645 medical records of diabetic type II patients (extracted from CDC NAMCS survey). Table 1 provides the descriptive statistics on the sample classi ed in the three alternatives A fourth alternative (injectable only) has been excluded of this analysis, with only 0.84% of the sample. The mixed logit model was rst tested with Stata command "asmixlogit" on this diabetic dataset, after transformation into a log form format. Table 2 presents some results, from this rst analysis prepared for Ispor, New Orleans [8]. Table 2 The parameters on physicians 'choices, estimated on the real diabetic dataset, with the mixed logit model presented in Table 1,are used in the simulation approach presented in this paper ; the simulation is implemented in matlab code. In order to use the new speci cation test proposed by Professors Hahn, Hausman and Lustig (2020) [4], the original code programmed by Dr Lustig had to be modi ed for the purpose of this speci c medical application. The Independence of Irrelevant Alternatives (IIA) property can then be examined for physicians' choices, at an individual level, for the drug choice set including Alternatives 1, 2 and 3. The test compares estimated parameters between the choice set of three alternatives, including the "no" diabetic drug, and a reduced choice set of two alternatives out of three; the removed alternative is either the rst alternative, the second or the third one; Normally, if the speci cation test is veri ed, coe cients of the parameters between the choice set of three versus a reduced choice set of two alternatives, will be quite similar, whatever the alternative which is removed from the reduced choice set.
The  : table 3 gives physicians' preferences variations with prices, when alternative 1 ( "on oral agent only") is removed and Table 4 when alternative 2 ("on combined therapy") is removed. They show the comparisons of coe cients for some parameters, between the full choice set and the reduce choice set.   Tables 3 and 4 show the comparisons of coe cients between full and reduce choice sets in two cases out of three (they will be used for the application of the new speci cation test on IIA [4]): when alternative 1 is removed for the reduced choice set, the two coe cients associated with the random drug price for Medicare are quite similar between the two choice sets as well as the two coe cients associated with the interaction variable between obesity and alternative 2.
When alternative 2 is removed, it is the main random price variable (and not the one associated with Medicare) which shows negative coe cients in both choice sets, all coe cients associated with the interaction variables between age and alternative 1 (one oral agent) and age and alternative 3 (no drug) are similar for the choice set with three alternatives and the reduced choice set of 2 alternatives.
When alternative three (no drug case) is removed, the comparison of the two maximum likelihood functions does not work at this point; additional modeling steps are investigated; in particular since the reduced choice set in this case includes the two alternative strategies with pharmacotherapies, different generators of random prices may be needed for the two different drug therapies, in order to avoid correlation issues (e.g. in sequences).
Iv. Discussion And Limitation: The computer simulation tends to compare a choice set with three alternatives (mainly two alternatives of a drug choice (either at least one oral agent or a drug choice of a combined therapy), versus a no drug choice. When a drug alternative with an uptake of drugs "either at least one oral agent " (Alt 1) or a drug alternative "combined therapy" (Alt 2) is removed for the reduced choice set, the test speci cation at individual level tends to show that there is independence of such alternatives, since some coe cients of some parameters are quite similar. However, when the alternative " no drug" is removed from the reduced choice set, the speci cation test comparing a reduced choice set with two drug therapies with the choice set of three alternatives (which included the no drug alternative) does not work.
Can we discuss at this point, the independence of irrelevant alternatives property at individual level of physicians? In the pre diabetic stage, it may not be relevant to be on a drug therapy (e.g. controversy between pharmacists and physicians at the time of the study on prescribing an oral agent versus aggressive preventive strategy). It may explain that when the "no drug alternative" is removed, the new test comparing a reduced choice set comparing two different clinical strategies with drug treatment and a choice with no drug, re ect irrelevant alternatives, for some individuals. However, further statistical analysis is needed to explore potential estimation issues for implementing the new test for all alternatives on this diabetic dataset.

V. Conclusion:
This type of simulation is often used in the eld of preference research especially in health care for patient preferences, using Discrete Choice Analysis (DCE) or shared decision making decision models; current methodological papers [23] discussed the bene ts of simulated multinomial logit according to number of draws, investigating the effects of correlation between Halton sequences on multinomial estimates according to number of draws. The IHAPR academy and the transportation research econometric society recommend performing sensitivity analysis according to the number of random parameters and the number of draws (to revise and complete Chikososky, 2016; nance literature); the two runs on 50 and 200 draws already show the changes on the likelihood ratios when the number of draws is increased; however, additional sensitivity analysis may be necessary to examine the effects of correlation between sequences. As in the rst study using "asmixlogit" Stata command (Table 1), other sequences than Halton sequences may also improve the results. The runs with an Hammersley sequence, seemed to provide more reliability of some parameters. The extraction methodology is described in several papers listed in the reference lists and could be requested to authors The price information used are also publicly available from the Redbook price lists , collected in medical library and also available online "Choice probabilities under mixed logit take the form of a multinomial integral over a mixing distribution" (Brownstone and Train, 1999). The integral is therefore evaluated numerically with the simulation draws.
The formulation of the mixed logit is the following: If Lin is the logit formula The integral of the mixing distribution is estimated with either random draws or draws from different distribution (usually Halton draws).

Appendix 2 sequences and simulation in mixed logit
The simulation of parameters on mixed logit models have usually been performed with Halton sequences instead of random draws since the early studies by Bhat, 1999; other ndings especially at UC Berkeley with Train, ( e.g. 2000) con rmed the improvements of parameters estimates with such sequences (1); and usually software include this type of sequences for simulated results ( e.g. MATLAB or Stata ). Moreover, research in this eld showed the need for a large number of draws in order to have reliable estimates of the parameters. The determination of the alternative to be removed is usually called the statu quo or outside good. In this application, the "no drug" alternative was rst considered as the one to remove for the reduced choice set, with inclusion of the two different drug therapies in the reduced choice set: however. This is not obvious on diabetic type II (see controversies on classi cation and preventive strategies); therefore, each alternative was successively removed: alternative 1, alternative 2 and alternative 3 for the reduced choice sets. Then, three comparisons between simulated parameters for three alternatives versus two alternatives were run, using 50 and 200 draws. The simulations used a different version of the mixed logit model; it was run a second time, using the same predictors on the dataset, but with a Harvard Stata in house license, in order to transfer coe cients from the Mixed logit model into MATLAB code. Tables 3   and 4 present preliminary comparisons of coe cients, for the simulated random price parameters and age. This code is currently under revision and needs additional modi cations for various applications.