Data Source
The current analysis is based on data from Wave 1 and Wave 2 of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), designed and sponsored by the National Institute on Alcohol Abuse and Alcoholism, conducted in 2001-02 and 2004-05, respectively. The NESARC sample represents the civilian, noninstitutionalized adult population of the United States(22). The surveys were conducted using face-to-face, computer-assisted, and in-home interviews. One randomly selected adult (aged 18 years or older) from each sampled household was invited to participate. The overall response rate was 81.0% for Wave 1, for a total sample size of 43,093. Among those, 34,653 (80.4%) were followed-up in Wave 2 (8,840 participants were lost to follow-up).
The NESARC samples were weighted to adjust for probabilities of selection biases and nonresponse. Calibration was applied to match the target population based on the 2000 census. Details regarding to the sampling, weighting and calibration have been described elsewhere (23, 24).
Measures
Daily alcohol consumption
The NESARC contains detailed questions about the drink types, frequency of drinking, quantity and size of drinks consumed during the past 12 months. The amount of pure alcohol in each drink was calculated using an ethanol conversion factor, which accounts for the proportion of pure alcohol in the different types of drinks (23, 24). The average daily volume of pure alcohol consumption in grams during the past 12 months (referred to as daily alcohol consumption herein) was then calculated by dividing the total alcohol consumption across all drink types by 365.
Alcohol dependence
Alcohol dependence in the past 12-months was assessed using the Alcohol Use Disorders and Associated Disabilities Interview Schedule-IV (AUDADIS-IV), based on the criteria of the fourth edition of the DSM (DSM-IV) (25, 26) .
Treatment utilization
The NESARC defines broadly alcohol treatment utilization as “seeking help for alcohol‐related problems” from at least one of the following: Alcoholics/Narcotics/Cocaine Anonymous, or 12‐step meeting; family services or other social service agency; alcohol/drug detoxification ward/clinic; inpatient ward of psychiatric/general hospital or community mental; outpatient clinic, including outreach and day/partial patient programs; alcohol/drug rehabilitation program; emergency room because of drinking; halfway house/therapeutic community; crisis center because of drinking; employee assistance program; clergyman, priest, or rabbi; private physician, psychiatrist, psychologist, social worker, or any other professional; and any other agency or professional. Accordingly, for Wave 1, treatment utilization was considered as the endorsement of any of the above within the past 12-months. For Wave 2, alcohol treatment utilization was ascertained using the following question: “Have you gone anywhere or seen anyone to get help because of drinking since last interview?”
Statistical Analysis
As an exploratory analysis, following traditional fitting of distributions for alcohol use (27, 28), we evaluated the fit of the Log-Normal, Gamma, and Weibull distributions to determine if the distribution of daily alcohol consumption could be appropriately described as unimodal, using the Wave 1 survey. The three fittings were examined using the Kolmogorov-Smirnov test and the null hypothesis was rejected for all three, which suggested the possibility of a multi-modal distribution. Given the skewness of the data, a log-transformation was applied to the daily alcohol consumption variable, and the distribution was modelled and fitted using the following steps:
1. Density plots
Density plots of daily alcohol consumption were produced and the resulting graphs were used to visually identify the possible number of modes.
2. Clustering
Clustering algorithms were used to group a set of data points into clusters, so that data points in the same cluster were more similar to each other than data points in other clusters. The desired number of clusters was decided using the NBClust package, which simultaneously varies the number of clusters, the clustering method and the indices to find the optimal number of clusters for the data points (29). When the indices fail to suggest the best clustering scheme, K-means was used to select the number of desired clusters (30).
3. Gaussian Mixture Model
Given the number of clusters, Gaussian Mixture Models (GMMs) (18) were used to estimate the likelihood that a given point belonged to one of a mixture of Gaussian distributions. The mixture distribution can be represented by writing the distribution function (F) as a sum:

Where k is the number of clusters, and x represents the data points and weights 
P(x) were assumed to follow Gaussian distributions. For each distribution, there are two parameters to describe the shape of the clusters: the mean and the standard deviation. The parameters were estimated via the Expectation-Maximization algorithm. There are two key advantages to using GMMs. Firstly, GMMs are more flexible in terms of cluster covariance. Secondly, since GMMs use probabilities, each data point can have multiple clusters. Therefore, if a data point is located in the middle of two overlapping distributions, its class can be defined by a mixed membership. The Bayesian Information Criterion was used to assess model fit. Sex-specific models were fit and visualized, as well as separate models for those with alcohol dependence.
Lastly, Wave 2 data were used to test if the respective parameter distributions could be described using GMMs were consistent with the distributions in Wave 1. In addition, an analysis using the same statistical approach, as described above, was performed on Wave 1 and Wave 2 data combined to investigate the distributions among those individuals with alcohol dependence in both waves. Individuals were assigned to the cluster for which they had the highest posterior probability from the GMMs. Treatment utilization rate were then calculated for each of the clusters based on lifetime treatment (seeking treatment prior to Wave 2) or any recent treatment (seeking treatment between 12 months prior to Wave 1 and Wave 2).