2.1.1. Pre-tax cash incomes
Our approach draws on both unit record tax data and income survey data. The tax data set, known as ALife, comprises a 10% random sample of tax returns covering the period 1991 to 2018. The income survey data come from the Australian Bureau of Statistics’ Survey of Income and Housing (SIH), covering the period 1994 to 2018, but with some gaps. The SIH provides the longest time span of coverage for income survey data in Australia, the main other survey source being the Household, Income and Labour Dynamics in Australia (HILDA) Survey, a panel study that commenced in 2001.[2]
For pre-tax cash incomes of individuals, based on exploratory work with both ALife and the SIH, we determined that the best approach was to primarily base cash income estimates on the SIH, but with ALife tax data used to adjust incomes for the top 1%. This is because the tax data appear inferior in income capture for most of the distribution (see Fig. 2.1). Although non-labour income is higher in ALife than in the SIH for people with above-median incomes (see Fig. 2.3), it is not enough to compensate for the undercoverage of labour income evident in Fig. 2.2.
Up until 2015-16, the SIH unit record data contain measures of both annual income (for the preceding financial year, 1 July to 30 June) and ‘current weekly’ income. We use the annual income estimates for these surveys. However, in the 2017-18 SIH, only current weekly income is available. We therefore use an annualised measure of this income measure for this survey.
Our approach is something of a departure from existing studies, which have given greater weight to tax records data. However, DINA need to be flexible to national circumstances, and in Australia’s case, survey data is preferable to tax records data for all but the top 1%.
Australia is by no means unique in the finding that income survey data is at least as good as tax data for incomes below the top 1%. Burkhauser et al. (2012) found the US CPS matched income tax data up to the 99th percentile, and Burkhauser et al. (2018) similarly found the UK HBAI matched income tax data up to the 98th percentile. Perhaps requiring some explanation is why the survey data actually captures more income below the 99th percentile than the tax data. Two main explanations exist: some forms of income are nontaxable and are even received by high income earners; and there are incentives to minimise income reported to tax authorities that do not apply to statistical agencies. Regardless of the explanation, the fact remains that macroeconomic aggregates are better captured when income survey data is used for the bottom 99% and tax data is only used for the top 1%.
Aside from better capture of the incomes of the bottom 99%, additional reasons to use the SIH include better flexibility to look at different income concepts (including equivalised disposable cash incomes), income units (including the household unit) as well as information on wealth. That said, we focus on the four income concepts described in the DINA Guidelines.
We distribute incomes of households on an ‘equal-split adults’ basis, meaning each adult household member is assigned an equal share of the total household income, as per the ‘broad equal-split series’ in the DINA Guidelines (p23). Although our baseline estimates are based on these broad equal-split series, we also consider two alternatives. First, we build ‘individualistic series’, which assume no sharing within households and distribute income to each person individually according to individual earnings and ownership. This is a useful comparison point with the ‘broad equal-split series’ when we further breakdown income shares by individual characteristics. Second, we build and use the ‘narrow-split series’ to ensure consistency in the comparison with the US and France. The ‘narrow-split series’ distributes income to all adult individuals by splitting income equally within a couple, but not within the extended household.
While the SIH is our preferred ‘core’ data source, it nonetheless has important limitations which need to be addressed. It is only available from 1994-95, and it has only been conducted every second year from 1997-98 to 2002-03 and from 2003-04 onwards. It also only has wealth data (and hence information on superannuation (private retirement account) balances and home equity required to distribute capital income; see below) in 2003-04, 2005-06 and 2009-10 onwards.
To produce estimates in non-SIH years, we interpolate distributions and adjust according to changes in the components of the National Accounts in those years. We use the national income price index to either inflate the distribution from the closest earlier year or to deflate it from the closest later year. If both an earlier and a later year are available, we apply both methods separately and compute the final DINA estimates by taking the average of the two series thus obtained.
Top 1%: combining survey and tax data
As a growing literature has shown, survey data tend to undercover top incomes. Comparison of survey and tax data has revealed that this is the case in Australia too (Burkhauser et al. 2016) and that it mostly affects the top 1%. We follow the cell-mean imputation method we developed for the UK in Burkhauser et al. (2018), using tax data (ALife) to impute incomes of the top 1% in the survey data.[3]
To implement this method, we first rank individuals in the ALife unit record data by their ‘tax gross income’, which is total income subject to taxation prior to any allowable deductions or rebates. This is the closest variable to ‘pre-tax income’ available in the tax records data. Second, we select individuals in the top 1%, using the ABS estimate of the total adult population shown for the relevant year. Next, we allocate top 1% individuals to income groups, with the size of each group equal to 1/100,000th of the total adult population, meaning we split the top 1% into 1,000 income groups. Third, we calculate the average income for each income group. Next, we repeat the first and second steps with the SIH data for the same year using our derived measure of individual gross income. We then duplicate each record according to its sample weight. Finally, for each of the 1,000 SIH income groups within the top 1%, we replace the individual-level SIH incomes with the mean income of the corresponding group in ALife.
In addition to imputing gross income from tax data for the top 1%, we also use the labour/capital income-source composition as is obtained from the tax data. An alternative assumption would be to use the income composition as determined by the survey data, but this tends to underestimate the importance of capital income for the top 1%. However, the tax data offer less detail and thus less flexibility in then adjusting incomes to match National Accounts totals (e.g., mixed-income is not directly observable in ALife). We address this issue by maintaining the assumption that the income-source compositions of capital and labour incomes are as obtained from the survey data.
Our procedure ensures that total ‘tax gross income’ for the top 1% – and for each of the 1,000 groups within the top 1% – is the same in the (adjusted) SIH and ALife data.
2.1.2. Labour income
Grossing up of labour incomes is required because of potential under-reporting in SIH as well as the failure of the SIH to capture (all of) salary sacrificed employment income, fringe benefits and fringe benefits tax, and ‘employer social contributions’ (i.e., employers’ superannuation contributions and workers’ compensation premiums). Employee incomes are grossed up by a constant factor so that total employee income in the SIH equals total employment income in the National Accounts.
Mixed income is grossed up separately, also by a constant factor. ABS National Accounts data do not report net mixed income. We therefore estimate net mixed income based on gross mixed income, which is reported in the National Accounts data, by applying net-to-gross ratios for mixed income sourced from the WID for Australia.[4]
All grossing-up factors are provided in Appendix Table A.2. Total employee incomes have to be increased by between 10% and 26% to ensure consistency with National Accounts. The required increase is much larger and more volatile from year to year for mixed income, ranging from 25–226%, depending on the year.
2.1.3. Capital income
Capital income is estimated based on reported business and investment income and imputed rent. A ‘grossing up’ adjustment is done separately for each of superannuation, imputed rent and other capital income. The principle is that superannuation income is imputed based on observed or estimated superannuation balances. Net operating surplus of households and non-profit institutions serving households (NOSHN) is distributed based on imputed rent. The remaining (i.e., non-pension non-imputed-rent) capital incomes not captured by the SIH are distributed according to reported non-pension non-imputed-rent capital incomes (hereafter called ‘other capital income’).
From the total capital stock (“National net wealth”) as measured in the National Accounts, we compute the share of the capital stock in superannuation funds (“Pension funds & life insurance”) and then use that share to allocate the appropriate proportion of total private capital income (other than NOSHN) accruing to superannuation funds. The implicit assumption is that returns on superannuation are the same as the overall return on the national private capital stock. Total private capital income is obtained here from the National Accounts by adding “total net property income of households and non-profit institutions serving households” and “total net primary income of corporations”.
Superannuation income, NOSHN and other capital incomes are thus allocated to each individual separately.
Superannuation income
We impute superannuation income proportionally to each individual’s superannuation balance. We use superannuation balances from the SIH for all years for which they are available (2003/04, 2005/06, 2009/10, 2011/12, 2013/14, 2015/16 and 2017/2018). For the years not covered by the SIH, we estimate superannuation balances separately for those aged 60 and over and those aged under 60.
For those aged under 60, we estimate a regression model of superannuation balances on age, labour income and sex (as well as interactions). For those aged 60 and over, the model is enriched by including superannuation income. The coefficient estimates (see Appendix A.3) are then used to impute superannuation balances in the SIH data for years with no information, by using the set of estimated coefficients from the closest year available. This means that superannuation balances from 1991 to 2002 are all estimated based on the 2003 model. This approach is likely to generate some prediction errors. However, we note that superannuation wealth was limited in the 1990s, since compulsory contributions only commenced in 1992, initially at only 3% of gross earnings and gradually increased up to 9% as of 1 July 2002.[5] Moreover, it is the relative distribution of superannuation balances that matters for imputation and not the absolute values, and relativities by labour income, age and sex are likely to have remained relatively stable between 1991 and 2003.
Net operating surplus of households and non-profit institutions serving households (NOSHN)
ABS National Accounts data report only gross and not net operating surplus of households and non-profit institutions serving households. We use the share of the consumption of fixed capital attributable to operating surplus in NOSHN from the WID Australian National Accounts data (see footnote 5 above) to derive net operating surplus from the ABS National Accounts data on gross operating surplus.
We then impute NOSHN proportionally to each household’s net imputed rent. Where a household comprises more than one adult, the income is equally split. Gross and net imputed rents are directly provided in the SIH from 2005 onwards.[6] For earlier years, we predict gross and net imputed rents. Using 2005 values, we estimate a model to predict gross imputed rents based on reported tenure type, state of residence, area of residence, number of bedrooms, household gross income decile and landlord type. The approach draws heavily on the approach developed by the ABS (ABS 2008a). For net imputed rent, all covariates listed above are interacted with (predicted) gross imputed rent and we add mortgage repayments and predicted gross imputed rent to the list of covariates. Coefficient estimates are reported in Appendix A.4. All models are estimated with and without tenure type as this variable was not available before 1995 in the SIH and thus cannot be used for imputation before that year. These models fit the data well with the adjusted R-square 0.97 for gross imputed rent and 0.69 for net imputed rent.
Other capital income
Other capital income has two components: that captured by SIH and that not captured by SIH, the latter of which is a residual equal to total capital income[7] from the National Accounts minus superannuation income from the National Accounts minus non-pension capital income as measured in SIH. This non-captured capital income will primarily comprise corporate retained earnings. We distribute it assuming it has the same distribution as observed other (non-superannuation non-imputed rent) capital income. We take the same approach for adding foreign inome received from tax havens and reinvested earnings on foreign portfolio investment. The latter captures retained earnings in foreign firms accruing to Australians whose shares comprise less than the 10% foreign direct investment threshold required to appear in the National Accounts. We use WID estimates of foreign income received from tax havens and reinvested earnings on foreign portfolio investment (see Zucman 2013).
Grossing-up factors reported in Appendix Table A.2 indicate that this captured capital income has to be multiplied by a factor of between 2 and 4.3 to match National Accounts totals.