Content uploaded by Aleksandr G. Alekseev

Author content

All content in this area was uploaded by Aleksandr G. Alekseev on Jul 09, 2019

Content may be subject to copyright.

Content uploaded by Aleksandr G. Alekseev

Author content

All content in this area was uploaded by Aleksandr G. Alekseev on Dec 25, 2018

Content may be subject to copyright.

Using Response Times to Measure Ability on a Cognitive Task∗

Aleksandr Alekseev†

March 27, 2019

Abstract

I show how using response times as a proxy for eﬀort coupled with an explicit process-based

model can address a long-standing issue of how to separate the eﬀect of cognitive ability on

performance from the eﬀect of motivation. My method is based on a dynamic stochastic model

of optimal eﬀort choice in which ability and motivation are the structural parameters. I show

how to estimate these parameters from the data on outcomes and response times in a cognitive

task. In a laboratory experiment, I ﬁnd that performance on a Digit-Symbol test is a noisy and

biased measure of cognitive ability. Ranking subjects by their performance leads to an incorrect

ranking by their ability in a substantial number of cases. These results suggest that interpreting

performance on a cognitive task as ability may be misleading.

Keywords: cognitive ability, test scores, response times, drift-diﬀusion model, choice-process

data

JEL codes: C24, C41, C91, D91, J24

∗I thank Jim Cox, Glenn Harrison, Susan Laury, Tom Mroz, Vjollca Sadiraj, and Todd Swarthout for their

valuable comments and suggestions. I thank conference participants at the Economic Science Association meetings,

the Southern Economic Association meetings, and the Western Economic Association meetings, as well as seminar

participants at Georgia State University, University of California San Diego, the University of Chicago, and Chapman

University for their feedback. This work has been supported by the Andrew Young School Dissertation Fellowship.

†Economic Science Institute, Chapman University, One University Drive, Orange, CA, 92866, e-mail:

alekseev@chapman.edu, phone: +1 (714) 744-7083, ORCID: 0000-0001-6542-1920.

1 Introduction

Correct measurement of cognitive ability is essential since ability is used as an explanatory variable

in a vast array of contexts. Economists have been using cognitive ability to explain diﬀerences in

earnings (Murnane et al.,1995;Heckman et al.,2006,2013), risk and time preferences (Dohmen

et al.,2010), the quality of decision-making (Agarwal and Mazumder,2013), strategic reasoning

(Gill and Prowse,2016), as well as diﬀerences in various life outcomes, such as teenage pregnancy,

marital status, smoking, and engaging in criminal activities (Duckworth et al.,2011). This literature

traditionally uses performance on a cognitive test as a measure of cognitive ability. A fundamental

ﬂaw in this approach is that performance never reﬂects cognitive ability by itself. Performance also

reﬂects character skills, such as motivation (Borghans et al.,2008;Duckworth et al.,2011;Segal,

2012).1The traditional approach thus confounds actual ability with the combination of ability and

motivation, which may result in wrong conclusions about the eﬀect of ability. Using performance

as a proxy for ability could be justiﬁed if subjects’ heterogeneity in motivation is small relative

to their heterogeneity in ability. However, the existing literature provides no way to empirically

evaluate this assumption.

I propose a new approach to measure cognitive ability that overcomes the issues with the

traditional approach. My method is based on a dynamic stochastic model of optimal eﬀort choice

in which ability and motivation are the structural parameters. I show how these parameters can

be separately identiﬁed from the data on outcomes and response times in a cognitive task. The

proposed method is based on explicit modeling of the decision-making process and is inspired by the

literature on drift-diﬀusion models (Ratcliﬀ,1978;Krajbich et al.,2012;Woodford,2014;Clithero,

2018;Webb,2019). These models have been shown to perform well in jointly predicting outcomes

and response times, as well as to match the actual processes in the brain.

I use response times as a proxy for eﬀort, following Wilcox (1993) and Ofek et al. (2007). An

agent’s eﬀective eﬀort is modeled as a Brownian motion with drift in which the drift rate represents

the agent’s ability. Higher ability leads to faster accumulation of eﬀective eﬀort. The accumulated

1For example, consider two students, Adam and Bob, who are taking a cognitive test. Adam has high cognitive

ability but is not interested in the outcome of the test. Bob, on the other hand, has lower cognitive ability but is

highly motivated to get the right answers. As a result, Bob might end up having a higher score on the test, which

according to the traditional approach would imply that Bob has higher ability than Adam, while in reality, their

ranking by ability is the opposite.

1

eﬀective eﬀort at a given time determines the probability to answer a question correctly. Correct

answer yields utility that represents an agent’s motivation. Eﬀort is costly, and the more time an

agent spends on a task, the higher will be the accumulated cost of eﬀort. The agent’s problem is

to choose the optimal moment to stop the eﬀective eﬀort accumulation process. The solution to

the agent’s problem takes the form of a threshold rule in terms of the accumulated eﬀective eﬀort.

I derive a closed-form solution for the optimal threshold and show how it is related to ability and

motivation. The parameters of the model can be estimated using the maximum likelihood method

using the data on outcomes and response times from a series of trials of a cognitive task. The

proposed estimation strategy can be viewed as a version of a threshold regression model used in

survival analysis (Lee and Whitmore,2006).

I conduct a laboratory experiment to illustrate the proposed approach and compare it to the

traditional approach. In the experiment subjects take a Digit-Symbol test (DST) in which they

have to match symbols to digits. DST is designed to capture a subject’s processing speed, which

underlies more complex cognitive functions. DST is used in the economics literature (Segal,2012;

Dohmen et al.,2010)) and in intelligence scales such as WAIS (Weiss et al.,2010). Subjects are

free to choose how much time to spend on a task and are not extrinsically motivated for good

performance. I estimate ability and motivation for each subject individually and use the structural

model to perform a counterfactual simulation in which the only source of variation in performance

is variation in ability. I ﬁnd that performance is a noisy and biased measure of ability. Variation in

ability can explain only 0.58 of the variation in observed performance. Subjects with relatively low

ability have lower performance than they would have if performance were an unbiased measure of

ability, while subjects with relatively high ability have even higher performance than they would

have. Ranking subjects by performance leads to an incorrect ranking by ability 24% of the time.

These results suggest that more care should be given when interpreting performance as cognitive

ability since such an interpretation may be misleading. The present paper, however, should be

viewed as a ﬁrst step towards uncoupling ability from motivation on performance. More work is

needed to understand how well performance approximates ability in other cognitive and real-eﬀort

tasks used in the literature. The main goal of the present paper is to provide the tools for this work

and to illustrate the usefulness of choice-process data and process-based modeling in developing

such tools.

2

2 Theoretical Model

Consider an agent working on a trial of a cognitive task. An outcome of the trial can be either a

success (the answer given by the agent is correct) or a failure (the answer given by the agent is

incorrect). The agent can exert eﬀort, approximated by response time t, to increase the probability

of success. Following the literature on the drift-diﬀusion model, I assume that the agent accumulates

eﬀective eﬀort Etaccording to the Brownian motion with drift:

dEt=αdt +σdWt, E0= 0,(1)

in which the drift rate α > 0 represents the agent’s ability and the diﬀusion parameter σ > 0

represents her (inverse of) consistency.2Ability in this model is equivalent to the eﬃciency, or

intensity, of converting eﬀort (time spent on a trial) into performance (probability of success). This

eﬃciency can vary across agents for a given task but is assumed to be ﬁxed over the trials of a

task for each agent. Given a ﬁxed amount of time, an agent with higher ability will have higher

performance on the task than an agent with lower ability. Having higher ability in the model thus

corresponds well to an intuitive notion of being good, or able, at doing something.

The agent stops the accumulation of eﬀective eﬀort and gives an answer to a trial when the

eﬀective eﬀort process (1) hits a threshold. Unlike in a typical drift-diﬀusion model, I assume that

there is a single threshold that is chosen optimally by the agent.3The agent uses a discount rate

ρ>0 and is assumed to experience a unit cost of eﬀort, in utility terms, per unit of eﬀort spent

on a trial. The utility of success is µ>0, and the utility of failure is normalized to zero. Utility

µrepresents the agent’s motivation for succeeding on a task, which is allowed to be agent- and/or

task-speciﬁc.4The probability of success p(·) depends on the accumulated eﬀective eﬀort. I assume

that p(·) is strictly increasing and strictly concave. At time τ, the agent’s discounted expected

utility is

EZτ

0−e−ρtdt +µp(Eτ)e−ρτ ,(2)

2The starting point of the eﬀective eﬀort process can be initialized at a value other than 0 to allow for the case

of multiple-answer questions. The starting value E0then would be chosen so that the probability of success at E0

equals 1/[number of answer options].

3Ratcliﬀ and Van Dongen (2011) also study a single-threshold diﬀusion model, however, they do not consider

utility maximization.

4Strictly speaking, motivation in this model is measured in the units of the cost of eﬀort.

3

which is the sum of the (negative) accumulated discounted cost of eﬀort and the expected discounted

beneﬁt from a success on a trial.

The agent chooses when to stop the accumulation of eﬀective eﬀort in order to maximize the

utility function (2). The solution to the agent’s problem is a stopping rule in terms of the accu-

mulated eﬀective eﬀort. The agent continues the accumulation of eﬀective eﬀort Etuntil it reaches

a threshold of E∗. The agent stops as soon as the threshold is hit.5Since the eﬀective eﬀort

accumulation process is stochastic, the optimal response time required to hit the threshold E∗is

a random variable.6Under the assumption of a Brownian motion with drift, the optimal response

time τ∗has an inverse Gaussian distribution with the pdf

f(τ∗) = E∗

p2πσ2(τ∗)3exp −(E∗−ατ∗)2

2σ2τ∗.(3)

The threshold E∗must satisfy the following optimality condition:7

ρ

βp0(E∗)−ρp(E∗) = 1

µ,where β≡−α+pα2+ 2ρσ2

σ2.(4)

For the empirical application of the method, I consider a special case of the model in which the

discount rate ρtends to zero.8In this case, the optimality condition for E∗becomes

p0(E∗) = 1

αµ.(5)

To solve the problem analytically, I further assume that p(E)=1−e−E. The optimal threshold

for the agent’s problem is then simply

E∗(α, µ) = ln α+ ln µ. (6)

5If E0=E∗, the agent should give an answer immediately.

6Importantly, the probability of success on a trial will not depend on the realized value of the response time.

Figure C.5 in Online Appendix Cempirically validates this prediction.

7See Online Appendix Bfor the derivations.

8In the experiment, decisions are made on timescales of under a minute. Discounting is unlikely to have a

meaningful eﬀect on such short timescales.

4

It follows from equation (6) that the agent’s optimal threshold, and hence her performance,

is increasing in both ability and motivation.9The eﬀect of ability on average eﬀort, ¯τ∗=E∗

α,

cannot be unambiguously signed. For agents with high E∗(>1), an increase in ability will lead

to lower average eﬀort, and vice versa. This result is a natural consequence of the concavity of

p:10 at high levels of E∗the marginal increase in E∗due to higher ability will be lower than the

marginal increase in α. Hence the average eﬀort needed to reach a higher eﬀective eﬀort threshold

will be lower. The comparative statics results for the optimal threshold and average eﬀort imply,

in particular, that one cannot use a single measure, either performance or eﬀort, to identify ability.

However, combining the two pieces of data does allow one to separate the eﬀect of ability from the

eﬀect of motivation, as next section shows.

3 Estimation Strategy

Suppose that we observe a sequence of Nindependent and identical trials of a cognitive task

performed by an individual. Each observation is a pair (xi, ti), i = 1, . . . , N , where xi∈ {0,1}is

an outcome of a trial iand ti>0 is a response time in that trial. The likelihood of an observation

i, conditional on the parameters of the model θ≡(α, µ, σ), is

l(xi, ti|θ) = p(E∗(θ))xi(1 −p(E∗(θ)))1−xif(ti|θ).(7)

The ﬁrst part is simply the Bernoulli likelihood. The second part, f(ti|θ), is the likelihood that the

stochastic process (1) hits the threshold E∗at time ti, given by (3). The ability of each individual,

as well as the two other parameters of the model, can then be estimated using the maximum

likelihood method:

ˆ

θ= arg max

θ

ln L(θ|x,t)≡

N

X

i=1

ln l(xi, ti|θ).(8)

There are three moments in the data and two functional relations, p(E) and E∗(α, µ), that

exactly identify the three parameters of the model. Let Xbe a Bernoulli random variable encoding

an outcome of a trial, and Tbe an inverse Gaussian random variable encoding a response time.

9It is straightforward to show using (5) that these comparative statics results hold for any increasing and concave

function p.

10 Diminishing marginal product of eﬀective eﬀort appears to be a reasonable assumption for a cognitive production

function p.

5

Then E[X] = p(E∗), which yields an estimate of the optimal threshold: c

E∗=p−1(X). The second

moment is E[T] = E∗

α, which yields an estimate of ability: bα=p−1(X)

T. Equation (6) then yields an

estimate of motivation: bµ=T

p−1(X)exp p−1(X). Finally, the third moment, E1

T=1

E[T]+σ2

(E∗)2,

yields an estimate of σ2:c

σ2= (p−1(X))21

T−1

T.11

4 Experiment

To illustrate the method, I conducted an experiment at the Experimental Economics Center lab

at Georgia State University (GSU) in June 2017 and March-April 2018. The experiment consists

of 11 sessions with 192 participants in total. The subjects in the experiment are undergraduate

students at GSU. The average earnings in the experiment are $36.35.

The main part of the experiment is a cognitive task, which is a version of a Digit-Symbol test

(DST).12 In a DST, subjects have to ﬁnd correct correspondences between digits and symbols. In

the present implementation, subjects are given a key with six digit-symbol pairs and a list of 14

symbols to ﬁll six numbered boxes.13 The DST consists of 100 trials in which the key and the

list of available symbols change in every trial.14 Subjects are free to choose how much time to

spend on each trial.15 Unconstrained (or endogenous, in the language of Spiliopoulos and Ortmann

(2018)) response time is important in this context since response time is assumed to be the only

margin of eﬀort in the experiment. In order to minimize the interdependency between the rounds

(e.g., via learning) as much as possible, I do not provide subjects with any feedback between the

rounds. Subjects learn their score only at the end of the experiment. The incentives in the DST

are ﬂat: each subject receives $20 for completion regardless of performance. This incentive scheme

11 The functional form assumption p(E)=1−e−Einvolves an implicit normalization. One could introduce an

additional parameter γin the probability of success function, p(E) = 1−e−γ E , which in the present case is normalized

to 1. Then one would have to normalize σ2to 1 and estimate γ.

12 The experiment also included a risk elicitation task and a survey, results of which are not reported here.

13 See Appendix Afor the subject instructions and screenshots.

14 In a traditional implementation of a DST, the key does not change across trials. Performance on a traditional

DST then captures subjects’ working memory in addition to processing speed. In the present context, however,

processing speed is the only quantity of interest. See Benndorf et al. (2018) for a similar argument.

15 In most implementations, the time that is allowed to spend on a cognitive task is constrained. The performance

measure that is used in the present experiment, i.e., performance with no time constraint, is, therefore, not strictly

identical to the performance measures typically used. The underlying message, however, would remain the same even

if the time were constrained: in order to separate the eﬀect of ability from the eﬀect of motivation, one needs to

supplement a measure of performance with a measure of eﬀort. In the case of a time constraint, however, the relevant

measure of eﬀort would be diﬃcult to observe.

6

allows one to elicit a subject’s intrinsic motivation since good performance is not extrinsically

incentivized.16

The beneﬁt of a DST is that it measures ﬂuid intelligence, i.e., the ability to solve novel problems

that do not rely on any cultural background or accumulated knowledge for solution (Cattell,1971).

Performance on a DST is associated with processing speed. Processing speed is positively associated

with other IQ measures since the processing speed is the basis for more complex cognitive functions

(Vernon,1983). In economics, researchers have used a DST to study the relationship between

cognitive ability and risk and time preferences (Dohmen et al.,2010) and the role of motivation in

performance (Segal,2012).

5 Results

Subjects perform surprisingly well on the DST given that they were not extrinsically rewarded for

good performance. The median score is 92 and the interquartile range (IQR) for the score is only

9. Such high scores suggest that the subjects had non-trivial levels of intrinsic motivation in the

task. The median subject took 20.6 seconds on average to complete a single trial. The IQR for the

mean response time (MRT) is 6.9.17 To get a rough idea of processing speed, one can look at the

ratio of a score to the total time spent on a task. According to this measure, the median subject

gave 2.6 correct answers per minute.

Figure 1shows the distribution of the individual-level estimates18 of the ability parameter

α.19 The distribution is bell-shaped and concentrated around the median but asymmetric. The

sample distribution has a higher mass of subjects with a just-below-median ability and a longer

and fatter right tail relative to a reference normal distribution. The graph shows considerable

variation in ability among subjects. For example, a subject at the 75th percentile is 1.5 times

better at converting their exerted eﬀort into accumulated eﬀective eﬀort than a subject at the 25th

16 An important modiﬁcation of this baseline design would be to introduce variable conditional rewards, which

would allow one to study how parameter estimates change with the reward level.

17 The distribution of mean response times is long-tailed and well-approximated by an inverse Gaussian distribution.

See Figure C.2 in Online Appendix Cfor the distribution of scores and response times in the sample.

18 See Figure C.3 in Online Appendix Cfor a quantile probability plot of a model’s ﬁt.

19 The subjects with a perfect score of 100 (6 subjects or 3% of the sample) were assigned a score of 99 by randomly

selecting a trial and assigning it as incorrectly solved. The model cannot be estimated in the case of a perfect score.

This is a ﬁnite sample issue: adding more trials would likely eliminate the instances of perfect scores. Excluding the

subjects with perfect scores does not alter the results signiﬁcantly.

7

Figure 1: Distributions of Raw Ability

0.00

0.05

0.10

0.15

0.20

2.32 5.89 7.28 9.10 14.70

Ability

Density

Panel A.

Note: The ﬁgure shows the distribution of the individual-level estimates of ability in the sample. The smooth

solid line is the kernel density estimate, the vertical bars are the histogram, the dotted line is the reference

density of a normal distribution with the parameters matching the sample moments, and the vertical dashed

line is the sample median. The breaks on the horizontal axis correspond to the quintiles of the distribution.

percentile. A subject with the highest ability is 2 times better than a median subject and 6.3 times

better than a subject with the lowest ability.

The advantage of having an explicit structural model is that it allows one to conduct a coun-

terfactual exercise. This exercise asks the following question: How would the distribution of per-

formance in the sample look like if it only varied based on ability? Answering this question is

important because it empirically evaluates how well performance approximates true ability. If the

two distributions are similar, performance is a good proxy for ability. If the two distributions are

diﬀerent, a correction method, such as the one proposed here, is required. I use formula (6) to

compute the optimal eﬀort threshold E∗

iand the probability of success p∗

iimplied by this thresh-

old for each subject while holding motivation ﬁxed at the median level. This procedure yields

the distribution of counterfactual performance that would arise in the sample due to variation in

ability alone. Note that this counterfactual performance is stripped down from all the variation in

motivation and thus represents an unconfounded measure of ability.

Before comparing the observed and counterfactual performance, it is worth to recall a simple

model of performance as a noisy measure of the true underlying ability:

P1=P0+, (9)

8

Figure 2: Observed and Counterfactual Performance

0

5

10

15

0.6 0.7 0.8 0.9 1.0

Performance

Density

Counterfactual

Observed

Panel A.

0.7

0.8

0.9

1.0

0.7 0.8 0.9 1.0

Counterfactual Performance

Observed Performance

Panel B.

Note: Panel A shows the kernel density estimates of the distributions of observed and counterfactual per-

formance. Panel B shows the scatterplot of the observed and counterfactual performance. The dotted line

is the 45-degree line. The dashed line is the linear ﬁt.

where P1is the observed performance, P0is the counterfactual performance deﬁned as above that

reﬂects the true underlying ability,20 and is a mean-zero noise term. Noise in this model is caused

by the variation in motivation. For the observed performance to be a good proxy for ability, the

variance of the noise σ2

should be small relative to the variance of the observed performance σ2

P1,

and the noise term should be orthogonal to ability, cov(P0, ) = 0. The observed performance in

our case fails to satisfy either property.

Figure 2(Panel A) plots the kernel density estimates of the distributions of the observed and

counterfactual performance. It is immediately clear that the observed performance contains a

substantial degree of noise. The ratio of the variance of the noise to the variance of the observed

performance, σ2

/σ2

P1, is 0.55. Such a high noise component results in only a moderate association

between the observed and counterfactual performance: variation in the counterfactual performance

can explain only 0.58 of the variation in the observed performance in a simple linear regression.

If one cares only about whether the ranking by performance is similar to the ranking by ability,

the picture is similarly unsatisfactory (Kendall’s τ= 0.54, p-value <0.001). In particular, ranking

subjects by their performance leads to an incorrect ranking by their ability in 24% of cases.21

20 For simplicity, I assume that there is no measurement error in the counterfactual performance, which of course

will not be true in practice. Allowing for this additional measurement error would only increase the overall noisiness

of the observed performance.

21An alternative way to interpret this number is that the probability that two subjects taken at random will have

incorrect ability ranking, as implied by performance, is 24%.

9

However, the observed performance is not just a highly noisy measure of ability. It is also a

biased measure. The issue lies in the fact that ability, as measured by the counterfactual perfor-

mance, is positively associated with the noise term (Kendall’s τ= 0.28, p-value <0.001). Panel B

on Figure 2illustrates the bias that results from this association by presenting a scatterplot of the

observed performance against the counterfactual performance. If the observed performance were

an unbiased measure of true ability, the dots on the graph would lie along the 45-degree line (dotted

line on the graph). This is clearly not the case. Subjects with relatively low ability (<0.92) score

less than they should, while subjects with relatively high ability (>0.92) score even higher than

they should. This bias is represented by a linear ﬁt (dashed line on the graph) that has a slope

greater than one and a negative intercept.

The high degree of noise coupled with the systematic bias in the observed performance is likely

to lead to invalid inferences when ability, proxied by performance, is used as a control or a causal

regressor. For instance, suppose that a researcher is interested in the eﬀect of ability on some

outcome of interest, and the outcome of interest might depend on both ability and motivation. The

researcher, however, only has access to performance as a proxy for ability. Then it is possible that

a researcher ﬁnds a positive eﬀect of performance on the outcome of interest and concludes that

ability has a positive eﬀect when, in reality, ability has no eﬀect: this would be the case when there

is a strong positive eﬀect of motivation on the outcome of interest.22 This issue is an instance of

an omitted variable bias.

6 Conclusion

The economics literature uses cognitive ability as an explanatory variable in a vast array of economic

contexts. The traditional approach of using performance on a cognitive test as a measure of ability

confounds actual ability with the combination of ability and motivation, which may result in wrong

conclusions about the eﬀect of ability. In this paper, I propose a new approach to measure cognitive

ability that overcomes this issue. The proposed approach is based on using response times data, in

addition to performance data, as a proxy for eﬀort together with an explicit process-based model

inspired by the drift-diﬀusion model. I model ability and motivation as parameters of the structural

22 In general, issues of this kind will arise when the eﬀects of ability and motivation on the outcome of interest

diﬀer. Table C.2 in Online Appendix Cmakes this point clear by presenting the results of a simulation exercise.

10

model and show how to estimate these parameters from the data on outcomes and response times

in a cognitive task. In a laboratory experiment, I ﬁnd that performance is a noisy and biased

measure of cognitive ability. Ranking subjects by their performance leads to an incorrect ranking

by their ability in a substantial number of cases.

These results suggest that more care should be given when interpreting performance as cognitive

ability, as is usually done, since such an interpretation may be misleading. The present paper

proposes a method to deal with this issue that calls for taking advantage of the response time

data and spelling out an explicit model of eﬀort choice that structurally separates ability from

motivation. The proposed method can be broadly applied in various settings, including existing

data, since collecting response time data is costless, and software applications collect these data in

the background. The estimates of ability and motivation can be easily computed since they are

simple functions of the sample moments. Future work should investigate how well performance

approximates ability in other cognitive and real-eﬀort tasks used in the literature.

11

References

Agarwal S, Mazumder B (2013). “Cognitive Abilities and Household Financial Decision Making.”

American Economic Journal: Applied Economics,5(1), 193–207.

Benndorf V, Rau HA, S¨olch C (2018). “Minimizing Learning Behavior in Repeated Real-Eﬀort

Tasks.” Working Paper 343, Center for European, Governance and Economic Development

Research, Georg-August-Universit¨at G¨ottingen.

Borghans L, Duckworth AL, Heckman JJ, Ter Weel B (2008). “The Economics and Psychology of

Personality Traits.” Journal of Human Resources,43(4), 972–1059.

Cattell RB (1971). Abilities: Their Structure, Growth, and Action. Boston: Houghton Miﬄin.

Clithero JA (2018). “Improving Out-Of-Sample Predictions Using Response Times and a Model of

the Decision Process.” Journal of Economic Behavior & Organization,148, 344 – 375.

Dohmen T, Falk A, Huﬀman D, Sunde U (2010). “Are Risk Aversion and Impatience Related to

Cognitive Ability?” American Economic Review,100(3), 1238–1260.

Duckworth AL, Quinn PD, Lynam DR, Loeber R, Stouthamer-Loeber M (2011). “Role of Test

Motivation in Intelligence Testing.” Proceedings of the National Academy of Sciences,108(19),

7716–7720.

Gill D, Prowse V (2016). “Cognitive Ability, Character Skills, and Learning to Play Equilibrium:

A Level-k Analysis.” Journal of Political Economy,124(6), 1619–1676.

Heckman J, Pinto R, Savelyev P (2013). “Understanding the Mechanisms Through Which an

Inﬂuential Early Childhood Program Boosted Adult Outcomes.” American Economic Review,

103(6), 2052–2086.

Heckman JJ, Stixrud J, Urzua S (2006). “The Eﬀects of Cognitive and Noncognitive Abilities on

Labor Market Outcomes and Social Behavior.” Journal of Labor Economics,24(3), 411–482.

Krajbich I, Lu D, Camerer C, Rangel A (2012). “The Attentional Drift-Diﬀusion Model Extends

to Simple Purchasing Decisions.” Frontiers in psychology,3, 193.

Lee MLT, Whitmore G (2006). “Threshold Regression for Survival Analysis: Modeling Event Times

by a Stochastic Process Reaching a Boundary.” Statistical Science, pp. 501–513.

Murnane R, Willett JB, Levy F (1995). “The Growing Importance of Cognitive Skills in Wage

Determination.” The Review of Economics and Statistics,77(2), 251–66.

Ofek E, Yildiz M, Haruvy E (2007). “The Impact of Prior Decisions on Subsequent Valuations in

a Costly Contemplation Model.” Management Science,53(8), 1217–1233.

Ratcliﬀ R (1978). “A Theory of Memory Retrieval.” Psychological Review,85(2), 59.

Ratcliﬀ R, McKoon G (2007). “The Diﬀusion Decision Model: Theory and Data for Two-Choice

Decision Tasks.” Neural Computation,20(4), 873–922.

Ratcliﬀ R, Van Dongen HPA (2011). “Diﬀusion Model for One-Choice Reaction-Time Tasks and

the Cognitive Eﬀects of Sleep Deprivation.” Proceedings of the National Academy of Sciences,

108(27), 11285–11290.

12

Segal C (2012). “Working When No One Is Watching: Motivation, Test Scores, and Economic

Success.” Management Science,58(8), 1438–1457.

Spiliopoulos L, Ortmann A (2018). “The BCD of Response Time Analysis in Experimental Eco-

nomics.” Experimental Economics,21(2), 383–433.

Vernon PA (1983). “Speed of Information Processing and General Intelligence.” Intel ligence,7(1),

53–70.

Webb R (2019). “The (Neural) Dynamics of Stochastic Choice.” Management Science,65(1),

230–255.

Weiss LG, Saklofske DH, Coalson DL, Raiford SE (2010). WAIS-IV Clinical Use and Interpretation:

Scientist-Practitioner Perspectives. Academic Press.

Wilcox NT (1993). “Lottery Choice: Incentives, Complexity and Decision Time.” The Economic

Journal,103(421), 1397–1417.

Woodford M (2014). “Stochastic Choice: An Optimizing Neuroeconomic Model.” American Eco-

nomic Review,104(5), 495–500.

13

Appendices

A Experimental Instructions (Online)

This task is based on ﬁnding correct correspondences between numbers and symbols. In each round,

you will see 6 pairs of number–symbol combinations (the key) arranged in a table at the upper part

of the screen, see Figure A.1a for an example. Below the key, there will be 6 empty numbered boxes.

You will use the key to ﬁll in the boxes with the symbols located in a column to the left of the

boxes. You will do this by dragging the symbols into the boxes. If a symbol from the column is in

the key, drag it to the corresponding numbered box. Some of the symbols will not be listed in the

key. In this case, you should not use them in any of the boxes. Some of the numbers will not have

corresponding symbols. In this case, you should leave those boxes empty. Each box, therefore, can

contain either one or no symbols. Figure A.1b shows an example of a correctly solved round.

After ﬁlling all the boxes as you see ﬁt, click “Submit”, and you will proceed to the next round. You

can proceed with each round at your own pace, there is no time limit. We ask that you complete

all 100 rounds of the task. We will show you your score at the end of the task. You will receive

$20 for completing this task.

You will have 3 practice rounds before the actual task begins. This will give you a chance to

familiarize yourself with the interface. During the practice, you will receive feedback if you make a

mistake.

(a) Decision screen (b) Correct solution

Figure A.1: Digit-Symbol Task

A.1

B Math Appendix (Online)

The discounted value function h(E) of the problem (2) must satisfy the following Hamiltonian-

Jacobi-Bellman (HJB) equation:

0 = −ρh −k+αh0+σ2

2h00,(B.1)

where kis the cost of a unit of eﬀort, assumed to be equal to 1. The general solution to the HJB

equation (B.1) is

h(E) = Aeβ1E+Beβ2E−k

ρ,(B.2)

where β1,2are the roots of the characteristic equation

σ2

2β2+αβ −ρ= 0.(B.3)

The two roots are

β1,2=−α±pα2+ 2ρσ2

σ2.(B.4)

It is worth noting that the term with the negative root β1is explosive and thus needs to be

eliminated, hence A= 0. The HJB equation then becomes

h(E) = Beβ2E−k

ρ.(B.5)

To determine the optimal threshold E∗, two conditions are used: the value-matching condition

h(E∗) = µp(E∗) and the smooth-pasting condition h0(E∗) = µp0(E∗). From the smooth-pasting

condition it follows that

Beβ E∗=µp0(E∗)

β,(B.6)

where β≡β2. Plugging it into the value-matching condition yields

µp0(E∗)

β−k

ρ=µp(E∗),(B.7)

B.2

which after re-arranging the terms becomes

ρ

βp0(E∗)−ρp(E∗) = k

µ.(B.8)

The limiting result in the case of ρ→0 follows from (B.8) after noting that

lim

ρ→0

ρ

β= lim

ρ→0

ρσ2

−α+pα2+ 2ρσ2

= lim

ρ→0

σ2

2σ2

2√α2+2ρσ2

(l’Hopital’s rule)

= lim

ρ→0pα2+ 2ρσ2

=α.

Equation (B.8) then becomes

p0(E∗) = k

αµ.(B.9)

Assuming that p(E) = 1 −e−Eand noting that p0(E) = e−E, one obtains from (B.9)

e−E∗=k

αµ (B.10)

or

E∗= ln α+ ln µ

k.(B.11)

Consider the average response time ¯τ∗=E∗/α = ln(αµ)/α. The marginal eﬀect of ability on

¯τ∗is given by

τ∗

α=α1

αµ µ−ln αµ

α2

=1−ln(αµ)

α2.

This expression is positive if the optimal threshold is suﬃciently low or when αµ < e. If the optimal

threshold is high, or αµ >e, the eﬀect of ability is negative.

B.3

C Additional Analysis (Online)

C.1 Monte Carlo Simulations

First, I consider the validity of the estimation procedure using a simulation exercise. Simulations

are conducted for the ﬁve diﬀerent parameter vectors listed in Table C.1. Parameter values are

drawn from a uniform distribution on [1,10]. For each parameter vector, the data (100 observations)

on outcomes and response times is simulated using the theoretical model 1000 times. The resulting

distributions of the parameter estimates are presented on Figure C.1. The vertical lines indicate

the true values of the parameters. As expected, the distributions of parameter estimates are well-

centered around the true values.

Table C.1: True Parameter Values and Mean Estimates

α µ σ

Panel A. True Values

1 3.39 4.35 6.16

2 2.66 7.32 6.16

3 2.51 8.27 4.46

4 6.27 1.08 3.64

5 2.80 7.17 9.25

Panel B. Mean Estimates

1 3.65 4.87 6.31

2 2.88 8.53 6.34

3 2.67 9.68 4.60

4 6.49 1.12 3.69

5 3.20 8.40 9.52

Notes: Panel A shows the ﬁve diﬀerent true parameter

vectors drawn from a uniform distribution. Panel B

shows the corresponding mean estimates of parameters

from the simulated data.

Second, I consider the consequences of using performance as a proxy for ability. I assume that

the true data generating process for some outcome of interest yis

yi=β0+β1αi+β2µi+i,(C.12)

where iindexes a subject, αiis ability of a subject i,µiis motivation of a subject i, and iis an

error term. I further assume that a researcher estimates a model in which only performance piis

C.4

Figure C.1: Monte Carlo Simulations

alpha

mu

sigma

1

2

3

4

5

0.0 2.5 5.0 7.5 10.0 12.5 0 10 20 30 40 50 4 8 12 16

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

Estimate

Relative Frequency

Note: The ﬁgure shows the histograms of the distributions of parameter estimates (ˆα, ˆµ, ˆσ) for ﬁve diﬀerent

values of the true vector of parameters. The vertical lines correspond to the true value of parameters.

C.5

observed, but not ability and motivation:

yi=γ0+γ1pi+ηi.(C.13)

I then study how the true values of β1and β2aﬀect the estimates of γ1. In the simulation, the

values of αiare drawn from a truncated normal distribution with the mean 7 and the standard

deviation 1 and the lower bound of 1. The values of the logarithm of µiare drawn from a truncated

normal distribution with the mean 1, the standard deviation 1, bounded between 0 and 3. These

distributional assumption are made to roughly match the observed distributions of ability and

motivation in the experiment. Performance as a function of ability and motivation is then computed

using the model as pi= 1−(αiµi)−1. The noise term iis drawn from a normal distribution with the

mean 0 and the standard deviation 3. The generated data consists of 1000 observations. Table C.2

shows the true values of β1and β2and the corresponding estimates of γand its standard error. The

table makes it clear that issues arise whenever sgn(β1β2)6= 1: the sign of the estimated coeﬃcient

on performance does not coincide with the sign of the ability coeﬃcient in the true model, which

would lead to wrong conclusions about the eﬀect of ability on the outcome.

Table C.2: True Parameter Values and Estimates

β1β2ˆγ1ˆγ1se

0 1 72.52 3.30

0−1−71.72 3.43

1 1 78.81 3.29

1−1−65.43 3.69

1 0 6.69 2.79

−1 1 66.23 3.55

−1−1−78.01 3.39

−1 0 −5.89 2.76

Notes: The table reports the coeﬃcients on ability

(β1) and motivation (β2) in the true model, and the

corresponding estimates of the coeﬃcient on perfor-

mance (ˆγ1) and its standard error from the estimated

model.

C.6

C.2 Summary Statistics of the DST

Figure C.2 (Panel A) shows the distribution of the raw scores from the DST. The subjects perform

very well on the DST with 75% of the subjects scoring 87 and above. Figure C.2 (Panel B) shows

the distribution of the mean response times, averaged across all rounds for each subject. The

distribution is tightly concentrated around the median of 20.6 seconds and has a relatively fat right

tail. The actual distribution (solid line) matches closely the reference inverse Gaussian distribution

(dotted line) with the parameters matching the sample moments. In fact, one cannot reject the null

hypothesis that the sample of mean response times comes from the inverse Gaussian distribution

(Kolmogorov-Smirnov test p−value = 0.393).

Figure C.2: Distributions of Scores and Mean Response Times on DST

0.00

0.02

0.04

0.06

58 87 92 96 100

Score

Density

Panel A.

0.000

0.025

0.050

0.075

0.100

12.7 17.620.6 24.5 47.1

Mean Response Time (sec)

Density

Panel B.

Note: Panel A shows the distribution of the scores on the DST. Panel B shows the distribution of the mean

response times on the DST. The smooth solid line is the kernel density estimate, the vertical bars are the

histogram, and the vertical dashed line is the sample median. The breaks on the horizontal axis correspond

to the quintiles of the distribution. On Panel B, the dotted line is the reference density of an inverse Gaussian

distribution with the parameters matching the sample moments.

C.7

C.3 Additional Estimates

Figure C.3 shows the quantile probability plot adopted from Ratcliﬀ and McKoon (2007).23 The

data are pooled across all subjects. The graph shows the proportion of correct and incorrect

response (horizontal axis) against the quantiles of the distribution of response times (vertical axis).

The quantiles of response times are 0.1,0.3,0.5,0.7, and 0.9. The circles represent the predicted

values from the estimated model, and the crosses represent the actual values. As is clear from the

picture, the model does a good job at jointly predicting outcomes and response times in the pooled

data.

Figure C.3: Quantile Probability Graph for Pooled Data

Error Responses Correct Responses

17.6

19.7

21.3

23.2

25.8

0.0 0.2 0.4 0.6 0.8 1.0

Response Proportion

RT quantile (sec)

Actual

Predicted

Note: The ﬁgure shows the quantile probability plot from the pooled data (averaged across all subjects).

Points on the right (left) correspond to success (failure) rates. Circles represent the predicted values from

the estimated model, crosses represent the observed data.

A useful alternative way of looking at the ability diﬀerences across subjects is to convert the

ability estimates into performance. To translate the estimates of ability into performance, I compute

the probability of success at the average accumulated eﬀective eﬀort in tmseconds, where tm(≈

5.71 seconds) is calibrated such that it is the time for a person with median ability to reach a 0.5

23Since only one treatment was used and there was no variation in diﬃculty, it is not possible to draw the complete

lines as in Ratcliﬀ and McKoon (2007).

C.8

Figure C.4: Distributions of Raw and Transformed Ability

0

1

2

3

4

0.20 0.43 0.50 0.58 0.75

Performance in tmseconds

Density

Panel B.

Note: The ﬁgure shows the distribution of the probability of success in tmseconds. The smooth solid line is

the kernel density estimate, the vertical bars are the histogram, the dotted line is the reference density of a

normal distribution with the parameters matching the sample moments, and the vertical dashed line is the

sample median. The breaks on the horizontal axis correspond to the quintiles of the distribution.

probability of success.24 Due to variation in ability, subjects will have diﬀerent levels of accumulated

eﬀective eﬀort in tmseconds, which will then translate into diﬀerent probabilities of success. Figure

C.4 shows the distribution of the resulting performance. This distribution is more symmetric than

the distribution of raw ability estimates. In fact, one cannot reject the null hypothesis of the

distribution of performance in tmseconds being normal (Shapiro-Wilk test p-value = 0.574). A

subject at the 75th percentile would have a 1.4 times higher performance in tmseconds than a

subject at the 25th percentile. A subject with the highest ability would have a 1.5 times higher

performance than a median subject and a 3.8 times higher performance than a subject with the

lowest ability.

On Figure C.5, I address the point about whether a success on a trial of the DST is signiﬁcantly

correlated with the response time in that trial. I estimate a logistic regression of the outcome of

a trial on the response time, for each subject individually. I then present the estimated regression

coeﬃcients on the response time graphically by ordering them from lowest to highest. The graph

shows the point estimates and the 95% conﬁdence intervals around them. As can be seen from

the graph, in the overwhelming majority of the cases, the null hypothesis about no signiﬁcant

24 The resulting performance is counterfactual in the sense that it is generated using the model and ability estimates

from a hypothetical scenario. This counterfactual performance is of course diﬀerent from the observed performance

on the test, which is a conventional measure of ability. The beneﬁt of this transformation is that it converts ability

into familiar performance terms. Both the counterfactual performance and raw ability estimates can be viewed as

the same quantity (ability) expressed in diﬀerent units.

C.9

eﬀect of the response time on the success cannot be rejected. The eﬀect is signiﬁcant only for 13

subject, which represents 7% of the sample, and even among these subjects, there is no systematic

relationship between response times and outcomes.

Figure C.5: Response Times and Success

0.00

0.25

0.50

0.75

1.00

-1.0 -0.5 0.0 0.5 1.0 1.5

Regression Coeﬃcient

CDF

Not Signiﬁcant

Signiﬁcant

Note: The graph shows the regression coeﬃcients from individual-level logistic regressions of outcomes on

response times. Each point on the graph represents an individual-level estimate, and the points are ordered

from lowest to highest. The error bars show 95% conﬁdence intervals. Signiﬁcance is determined based on

a 0.05 cutoﬀ for the p-value.

C.10