Attention is a complex system involving multiple interactive components that jointly regulate information processing in the brain. It has been hypothesized that the computational goal of attention is to optimally integrate information under task demands, and evidence has been provided in relatively simple learning and decision making tasks. It remains unclear, however, whether this hypothesis can explain attention distribution in more complex real-world tasks that engage multiple attention systems. Here, taking advantage of the development of attention mechanisms in deep neural network (DNN) models, we investigate whether human attention during real-world reading comprehension tasks can be explained as a consequence of task optimization. In a goal-directed reading task, participants read a passage to answer a question. Eye tracking results show that the attention on each word, quantified by the fixation time, is modulated by both the top-down reading goal and lower-level visual layout and textual features. When trained to perform the same goal-directed reading task, DNN models yield human-level performance and naturally evolve human-like attention distribution, with deep layers tuned to the reading goal and shallow layers tuned to textual features. Further experiments suggest that different training tasks separately contribute to goal-directed and text-based attention. In summary, the results strongly suggest that human attention can be interpreted as a consequence of task optimization during real-world reading tasks.