In this work, we present a convolutional neural network (CNN) named CGFA-CNN for blind image quality assessment (BIQA). A unique 2-stage strategy is utilized which fifirstly identififies the distortion type in an image using Sub-network I and then quantififies this distortion using Sub-network II. And difffferent from most deep neural networks, we extract hierarchical features as descriptors to enhance the image representation and design a feature aggregation layer in an end-to-end training manner applying Fisher encoding to visual vocabularies modeled by Gaussian mixture models (GMMs). Considering the authentic distortions and synthetic distortions, the hierarchical feature contains the characteristics of a CNN trained on the self-built dataset and a CNN trained on ImageNet. We evaluate our algorithm on the four publicly available databases, and results demonstrate that our CGFA-CNN has superior performance over other methods both on synthetic and authentic databases.