Thanks to the short storage cost and fast retrieval efficiency of hashing methods, deep hashing models are now widely used in cross-modal retrieval. However, the images are often accompanied by correspondingtextual descriptions rather than labels about the vision. So unsupervised cross-modal hash retrieval has received continued widespreadattention. In this paper, we propose CLIP-based Cycle AlignmentHashing for unsupervised vision-text retrieval (CCAH), which aimsto exploit the semantic link between the original features of modalities and the reconstructed features. Firstly we design a modal cyclicinteraction method that aligns semantically within a modality, whereone modal feature reconstructs another modal feature, thus taking full account of the semantic similarity between intra-modal andinter-modal relationships. Secondly, introducing GAT into cross-modalretrieval tasks. Consider the influence of text neighbour nodes and addattention mechanisms to capture the global features of text modalities.Thirdly, Fine-grained extraction of image features using the CLIP visual coder. Finally, hash encoding is learning through hash functions.The experiments demonstrate on three widely used datasets that ourproposed CCAH achieves satisfactory results in total retrieval accuracy.