Sign language is an effective communication tool to convey information to each other, that is a bridge to reduce the communication gap between deaf and dumb people. The word level sign language recognition is a challenging task due to the wide range of body gestures, unidentified signals and hand configuration. To overcome this issue, a novel Inverted Residual Network Convolutional Vision Transformer based Mutation Boosted Tuna Swarm Optimization (IRNCViT-MBTSO) algorithm is proposed for recognizing double-handed sign language. The proposed dataset is designed to identify different dynamic words and the predicted images are preprocessed to enhance the generalization ability of the model and improve image quality. The local features are extracted after performing feature graining and the global features are captured from the preprocessed images by implementing the ViT transformer model. These extracted features are concatenated to generate a feature map and are classified into different dynamic words using the Inverted Residual Feed Forward Network (IRFFN). The TSO algorithm tunes the parameters of the IRNCViT model that is responsible for handling high-dimensional problems and convergence issues. The Mutation operator is introduced in this optimization phase to escape local optimum issues while updating the position of tuna. The performance valuation of this proposed model is performed in terms of recognition accuracy, convergence and visual output of the dataset that showed the highest performance than other state-of-the-art methods.