Background: Single individual haplotype (SIH) problem refers to reconstructing haplotypes of an individual based on several input fragments sequenced from a specified chromosome. Solving this problem is an important task in computational biology and has many applications in the pharmaceutical industry, clinical decision-making and genetic diseases. It is known that solving the problem is NP-hard. Although several methods have been proposed to solve the problem, but it is found that most of them have low performances in dealing with noisy input fragments. Therefore, proposing a method which be accurate and scalable, is a challenging task.
Results: In this paper, we introduced a method, named NCMHap, which utilizes the Neutrosophic c-means (NCM) clustering algorithm. The NCM algorithm can effectively detect the noise and outliers in the input data. In addition, it can reduce their effects in the clustering process. The proposed method has been evaluated by several benchmark datasets. Comparing with existing methods indicates that NCMHap is significantly superior in the most cases, particularly when the amount of noise increases, it outperforms the comparing methods.
Conclusion: The experimental results recommend the application of the proposed method on the datasets which involve the fragments with huge amount of gaps and noise.