Proposed Model
The model we proposed for speaker recognition includes both speaker identi- fication and Speaker Verification. These are two major applications of speaker recognition technologies and methodologies [3]. If the speaker claims to be of a certain identity and the voice is used to verify this claim, this is called ver- ification or authentication [4]. On the other hand, identification is the task of determining an unknown speaker’s identity [5]. In one way the speaker check is a 1:1 match where the voice of a speaker matches a certain template while the speaker identification is a 1:N match when the voice is compared with several templates. [6].
In figure 1 & 2, our content is explained in detail. Our system only works if there is an incoming unknown call, otherwise it will not yield any results. Also if the unknown caller’s voice does not exist in the system’s database then no notification will be sent through our system.
If the voice of a spam caller already exists in our database, then two events can take place.
- A notification will be sent to the user about that spam caller if his voice is previously reported more than 5 times. But
- If his voice has less than 5 reports, then a notification will be sent only to those users who complained Otherwise, no notification will be sent by our system. So that no one can harass people unjustly. This precaution is taken to save innocent people from defamation.
User can report an incoming call if he/she feels harassed, threatened, tormented, humiliated, embarrassed or otherwise victimised, either he/she receives a spam notification about the unknown caller or not.
When a user wants to report a call, some hidden process will take place that won’t be disclosed to the user. In figure 1, phase 1 and 4 will be shown on the phone screen to inform the user about the corresponding actions, but phase 3 and 4 will be hidden from the user. System will perform these tasks to extract the specific voice from a noisy environment and multiple speakers.
The processes shown in figure 1 are almost identical to those indicated in figure 2, but there are 3 additional phases called phase 5, phase 6 and phase 7.Here, phase 7 will appear on the user’s screen but phase 5 and 6 are hidden process that sends the user a warning massage.
The following part deals with the working principle of these phases.
Phase 1: Reporting threat call
Immediately after a phone call in which there is a threat of physical harm or violence, the user should report that spam call through our system. In that way, he/she could be freed from this spam caller for the rest of his/her life. Because whenever the user is called by this spam caller from any number, he/she will be notified about this spammer. Also it’ll help others to be aware from this spammer.
Phase 2: Extract vocals from noise
In the field of automatic or semi-automatic speaker recognition, background noise is one of the main causes of degradation in performance in various appli- cations of digital speech processing [7]. So we need to reduce background noise, as it helps to improve intelligibility and quality of a speech signal.
Recently, the REpeating Pattern Extraction Technique (REPET) was proposed to separate the repeating background from the non-repeating foreground [8, 9]. The fundamental concept is to define repeating audio components, compare them to repeating the derived models, and extract the repeating patterns through time-frequency masking [10]. While the original REPET (and its extensions) assumes that repetitions happen periodically [11], REPETSIM, a generalization of the method that uses a similarity matrix was further proposed to handle structures where repetitions can also happen intermittently [12]. The only as- sumption is that the repeating background is dense and low-ranked, while the non-repeating foreground is sparse and varied [10].
Repetitions happens in background noise, such as car horn sounds, construc- tion work, crying babies and industrial machinery. All of them have repeated patterns. Considering this point, we have used this algorithm in our proposed system to extract vocals from background noise.
We got the following result by coding this method. Figure 4 shows voices and noise are combined. The vocal element is caused by the wiggly lines above. Our objective is to distinguish them from the instruments we use.
Phase 2: Extract vocals from noise
In the field of automatic or semi-automatic speaker recognition, background noise is one of the main causes of degradation in performance in various appli- cations of digital speech processing [7]. So we need to reduce background noise, as it helps to improve intelligibility and quality of a speech signal.
Recently, the REpeating Pattern Extraction Technique (REPET) was proposed to separate the repeating background from the non-repeating foreground [8, 9]. The fundamental concept is to define repeating audio components, compare them to repeating the derived models, and extract the repeating patterns through time-frequency masking [10]. While the original REPET (and its extensions) assumes that repetitions happen periodically [11], REPETSIM, a generalization of the method that uses a similarity matrix was further proposed to handle structures where repetitions can also happen intermittently [12]. The only as- sumption is that the repeating background is dense and low-ranked, while the non-repeating foreground is sparse and varied [10].
Repetitions happens in background noise, such as car horn sounds, construc- tion work, crying babies and industrial machinery. All of them have repeated patterns. Considering this point, we have used this algorithm in our proposed system to extract vocals from background noise.
We got the following result by coding this method. Figure 4 shows voices and noise are combined. The vocal element is caused by the wiggly lines above. Our objective is to distinguish them from the instruments we use vocals and background noise are separated in two slices.
Phase 3: Separation of spam caller from multiple speaker
The next phase is about to separate the voice of the spam caller from multi- speaker signals by making use of a reference signal from the target speaker. This process is presented in [13]. One way to deal with this issue is to first apply a speech separation system on the noisy audio in order to separate the voices from different speakers. Therefore, if the noisy signal contains N speakers, this approach would yield N outputs with a potential additional output for the noise [14].
This approach can be easily extended to more than one speaker of interest by repeating the process in turns, for the reference recording of each target speaker [13].
Phase 4: Saving voice in database
In phase 2 and 3, target voice has already been detected. In this phase, system will save this specific voice in database for further actions. Whenever this person will call the user from any number, the system will match his voice with the saving one by some complex method and send a notification to the user, that this man can be harmful or dangerous for him/her.
Phase 5: Voice recognition in database
The identification of a person through speech samples with a forensic quality is challenging.In this phase caller’s voice will be checked whether it matches our database. For this purpose we have used a method for forensic speaker recog- nition that has been proposed in [15]. Here each speakers voice is recorded in both clean and noisy environments, through a microphone and a mobile channel though it has shown low equal error rates (EER) with very short test samples. This diversity facilitates its usage in forensic experimentation. The Gaussian mixture model-universal background model is used for speaker modeling and Mel-Frequency Cepstral Coefficients are used to extract features [16].
Phase 6: Creating Spam reports
Whenever a user reports a spam voice in database, system will save that vocal and create a profile for that corresponding spammer, in which the number of spam reports will be stored. If a voice has already been reported by a user that means this voice has an individual profile with it’s corresponding spam reports. Hence, if anyone again reports this spam voice, no profile will be created but the number of reports against this person will be increased by our system.
Phase 7: Sending spam notification
Notification will be send when the spam callers voice already exists in the database. In this scenario, after receiving the unknown number, a notification will be appear on the notification bar within a few minutes. It will contain the number of spam reports that have been reported against this specific voice ear- lier by other victims. That will notify the user to be aware from that spam caller. From this notification the user can get an idea about that person and can take necessary steps to be safe or more observant. After the call has ended, if he/she will also willing to report against that person, he/she just needs to pick the specific voice from the sorted list as it was explained earlier.