Safety of clients’ money and data (e.g. transactions) is at the heart of banking culture and reputation. As one of the instruments to safeguard clients, banks use polygraph screenings (PS). These are performed when hiring candidates to prevent the hiring of untrustworthy people. To detect an infringement early, employees with sensitive roles are screened regularly. The PS topics include drug abuse, gambling addiction, insider trading, disclosure of confidential information, bribery, corruption, and misappropriation and fraud (sample screening questions are in Suppl. Table 2). The finance industry is not alone in applying PS; other examples being critical sectors such as aviation, manufacturing companies, and federal law enforcement agencies throughout the world1,2.
The classical polygraph is a device that records cardiovascular activity (such as heart rate), thoracic and abdominal respirations, galvanic skin response (a.k.a. electrodermal activity, or EDA), and tremor. An examiner asks questions of, and accepts «yes» or «no» answers from, the person being screened (examinee). There are many good overviews of classical polygraph and questioning methods3,4,5.
Unorthodox lie detection studies analyze video and audio6 (including facial expressions7,8, pupil reaction9, and delays between question and answer10), electromyography (EMG)11, electroencephalogram (EEG)12, magnetic resonance tomography (MRT)13,14, or writing pattern (keystroke dynamics)15 in addition to or instead of classical polygraph data. Some of these studies even get a chance to pilot in the new fields, such as the iBorderCtrl lie detector pilot in EU airports16,17 or the VeriPol deception detection pilot by Spanish police on written insurance claims18,19. Yet, in the traditional fields, we are unaware of any cases where classical polygraphs are substituted with unorthodox systems. Classical polygraph remains the instrument of choice in the traditional areas, such as hiring screening, and criminal and internal investigations.
Polygraph has a long history of drawing criticism from psychology20,21 and law scientists22, as well as from the public and state1,23. A major concern is that this method does not detect lie and truth reliably. And yet, “paradoxically, although Congress expressed deep concerns about the efficacy of the technology, the EPPA permits the use of lie detectors in circumstances in which the accuracy of the results is of paramount importance: national defense, security, and legitimate ongoing investigations”22.
Critical related work provides many arguments for why polygraph screening may fail at detecting a lie or mark a truthful answer as a lie. For example: “Polygraph tests do not assess deceptiveness, but rather are situations designed to elicit and assess fear”24. A truthful junior manager may fear being called a corruptor more than a coldblooded, corrupted senior manager fears being caught lying by a polygraph examiner. Another example of constructive critique is a grounded call for standardization of polygraph screening procedures and examiner education25. Of all concerns, in this paper we tackle only one: the need for quality assessment (QA) of examiner work. Examiner errors happen, for example, when an examiner is inexperienced, exhausted or distracted, or biased26.
A simple QA solution exists: always have another examiner review the screening and confirm or disprove the conclusions of the original examiner27. To QA a polygraph examiner report, another examiner needs to review the recording of the screening, including the polygram (a graphical representation of recorded sensor data coupled with the examiner’s questions and the examinee’s answers), sometimes audio and video recording, and to compare his conclusions with the original report. In our experience, QA takes at least half the time it took to perform the screening. An average screening takes at least two hours. Thus, QAs are costly in terms of both time and money. For this reason, and to the best of our knowledge, industrial internal security departments QA screenings infrequently or not at all. We also note that having other examiners to QA all screenings is not a bulletproof solution. Some examiners mistakes come not from the examiner’s bias or fatigue, but from the fact that the case is hard. In hard cases, the second examiner may make just the same mistake the original examiner did.
Here we report devising and testing in the field an ML tool to QA the examiner reports for PS performed on classical polygraph. A small number of reports, marked suspicious by this tool, will be handed to another examiner for QA. Such a tool would allow for semi-automatic double-checking of all new and historical reports, without hiring additional examiners. An additional advantage of such a tool is that if examiners are sure all their work will be QA-ed, they will make decisions more carefully.
Our results neither justify nor solidify the practice of classical polygraph screenings. Rather, we consider our results as a temporary and partial patch that helps to eliminate a specific type of error of this method, until better methods are devised and put into practice. More broadly, we believe we make a step towards rethinking classical polygraph practices.
Below we describe the steps, from a basic model to a validation of the final model, that succeeded at exposing real examiner errors in historical field screenings.