|Каталог изданий 107Рубрики 53Авторы 8884Новости 1776Ключевые слова 5095||Правила публикацииВебинары RSS|
Expert facial comparison evidence: Science versus pseudo science 1067
Establishing that the person in the dock is the person who committed the crime is an essential component of many criminal trials. While an eyewitness who has identified the defendant as the perpetrator can have a persuasive effect on a jury, research suggests that the high value attached to this form of evidence is not always justified (Scheck, Neufeld & Dwyer, 2000). Indeed, figures from the Innocence Project, an organisation that works to overturn miscarriages of justice using DNA evidence, suggest that faulty eyewitness identification evidence is the single greatest cause of wrongful conviction (The Innocence Project, 2015). In January 2015, post-conviction exonerations stood at 325, with 234 (72%) attributable to eyewitness misidentification. Research in this area appears to support this estimate. Huff, Rattner, & Sagarin (1986) suggest that 60% of cases of wrongful imprisonment involve eyewitness misidentification, whereas Wells, Small, Pernod, Malpass, Fulero, & Brimacombe (1998) put this figure as high as 90% (for further examples of eyewitness identification fallibility see: Cutler & Penrod, 1995; Lindsay & Pozzulo, 1999; Narby, Cutler & Penrod, 1996; Pezdek, 2012; Scheck, Neufeld & Dwyer, 2000; Wells & Bradfield, 1999; Westcott & Brace, 2002; Wright & Davies, 1999). Taken together this compelling body of evidence suggests that eyewitness identification is highly unreliable. However, much progress has been made in recent years towards understanding the variables that contribute to the high error rates in this area.
Wells (1978) makes an important distinction between system and estimator variables. System variables are those that fall within the control of the legal system, such as the production of unbiased line-ups and instructions for carrying out such procedures. Estimator variables are those that fall outside the control of the legal system including, viewing conditions at the time of the crime and the quality of witness’s memory. By their nature the influence of estimator can never be truly known. While great progress has been made in quantifying and controlling system variables (e.g., Wells et al., 1998; Wells, Malpass, Lindsay, Fisher, Turtle & Fulero, 2000; Wells, Memon, & Penrod, 2006) the problem of quantifying estimator variable persists. However, the recent proliferation of closed-circuit television (CCTV) cameras (suggested to be between 4 million and 5.9 million in the UK, Security News Desk, 2015), offers a potential solution to the estimation problem. With a permanent record of a culprit’s image recorded on CCTV, problems relating to viewing conditions at the time of the crime and those of witness memory failure eliminated. Thus reducing identification to the apparently much simpler task of matching the culprit’s face in the CCTV image with that of the suspect. However, it turns out that even this simple matching task is much more difficult than might be expected, at least with previously unfamiliar faces.
Face matching performance is excellent with faces that are familiar to us (such as family members, close friends, celebrities etc.) even with very poor quality images (Burton et al., 1999; Liu et al., 2003), but the opposite is true when the faces are unfamiliar. In this situation matching performance is very poor, even when the image quality is excellent (Bruce, Henderson, Greenwood, Hancock, Burton, & Miller, 1999; Megreya & Burton 2006, Kemp, Towell & Pike, 1997). Figure 1 shows arrays reproduced from Bruce et al. (1999).
Figure 1. From: Bruce, V., Henderson, Z., Greenwood, K., Hancock, P., Burton, A.M. & Miller, P. (1999). Verification of face identities from images captured on video. Journal of Experimental Psychology: Applied, 5, 339-360.
Here experimental witnesses were shown arrays of this type and told that the face at the top might or might not be present in the array below. They were asked to decide whether or not the target face was present in the array and, if so, to pick the correct person. Performance on this task was surprisingly low. When the target was present, participants picked the correct person on about 70% of occasions. When the target was absent, participants mistakenly chose someone on roughly 30% of occasions. Given that the task was constructed to optimize performance, this poor level of performance is rather surprising. Firstly, no memory component was involved as both the target face and array were always simultaneously available. Secondly, all images were taken in good lighting, from very similar full-face poses, and all were taken on the same day, eliminating transient differences such as changes in hairstyle. It should be noted that poor performance here might be related to task difficulty (matching one face against an array of ten). However, when the task is reduced simple verification task between pairs of images (judging if both images belong to the same person) errors rates remain high, at around 20% (Burton, White & McNeill, 2010; Megreya & Burton, 2006, 2007). These findings are further supported by Davis & Valentine (2009), who demonstrate that matching the identity of a live person, to a person in a video, is also highly susceptible to error. Taken together this evidence suggests that, even in optimal conditions, the apparently simple task of matching two high quality face images is much more difficult than might be expected.
One way to potentially improve matching accuracy is for a facial mapping practitioner to make a comparison between an image of the culprit taken from CCTV and an image of the suspect, and to provide the court with an opinion based on this comparison (Attorney General’s Reference, 2003). The decision to allow photo-comparison evidence was ratified by The Court of Appeal in the courts of England and Wales (R v Stockwell, 1993) and its admissibility has also been confirmed within the Scottish legal system (Church v HMA, 1995). Guidance produced by the Association of Chief Police Officers (ACPO) also endorses the use of a range of these techniques and offers recommendations for good practice (ACPO, 2003; 2009). While the techniques identified vary widely they can be broadly classified in two categories: photographic-superimposition and photo-anthropometry (Oxlee, 2007; Stavrianos, Zouloumis, Papadopoulos, Emmanouil, Petalotis & Tsakmalis, 2012).
The first of these broad techniques, photographic-superimposition, involves overlaying one face with another using still or moving images superimposition (Vanezis & Brierley, 1996). The case brought against the Metropolitan Police in relation to the shooting of John Charles de Menezes (Reuters, 2007) provides an example of the use of the still image version of this technique (see Figure 2).
Figure 2. The photo-composite of Jean Charles de Menezes and Hussein Osman's (left half) shown at the Stockwell Health and Safety prosecution at the Central Criminal Court in London October 2, 2007. Source: Metropolitan Police.
Strathie, McNeill & White (2012) provide an evaluation of this technique in three experiments that compare performance across the following conditions: two full-faces, fully aligned half-face composites, misaligned half-face composites and half-face composites with a gap. They find that performance is best when two full-faces are presented, as compared to all composite presentations. However, most worryingly, the fully aligned composites, commonly used by facial mapping practitioners, appear to increase the likelihood that images of two different people will be judged to portray the same person.
Video superimposition is similar to the still-image technique described above except that one image is superimposed on top of the other and the display gradually wipes vertically, horizontally or diagonally between the two (Vanezis & Brierley, 1996; Oxlee, 2007). An example of this type of presentation is shown in Figure 3.
Figure 3. Example of frames from a mismatched video wipe transition. From: Strathie, A. (2010). Person Identification in a Legal Setting (Unpublished doctoral thesis). Glasgow Caledonian University, Glasgow.
The video-superimposition technique was evaluated in three experiments by Strathie (2010), in which video-wipes were compared with full-faces in normal, disguised and degraded presentations. In general, the results showed the same pattern of responding to that found in Strathie et al. (2012), with performance best for full-faces (in all presentations), coupled with an increased likelihood to judge that two different faces were the same in normal and disguised presentations (this bias was not present for different faces when degraded, but importantly the technique produced no advantage in matching accuracy).
A theoretical explanation of these findings offered by Strathie et al. (2012) appeals to a broad consensus that faces are generally processed at a holistic rather than feature level (Hole, 1994; Rossion, 2009; Tanaka & Farah, 1993; Young et al., 1987). They argue that that the fully aligned composite images (whether still or moving) evoke holistic processing, thus creating a perception of a whole face, whereas the misaligned half-faces or half-faces with a gap evoke a feature based comparison. While further experimental work may be required to consolidate this argument, it is clear from an applied perspective that these studies demonstrate serious shortcomings of the photo-superimposition technique, using both still and moving images.
The second broad technique, photo-anthropometry, involves a comparison of angular and distance relationships between key features and can be used to compare an image of suspect with that of an offender taken from CCTV (Iscan & Loth, 2000). The technique involves deriving a numerical signature for each face by computing distance and angular measurements between facial landmarks such as the corners of the eyes and centre of the mouth (See Figure 4 for example).
Figure 4. From: Kleinberg, K.F., Vanezis, P., & Burton, A.M. (2007). Failure of anthropometry as a facial identification technique using high-quality photographs. Journal of Forensic Sciences, 52(4), 779-783.
These metrics and indices based on them are then compared across different images and used to determine if two images belong to the same or different individuals (Iscan & Loth, 2000). However, current experimental evidence that has examined this technique has found it to be unreliable (Kleinberg, Vannezis & Burton, 2007; Davis, Valentine & Davis 2010; Moreton & Morely, 2011). The main problem with this technique is that variability between same-face image pairs, created by changes in lighting, pose, expression, angle and focal length of camera can be greater than the variability between different-face pairs (Porter & Doran, 2000). As a result it is entirely possible that two images of different faces might produce more similar angular and distance relationships than two of images of the same face
The evidence reviewed so far offers no support for the use photo-superimposition nor photo-anthropometry. In the following experiment we explore another photo-anthropometric technique that utilises the placement of gridlines between two face images as described by Oxlee (2007). In line with previous findings (Strathie, 2010; Strathie et al., 2012) we predict that identity match judgments will be more accurate for simple image pairs (without gridlines) as compared to when the gridline technique is used, and that there will be an interaction such that gridline technique will increase the likelihood that different faces pair will be judged to be the same.
A 2 x 2 within participant design was used. The first independent variable was the Type of Presentation, with two levels: faces presented with the gridlines or without the gridlines. The second independent variable was the Type of Face Pair with two levels: faces of the same person or faces of different people. The dependent variable was accuracy of the matching decision.
Eighty-seven volunteers (24 males and 63 females) were recruited for the study. Their age ranged between 18 and 40 years. Participants were students at Glasgow Caledonian University and all had normal or corrected-to-normal vision.
A set of 24 face-pair images, containing 12 same and 12 different face-pairs, were selected from the Glasgow Face Matching Test (Burton, White and McNeill, 2010). Half of the images were of males and half females. One face in each image pair was degraded (using the same standard blur operation in Adobe Photoshop) in order to simulate the quality of image that might be ordinarily derived from CCTV. This set was then modified to create a second set of face-pair images using the gridline technique described by Oxlee (2007), resulting in a total set of 48 experimental stimuli. Examples of face images are shown in Figure 5.
Figure 5. Examples of images of same and different face-pairs, with and without gridlines.
Participants viewed the total set of 48 face-pair images on a computer screen. Each face-pair image was presented twice, once with and once without the gridlines, and the order of presentation was randomised between participants. The participants’ task was to decide whether the two faces viewed in each trial were of the same person or of two different people. When the images did not contain gridlines, participants were simply asked to indicate if the faces were the same or different. When making decisions about the images including the gridlines, they were encouraged to use the gridlines in order to facilitate their same or different decision. The presentation was self-paced. Having recorded their decision on a separate response sheet, participants were instructed to press the space bar to move to the next slide.
Face matching accuracy was calculated for each condition and mean values for correct responding (out of a maximum of 12) and standard deviations are presented in Figure 6.
Figure 6. Type of Face Pair by Type of Presentation showing mean number of correct responses in each condition (standard deviations in brackets)
Matching accuracy scores were analysed using a repeated measures ANOVA. For Type of Presentation a significant main effect was found, F(1,86) = 8.51, p< 0.05, indicating that face pairs without gridlines (mean accuracy = 8.5) were matched more accurately than when the gridlines were applied (mean accuracy = 8.1). A significant main effect was also found for the Type of the Face Pair, F (1,86) = 42.10, p< 0.05, indicating that matching was more accurate for same face pairs (mean accuracy = 9) than for different face pairs (mean accuracy = 7.6). Moreover, there was a statistically significant interaction between the Type of Presentation and Type of Face Pair, F(1,86) = 8.11, p< 0.05). Examination of this interaction shows that it is entirely driven by the difference between gridlines and no-gridlines for different face pairs, indicating that use of the gridline technique increases the likelihood that different face pairs will be judged to be the same.
The results show identity match decisions are more accurate when two face images are simply presented side-by-side, as compared to when they are presented using the gridline technique. There was also an advantage for same-face pairs over different-face pairs. More importantly an interaction is found such that the inaccurate responding is greatest when the face pairs are different. The increased probability that images of two different people will be judged as showing the same person is generally consistent with the findings of Strathie (2010) and Strathie et al. (2012).
Several facial-mapping techniques have now been investigated using different experimental methodologies, and none have been found to offer an advantage over a simple visual inspection of two images. Vanezis et al., (2007) found no advantage for photo-anthropometry and more recent research by Davis et al., (2010), using a computer-assisted approach to anthropometry, also identified problems with the technique. Strathie (2010) and Strathie et al. (2012) show there is no advantage for photo-superimposition over simple side-by-side image presentations and indicate that applying this technique increases the likelihood that two different faces will be judged to be the same. The results reported here show a similar pattern of responding for yet another anthropometric technique using a gridlineprocedure as a matching aid.
The findings reported are generally consistent with those theories of face processing that suggest that faces are processed as a whole rather than in a feature based fashion (Hole, 1994; Rossion, 2009; Tanaka & Farah, 1993; Young et al., 1987). In line with this theorising we suggest that, in the current experiment, same-face pairs automatically evoke holistic face processing and application of this mechanism provides optimum performance. When face-pairs are different, automatic holistic processing may not provide an immediate solution and it may be necessary to resort to a feature-based strategy, indicated by an overall drop in performance. Moreover, it appears that the gridline technique, in which the position of the features in one face can be aligned with the position of features in another, further encourages use of the less effective feature-based mechanism, resulting in poorest performance for different face-pairs with gridlines.
It is worth pointing out that our experimental participants had no special training and their matching strategies might have, of course, differed from each other and from those used by facial-mapping practitioners. However, as already noted, the methodologies used by practitioners in this area vary widely and it is clear there are no standardised procedures. Instead each practitioner relies on a loose description of techniques and practices, none of which have been subject to scientific verification or scrutiny (Edmond et al., 2009). In such circumstances it is not unreasonable to suggest that use of the gridline technique by facial mapping practitioners may also evoke the same non-optimal, feature-based, face-matching strategy as described above for ‘non-experts’. While this suggestion is speculative, the data reported here is consistent with theorising in this area (Hole, 1994; Rossion, 2009; Strathie et al., 2012; Tanaka & Farah, 1993; Young et al., 1987) and with previous empirical findings (Kleinberg, et al., 2007; Davis et al., 2010; Moreton & Morely, 2011; Strathie, 2010 and Strathie et al., 2012).
A review of a recent case heard in the Court of Appeal in England (R v Atkins and Atkins, 2009) illustrates that current legal opinion in the UK regarding facial-mapping evidence is at odds with empirical findings in this area. At the original trial in this case, a facial mapping practitioner compared images captured on CCTV with that of the accused, Mr Atkins. Based on this comparison, which included the use of the gridline procedure described above, the practitioner offered an opinion on the probability that the faces belonged to the same person and rated his opinion at between, “it lends support” and “lends strong support”, using a hierarchical scale suggested by Bromby (2003) and recommended by the Forensic Imagery Analysis Group (FIAG) (2006) (See Figure 7).
Figure 7. Scale of comparison as recommended by the FIAG for use by Facial Comparison experts (FIAG, 2006)
The appeal was dismissed on the basis that expert photographic comparison evidence, if properly based on study and experience, may be expressed using such a scale. While this ruling is primarily concerned with the way the expert’s opinion is expressed it also highlights current problems regarding the admissibility of photo-comparison evidence in general. Edmonds et al. (2010) provides a powerful a critique of the judgment in this case, highlighting both the dubious reliability of the techniques used by the facial mapping practitioner, and the subjective nature of the scale used to express opinion evidence. In fact, these authors go as far as to say, “Atkins, in effect, represents a jurisprudential backwater, largely indifferent to the scientific processes...” (Edmonds et al., 2010, p164). However, it should be noted that the majority of the legal and scientific community in the UK does not share this ‘indifference’. The Law Commission (2011) offered several recommendations to the UK Government regarding the admissibility of expert evidence in criminal trials, arguably the most important of which was a requirement for the trial judge to assess the reliability of expert evidence. The UK Government rejected this requirement (together with the vast majority of recommendations in the report) primarily on the grounds of cost (Ministry of Justice, 2013).
The current paper reviews previous finding in this area which demonstrate the unreliability of several facial-mapping techniques (Kleinberg, et al., 2007; Davis et al., 2010; Moreton & Morely, 2011; Strathie, 2010 and Strathie et al., 2012) and offers new evidence regarding the unreliability of photo-anthropometry using gridlines, thus adding weight to the critique of these techniques offered by Edmonds et al., (2009, 2010).
The probative value of expert led photo-comparison evidence depends on the reliability of the techniques used by facial-mapping practitioners. None of the techniques investigated so far have proved to be reliable and in some cases actually increase the probability that images of two different people will be judged to belong to the same person, thus increasing the likelihood of wrongful conviction. On this basis it is recommended that expert opinion evidence based on the use of photo-anthropometry and photo-superimposition techniques should not be admitted in court. In the absence of a statutory reliability test, as suggested by The Law Commission (2011), it will be incumbent on judges to be ensure that evidence of dubious reliability is not led.
Статьи по теме