HyperAI

The Accuracy Rate Reached 97%. The Australian Team's New Achievement Is Based on Deep Learning to Identify Gender by Skull CT, Surpassing Human Forensic Doctors

特色图像

In recent years, with the success of several suspense and crime-themed TV series, a mysterious subject has also been brought to the audience - forensic medicine. Simply put, forensic medicine is like "Sherlock Holmes" hiding in the dark to seek the truth. With professional knowledge and advanced technology, it can identify bones and traces by interpreting silent testimonies in remains and physical evidence, and provide directions for solving countless difficult cases. It can be said to be a solid foundation for ensuring judicial justice, and its importance is self-evident.

Among the many research areas of forensic medicine,Gender identification of the remains is an extremely important step.When faced with a pile of bones, previous methods mainly rely on experienced forensic scientists to speculate and evaluate according to published standards. However, these methods are often affected by subjective factors, which inevitably lead to biased results. Nowadays, with the popularity of computer and deep learning technology, how to use science to solve the impact of human bias has become a new topic.

Recently, a team from the University of Western Australia, the University of New South Wales, and Hasanuddin University in Indonesia,An automated framework based on deep learning is proposed to improve the accuracy of gender judgment and reduce the impact of cognitive bias.

The study used 200 skull CT scans from a hospital in Indonesia to train and test three deep learning-based network configurations. The most accurate deep learning framework was able to combine gender and skull features for judgment.The classification accuracy can reach 97%, which is significantly higher than the 82% of human observers.This experiment confirms the potential of deep learning frameworks for deep applications in forensic anthropology.

The relevant research results were published in the academic journal Scientific Reports under the title "Deep learning versus human assessors: forensic sex estimation from three-dimensional computed tomography scans".

Paper address:
https://www.nature.com/articles/s41598-024-81718-y

The open source project "awesome-ai4s" brings together more than 200 AI4S paper interpretations and provides massive data sets and tools:

https://github.com/hyperai/awesome-ai4s

Going a step further to make AI “trustworthy and usable”

In forensic anthropology, skeletal structure hides many differences in sex characteristics, especially in the skull.The most popular morphological skull sexing method in modern forensic practice is the five skull dimorphism features proposed by Phillip L. Walker (hereinafter referred to as Walker features).That is, to observe the differences between male and female skulls in terms of mental eminence (MEN), glabella (GLA), supraorbital margin (SUP), nuchal crest (NUC) and mastoid process (MAS).

For example, the study mentioned that the space between men's brows is usually more prominent and wide, and may have obvious ridges or nodules, while the space between women's brows is smoother and thinner. Men's eye sockets are mostly square or rectangular, with sharp corners and a tougher overall shape; women's eye sockets tend to be more round, with natural and soft edges and no obvious edges.

However, with the development of forensic anthropology activities, this method has also found some situations where it is no longer applicable.on the one hand,The data samples obtained by the analysis of the identification means represented by this method are all from physical records, that is, a large number of physical bones need to be collected to obtain sufficient samples;on the other hand,The samples of this method come from the British, American and Native American people who lived in the 19th and 20th centuries, which also leads to certain limitations in the time and space effectiveness of the research subjects.

The emergence of virtual anthropology has provided a new way out for the practice of forensic anthropology. In terms of data set acquisition, unlike the data collection method used in Walker's research, clinical digital imaging technology such as computed tomography (CT) enables researchers to obtain enough bone data sets. Compared with the collection of physical bones, virtual bone data sets recorded by clinical imaging are undoubtedly easier to establish. In addition, with the widespread use of CT in modern medicine, the data sets obtained by this method are more representative of the contemporary population.

In terms of analysis and processing, deep learning-based technologies have also been applied to forensic anthropology. Researchers use deep learning to process large data sets and build models for assessing skeletal gender to assist forensic anthropologists in biological assessments. For example, GoogleNet developed by Bewes et al. reconstructs 2D lateral images of skull CT scans into 3D and uses this to identify skeletal gender.The discrimination accuracy of 96% for males and 94% for females was achieved.

It is worth noting that although these methods have made great progress, there are still some challenges in the previous deep learning-based bone gender identification methods.Fully automatic and explainable.

First, some studies relied on commercial software to remove surrounding structures and extract the skull by setting the Hounsfield Unit (HU) threshold with an empirical value, which may be affected by issues such as software accessibility, noise, artifacts, unwanted bone structures, and variability in HU values.

Secondly, unlike human observers identifying skull features, deep learning-based networks are often called "black boxes" in which hidden layers are usually difficult to structure, which also limits the interpretability of deep learning-based networks.

Multiple designs create AI frameworks that surpass humans

In this study, the researchers developed a fully automated AI framework for forensic gender identification using skull CT scans and tested the model using the features proposed by Walker.

The AI framework consists of a preprocessing stage and a gender classification network.First, a pre-trained deep learning network is used for skull segmentation, and then different classification network configurations are trained using different input compositions, using multi-task learning to generate Walker feature scores and perform gender identification, or using single-task learning for gender identification. The specific network settings are shown in the figure below:

Deep learning-based network configuration and its related output

* I is the preprocessed CT image;

* (I, S) is a dual-channel input, including preprocessed CT image and skull mask;

* I∩S indicates individual cranial regions;

* N1 and N2 use the combined loss function, and N3 uses the binary cross entropy loss function.

The three deep learning network variants N1, N2, and N3 are built on ResNet, which consists of an Input Block and three Residual Blocks, including 3D convolution (Conv3D), batch normalization (Batch Norm), and rectified linear unit (ReLU) activation layers. The Input Block consists of 32 filters, and the Residual Blocks have 64, 128, and 256 filters respectively. The kernel size of Conv3D is 3 x 3 x 3. As shown in the figure below:

ResNet Backbone
3 network architecture variants built on the ResNet backbone: N1, N2, N3

All networks were implemented in Torch 2.0 using Python v3.9 and trained on an NVIDIA Tesla P100 GPU with 16GB RAM.

The dataset used in this study comes from Dr Wahidin Sudirohusodo General Hospital (RSWS), Indonesia, and mainly includes multi-slice CT (MSCT) scans of some patients who underwent radiological examinations in the hospital from January 2020 to August 2022.There are 200 photos in total, including 87 females and 113 males.166 images in the dataset are used for training and 34 images are used for testing.

Specifically, the multi-task configuration of N2 (estimates Walker's skull dimorphic feature score and gender in different branches) achieves the highest AUROC and accuracy under different inputs, and is the most balanced model for gender identification. When the skull region is used as input,N2 achieved the highest accuracy of 0.97 and the lowest log loss of 0.30.

The multi-task configuration of N1 (estimate Walker's skull dimorphic feature score in sequence, then estimate gender) uses skull regions as input.Its accuracy is 0.91.However, the AUROC under different inputs is lower than that of N2 and N3, and the logarithmic loss is higher.

The AUROC of the single-task network N3 (directly estimating gender) under different inputs is similar to that of N2, but when using the skull as input,Its accuracy is only 0.85.The lowest among all networks. The specific results are shown in the figure below:

Performance metrics of 3 network models and human observers

It is noteworthy that when compared with the performance of human observers, all three deep learning-based network models achieved higher accuracy in gender classification than human observers. Specifically,N2 achieved the highest gender identification accuracy of 97%, while human observers only achieved 82%.

In order to improve the interpretability of the network's decision-making process, the research team used Gradient-weighted class Activation Mapping (Gradient-CAM) to visualize the discriminative skull regions identified by the network. Grad-CAM is a method for explaining the decisions of convolutional neural networks. The key idea is to multiply the gradient of the output category with the output of the layer, and then take the average to obtain a "rough" heat map.This heatmap can be zoomed in and overlaid onto the original image to show the areas that the model focuses most on when classifying.Its advantage is that it can be used for any convolutional neural network without structural modification or retraining.

The figure below shows the Grad-CAM heat map related to each feature prediction in the Walker feature branch of networks N1 and N2 when using the skull as input, where a, b, c, d, and e are GLA, MAS, MEN, NUC, and SUP respectively. The heat map particularly highlights GLA and NUC.

The figure below shows the output Grad-CAM heatmaps of the three networks when the skull is used as input. It can be observed that in addition to GLA being activated, the area around the skull is also activated, especially the heatmap of N3. Given that CT images are preprocessed to a uniform physical size,This may indicate that the model is analyzing the morphology of the entire skull,Perhaps it's its size and shape, as skull size and shape are key features reflecting human sexual dimorphism, with male skulls generally larger and heavier than female skulls.

In summary, this experiment demonstrated the effectiveness of a fully automated AI framework based on deep learning in improving the accuracy of skeletal sex identification, confirming its significantly broader forensic applicability compared to the developed basic methods. At the same time, the framework also has the potential to surpass human observers, tapping its potential to assist forensic anthropology in becoming more intelligent and automated.

In addition, Grad-CAM also demonstrated the interpretability of deep learning-based network models in identifying gender through skulls. These integrations bring more standardized and objective evaluations to forensic anthropology, reducing the impact of cognitive bias and variability.

AI opens a new chapter in forensic anthropology

In fact, there are many studies on using AI to empower gender identification in forensic anthropology. Coincidentally, related papers included in Scientific Reports reveal many groundbreaking methods.

For example, a study titled "Sex estimation using skull silhouette images from postmortem computed tomography by deep learning" used deep learning to obtain two-dimensional silhouette images through CT scans, enhanced the contour shape of the skull, and then observed the silhouette images at different angles and performed majority voting to determine gender.


Paper address:
https://www.nature.com/articles/s41598-024-74703-y

A deep learning-based craniofacial reconstruction method developed by the School of Computer Science and the West China School of Basic Medical and Forensic Medicine of Sichuan University has successfully restored craniofacial images from CT scanned skull data. It is mentioned that the research team has overcome the technical difficulties of craniofacial restoration and developed the first craniofacial reconstruction face retrieval system. The system generates a series of restored faces of different ages and genders but consistent identities based on a skull data, eliminating the influence of age and even deformation changes on identity recognition, thereby improving the accuracy of recognition.

The paper, titled "CR-GAN: Automatic craniofacial reconstruction for personal identification", was published in Pattern Recognition, a top journal in the field of pattern recognition.

Paper address:
https://www.sciencedirect.com/science/article/abs/pii/S0031320321005768

Of course, gender identification based on bones does not only rely on the skull. As mentioned above, the bone structure contains a lot of information about the differences between men and women. For example, due to the different physiological functions of the male and female pelvis, the pelvis has very obvious differences in gender identification. Based on these characteristics, related methods of gender identification based on deep learning are also being studied simultaneously.

In summary, the popularity of AI provides an objective and sustainable solution to the problem of gender identification in forensic anthropology. It also allows this mysterious and niche field to move away from the ancient identification methods and gradually embrace intelligence and automation like other fields.

References:
1.https://www.nature.com/articles/s41598-024-81718-y
2.https://www.csiro.au/en/news/All/News/2025/February/CSIRO-develops-AI-tool-for-rapid-identification-in-forensic-investigations
3.https://blog.csdn.net/qq_68308828/article/details/132663304
4.https://mp.weixin.qq.com/s/bpZCZMM5MJRShhZvI2fcsw

Finally, I would like to recommend an academic live broadcast to everyone! At 12:00 noon on March 7, the latest Meet AI4S live broadcast was themed "Her Power in the AI Era: Transformation under Hard-core Technology" and invitedHuang Hong, associate professor at Huazhong University of Science and Technology, Zhou Dongzhan, young researcher at AI for Science Center of Shanghai Artificial Intelligence Laboratory, and Zhou Bingxin, assistant researcher at the Institute of Natural Sciences of Shanghai Jiao Tong University,Introduce personal achievements and share scientific research experience.