Mixture Deconvolution Method for Identifying DNA Profiles

In forensic genetics, DNA profiling is essential for identifying individuals involved in criminal investigations. However, forensic samples frequently contain mixed DNA from multiple contributors, thereby complicating analysis and interpretation. Accurate deconvolution of these mixtures is critical to isolate individual genetic profiles, enabling the generation of investigative leads and matches against genetic databases. The increasing use of Investigative Genetic Genealogy (IGG) in solving cases has intensified the demand for advanced mixture analysis techniques capable of handling complex samples without relying on reference profiles. Existing approaches to DNA mixture analysis face significant challenges. Traditional methods typically require reference DNA profiles, limiting their effectiveness when such samples are unavailable or when dealing with low-template or degraded DNA. Additionally, these techniques often struggle to accurately determine the number of contributors and their respective genetic contributions, leading to ambiguous or unreliable results. The dependence on manual interpretation increases the potential for human error and reduces reproducibility. Furthermore, integrating mixed DNA profiles with genetic genealogy databases is hindered by insufficient computational tools and algorithms, impeding the efficient processing and deconvolution of complex mixtures and the generation of actionable investigative leads.
Technology Description
This advanced DNA mixture analysis system is designed to effectively separate and identify two-person DNA mixtures without needing reference DNA profiles. The system employs a comprehensive deconvolution pipeline that integrates mathematical procedures and machine learning algorithms to process DNA sequencing data. It incorporates three primary components: Contributor Analysis, which determines the number of contributors and their respective contributions; Sex Determination, utilizing a random forest model to predict the sexes of the contributors based on sex-specific genetic markers; and SNP Profile Deconvolution, a sophisticated model that generates probability scores for possible genotype combinations. The system provides detailed outputs, including predicted genotypes with probabilities, estimated contributor percentages, sex determinations, and data formatted for Investigative Genetic Genealogy (IGG) database searches. Implemented in R and deployable via Docker containers, it ensures consistent performance across various forensic settings. What sets this technology apart is its ability to deconvolve mixed DNA samples without relying on existing reference profiles, a significant advancement in forensic investigations. By seamlessly integrating with IGG databases, it enables the generation of investigative leads from unresolved cases through genealogical relationships. The use of optimized probability thresholds and robust confidence filtering ensures high accuracy and reliability of results. Additionally, the implementation of machine learning models, specifically random forests trained on simulated mixtures, allows for precise analysis of complex DNA data. This innovative approach not only enhances the efficiency of genetic profiling in forensic contexts but also opens new avenues for solving criminal cases through detailed genetic genealogy research.
Benefits
- Accurately separates two-person DNA mixtures without requiring reference profiles
- Enables generation of investigative leads through access to IGG/FGG databases
- Enhances forensic investigations by analyzing unresolved mixed DNA samples
- Utilizes machine learning algorithms for improved deconvolution accuracy and performance
- Provides comprehensive outputs, including genotype predictions, contributor percentages, and sex determinations
- Ensures consistent and reproducible performance through Docker container deployment
- Bridges mixed DNA samples with genealogical database searches, aiding in solving criminal cases
- Maintains high reliability with optimized probability thresholds and low-confidence filtering