James is a molecular biologist who is currently completing a MS in forensic science from the University of New Haven.
If you're like me and have watched criminal investigation shows obsessively growing up, then you're probably just as fascinated as I am that we're capable of connecting someone to a crime they've committed just by collecting their DNA from a speck of blood they've left behind. But have you ever wondered just how forensic scientists can tell people apart using their DNA? If you're thinking about becoming a forensic DNA analyst or are simply curious about how it all works, keep reading to find out!
The Scientific Principle
For those who need a refresher from high school biology, DNA is the genetic code inside all of our cells that contains instructions for which proteins each cell needs to manufacture. The letters that make up this code are A, T, C, and G, and the order in which these letters appear determine what proteins are made, how many, and how quickly. DNA is stored in bundles called chromosomes, and we each inherit 23 chromosomes from our mom as well as 23 chromosomes from our dad. For this reason, we have 2 copies of each DNA sequence.
The types of sequences forensic scientists look at to tell different people apart are called microsatellites, sequences that contain a certain number of short repeating sequences. This is why microsatellites are also called Short Tandem Repeats (STRs).
Using the image above as a reference, we can see that this microsatellite has repeating units of G and A. The first version (or allele) of this microsatellite has 8 repeating units of GA, the second allele has 7 units, and the third has 6 units. And remember, we all have 2 copies of this microsatellite, one from mom and one from dad, which means that the chance of 2 people having the exact same alleles (i.e. number of repeating units) is pretty slim. This is exactly what allows forensic scientists to determine whether someone's DNA matches the DNA found at a crime scene.
Applying What We Learned
Let's use what we learned in a mock case example. Let's say that a masked assailant went into Bill's house and attacked him with a knife. Bill manages to fight off the assailant, who runs away leaving the knife behind. The police arrive and submit the knife to forensics, who successfully extract the assailant's DNA from the knife handle. It was found that at this microsatellite, the assailant had one allele with 8 repeating units of GA and another with 7 units. Bill suspects that the assailant was his co-worker David, who was recently fired due to a complaint that Bill had filed against him. So, the police collect a DNA sample from David to compare to the DNA from the knife handle.
To everyone's surprise, it turns out that David's DNA has one allele with 8 repeating units of GA and another with 6 units! Although it's plain as day that David hates Bill's guts, it's not a match and we have conclusively proven that the DNA from the knife handle did not come from David.
Bill then identifies his neighbor Todd as a potential suspect, as Bill had accidentally scratched his beloved Porsche the other day. The police collect Todd's DNA and BAM!, just like the DNA from the knife handle, Todd's DNA had one allele with 8 repeating units and another with 7 units at this microsatellite. So, we've proven that Todd was the assailant and he's going to jail, right?
Well, not exactly. Considering that a large city could have as many as 1 million residents, it's not hard to imagine that we could find several thousand individuals who have the same alleles with the same number of repeat units at the same microsatellite. For this reason, we can only say that David "may" have been the assailant, which is not enough to convict. So how do we find out for sure?
Modern STR Kits
We compare their DNA at multiple microsatellites. As common sense might dictate, the more microsatellites we have to compare, the less likely it is for 2 individuals to share the same alleles at every single one of those microsatellites. In fact, as of January 2017, the national DNA criminal database maintained by the FBI (known as CODIS) requires an offender's alleles from 20 different microsatellite locations (loci) to be uploaded. Depending on the prevalence of each allele in a given population, the power of discrimination achieved by STR profiling is anywhere from 1014 to 1023, whereas the total population on earth is only around 8 billion (approximately 1010). In other words, the chance that 2 people share the same STR profile is very very low.
Nowadays, STR kits are developed, manufactured, and sold commercially by large biotech companies such as Thermo Fisher and Promega. The most commonly used kits are the PowerPlex Fusion kit from Promega and the GlobalFiler kit from Thermo Fisher, both of which can detect 24 loci in a single reaction. These standardized kits make it faster and simpler for forensic DNA analysts to obtain STR profiles, which is a huge help considering that DNA labs test hundreds of evidence samples daily.
The above image shows a part of what a real life STR profile looks like. On this diagram (called an electropherogram), the microsatellites have been separated by their size (i.e. the total number of A, T, C, and G that make up the DNA sequence). The coded strings of letters and numbers above are the names of the microsatellite locations being observed. The thin peaks below these names are the alleles of the 2 copies of that microsatellite and the number below each peak is the number of repeating units at that copy. For example, at the D5S818 locus, this individual has one microsatellite copy with 12 repeating units and another copy with 14 repeating units. At the D16S539 locus, they have one copy with 10 repeating units and another copy with 12 repeating units.
Is Todd the masked assailant?
So, now that we have a good understanding of how STR profiling works, let's go back and decide whether Todd was the masked assailant.
From the above electropherogram, we can see at locus A the microsatellite with the 7-unit and 8-unit repeats we previously observed that Todd and the masked assailant had in common. Looking further, we can see that they continue to share the same alleles at loci B, D, and E. However, upon closer inspection, we can see that the masked assailant has 10 and 14-unit repeats at locus C while Todd has 10 and 15-unit repeats. Furthermore, at locus F, the masked assailant has 7 and 14-unit repeats while Todd as 10 and 14-unit repeats.
So close, but alas, it was not Todd's DNA that was on the knife handle. It looks like we'll have to go back to the drawing board and either find new suspects or enter the masked assailant's profile into CODIS to see if we can get a hit. A little disappointing but such is the daily life of a DNA analyst.
In this article, we learned that forensic DNA analysts tell people apart by comparing the number of repeating units in each copy of the microsatellites found at multiple loci in the DNA. If even 1 allele at 1 locus is different, then we can automatically conclude that we're looking at two separate individuals. However, even if all the alleles match, we can't say for certain that it's the same person, which is why we have to compare multiple loci. The more loci we have to compare, the lower the likelihood that two people will be found to have the same alleles at every single locus. I hope you've enjoyed this peek into the world of forensic DNA and gained a bit of insight into how DNA profiling works!