Calculating the Time to Most Recent Common Ancestor (TMRCA)

 

DNA is composed of a string of nucleotides – molecules of cytosine, adenine, guanine, and thymine.  Adenine bonds to thymine (A-T) and cytosine binds to guanine (C-G).  The DNA strand consists of a long line of these nucleotides forming a “double helix”.  The strand is “read” by listing the sequence along one side of the pairs such as ATAGCGAT etc..  The test done for family Y-DNA studies looks at only a small section of these long strands located on the Y chromosome – one our 23 chromosomes, but the one unique to males and passed from father to son.

 

Scientists have been able to identify and label sites along a DNA strand.  Each site is designated a “marker” which marks the starting point of a sequence of nucleotides.  They have found patterns of nucleotides that repeat, called “short tandem repeats” or STR that can be read starting at that marker.  The repeating pattern is usually a short section of 3 or 4 nucleotides, but they can repeat the pattern 3 to 100 times.  For example, the marker called DYS393 is the first listed in our Y-DNA results table.  In our family it shows a repeat of 13 times of the sequence AGAT.  We all have this value of 13 at DYS393.  DYS393 is just a label to denote where it is found along the DNA strand (DNA Y-Chromosome Segment # 393).

 

The sequence of nucleotides along a DNA strand changes over time.  There are various theories why this happens; perhaps a cosmic ray hits a cell, or the person is infected with a virus that damages the cell.  There is also a random chance of a change with each new generation – recall that the Y-DNA “unzips” to make ½ a cell when sperm are formed, and “zips” back with a new X half when a new baby boy is formed (X-Y).  It was found that over time, the rate that the DNA strand changes is fairly predictable, and that rate of change can be used like a clock to estimate the time to a previous ancestor.

 

Knowledge in this area has become refined over the last 10 years.  Initially, estimates showed a rate of change to be about 1 in each 500 chances – a chance being each time a new son is formed.  For example, if you looked at 12 markers, then you would expect one of them to change to a different count in about 40 generations (12 x 40 = 480 chances).  While that may help with population genetics looking over tens of thousands of years, that time period is too long for family genealogy studies.

 

One way to increase precision is to test additional markers – with 111 markers tested, the 1 in 500 chance of a difference would appear in about 5 generations.  But, it was also discovered that some markers change faster than others.  And, it has been noted that in some family lines, some markers tend to change more quickly than in other families.  So, it is also important to pick which markers to test.  FamilyTreeDNA has selected markers for their 12, 25, 37, 67, and 111 marker kits that provide the best opportunity to see differences and similarities give the mutational rate of those markers.

 

Knowing the individual marker rate of change allowed for the development of the TiP program (published in 2004) which can show probabilities of time to the most recent common ancestor when comparing two samples.  In our family, paper documentation shows that 1767 Francis Wright and 1730 John Wright both were in Lancaster County Virginia in the very early 1700s.  Y-DNA testing shows that descendants of these lines share a very similar pattern, so they are related.  I wanted to estimate the probable time period when they shared a common ancestor.  I used the TiP program to compare kits 18984 ( representing the modal or most usual values at each marker of those who descend from 1767 Francis Wright) to kit 22990 (representing 1730 John Wright).  The program does not give the answer in terms of a specific person, or even a single generation, but rather gives a range of probabilities that may support and guide research of a family line.

 

There are several assumptions that go into the prediction of the time to most recent common ancestor: 1) the mutational model used, 2) the number of markers tested, 3) which markers are tested, 4) how certain you want the prediction, and 5) how many years there are in a generation.  FamilyTreeDNA gives TiP calculation results in generations.  They automatically use an “infinite allele” model – an assumption about how mutations occur.  They have included the mutation rate for each marker, and they use a 50% probability for degree of certainty of the prediction.

 

To use the TiP program, first run “Matches” in the Y-DNA tab.  Then click on the TiP button next to a person who matches.  The results are a table that looks like this:

 

 

The table shows a cumulative percentage probability (in the right column) that the common ancestor was within the generation shown (in the left column).  You can adjust the results to see the percentage cumulative probability for every generation, and you can adjust it to show the results when one knows that there was NOT a common ancestor within a defined number of generations.

 

The graph below was generated by using the results table from the TiP program set to show every generation.  The graph shows the impact of using a different number of markers – in our case, comparing our two kits, there is a one-step difference in DYS389ii, which is marker number 12 on the results table.  The graph shows “cumulative probability” of the time to the common ancestor.  Since there was only one difference in the 12, 25, and 37 marker tests comparing these two samples, this gives a match of 11/12, 24/25, and 36/37 markers.  You can see that with 37 markers, it reaches about 100% cumulative probability that the common ancestor was within the last 17 generations.  Using 25 markers, this drops to about 92% at 17 generations, and with 12 markers it drops to about 50%.

 

I wanted to look at the probability that our common ancestor was between 9 and 11 generations ago.  To find this, you add the probabilities between those two points – see the table below.  This gives a probability of 3.99% that the common ancestor was between 9 and 11 generations ago; not so good.

 


 

9 Generations Ago

10 Generations Ago

11 Generations Ago

Interval 9 to 10

Interval 10 to  11

Sum
9 to 11

Cumulative Probability

CA not considered

92.34%

94.68%

96.33%

2.34%

1.65%

3.99%

Cumulative Probability

CA > 8 Generations Ago

50.63%

65.71%

76.33%

15.08%

10.62

25.70%

Cumulative Probability

CA > 9 Generations Ago

30.04%

51.41%

66.46%

21.37%

15.05

36.42%

 

But, we know through documentary evidence that there was no common ancestor to these lines in the past 8 or 9 generations.  In the case of the 1767 Francis line, there are 8 generations from the participant to the father of 1767 Francis and 9 generations from the participant to the father of 1772 John Wright (Goochland Carpenter).  This increases the probability to 46.9% to 66.5% that the common ancestor was between 9 and 11 generations ago.  This is a relatively high probability.

 

Cumulative probability is somewhat difficult to conceptualize – the graph below is a plot of the probability differences by generation showing the impact of adjusting for knowing the common ancestor was not within the last 8 or 9 generations.  The unadjusted curve has a peak probability 2 generations ago (due to the closeness of the match 36/37) with a long tail downward to the right.  When adjusted for no common ancestor in 8 or 9 generations, it forces the curve to the right.  This increases the probability that the common ancestor will be within those next few generations.

 

To calculate the actual timeframe of the common ancestor, you have to multiply the number of generations by the average years in a generation.   Many population genetics studies use 25 years in a generation.  Maybe that was true thousands of years ago, but it tends to be longer more recently.  There is no clearly defined method to calculate the years in a generation, but when I look at the average birth year of the sons compared to the birth year of their father in all the documented lines in our family, I find an average intergenerational time of about 31 years.  So, 10 generations ago would be 310 years ago or about 1702 (9 generations would be 279 years and 11 generations would be 341 years or between 1672 and 1733).  That would be the estimated birth year of the common ancestor in our lines given the above assumptions and probabilities.  That fits well with the estimated birth range for sons of 1690 Francis Wright.

 

Another program that I have found very useful is the Y-Utility written by Dean McGee.  This program has the advantage that you can manipulate the variables such as the mutational model, the intergenerational time, and the mutation rates of the markers.  It is also useful in that any number of samples can be compared at once, whereas the TiP program only has a comparison between 2 samples.  Here is an example, using our family line matches, using the Y-Utility.  It gives the years based on a 50% probability, a 30 year generation time, FamilyTreeDNA mutation rates, and the infinite allele mutational model.  Here you can see the TMRCA for any pair of samples and to a theoretical modal for all samples – the range turns out to be 30 to 300 years with an average of 164 years.  If you average the averages for each sample comparison, that equals 252 years.  If you want to be 95% certain instead of 50%, then the average to modal common ancestor is 464 years ago.

 

 

 

 

 

Jeffrey A. Wright

6 September 2012