Counting Chromosomes
A blog of random musings on genealogy, genetics, science, and history

There is only one thing genealogically special about the X chromosome in relation to males: the inheritance pattern. The important distinction is that, in males, the X is "naturally phased."

Human X chromosome

Q: I've been told that if two men share a match on the X chromosome, that special conditions apply. That because they're both male and because they can only have gotten their X chromosome from their mothers, that the match means more than matches on other chromosomes do. I've seen matching that indicate a very small amount of shared DNA on the X chromosome, for two males only, can be used as confirmation of the relationship and no other DNA information is needed. That just doesn't sound right to me, and I wanted to get your thoughts.

A: Thanks for writing, and I've seen the same contention about the X chromosome in different places. I'm uncertain how that myth first started, but your impression—and discomfort with the claims—is well-founded.

Prove and Confirm

First, though, let's talk a moment about that word "confirmation." We do see it used in reference to genetic genealogy and, in my opinion, much too liberally so. The verb "confirm" has different meanings based on the circumstances or environment to which it's being applied, but let's look at what Google presents us at the top of its search results for the word:

General Definition of Confirm

Establish the truth or correctness of something previously believed or suspected; state with assurance that a report or fact is true.

In her indispensable book, Evidence Explained: Citing History Sources from Artifacts to Cyberspace, my genealogy totem spirit and guide, Elizabeth Shown Mills, defines "confirm" this way:

confirm: to test the accuracy of an assertion or conclusion by (a) consulting at least one other source that is both independently created and authoritative; and (b) finding agreement or compatibility between them.
     —Elizabeth Shown Mills

Note the conjunction and in Elizabeth's definition: for genealogy, both conditions a and b must be met, not either/or. Note also the phrase "independently created and authoritative."

Evidence Analysis Process

A digression—tangential though pertinent—but while I'm mentioning Evidence Explained I'd be remiss if I didn't call attention to the inside cover of the book, where Elizabeth gives us a simple and intuitive diagram of her Evidence Analysis Process Map. The image below (click to enlarge) is from her website, "QuickLesson 17: The Evidence Analysis Process Map". It isn't lengthy; I suggest it be in everyone's bookmarks list.




Evidence Analysis Process Map

At the risk of paraphrasing Elizabeth incorrectly, the Evidence Analysis Process lays out for us that a "source" does not equal "information," and "information" does not equal "evidence." It means that there is a logical progression of refinement—and critical evaluation—that must occur for us to arrive at what we can consider to be evidence in our search for genealogical conclusivity.

Last year I wrote a short piece that was met with more controversy than I'd expected. In part it was a call for consideration of a formal certification in genetic genealogy (this was the part that caused the controversy), and in part it was a poorly explained case for distinctions that can be, perhaps should be, made between genealogical evidence and DNA evidence. In the end, for genealogists, all evidence falls under the guidance of the Genealogical Proof Standard. It's the intervening critical analyses between "information" and "evidence," and "evidence" and a written conclusion statement that differs.

The term "scientific method" has no real fixed meaning, not in the form of a detailed procedure at any rate, and probably merits future posts here to delve into its various formulations and applications...in particular the work of Karl Popper and the concept of falsifiability. But very like Elizabeth's Evidence Analysis Process Map, there is a generally agreed upon progression of criteria for scientific evidence. Most often displayed as a pyramid, the image below, courtesy of TheLogicOfScience.com, illustrates this nicely:

Scientific Evidence Pyramid from TheLogicOfScience.com

The Lexicography of Science and Genealogy

One difficulty we have when classic genealogy and the scientific method meet is that lexicons differ. In science, "theory" is likely to carry a meaning that may be very different than we genealogists use when we say, for example, "Based on these 5th cousins both being male and sharing 5 cM on the X chromosome, I have a theory about their common ancestor." In science, what we casually call theories are not; they are hypotheses, at best. But where the genealogy/science lexicography really breaks down is that, in genealogy, the very name "Genealogical Proof Standard" implies proof is possible. In science, a final proof is never actually possible. This 2017 article by Ethan Siegel for Forbes, "Scientific Proof is a Myth," will make interesting reading.

To scientists, proof isn't the Holy Grail. It isn't even in the equation, so to speak (the term "proof" takes on a different meaning in pure mathematics). In fact, if you read the actual BCG definition of the Genealogical Proof Standard's five steps you will, interestingly enough, not find the word "proof."

To revisit the most common definition of "confirm":

Establish the truth or correctness of something previously believed or suspected; state with assurance that a report or fact is true.

To scientists, the word "confirm" doesn't grate quite as much as "proof," but it still grates. We simply cannot "state with assurance that a fact is true."

DNA is first and foremost in the realm of science. For genealogy, in the case of monozygotic twins or a biological parent, "confirmation" and DNA can get away with being in the same sentence. For all other relationships, DNA can provide evidence all the way along a spectrum from very powerful to weak and inconsequential. Inconsequential evidence which I all too often see used in genealogy.

But DNA data are evidence, and only evidence. DNA should not be viewed as genealogical proof or confirmation.

And certainly not at face value with nothing but a small X chromosome match.

The X Chromosome

Rounding that large digressional loop—keeping the framework in mind—and returning to the original question, what we note first is that none of the DNA testing companies perform matching based only on the X chromosome. In fact, 23andMe is the sole company to even include the xDNA amount when reporting the total DNA shared with a match, and they do not consider an X match if autosomal matching does not first meet their minimum criterion. All the other companies report matching information only about the 22 autosomes. This should be a first clue that there is something particularly variable about the X, and that rather than believing we can hold those matches to a looser standard they should, in fact, be more skeptically scrutinized.

The X chromosome, while undergoing recombination at frequencies not radically different than an autosome of the same size, still seems experientially to be more volatile when it comes to using it to estimate relationships. For some examples, genetic genealogist Jim Owston has written about it, as have Jared Smith and Roberta Estes. Not only can xDNA segments be "sticky" and date back generations before a hypothesized MRCA, but in relationships as close as grandparent/grandchild it's possible to see the X inherited 100%, in whole...or no segments inherited at all. By itself, it's an unreliable predictor of relationship degree, and a small shared xDNA segment—even one that is triangulated—may be meaningless as accurate evidence of the hypothesized MRCA unless combined with additional autosomal evidence.

Naturally Phased

There is only one thing special about the X chromosome in relation to males: the inheritance pattern. The important distinction for genealogy is that, in males, the X is "naturally phased," a term that Debbie Kennett reminded me of recently. ISOGG succinctly defines phasing as "the task or process of assigning alleles...to the paternal and maternal chromosomes." Every male's X chromosome comes from only one place: his mother. Ergo, it is naturally phased.

However, there have been continued misconceptions that this condition also infers upon any xDNA sharing between male cousins some additional super powers: that even tiny segments can be relationship predictive, and that triangulation is unnecessary...where, conversely, between two females or between a male and a female test-taker, larger segments are required as evidence. There is no scientific justification or existing experimental evidence for this male-matching favoritism.

Recombination and Inheritance

At oogenesis, the X chromosome undergoes crossover—recombination—just like any autosome. Blaine Bettinger began the "X-DNA Inheritance Project" two years ago. I don't believe it got anywhere near the traction that the Shared cM Project did, but Blaine's interest was in compiling information about xDNA recombination. To do so, he used GEDmatch to analyze grandparent/grandchild relationships only: a necessary step back from the parents, but not so far that the crossover events couldn't accurately be analyzed. After 150 unique data submissions to the project, he charted the per-meiosis event X chromosome crossovers against those of Chromosome 7 (Chr 7 was chosen because it is relatively equal in size to the X: 159 million bp vs. 153 million):

Chart from the X-DNA Inheritance Project

What he found was that—regardless of whether the test taker was male or female—the inherited X went through recombination in a frequency not disproportionate to an equivalently-sized autosome. The X chromosome had a slightly higher chance, 3.3%, of seeing no crossover during a given meiosis event, and a lower probability of a greater number of crossovers per event. The greatest probability was two crossover events per birth. Admittedly the sample size is not large, but I'm aware of no peer reviewed study that has yet examined this.

With the exception of the X chromosome's two pseudoautosomal regions, which aren't evaluated for genetic genealogy anyway, the X goes through crossover in the mother, not the father. If we step back to the level of the great-grandparents—2nd cousins—the male test-taker's X chromosome will have gone through crossover a possible four times before he inherits it. At the same relationship level, two female test-taker's X will have gone through crossover a possible six times. One generation further—2g-grandparents and 3rd cousins—the per-meiosis possibilities for recombination are seven for the male and eleven for the female.

Magic Numbers

At any given generation, the number of ancestors who might contribute an X chromosome follows the Fibonacci sequence—each number is the sum of the two preceding ones, e.g.: 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144—with a male starting at 1 (his X can only come from his mother) and a female starting at 2 (she receives one X chromosome each from her mother and her father).

Since recombination of the X occurs only in the mother, the progression of possible recombination instances differs between males and females. For males it's 1, 2, 4, 7, 12, 20, 33, etc. For females, 1, 3, 6, 11, 19, 32, 53, etc. It isn't immediately as elegant as the Fibonacci sequence, but interestingly enough the progression does approach a mathematical relationship that's strongly linked to the Fibonacci sequence, one referred to as the Golden Ratio. It is an irrational number that, if extrapolated, would continue infinitely beginning with 1.6180339....

A Fun with Math moment aside, the point is that, while every male receives his X chromosome, intact, from his mother, before it reached her it went through multiple opportunities for recombination over the course of generations. Too, Blaine Bettinger's data indicated that the X chromosome might average crossover 1.71 times per meiosis event.

Between two male 4th cousins, then, there might be as few as six or as many as 10 possibilities for X chromosome recombination, with an average of 1.71 crossovers per event, or 17.1 segment crossovers. Blaine's data indicates that three crossovers occurs 17.3% of the time, which could mean up to 30 segment crossovers between our two male 4th cousins. We have no peer-reviewed studies to help us frame or constrain the information, but it seems clear that the potential for a significant number of crossover instances exists between two male cousins, even those somewhat closely related.

The X: No Special Super Powers

Other than the naturally-phased condition of the X chromosome in males, absolutely no special conclusions can be drawn regarding evidence of relationship, size of relevant chromosomal segment, or removal of requirement from triangulation when two males are compared vs. a male and female or two females. I personally would never use xDNA in a vacuum as evidence of any genealogical relationship. It's invaluable to use in conjunction with autosomal data in order to isolate an inheritance pattern, but a nominally-sized X chromosome match doesn't mean a great deal in and of itself.

Speaking of the pseudoautosomal regions, or PAR1 and PAR2, these are small, homologous areas at the start and end of the X chromosome and Y chromosome. PAR1 is at the ends of the short arm of the chromosomes and is, in GRCh38, about 2.78 million base pairs long. PAR2 is much smaller at only about 330,000 base pairs at the ends of the long arms, fractionally longer on the X than the Y. Their primary purpose is to serve as the mechanism by which an X and a Y can be, in males, separated during meiosis and paired during zygosis. Crossover does occur between the X and the Y chromosomes, but is restricted to the PAR.

In neither of the sex chromosomes are the pseudoautosomal regions meaningful to us for genealogical purposes and, for that reason, though included in some microarray genotyping tests, you will seldom see SNP data reported from the PAR. Even when using utilities like the Human Genome Map Interpolator at Rutgers University to estimate centiMorgan equivalence from distance between base pair loci, the PAR are ignored; you will see a total possible value for the X chromosome reported as 196 cM regardless of whether or not the 2.8 Mbp of PAR1 is included.

 
In summary, if you see genealogically distant cousins whose relationship is purported to be confirmed by DNA with only an X-chromosomal match as evidence, I would urge a large measure of skepticism. If the xDNA segment is of reasonable size, that a common ancestor exists is probable. Without additional evidence from other matching autosomes, however, a small segment match on the X is no more predictive of a specific relationship than would be a small match on one of the autosomes.

See also: