To recap: About 9:00 p.m. EST on June 4, MyHeritage announced that a data breach of their systems had been discovered that affects 92.3 million accounts, users who had registered at MyHeritage up to and including the date of the breach, 26 October 2017.
About 9:00 p.m. EST on June 4, MyHeritage announced that a data breach of their systems had been discovered that affects 92.3 million accounts, users who had registered at MyHeritage up to and including the date of the breach, 26 October 2017. Approximately eight hours earlier, an independent security researcher notified the company that he had discovered a file "on a private server outside of MyHeritage" that contained the email addresses and so-called "hashed" passwords of these accounts.
Staggering to comprehend, but the company has stated that "other websites and services owned and operated by MyHeritage, such as Geni.com and Legacy Family Tree, have not been affected by the incident." Further information shows that they have added about 4 million new accounts since the breach on 26 October, making the current registered base over 96 million. There is as yet no information about the source of the leaked data.
As of this writing, MyHeritage has issued the original announcement at blog.myheritage.com/2018/06/myheritage-statement-about-a-cybersecurity-incident/, and an update today, June 6, at blog.myheritage.com/2018/06/cybersecurity-incident-june-5-6-update/.
The company has reassured us that they have no reason to believe passwords were compromised because it "does not store user passwords, but rather a one-way hash of each password, in which the hash key differs for each customer." More on this in a moment. The company added:
We have no reason to believe that any other MyHeritage systems were compromised. As an example, credit card information is not stored on MyHeritage to begin with, but only on trusted third-party billing providers (e.g. BlueSnap, PayPal) utilized by MyHeritage. Other types of sensitive data such as family trees and DNA data are stored by MyHeritage on segregated systems, separate from those that store the email addresses, and they include added layers of security.
During the melee, the news hits that the infamous Golden State Killer was identified by use of the free autosomal DNA comparison tools at GEDmatch, and within days on the heals of that development came the announcement by Parabon NanoLabs (a company best known for its Parabon Snapshot DNA Phenotyping Service that creates composite facial based upon DNA profiles) on May 8 of its new forensic genealogy service, to be headed by well-known genetic genealogist CeCe Moore. A predictable but unfortunate repercussion was a reaction from some GEDmatch users to either hide or completely delete test kits that they manage.
We'll never see any actual numbers, but I have a feeling that this year's Father's Day DNA sales won't be as successful as in years past.
To be clear, none of the DNA testing, comparing, or matching services make your autosomal results available online. The only way to get that raw data is to go through specific steps to download a digitally compressed file for your own use elsewhere. Until we eventually achieve affordable technology that allows direct-to-consumer full genome sequencing of our 3.2 billion DNA base pairs, what the current tests for genealogy offer is a look at only about 0.023% of your genome and, since these tests use specific micro-array chips that are designed for purposes of population genetics, the vast majority of the markers tested are not in locations that include protein-encoding genes. The tests need to measure variances, and our protein-encoding genes can't vary that much without endangering the survival of the organism; that's one big reason that all humans have over 99.9% of their DNA in common.
For genealogy, we test autosomal SNPs, Single Nucleotide Polymorphisms. The forensic DNA testing used by law enforcement—and which TV shows like CSI always get wrong—doesn't test for SNPs at all; that testing uses STRs (Short Tandem Repeat) to identify individuals. For example, the FBI's National DNA Index System (NDIS), implemented in 1998 and part of the Combined DNA Index System (CODIS), began looking at only 13 specific alleles, or loci, to measure the number of times a copy of the allele values repeated. Beginning in 2017, the original 13 core loci were retained but seven additional markers were added for a total of 20. As of last April, there were over 13.3 million offender profiles in NDIS.
We do use STR testing when examining the Y-chromosome for genealogy. If you look at any surname project at Family Tree DNA you can see results of the project members grouped and displayed; most marker names start with "DYS" followed by a number. How can these be displayed openly without endangering privacy? It's simple: these markers are not individually identifying. They can group men into a haplotype and predict that they share a common male ancestor, but they can't show how one man is related to another. Useful for genealogical and anthropological research, but not for forensic purposes. The most popular test looks at 37 markers; at that level an exact match predicts a 95% probability that the men share an ancestor within eight generations. Even with a perfect match on 111 markers, we don't reach a 99% probability that two men share an ancestor until we get to their fourth-great-grandparents. That two men both descend from a man born about 200 years ago doesn't do much for a criminal court case.
The net message is that hackers aren't after your DNA data. They can't do anything with it if they got it. Everyone, from the very start of consumer-direct testing in 2000, has gone into it with the understanding that the results might reveal something unexpected: a non-paternal event in the family line, an undocumented adoption, a name change. We will see continued discussion about the ethics of "forensic genealogy" and the likes of the Parabon initiative to use GEDmatch and similar sites with the goal of uncovering leads on cold-case violent crimes. I for one am fine with it. They're using the databases for nothing different than any of us have when assisting an adoptee in search of biological family. My GEDmatch kit number is A493619 and I'm fine with anyone knowing it. Drop me a note if you're one of my undiscovered cousins (please, though, no contact about matching at less than 25cM). Serial killers should be worried about forensic genealogy; not the rest of us.
Noted genetic genealogist, Blaine Bettinger, commented yesterday: "I've always contended that whether today or 25 years from now, I will be able to tell a lot more about a person from their credit history or bank records than their DNA will ever reveal."
Buy DNA tests and circulate your results so that other family researchers can find you. None of the recent events or developments should change your behaviors with regard to DNA testing.
To some, this is going to be an old refrain and one that may seem more than a bit annoying. Like it or not, though, our online presence and digital footprint increases steadily, and in many instances—think mergers, acquisitions, business closures—we have no real control over what happens to the data associated with that presence. Unless we decide to build a log cabin a hundred miles from anything, with no electricity and no telephone, well, we're stuck with the ever-expanding nature of our online worlds. What you really should consider doing, with a little explanation to follow:
Usernames and nicknames are generally publicly viewable, and there's seldom a reason—unless there's a special one to you—to obscure or obfuscate it. The email address and password are the important identifying factors everyplace you have an account.
Statistics say that the majority of us use the same password and email address on many, if not most of the websites where we have accounts. If hackers obtain the matched pair, all they have to do is see where else you might have accounts or, just for grins, try logging onto major financial service providers to see if they can get into your accounts. Even if all they obtain is the email address, and it's the same one you use for multiple accounts, they can try simple brute-force dictionary-type hacking tools to see if they can crack it, moving on to the next site once the maximum number of failed attempts is used up on the first site. All with automated tools so the hacker can go grab a Starbuck's and come back later to see if he was successful.
MyHeritage has told us that there is no reason to believe any passwords were compromised as a result of this breach. However—and understandably so—they haven't revealed precisely what type of techniques beyond basic "hashing" were used to protect password data. Hashing is a common safeguard, and the reason that most websites will tell you that your password cannot be recovered if you lose or forget: you can reset the password, but not learn what it was.
In a nutshell, cryptographic hashing uses a mathematical algorithm to take the character string of a password and transform it into an entirely unrelated string of gibberish. The gibberish is what's stored on the server, not the actual password. When you log-on to the service, an application on the server acts like a secret decoder ring; it translates the password you just entered into the lingua franca gibberish and then verifies that the gibberish matches up to a string of identical gibberish stored in a database. Well and good so far.
But there are hashing algorithms...and then there are hashing algorithms. Black-hat nerds are constantly at work attempting to defeat any and all online security measures. The basic hashing methods developed three decades ago doesn't cut it anymore. Not only have older, simpler hash operations—like the MD-series designed by Ronald Rivest of MIT, or the first iteration of the "Secure Hash Algorithm," SHA-1, developed by the U.S. National Security Agency in 1995—proven to be vulnerable, but hackers have even developed and published open-source applications that can crack them with the push of a button. A 13-year-old kid in his basement can do it with no training or experience, just the click of a mouse.
To combat that, techniques like "salting" have been developed to randomly generate a bunch of clutter-characters to add to a password before it's hashed. MyHeritage stated that they use "...a one-way hash of each password, in which the hash key differs for each customer." This no doubt refers to salt as the method of generating that unique has key. In addition, more robust encryption methods, like bcrypt, have been created to work in tandem with salting to provide a greater defense against brute-force attacks.
All the technology in the world between your keyboard and the webserver isn't going to help if your password is password or 123456. And, statistically, most aren't more complicated than that.
It all starts with a password—a unique password that you use for no other purpose—that is complex enough to be considered strong. The aspects of a strong password are length (the longer the better); a mix of letters (upper and lower case), numbers, and symbols (special keyboard characters); no dictionary words, including words gleaned from a different language; and no association with your personal information.
We all probably have accounts at sites where we had to setup security questions to which only we should know the answer. Not only is that so Y2K, but as genealogists have you ever looked at those available questions shaking your head, because you know it would take you all of five minutes to discover someone's paternal grandfather's first name, or maternal grandmother's maiden name. Don't use those types of security questions, and if your great-grandfather Earl was born in 1882, don't use 18Earl82 as a password. And simply doing letter/numeral swapping looks cute but stopped being effective a couple of decades ago. Avoid gr8geneal0g1st and MyFam1lyH1st0ry.
I've frightened people with this sort of thing before, but here's an example of an expired password (that will never be employed again) I've used here on the Threlkeld One-Name Study webserver:
Simple and easy, right? There may be some remarkable humans who are able to memorize scores of passwords of that ilk, and memorize new ones every couple of months, but I'm not one of them. You probably aren't either.
There are two options, and I use both of them for different purposes. The most basic is to store passwords in a simple document. I have a few different websites I manage, and each has separate and unique credentials at many of the same places (e.g., Google webmaster tools, Norton Safe Web, associated social media accounts), in addition to multiple email addresses for system accounts on the server. My preference is to keep all those related credentials stored in the same place, and separate from the other sets of credentials.
But wait! What about the maxim to never write down your passwords? I maintain discrete Microsoft Word documents and use a robust data encryption technology to prevent prying eyes from using the files. Your needs may differ from mine, but there are two broad categories of these products: one encrypts entire sectors or folders on your hard drive and everything that goes into them, and the others are used for one file at a time. Many of these are inexpensive, but can provide a solid level of protection. You can read about a few choices from this December 2017 review by PC Magazine, and another from last April by TechRadar.pro.
If going standalone, you'll probably want to find a good random password generator. Here's an article from Digital.com that describes 13 of them. Some don't reach the password lengths that I prefer, so I'll just do two generation runs and paste the results together. Another thing I try to avoid, just for clarity, is using characters/numerals that are too similar visually, i.e., 1 I | and 0 o O. If I ever have an issue doing a copy-and-paste, I want to be certain what that character is I'm staring at. In some fonts, these can be downright indistinguishable from one another.
The second category are password management systems. These encrypt, store, and call-up your passwords on demand. I've found them less suitable where I require separate credentials for the same website (again, like Google webmaster tools or Buffer.com), and some store the necessary data on the cloud so that you can access the password management systems from multiple devices, including your phone. The online storage represents a possible security threat, but everything's a trade-off. For websites where I use only one set of credentials I find a password manager to be extremely handy.
There are password management systems that come bundled with larger security applications, like Norton's Identity Safe, and some that are available as separate software titles. Here's a review from last May at ConsumersAdvocate.org, another from May by PC Magazine, and one from February by c|net.
A few paragraphs ago, I said you should use a unique email address at every website where you have a registration. And you thought I was crazy, didn't you? Gmail offers one very good, free email service...but setting up 100 different Gmail accounts?
You guessed it: there's an app for that, too. In fact, a number of them. A quick search for email forwarding services will find you a bunch to choose from.
The one that I use and can recommend is 33mail. Their free tier of service is likely to be all you'll need, but there is a "premium" level for $12 per year that includes extra features and no advertisements, and a "professional" level for $50 per year. It's incredibly easy to use, and gives that important benefit of unique email addresses everywhere you register. Their own explanation of how it works can't be improved upon, so I'll just quote that:
"Sign up and pick a username, for example, 'joesmith'. Now, any email address ending with ...@joesmith.33mail.com will be forwarded to you. The next time you visit a website that asks for your email address, instead of giving them your real email address, just make one up especially for them.
"For example, if the website is tribble.com, you might give them firstname.lastname@example.org. Don't worry, you don't need to do anything else, we'll create an alias automatically the first time they try to send you an email, and we'll forward any emails they send to you. You can even reply anonymously to any mails received through 33mail.
"Later, if tribble.com start to send you emails you don't want, or even if they sell your email address to a spammer, just click on the link that we add to the top of every email we forward, we'll kill their alias, and they won't bother you any more."
It's just that simple. Every email comes into the single email address that you specified at setup (and yes, you can change that later if you want), and each email shows you who sent it, and to which 33mail.com alias it was sent. If you used one of these for your MyHeritage account, it would now be as simple as changing to a new email alias at MyHeritage, then block the old alias, rendering it useless to whomever snagged the MyHeritage data.
Do genealogy safely, my friends!
The popular online service for autosomal DNA matching and comparison, GEDmatch.com, has been facing not just pressure from the impending GDPR regulations, but also an unfortunate media backlash over how law enforcement are using the tools it offers to focus on investigating cold-case violent crimes.
Interest in and concern about the European Union's General Data Protection Regulation (GDPR), taking effect May 25, has rapidly escalated over the past several weeks, even for U.S.-based organizations, commercial or not. Small genealogy organizations and websites are feeling the pressure, so much so that some, notably Ysearch.org, Mitosearch.org, and WorldFamilies.net, are closing permanently.
The popular online service for autosomal DNA matching and comparison, GEDmatch.com, has been facing not just pressure from the impending GDPR regulations, but also an unfortunate media backlash over how law enforcement are using the tools to focus on investigating cold-case violent crimes. The now month-long media blitz began April 27 when Paul Holes, a retired investigator with the Contra Costa County District Attorney's Office, told the Sacramento Bee that he used DNA samples collected from crime scenes to create a genetic profile of the suspected Golden State Killer and used it to search genealogy websites. The genealogy website that proved effective was GEDmatch.
Parabon NanoLabs, a company best known for its Parabon Snapshot DNA Phenotyping Service that creates composite facial based upon DNA profiles, on May 8 unveiled its new forensic genealogy service, to be headed by well-known genetic genealogist CeCe Moore. This is a for-fee service to be sold to law enforcement; the price tag for a single analysis is $3,750. The results of the new service's first case came a scant 10 days later. On May 18 an arrest was announced in the 1987 murder in Washington State of a young Canadian couple. DNA from the case was tested and run against the data in GEDmatch. Close matches were made to two of the suspect's relatives.
One thing we can take away from this is that the GEDmatch utilities—most of which are free—are effective and accurate. But GEDmatch's Curtis Rogers made it very clear that the company was never approached by law enforcement, and that they did nothing knowingly to aid any investigation. In truth, GEDmatch was used in the investigations precisely the same way hundreds of us use it every day for genealogy. The only difference is that, instead of me sending in a sample to AncestryDNA or 23andMe or Family Tree DNA for testing and then uploading the results to GEDmatch, a law enforcement agency used an old sample collected at a crime scene, sequenced it, and created a pseudo-profile on GEDmatch with which to upload the results.
Beyond the anticipated verbiage to satisfy GDPR strictures, there is interesting new language dealing with the raw data files provided to GEDmatch:
Raw DNA Data Provided to GEDmatch
When you upload Raw Data to GEDmatch, you agree that the Raw Data is one of the following:
'Violent crime' is defined as homicide or sexual assault.
Quite explicit now, addressing not only "DNA obtained and authorized by law enforcement," but also the contingency of artificially created DNA information. The current state of our technology looks at approximately 700,000 SNPs, Single Nucleotide Polymorphisms, at each position consisting of one of the four nitrogenous bases, the "letters" of DNA: A, G, C, and T. The raw data we get from the major testing companies is a plain text file which can be edited by someone with the know-how. It's difficult to imagine what purpose that could serve because the SNPs comprise only about 0.023% of the total base pairs in the human genome...insightful for making assumptions about a chromosomal segment potentially shared between second cousins, but not really for any other use. Regardless, GEDmatch has addressed it.
Beginning May 21, your next visit to GEDmatch, after you enter your log-on credentials, will present you with this message:
When you scroll down through the new terms of service, you will have three options to choose from:
GEDmatch has been one of the best new additions to genetic genealogy. The utilities they offer are extremely valuable to researchers, and the majority of them are free. I'll continue to be a Tier 1 subscriber, and sincerely hope that, if anything, these past and future weeks do not damage GEDmatch's viability as a company; that, if anything, it is strengthened by recognition of the validity and effectiveness of its toolset.
Marketing ethnicity/admixture as the primary reason to take an autosomal DNA test is, frankly, a bit disingenuous at best; at worst, it might be interpreted by some as deceptive.
Q: My AncestryDNA test results came back, and they don't make much sense compared to our family history. My mother's father was Italian. His grandfather came to America from Italy, but I'm not showing anything at all that looks like that side of the family in my results. Should I take another test at a different company?
A: Thanks for the question. You're touching on a matter that is of concern to me, one that I believe is the primary downside to the marketing tack that AncestryDNA employed, that all others had to follow or see their market shares get eaten alive, and for which serious genealogists are paying the price. Marketing ethnicity/admixture as the primary reason to take an autosomal DNA test is, frankly, a bit disingenuous at best; at worst, it might be interpreted by some as deceptive. The whole "traded my lederhosen for a kilt" nonsense.
Q: I've been in touch with a gentleman who says that he is related to my deceased father's family line. His family tree shows this—six generations back—but on GEDmatch he doesn't match me, my two siblings, or two known 1st cousins. He told me that DNA will skip a generation, and insists that if I don't match him I'll probably match his son, whose test results, autosomal and Y-chromosome, are pending. I have no male immediate family members to test the Y-chromosome. I know we don't all get the same DNA from our ancestors, but aren't we limited to the DNA of our parents? Nothing new could show up in his son that he doesn't have but that his father, the son's grandfather, did, correct?
A: Assuming the gentleman and his wife aren't related—e.g., their great-grandparents were a case of two brothers marrying two sisters, making them double 2nd cousins and allowing differing autosomal DNA segments to pass down from both lines—you are absolutely correct. Without pedigree collapse in our trees, and relatively recently generationally speaking, there should be no surprises in the son's results. I'm going to digress a moment before addressing the skip-a-generation thing.
On 3 May 2018, seven weeks after this article was first published, Dr. Bryan Sykes announced that Oxford Ancestors would not be closing as had been previously communicated. He posted the following to the company's website:
I have just received some very good news. Our labs at the University, which were threatened with closure for up to a year from July 2018 owing to redevelopment of the Science Area, have now been reprieved. In light of this I am very pleased to announce that Oxford Ancestors will remain open for business as usual.
I wish Oxford Ancestors new life and continued success. EW
Large-scale DNA studies focusing on Ireland are seeming to appear almost back-to-back. Last December in Scientific Reports, Gilbert, O'Reilly, Merrigan, et al. published "The Irish DNA Atlas: Revealing Fine-Scale Population Structure and History within Ireland." From that abstract:
The extent of population structure within Ireland is largely unknown, as is the impact of historical migrations. Here we illustrate fine-scale genetic structure across Ireland that follows geographic boundaries and present evidence of admixture events into Ireland.
Less than two months later, we had a new peer-reviewed article published by Ross Byrne, Rui Martiniano, Lara Cassidy, Matthew Carrigan, Garrett Hellenthal, Orla Hardiman, Daniel G. Bradley, and Russell L. McLaughlin in PLOS Genetics: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007152.
Titled "Insular Celtic Population Structure and Genomic Footprints of Migration," this study used haplotype-based fineSTRUCTURE
MyHeritage made a big splash at RootsTech last week highlighting its new chromosome browser which initial reports indicate is nicely constructed, robust, and includes a true triangulation feature. The debut of the browser was in conjunction with the announcement of major improvements in the MyHeritage matching procedures and algorithms.
The announcements came not at RootsTech—which wrapped-up last Saturday in Salt Lake City—but a few weeks earlier on the MyHeritage Blog. It was at the massive RootsTech though, this year estimated to have had over 14,200 paid attendees, where attention on MyHeritage became front and center.
Anyone who took a MyHeritage DNA test, and anyone who uploaded DNA data from another service, will now receive more accurate DNA Matches; more plentiful matches (about 10x more); fewer false positives; more specific and more accurate relationship estimates; and indications on lower confidence DNA Matches to help focus research efforts.
—MyHeritage Blog, 11 January 2018
DNA Painter, the autosomal DNA visualization tool for genealogy created and developed by Jonny Perl, has not only been gaining thousands of users in its seven-month existence, but on March 2 in Salt Lake City it was announced as winner of the 2018 RootsTech DNA Innovation Contest.
Jonny, a web and applications developer in England, has been involved in genealogy for over ten years, but took his first DNA test in December 2016. He admits that he was skeptical of DNA testing initially, and accordingly had delayed testing himself for years. After he saw his results, he was less than completely satisfied with the way they were displayed, and thought that there had to be a better way.
He became involved with a UK-based Facebook group discussing DNA and genealogy, and credits that with helping move his understanding of DNA rapidly past the basic and intermediate stages. He began looking at ways to group and display chromosome mapping and segment sharing more intuitively and visually and, in July 2017, invited just a few people to have a look at what was working on as, essentially, "alpha" testers. Jonny readily admits that if we'd seen the application at that stage, we would not have been impressed.