Counting Chromosomes
A blog of random musings on genealogy, genetics, science, and history

To recap: About 9:00 p.m. EST on June 4, MyHeritage announced that a data breach of their systems had been discovered that affects 92.3 million accounts, users who had registered at MyHeritage up to and including the date of the breach, 26 October 2017.

Internet Security
The data breach affects all MyHeritage user accounts created before 27 October 2017

Summary of Events

About 9:00 p.m. EST on June 4, MyHeritage announced that a data breach of their systems had been discovered that affects 92.3 million accounts, users who had registered at MyHeritage up to and including the date of the breach, 26 October 2017. Approximately eight hours earlier, an independent security researcher notified the company that he had discovered a file "on a private server outside of MyHeritage" that contained the email addresses and so-called "hashed" passwords of these accounts.

Staggering to comprehend, but the company has stated that "other websites and services owned and operated by MyHeritage, such as Geni.com and Legacy Family Tree, have not been affected by the incident." Further information shows that they have added about 4 million new accounts since the breach on 26 October, making the current registered base over 96 million. There is as yet no information about the source of the leaked data.

As of this writing, MyHeritage has issued the original announcement at blog.myheritage.com/2018/06/myheritage-statement-about-a-cybersecurity-incident/, and an update today, June 6, at blog.myheritage.com/2018/06/cybersecurity-incident-june-5-6-update/.

The company has reassured us that they have no reason to believe passwords were compromised because it "does not store user passwords, but rather a one-way hash of each password, in which the hash key differs for each customer." More on this in a moment. The company added:

We have no reason to believe that any other MyHeritage systems were compromised. As an example, credit card information is not stored on MyHeritage to begin with, but only on trusted third-party billing providers (e.g. BlueSnap, PayPal) utilized by MyHeritage. Other types of sensitive data such as family trees and DNA data are stored by MyHeritage on segregated systems, separate from those that store the email addresses, and they include added layers of security.

Steps MyHeritage is Taking

  • Upon learning of the breach, the company formed an "Information Security Incident Response Team" to research and address the incident. It is not clear whether this was a standing team that was mobilized, or one created ad hoc after news of the breach. It does include hiring of a "a leading, independent cybersecurity firm to conduct comprehensive forensic reviews."
     
  • They began a process sometime late June 5 to force-expire all MyHeritage passwords, ones created after the date of the breach included. This means all their users will be "forced to set a new password and will not be able to access their account and data on MyHeritage until they complete this. This procedure can only be done through an email sent to their account's email address at MyHeritage."
     
  • The company has expedited development of a planned, but optional, two-factor authentication feature. Familiar to most with online financial accounts, this uses the standard password in conjunction with a rolling, random code available to the user for a brief period of time via a secondary device. In this case, it looks as if MyHeritage will employ an SMS text method that will send a token code to a pre-registered mobile device. Both the login password and the random code would be needed to access the site.
     
  • They have created a customer service response team available by email at This email address is being protected from spambots. You need JavaScript enabled to view it., or by phone via the toll-free helpline phone number (USA) 888-672-2875, available 24 hours a day.
     
  • And, of course, and especially if you are a registered MyHeritage customer, it's a good idea to look for additional updates at the company's blog, blog.myheritage.com.

Opinion and Recommendations

Unfortunate Timing

MyHeritage had no influence over the timing of this announcement, but for the genetic genealogy community it could hardly be worse. We're just now coming off several weeks of furor and confusion over the European Union's General Data Protection Regulation (GDPR) which caused an array of businesses, not just genealogy and DNA businesses, to issue revised privacy policies and terms of use. It's forced some services—notably Ysearch, Mitosearch, and WorldFamilies.net—to close; it's forced sites like WikiTree to drastically curtail the effectiveness of using DNA information in conjunction with its shared family tree; it's bombarded us with notifications and annoying "this website uses cookies" pop-ups; and, while not a bad thing, brought privacy concerns front and center in genealogists' minds.

During the melee, the news hits that the infamous Golden State Killer was identified by use of the free autosomal DNA comparison tools at GEDmatch, and within days on the heals of that development came the announcement by Parabon NanoLabs (a company best known for its Parabon Snapshot DNA Phenotyping Service that creates composite facial based upon DNA profiles) on May 8 of its new forensic genealogy service, to be headed by well-known genetic genealogist CeCe Moore. A predictable but unfortunate repercussion was a reaction from some GEDmatch users to either hide or completely delete test kits that they manage.

We'll never see any actual numbers, but I have a feeling that this year's Father's Day DNA sales won't be as successful as in years past.

Hackers Don't Want Your DNA Data

To be clear, none of the DNA testing, comparing, or matching services make your autosomal results available online. The only way to get that raw data is to go through specific steps to download a digitally compressed file for your own use elsewhere. Until we eventually achieve affordable technology that allows direct-to-consumer full genome sequencing of our 3.2 billion DNA base pairs, what the current tests for genealogy offer is a look at only about 0.023% of your genome and, since these tests use specific micro-array chips that are designed for purposes of population genetics, the vast majority of the markers tested are not in locations that include protein-encoding genes. The tests need to measure variances, and our protein-encoding genes can't vary that much without endangering the survival of the organism; that's one big reason that all humans have over 99.9% of their DNA in common.

For genealogy, we test autosomal SNPs, Single Nucleotide Polymorphisms. The forensic DNA testing used by law enforcement—and which TV shows like CSI always get wrong—doesn't test for SNPs at all; that testing uses STRs (Short Tandem Repeat) to identify individuals. For example, the FBI's National DNA Index System (NDIS), implemented in 1998 and part of the Combined DNA Index System (CODIS), began looking at only 13 specific alleles, or loci, to measure the number of times a copy of the allele values repeated. Beginning in 2017, the original 13 core loci were retained but seven additional markers were added for a total of 20. As of last April, there were over 13.3 million offender profiles in NDIS.

We do use STR testing when examining the Y-chromosome for genealogy. If you look at any surname project at Family Tree DNA you can see results of the project members grouped and displayed; most marker names start with "DYS" followed by a number. How can these be displayed openly without endangering privacy? It's simple: these markers are not individually identifying. They can group men into a haplotype and predict that they share a common male ancestor, but they can't show how one man is related to another. Useful for genealogical and anthropological research, but not for forensic purposes. The most popular test looks at 37 markers; at that level an exact match predicts a 95% probability that the men share an ancestor within eight generations. Even with a perfect match on 111 markers, we don't reach a 99% probability that two men share an ancestor until we get to their fourth-great-grandparents. That two men both descend from a man born about 200 years ago doesn't do much for a criminal court case.

The net message is that hackers aren't after your DNA data. They can't do anything with it if they got it. Everyone, from the very start of consumer-direct testing in 2000, has gone into it with the understanding that the results might reveal something unexpected: a non-paternal event in the family line, an undocumented adoption, a name change. We will see continued discussion about the ethics of "forensic genealogy" and the likes of the Parabon initiative to use GEDmatch and similar sites with the goal of uncovering leads on cold-case violent crimes. I for one am fine with it. They're using the databases for nothing different than any of us have when assisting an adoptee in search of biological family. My GEDmatch kit number is A493619 and I'm fine with anyone knowing it. Drop me a note if you're one of my undiscovered cousins (please, though, no contact about matching at less than 25cM). Serial killers should be worried about forensic genealogy; not the rest of us.

Noted genetic genealogist, Blaine Bettinger, commented yesterday: "I've always contended that whether today or 25 years from now, I will be able to tell a lot more about a person from their credit history or bank records than their DNA will ever reveal."

Buy DNA tests and circulate your results so that other family researchers can find you. None of the recent events or developments should change your behaviors with regard to DNA testing.

Steps You Can Take Now

To some, this is going to be an old refrain and one that may seem more than a bit annoying. Like it or not, though, our online presence and digital footprint increases steadily, and in many instances—think mergers, acquisitions, business closures—we have no real control over what happens to the data associated with that presence. Unless we decide to build a log cabin a hundred miles from anything, with no electricity and no telephone, well, we're stuck with the ever-expanding nature of our online worlds. What you really should consider doing, with a little explanation to follow:

  1. MyHeritage will force you to change your password the first time you log-on after June 6. Do it as soon as possible.
     
  2. Anyplace you used that same MyHeritage password, go change it.
     
  3. Stop using weak passwords. Anything you can remember should be considered too weak. If it isn't at least 12 characters long, and preferably more, and doesn't contain at least three or more "special characters"—e.g., ! # $ % @ { ^—it's too weak.
     
  4. Never use the same password in two places. Every account and online presence you have should use a unique password.
     
  5. Never share your password with anyone; but if you must (like for the IT support guy) change it immediately afterward.
     
  6. Change your password periodically. If it's been a year since you changed it, you're way overdue. I'm on a rotating quarterly schedule on most of mine, but some particularly important ones are changed monthly. For general purposes, I'd recommend going no longer than six months between changes.
     
  7. Enable two-factor authentication wherever the option is offered. Don't let it be an excuse to slacken your guard about numbers 3 and 4, however. With cell phones now ubiquitous, companies can send you an automated SMS text with a temporary numeric PIN that has to be entered in conjunction with your password.
     
  8. Stop using the same email address for mutiple online accounts. Like passwords, if it's a website where you register for any reason, even to receive email alerts about blog posts, you should use a unique email address. (See below; it isn't as difficult as it sounds.)

Usernames and nicknames are generally publicly viewable, and there's seldom a reason—unless there's a special one to you—to obscure or obfuscate it. The email address and password are the important identifying factors everyplace you have an account.

Statistics say that the majority of us use the same password and email address on many, if not most of the websites where we have accounts. If hackers obtain the matched pair, all they have to do is see where else you might have accounts or, just for grins, try logging onto major financial service providers to see if they can get into your accounts. Even if all they obtain is the email address, and it's the same one you use for multiple accounts, they can try simple brute-force dictionary-type hacking tools to see if they can crack it, moving on to the next site once the maximum number of failed attempts is used up on the first site. All with automated tools so the hacker can go grab a Starbuck's and come back later to see if he was successful.

Your Password

Password Infographic
Password Infographic from TeleSign, click for larger version

MyHeritage has told us that there is no reason to believe any passwords were compromised as a result of this breach. However—and understandably so—they haven't revealed precisely what type of techniques beyond basic "hashing" were used to protect password data. Hashing is a common safeguard, and the reason that most websites will tell you that your password cannot be recovered if you lose or forget: you can reset the password, but not learn what it was.

In a nutshell, cryptographic hashing uses a mathematical algorithm to take the character string of a password and transform it into an entirely unrelated string of gibberish. The gibberish is what's stored on the server, not the actual password. When you log-on to the service, an application on the server acts like a secret decoder ring; it translates the password you just entered into the lingua franca gibberish and then verifies that the gibberish matches up to a string of identical gibberish stored in a database. Well and good so far.

But there are hashing algorithms...and then there are hashing algorithms. Black-hat nerds are constantly at work attempting to defeat any and all online security measures. The basic hashing methods developed three decades ago doesn't cut it anymore. Not only have older, simpler hash operations—like the MD-series designed by Ronald Rivest of MIT, or the first iteration of the "Secure Hash Algorithm," SHA-1, developed by the U.S. National Security Agency in 1995—proven to be vulnerable, but hackers have even developed and published open-source applications that can crack them with the push of a button. A 13-year-old kid in his basement can do it with no training or experience, just the click of a mouse.

To combat that, techniques like "salting" have been developed to randomly generate a bunch of clutter-characters to add to a password before it's hashed. MyHeritage stated that they use "...a one-way hash of each password, in which the hash key differs for each customer." This no doubt refers to salt as the method of generating that unique has key. In addition, more robust encryption methods, like bcrypt, have been created to work in tandem with salting to provide a greater defense against brute-force attacks.

All the technology in the world between your keyboard and the webserver isn't going to help if your password is password or 123456. And, statistically, most aren't more complicated than that.

It all starts with a password—a unique password that you use for no other purpose—that is complex enough to be considered strong. The aspects of a strong password are length (the longer the better); a mix of letters (upper and lower case), numbers, and symbols (special keyboard characters); no dictionary words, including words gleaned from a different language; and no association with your personal information.

We all probably have accounts at sites where we had to setup security questions to which only we should know the answer. Not only is that so Y2K, but as genealogists have you ever looked at those available questions shaking your head, because you know it would take you all of five minutes to discover someone's paternal grandfather's first name, or maternal grandmother's maiden name. Don't use those types of security questions, and if your great-grandfather Earl was born in 1882, don't use 18Earl82 as a password. And simply doing letter/numeral swapping looks cute but stopped being effective a couple of decades ago. Avoid gr8geneal0g1st and MyFam1lyH1st0ry.

I've frightened people with this sort of thing before, but here's an example of an expired password (that will never be employed again) I've used here on the Threlkeld One-Name Study webserver:

b*:*P@^44W{-]+8_sg~@7}^G-g2[:Uf3>J#7*zR_}f~T9]k+]G@5

Simple and easy, right? There may be some remarkable humans who are able to memorize scores of passwords of that ilk, and memorize new ones every couple of months, but I'm not one of them. You probably aren't either.

There are two options, and I use both of them for different purposes. The most basic is to store passwords in a simple document. I have a few different websites I manage, and each has separate and unique credentials at many of the same places (e.g., Google webmaster tools, Norton Safe Web, associated social media accounts), in addition to multiple email addresses for system accounts on the server. My preference is to keep all those related credentials stored in the same place, and separate from the other sets of credentials.

But wait! What about the maxim to never write down your passwords? I maintain discrete Microsoft Word documents and use a robust data encryption technology to prevent prying eyes from using the files. Your needs may differ from mine, but there are two broad categories of these products: one encrypts entire sectors or folders on your hard drive and everything that goes into them, and the others are used for one file at a time. Many of these are inexpensive, but can provide a solid level of protection. You can read about a few choices from this December 2017 review by PC Magazine, and another from last April by TechRadar.pro.

If going standalone, you'll probably want to find a good random password generator. Here's an article from Digital.com that describes 13 of them. Some don't reach the password lengths that I prefer, so I'll just do two generation runs and paste the results together. Another thing I try to avoid, just for clarity, is using characters/numerals that are too similar visually, i.e., 1 I | and 0 o O. If I ever have an issue doing a copy-and-paste, I want to be certain what that character is I'm staring at. In some fonts, these can be downright indistinguishable from one another.

The second category are password management systems. These encrypt, store, and call-up your passwords on demand. I've found them less suitable where I require separate credentials for the same website (again, like Google webmaster tools or Buffer.com), and some store the necessary data on the cloud so that you can access the password management systems from multiple devices, including your phone. The online storage represents a possible security threat, but everything's a trade-off. For websites where I use only one set of credentials I find a password manager to be extremely handy.

There are password management systems that come bundled with larger security applications, like Norton's Identity Safe, and some that are available as separate software titles. Here's a review from last May at ConsumersAdvocate.org, another from May by PC Magazine, and one from February by c|net.

Your Email Address

A few paragraphs ago, I said you should use a unique email address at every website where you have a registration. And you thought I was crazy, didn't you? Gmail offers one very good, free email service...but setting up 100 different Gmail accounts?

You guessed it: there's an app for that, too. In fact, a number of them. A quick search for email forwarding services will find you a bunch to choose from.

The one that I use and can recommend is 33mail. Their free tier of service is likely to be all you'll need, but there is a "premium" level for $12 per year that includes extra features and no advertisements, and a "professional" level for $50 per year. It's incredibly easy to use, and gives that important benefit of unique email addresses everywhere you register. Their own explanation of how it works can't be improved upon, so I'll just quote that:

"Sign up and pick a username, for example, 'joesmith'. Now, any email address ending with ...@joesmith.33mail.com will be forwarded to you. The next time you visit a website that asks for your email address, instead of giving them your real email address, just make one up especially for them.

"For example, if the website is tribble.com, you might give them tribble@joesmith.33mail.com. Don't worry, you don't need to do anything else, we'll create an alias automatically the first time they try to send you an email, and we'll forward any emails they send to you. You can even reply anonymously to any mails received through 33mail.

"Later, if tribble.com start to send you emails you don't want, or even if they sell your email address to a spammer, just click on the link that we add to the top of every email we forward, we'll kill their alias, and they won't bother you any more."

It's just that simple. Every email comes into the single email address that you specified at setup (and yes, you can change that later if you want), and each email shows you who sent it, and to which 33mail.com alias it was sent. If you used one of these for your MyHeritage account, it would now be as simple as changing to a new email alias at MyHeritage, then block the old alias, rendering it useless to whomever snagged the MyHeritage data.
 

Do genealogy safely, my friends!

Reference Links