Hadoop Framework is Used to Stitch or Link Records Together at Ancestry.com

The following teaser is from an interesting post published on February 26, 2013 about Ancestry’s use of the Hadoop Framework to allow users to find more ancestors. Good article…

Genealogy website Ancestry.com, an Internet veteran that traces its own roots back to 1997, has accumulated 11 billion records (and counting) to date, including birth and death certificates, marriage licenses, immigration documents, and millions of family trees. That translates into 4 PB of data — a lot of it unstructured.

If the volume and variety of the Provo, Utah-based company’s data doesn’t meet the threshold for big data, there’s no doubt its newest venture will: Genealogy by autosomal DNA test. The service, which launched in May 2012 and has received mixed reviews online, offers to help subscribers discover their “cultural roots” by comparing their genome sequence against other sequences and genetic information to determine ethnicity and find potential matches.

“We have to have massive storage technology to store 4 PB of content, and then it requires us to deploy massively parallel processing [MPP] solutions to mine the data,” said Scott Sorensen, senior vice president of engineering at Ancestry.com.

Meeting that need required reinventing Ancestry.com’s traditional systems for processing and storing data.

Read the full article at searchcio.techtarget.com.

Author: Leland Meitzler

Leland K. Meitzler founded Heritage Quest in 1985, and has worked as Managing Editor of both Heritage Quest Magazine and The Genealogical Helper. He currently operates Family Roots Publishing Company (www.FamilyRootsPublishing.com), writes daily at GenealogyBlog.com, writes the weekly Genealogy Newsline, conducts the annual Salt Lake Christmas Tour to the Family History Library, and speaks nationally, having given over 2000 lectures since 1983.

Leave a Reply

Your email address will not be published. Required fields are marked *