Hadoop Framework is Used to Stitch or Link Records Together at Ancestry.com

The following teaser is from an interesting post published on February 26, 2013 about Ancestry’s use of the Hadoop Framework to allow users to find more ancestors. Good article…

Genealogy website Ancestry.com, an Internet veteran that traces its own roots back to 1997, has accumulated 11 billion records (and counting) to date, including birth and death certificates, marriage licenses, immigration documents, and millions of family trees. That translates into 4 PB of data — a lot of it unstructured.

If the volume and variety of the Provo, Utah-based company’s data doesn’t meet the threshold for big data, there’s no doubt its newest venture will: Genealogy by autosomal DNA test. The service, which launched in May 2012 and has received mixed reviews online, offers to help subscribers discover their “cultural roots” by comparing their genome sequence against other sequences and genetic information to determine ethnicity and find potential matches.

“We have to have massive storage technology to store 4 PB of content, and then it requires us to deploy massively parallel processing [MPP] solutions to mine the data,” said Scott Sorensen, senior vice president of engineering at Ancestry.com.

Meeting that need required reinventing Ancestry.com’s traditional systems for processing and storing data.

Read the full article at searchcio.techtarget.com.

Leave a Comment

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.