Download Ancestry DNA Match Information
Ancestry Match Downloader is designed to use the new Ancestry DNA API to scan, store, and download your current matches.
The extension limits the download to matches having 20cm or more in common. This the 4th Cousin or Closer filter that Ancestry uses. Ancestry does not include matches with less than 20cm shared in their in common matching.
Typically a user will have from up to 10000 matches that fall in this range. The match scan will take approximately 1 minute per 5000 matches. Scanning in common matches still takes a long time to complete. If you have 2500 matches expect ICW scan to take around 30 minutes. If you have 4000 matches expect it to take 2 hours. If you have 6000 matches expect it to take 3 to 4 hours.
Changed the default filenames for export to include the test takers name. Also I included the filter settings for Clusters in the file name. The Cluster file name will be Name-ICW-Clusters-MaxCM-MinCM-Percent-MinSize.XML.
Added message box to export routines to show number of matches exported and clusters exported.
Added more error catching to the exports.
Added some error catching to the export routines to catch incomplete scans of match data.
Added an message pop up when match scan is complete and ICW scan is complete.
Added an alert message if communication to Ancestry server is interrupted during a scan.
Eliminated current ICW matches label. When scanning ICW matches if you have close family matches it may take up to 15 seconds or so for the scanned ICW matches number to increase. Be patient and trust that it is working. Or open the developer tools with Ctrl+Shift+I and click the network tab to view the network requests in real time.
Added tree information to the cluster output. The column beside the matches name will tell you if there is a common ancestor identified, public or private tree, and number of people in the tree if it is linked to the match. If there is no tree public or private it will simply say no tree.
Cleaned up the user interface.
Added a filter to not include single match clusters. Set minimum cluster size to 2 or more to filter small clusters.
Fixed some bugs that could cause corrupt data to get saved to storage.
Combined the scan match information and tree information to a single button press.
Fixed a bug in the filter for clusters.
Fixed a bug that prevented the last few ICW matches from being saved.
Modified saving of match data to storage to reduce cpu load.
Modified the export icw matrix to reduce memory usage. Large numbers of matches were causing out of memory crashes.
Changed extension to load into a new tab by default instead of being an iframe. Scans now continue if you change to a new tab or window.
Scan times greatly reduced.
Modified the In common matches download to eliminate unneeded requests. This should speed up ICW downloads some.
Fixed a bug in storing In Common with matches
Updated match scanning for updates to Ancestry API that increased matches returned per page from 20 to 100. This greatly reduces the time to scan matches and ICW matches. Scans should now complete 5x faster than previously.
I am working on downloading matches pedigree trees and combining them into a single gedcom file. Currently Ancestry beta allows viewing a 6 generation pedigree tree of matches even if you are not subscribed to ancestry's service.
Instructions for use:
In order for the extension to work you must be signed in to your Ancestry account where you have access to the DNA test.
I suggest opening a new window in chrome and then browsing to Ancestry.com. Make sure you are logged in and then click the Extension ICON. The extension will load in a new tab and will stay active even if you switch to other tabs or windows.
1: On first use of the extension click the "Get Test IDs" button to populate the drop down with the available DNA tests on the signed in account. The IDs will be stored locally. You will only need to click the "Get Test IDs" the first time you run the extension unless you add more testers to your account.
2: Select the Tester's name from the drop down selection box. This will populate the Number of matches field with the amount of matches currently available to scan for this tester.
Note: If you have previously scanned this tester it will also populate the number of matches previously scanned, number of matches scanned for tree information, and the number of matches that have been scanned for "In Common With" matches.
3: To begin scanning the matches click the "Scan matches" button. If you have previously scanned your matches a prompt will appear to verify you wish to rescan the matches. "Match information scanned" will update as the matches are scanned and works as a progress indicator. After all matches are scanned it will then scan the matches for tree information. "Tree information scanned" will update to show progress of the scan. Once complete Match information and Tree information numbers should match the "Number of Matches".
Note: Ancestry's new api does not provide a means of getting only new matches so each time you wish to update the match database you must rescan all of your matches.
Note: The scan does not change the viewed status of a match. It essentially creates a list equivalent to the list of matches seen on Ancestry's site.
Note: If internet connection is lost or Ancestry's server has problems during the scan and the number of matches is different from the match information and tree information scanned you will need to rescan the matches by pressing scan matches and selecting OK from the message box.
Note: The program will retry a scan 3 times prior to giving up.
4. The Matches in common with scan is a long scan but is required to use the clustering tool. The initial scan time will depend greatly on the number of matches you have and how interrelated those matches are.
Note: A tester that has 2000 matches where the average matches in common for each match is less than 100 will take 30 mins to complete.
Note: A tester that has 5000 matches where the average matches in common for each match is 200 will take 3 hours to complete.
5. If somehow the Matches in common scan fails (internet connection fails etc..) the last match fully scanned will be saved. You can resume the scan as long as you have not run the Scan Matches to update your list of scanned matches. Click the resume button to continue a failed scan of ICW matches.
Note: Clicking Resume will prompt you to continue a failed scan. Select ok to continue.
6. If you have updated your scanned matches since last running "Scan ICW" select "Update" instead of "Resume". This will continue scanning your in common matches and will update the previously scanned matches that are in common with any new matches.
7. "Export Matches" button will dump all the saved match information for the current tester in CSV format.
Note: ICW match IDs are appended at the end of each row of matches.
Note: The columns labeled 1000 to 1023 represent any custom color tags that the tester has used to tag their matches.
Note: Export may take up to 1 minute to complete if you have a large number of matches.
8. "Export" button creates 2 CSV files. The first containing the match information without the ICW data. The second being a matrix of the tester's matches with the ICW matches marked with an "X".
Note: The 2nd file ICW Matrix may take up to 1 min to output.
9. "Cluster" button runs a clustering routine on the ICW match data. It creates clusters based on the matches that fall between "Max CM" and "Min CM". Use the "Min percent of matches in common" to tighten up the clusters. Default is 50% however 65% or more may give better results. The Max CM should be set below the value of any 1C1R matches for best results. The min CM will increase or decrease the number of clusters. Set minimum cluster size to 2 or more to filter small clusters.
Note: The output of the Clusters is an Excel XML Spreadsheet file. This format can be directly opened in Excel 2007 or newer.
Note: The XML will have 1 worksheet for each cluster and will already have the rows/columns formatted to make viewing easier.
Note: The routine counts the number of times a match shows up as an ICW match to those in the cluster and the value shown in each column is that.
Note: The matches are sorted based on that value.
Note: I hope to continue working on the algorithm and improve the cluster outputs in future versions.
Note: If a CSV file output is desired I can add this to the output. All of the clusters would have to be on a single sheet though as CSV does not support multiple sheets.