With all the holiday trappings completely stowed for another year, and the shimmer of New Year's Day hopefulness already lost its luster, I've been ready to get back to normal—whatever that is. So I turned to my favorite traditional virtual volunteer activity, indexing digitized records at FamilySearch.org.
Apparently, things aren't back to "normal" there, either. In fact, they might be in the midst of an exciting development—or, as one person on the "Automated Content Extraction" team put it, "explosion."
It's not a development we couldn't have foreseen. After all, though volunteers like me who index to help make more digitized records computer-searchable may find the process as relaxing as knitting, with every year—no, every record gained—we are falling farther and farther behind what could have been accomplished.
For instance, while genealogy organizations like FamilySearch.org have been publishing digitized records at an exponential rate—the first one billion images by 2014, but two billion achieved by 2018—our human fingers on the computer keyboard have been clunking away at the rate of thousands in comparison. The hurrier we go, the behinder we get.
It's no surprise, then, to see people searching for ways to apply technology capabilities to our lagging rate of indexing—of converting images to searchable terms. It was interesting, as I exploring who had been thinking about this issue, to find a thread begun on a forum back in 2017. Someone, thinking and curious, had posted a query on the forum about whether anything was being done about exploring the application of handwriting recognition software to the indexing process at FamilySearch.org.
Apparently, there were others thinking along those same lines. In 2018, at a family history conference at Brigham Young University, one presentation outlined the task at hand—read the presentation notes from Ben Baker below the slideshare here—focusing specifically on the game-changing usage of machine learning and other technologies in an effort to play indexing catch-up. With that key shift in strategy, indexers may now serve more as quality assurance to check the initial indexing work done much faster by machines through optical character recognition.
Thankfully, this savvy application of technological capabilities to our insatiable craving for more clickable historical documents has been helping organizations like FamilySearch.org narrow the gap between records digitized and records made searchable, despite the very human limitations of those willing indexing volunteers. Sure enough, when I checked for available indexing projects—in my quest to get back to "normal" with my monthly volunteering stint—there weren't really that many opportunities available.
Perhaps, in a race with machines, they will beat me every time. And I'm glad of that, especially for the sake of the many who are still searching for evidence of their family's history. But for my desire to find that relaxing volunteer equivalent of knitting, perhaps I'll need to adjust my expectations a bit. Perhaps now, we'll be crowdsourcing our analysis of how well those machines did at reading hundred year old documents. At least now we won't get bleary-eyed over that prospect.
A little scary, but as a person who often reads through hundreds of pages of handwriting search for a record, I will keep my fingers crossed.ReplyDelete
From what it sounds like, Miss Merry, this may be a process of letting the "machines" have first dibs at tackling the information, then taking the resultant transcriptions and crowdsourcing the corrections.Delete
This is somewhat like what has already been done in the indexing process--two people indexing the same document, then an arbitrator evaluating any discrepancies to adjust to the correct version. Only now, the initial sweep may be done by a character-recognition program. At least, that's what it sounds like to me. We'll see how it goes.
Personally, I think it is a lot easier to look at someone else's work and make the judgment call as to whether it was done correctly, rather than have to type in everything, myself, at the start. Hopefully, the system will be worked out soon, as it sounds like the production speed will be greatly improved! Not to mention, my eyes started tearing up, just thinking about how many pages of handwriting you end up reading through!