Many of you who stop by regularly, here at A Family Tapestry, are also bloggers in
your own right. You likely work hard to produce posts that will accurately
represent your research—or share your latest discoveries in whatever topic you
choose to discuss. Though you may not get paid for your efforts, you offer them
in the sincere hope that your work will be of benefit to others.
In one way, as bloggers, we share and share alike.
That, however, is vastly different from the instance of
those who freely offer on their own site, but unbeknownst to them, have had
their work lifted and repackaged on another
website—a place likely using that very content to make someone else a profit.
That little sleight of hand is called content scraping (or, in some cases, blog scraping) and
it has occasionally become a topic of conversation amongst members of the
genea-blogging community—usually in the form of outraged diatribes against such
perpetrators by those personally wronged.
If you think you have never had that happen to you, dear blogger,
think again. The mere effort of cutting and pasting the title, or an excerpt, from one of your
recent blog posts into the search box at Google may reveal otherwise. I know it
has done so for me.
Not only that, but there are tools available to such content
scrapers to make their “job” even easier. When I googled the term to find
relevant sites to support today’s post, the first item to come up was not an
example or definition source, but an ad for software to facilitate content
scraping. You see, you are not just up against a well-meaning but misguided
zealous fellow-researcher, but a worldwide variety of people who see no problem
in stealing your hard work.
I’ve been blogging for less than four years, but during that
time, I’ve also been an avid genealogy blog reader. And I recall several of my
fellow bloggers reporting how they encountered that loss on a personal basis.
The instance that stands out in my mind most vividly is when that occurred to GeneaBloggers originator, Thomas MacEntee. Thomas, blogging not only because of his fascination with genealogy but because, well, computer geeks can do this stuff blindfolded with one arm tied behind their backs, not only took this loss as the serious threat that it was to his business, but put his considerable computing knowledge to work in fighting back.
The instance that stands out in my mind most vividly is when that occurred to GeneaBloggers originator, Thomas MacEntee. Thomas, blogging not only because of his fascination with genealogy but because, well, computer geeks can do this stuff blindfolded with one arm tied behind their backs, not only took this loss as the serious threat that it was to his business, but put his considerable computing knowledge to work in fighting back.
If you don’t recall Thomas’ frustration, back in 2012, dealing
with “sploggers” who were stealing his content, you might find it helpful to
check out how he went about combating the problem. He also shared his resource page of links on how to do this, which he posted on Pinterest.
In a different episode, another blogger—Heather Kuhn
Roelker of Leaves for Trees—had commented,
“I work too hard on writing my blog for it just to be stolen.” Heather found
Thomas’ advice helpful. I’m sure a number of others have, too.
Content scrapers do not only target genealogy bloggers, of
course. So it is no surprise to find blogs which offer generic advice for all
sorts of bloggers in this predicament, such as this one for WordPress bloggers.
In fact, it was a recent announcement about a new anti-scraping plug-in for WP bloggers that
got me re-thinking this very issue.
In the past, my thoughts had ranged everywhere from “Who
would copy my stuff?” to “So what if they copy my stuff; I have enough internal
links to lead readers back to my own site.” I pretty much still hold to
that latter thought. However, just because content scraping software likely
doesn’t know how to differentiate between the rest of the post and a concluding
sentence that essentially says, “Hey, if you didn’t find this post on my blog,
come read it at my own site,” I’d like to start adding a sentence like that to
the bottom of my posts. That way, when the scrapers scrape the rest of my
content and lift it to their own site, they’ll also be lifting a sentence that
tells readers where to go to get the rest of the story.
Of course, for you who are reading my posts here on my own
blog site, it will seem redundant. But humor me. It can take me anywhere from
ninety minutes to three or four hours to complete the research and writing for
just one of my posts. I’m with
Heather: I don’t want to do all that work for someone else’s online profit-making
machine, either. However, I don’t want to add another several hours to that
tally, just to fight my way through all the hoops necessary to get those people
to cease and desist.
I just want to spend my time doing what I feel would be my
best contribution: doing the research and writing for my own posts. For you—my regular readers who stop by here at A Family Tapestry to spend a moment every day, and perhaps share a few words of comment as well. It's for a readership community and to further our mutual research interests that I do this. I'd like to keep it that way.
Hmm -- what happened? Where is this going? What prompted this post?
ReplyDeleteHmmm...lately, Wendy, you've been thinking of all the questions I should have asked myself before clicking the "publish" button...
DeleteNothing in particular has been happening--at least to my blog here--but there have been instances that trouble me. For instance, if I take a particular section of one of my posts, enclose it in quote marks and search for the exact phrase on Google, it will come up in the results...in two other sites. Neither of which has my permission to re-post my articles.
Second, I can see from my analytics that, for some strange reason, I am deeply beloved by voracious readers in the Ukraine--well, at least that's the latest country to pump my numbers up well over a hundred more than my dailynorm. (Incidentally, if you are reading this in Ukrainian, you have my invitation to come read this post on its original site!)
What prompted this post is that I'd like to give my readers--my real readers--a head's up about some small statements that I'll be adding to my posts and pages in hopes of deflecting some of this bleed-off. To you who come to this site legitimately, the changes will (hopefully) be barely perceptible...but may seem silly at first, until you know the back story.
Thanks Jacqui for the shout out and highlighting this issue - I've shared with the GeneaBloggers page over at Facebook. If there is a specific incident of scraping or copying that I can help with, please let me know!
ReplyDeleteThomas, thanks so much! Your site is a wealth of information, so I'll start there. But thanks for the offer--and for sharing on your Facebook page. Much appreciated!
DeleteI can't understand why anyone would do this sort of thing myself - but yeah, there is a slimeball for every spam, hack, virus, and malware out there... I just can't understand why they do what they do.... I wish I could give them a swift kick in the ass with steel toed boots.
ReplyDeleteYeah, it's frustrating...but it is what it is, so we have to do whatever it takes. Fortunately, we have options :)
DeleteI think it happens more than we realize. I was reading a blog one day and ran across a photo of mine that another blogger claimed was hers...only it had my watermark...
ReplyDeleteWhat Iggy said!
Now, that takes the cake, Far Side! Incredible what gall people have! Wonder just what readers thought your watermark meant?!
DeleteExcellent post! I've only been blogging for 6 months and I found this to be very interesting and informative. I will be reading Thomas's posts as well.
ReplyDeleteDawn, thanks for stopping by! Yes, do look at Thomas' posts on this subject. Actually, he is a wealth of blogging information beyond just the link I've shared.
DeleteBest wishes on your blogging endeavors, Dawn. You mentioned "only" blogging for six months, but I bet you've learned a lot in those six months!