Spam Part Two - Attack of the Clones
This post was written by our Chief Spam Fighter and delves into the subject of why spam is such a tricky little beast. It was prompted by a question that a Searchme user posted at getsatisfaction.com.
In Spam Part One, I touched on the adversarial nature of spammers, how they cheat by yelling and shape-shifting. Now let’s discuss the second reason why spam is a particularly tricky problem: The numbers.
First of all, there is just so much darn spam out there. Billions and billions of pages. Dealing with the sheer mass of it is a never-ending, soul-wearying battle.
Second of all, spammers multiply like the devil. Say each person in our stadium represents one good site. Well, the spammers in the crowd have found a way to clone themselves, so what looks like a whole end zone full of people could in fact be one bad spammer. This cloning process is so fast and so cheap that even if we cleared out the area at half time, the area would be filled again by the third quarter.
Here’s an example to illustrate this point: We once found a spam site that led to 381 billion pages. One domain created a flood of spam pages that was more than ten times the size of Google’s index.
That’s the kind of enemy we’re dealing with.
Next time I’ll post about how hard it is to distinguish what is and is not spam (even though they’re everywhere.)
Tomorrow: Spam Part Three - Babies with the Bathwater

April 15th, 2008 at 4:16 pm
A lot of folks in my category achieve top slots by using various Black Hat techniques, which include multiple domain redirects, hidden text (Background colour same as font colour), key word spamming (in both the metas and in the content), key word repetition, etc.
I’m trying to take the high road and not resort to any of these techniques. Thankfully, its paid off because Google has placed me in the #1 slot in many of my key word phrases.
June 3rd, 2008 at 1:02 pm
Is it possible to flag hosting companies which host some of these spam sites? You will then damage the ranking of good sites also hosted by this provider. This will upset their client base and put some pressure on hosting companies to check their hosting for spam sites and not leave everything up to the SE to figure out.