How Google’s Disavow Tool Works with the Penguin Algorithm


More than a year after Google announced its Penguin anti-spam algorithm (in April 2012), many search marketing bloggers continue to misidentify manual spam actions as Penguin downgrades.

If your Website has been affected by the Penguin algorithm, YOU DID NOT GET A MESSAGE FROM GOOGLE.

If you received a message about unnatural links or spam from Google through your Google Webmaster Tools account, you have been identified through a manual spam action, NOT the Penguin algorithm.

Here is a recent example of a manual penalty action that several marketing bloggers have misreported as a Penguin “penalty”: Google Notifies Sprint Of Spam Penalty; Seeks Advice In Google Help Forums. Despite the fact that the Search Engine Land article never mentions the Penguin algorithm, I have come across several marketing blogs that link specifically to this article as an example of a Penguin “penalty”.

How Google’s Panda and Penguin Algorithms Differ From Each Other

Another common issue with bloggers writing about Penguin is that they appear to confuse its priorities with the Panda algorithm. Google’s Panda algorithm, released in late February 2011, was designed to evaluate the quality of Websites according to Google’s own preferences and automatically “downgrade” those sites on a page-by-page basis. We think this downgrade is applied in the form of a page-level score that is combined with all the other page-level scores (such as PageRank) that Google computes for all Web documents.

The Penguin algorithm may very well apply a similar score to each page on a Website but it’s looking at entirely different criteria. Whereas Panda appears to be targeting poor page organization and presentation, Penguin is looking for spammy content: keyword stuffed text and manipulative links.

Think of the Penguin algorithm as the result of years of spam research by the Google team, all wrapped up in a “grab the low-hanging fruit” package that frees up spam team time to dig deep for other types of spam. Penguin just takes a lot of cheap, easily created Web spam off the table — but like Panda it offers a dim hope of self-driven recovery for Webmasters. In theory all you have to do is clean up your Website or your backlinks and Penguin will eventually re-eveluate your site.

But like the Panda algorithm, Penguin is so complex (and, I suspect, unstable) that it has to be processed “offline” (outside of the live SERP updates). That means someone has to review the data and how the SERPs will be adjusted by a Penguin iteration. Hence, it takes weeks or months for Google to release a new Penguin iteration into the search index — and meanwhile people who have done all they can think of to fix their Penguin downgrades are waiting, waiting, waiting….

Penguin Looks at Both ON-PAGE and OFF-PAGE Factors

In their announcement for Penguin, Google provided two (extreme) examples of Websites that would be targeted by the algorithm. The first example dealt with on-page keyword stuffing. The article cautioned readers not to assume all sites affected by the algorithm for keyword stuffing would look like this; the algorithm scans for less obvious keyword stuffing.

Example 1 of a Penguin-affected Website: Keyword Stuffing

Example 1 of a Penguin-affected Website: Keyword Stuffing

Keyword stuffing may not in fact be intentional. Your Web design may include elements that the search engine’s parsing and page layout algorithms don’t full understand. For example, if you include long lists of city names, state names, country names, and other types of special data in on-page controls or widgets it’s possible that all the search engine sees is a long string of names with no context.

Some Web designers use CSS to hide large blocks of text that are jammed together in the source code, creating a mass of run-on sentences and sentence fragments. Search engine algorithms may be able to understand that; then again, they may not. In August 2010 Googler Matt Cutts released a Webmaster video that cautioned people about including too many tag links in “tag clouds”. He specifically pointed out that sometimes large tag clouds can look like spammy links to search engines.

Watch the video here:

A Little More About “Tag Clouds” and Huge Blocks of Links

Why would a tag cloud look spammy to a search engine? Here is an example I grabbed off a Website (that, so far as I know, has NOT been penalized or downgraded by Google) of a tag/label cloud that includes a LOT of links:

Penguin Example 3: A very large tag cloud on a Website.

Penguin Example 3: A very large tag cloud on a Website.

This particular screen shot shows just a small portion of the whole tag cloud. This cloud could be completely innocuous (in fact, I am sure there is nothing deceptive in the live site’s tag cloud) but it could also be used to hide manipulative links pointing to other Websites.

In fact, an old spam tactic is to create what are called “Hallway Pages” or “Crawl Pages” — basically just a page loaded with links that point to “Doorway Pages” on other Websites. Now, good, old-fashioned guideline-compliant SEO can utilize crawl pages, too — but we make them useful to real people. In other words, if a random visitor to a Website lands on a page that is loaded with links but can still make sense of the page and use it to find something interesting or helpful, that’s probably NOT a spammy page (unless you have dozens or hundreds of such pages on the site, in which case you’re either running a Web directory or you need to learn a whole different kind of SEO).

Tag clouds have their own inherent issues (as Matt points out in the video above) but they can be wholly perverted into just linking out to other Websites with keyword-rich anchor text. That’s manipulative linking.

A Little More About Manipulative Links

Through the years I have pointed out many times that if an SEO places a link anywhere on a Website it is more than likely placed with search engine algorithms in mind — and therefore it is “manipulative”.

But search engineers understand the difference between working with Website navigation and outbound links and just being gamed. When your link placements are really ignoring people and just intended for search engines, you’re violating the guidelines. But even if a link placement is useful for people it might STILL violate guidelines because someone paid for the link, or because it’s part of a complex linking scheme.

We can easily identify the egregious examples of manipulative linking but the search engines have drawn the line somewhere deep in their algorithms that we cannot see. Hence, many people who believe their link placements should be acceptable are, in fact, violating search engine guidelines.

If you embed a link on a page expecting or hoping for some SEO benefit you are more likely to be violating a search engine’s guidelines than not. That doesn’t mean you are but it does mean that in a random set of such link placements a lot of those links will be deemed suspicious by search algorithms.

Why? Because you put the search engine marketing benefit ahead of the benefit your visitors will derive from the link. Of course, some people have argued that they only sell (or buy) “quality links” but the links were placed with specific anchor text and omitted use of the “rel=’nofollow'” link attribute so that PageRank and anchor text would flow to the destinations.

When every requested or self-placed link is manipulative, search engines have to draw an arbitrary line somewhere and that is what they are doing. You just don’t happen to like where the line is drawn.

Some Manipulative Links Are Easy to Spot, But Not All

The classic model for “paid links” is a small group of say 3-10 links embedded in a page margin with random-seeming anchors like “buy cheap tickets here”, “best online poker sites”, “free airport parking”, “easy payday loans”, etc. Search engines assume those links would not have been purchased if the search engines didn’t exist (although in reality such links were purchased long before Inktomi and Google started laying down rules for acceptable linking).

In another example from their Penguin announcement Google showed us a screen capture of a page that was using a “spun” article. Article spinning goes back about 100 years but for the first 80-90 years the spinning was all done manually, with a writer carefully rewriting the article or press release each time it was reused. Some freelance writers claimed they resold an article dozens of times to small newspapers and magazines, each time “spinning” the article a little to give it a custom/bespoke angle for its intended market.

Penguin Example 2: A spun article with manipulative links.

Penguin Example 2: A spun article with manipulative links.

When Internet marketers started spinning articles for links they just mass-produced thousands of slightly variated texts based on a single “source” document. Using word and phrase substitution with little to no editing these articles are usually so poorly written it’s obvious no human hand touched them. Some article spinning practitioners made an effort to edit their output but they could not keep up with the automated junk spinners.

Article spinning software has been adapted to create fake blog comments, fake social media posts (Tweets, Facebook updates, etc.), and even fake social media profiles. Their laughably bad grammar and spelling is easy to spot but these link drop tools produce a lot of spam that gets past Webmaster filters and editing, either on abandoned forums and blogs or on “liberally-managed” forums and blogs. In fact, some spammers have set up their own blogs and forums for spamming (through free hosting services that make this simple and cheap to do).

User-generated content is especially risky because it is a favored tactic of link spammers. Unlike spun articles, user-generated content is hard to target. The Google Blogacalypse of March 2012 saw the delisting/de-indexing of ten of thousands of spammy blogs that were used by networks to host manipulative links. These subscription link networks were quite popular for a few years but one of the reasons why Google was able to identify them was that the vast majority of articles were spun mush.

A lot of decent domain names were burned with totally crap content and no one cared as long as they got their links. But I digress.

In October 2012 Google Released the Disavow Tool

You hear so many complaints about this tool today that you would think no one in the SEO community wanted it. In fact, many of us begged Google for such a tool for years. I had a conversation with Matt Cutts either in late 2011 or early 2012 in which I asked once again for the ability to disavow links. At the time Matt indicated he wasn’t sure what Google would be able to do with such data, but it was already on the radar. They had received so many requests for the ability to say, “Google — please ignore this link” that they put the project together and worked on it for months.

Today we hear a lot of heart-breaking stories about how people don’t see any results from using the Disavow Tools for Bing and Google. But let’s face it: no one tool can solve everyone’s problem. Some people have, in fact, reported success in using Google’s Disavow Tool. I have read several such case studies. So while there may be room for doubt about whether the tool worked as desired, the end result was that after going through a thorough process that included disavowing links some people did see improvement in their search referral traffic.

So why doesn’t everyone see a full recovery?

Reasons Why the Disavow Tool Might Not Help You As Expected

To me one of the most obvious explanations for failure to recover traffic seems least often cited: if you got all that traffic from using spammy links, then disavowing the links is not going to restore the PageRank and Anchor Text they had once conferred upon your site.

If your backlink profile consists primarily of bad links and you are either manually penalized or algorithmically downgraded, what that means is that no matter how many more links you obtain you’re not going to improve your search referral traffic. Using the Disavow Tool may get the manual penalty lifted, in which case you can start earning credit from natural, editorially-given links again. But all that delicious rankings success you once enjoyed is dead, gone, and buried.

And Matt Cutts did recently confirm that the Disavow Tool can help you with a Penguin downgrade. So how does that work, if no one is manually reviewing your site and its backlinks for Penguin?

There Are Multiple Lag Times Involved in Disavowing Links

First and foremost, you need to understand the directions for the Disavow Tool. Some people have either submitted badly formed files or they have disavowed their own Websites. The search engines are trying to process these files as efficiently as possible but part of the SEO’s responsibility now is to ensure that Disavow Files are properly formatted and only include appropriate domain names (not individual links).

The Disavow Files, we are told, are manually reviewed. What does that mean? Other than that it’s taking up one or more engineers’ time, I don’t know. But it surely means (among other things) that you won’t see any immediate benefit from Disavowing Links.

Secondly, if your Website has been manually penalized I think you’re more likely to see some sort of improvement sooner than if your site was downgraded by Penguin. Why? Because Google will still need to integrate the link Disavows into the Penguin data, and if that work is still weeks away in the future when your Disavowed Links are entered into the database (or whatever) then you won’t realize any immediate benefit.

So you have to allow for the time required to manually review the Disavow File; and you have to allow for the time required to integrate the Disavowed File into your backlink profile; and then you have to allow for the time required to process and release the next iteration of Penguin.

In this situation it’s almost preferable to be manually penalized than to be downgraded by Penguin — except that with a manual penalty your Website gets a “rap sheet”. The search engines keep track of who has been naughty and nice and if they see a pattern of abuse-and-apologize they may be a little more harsh with future penalties. To date I have seen no indication from Google that Websites affected by Penguin are being tracked.

So How Long Do You Have to Wait?

The longest reported wait time I have seen so far (for a Penguin recovery) was four-to-six months. But there are so many misconceptions about Penguin out there some people might have recovered sooner and never known they were downgraded by Penguin. A lot of people have submitted Disavow Files, “cleaned up their backlink profiles”, and redesigned their Websites since April 2012. You’re just not going to see a majority of these issues reported on your favorite SEO blogs and forums, so we’re all in the dark about how successful Penguin cleanup can be.

Truth be told, though, if you’re only looking at backlinks for a Penguin downgrade you may be overlooking something important on your Website. Just because you think you are not keyword stuffing doesn’t mean some algorithm somewhere didn’t tally up 1 too many occurrences of some word in your page titles and domain names.

As a programmer with more than 30 years’ experience in the field (including many years of processing large amounts of text data using fuzzy logic and other techniques) I can assure you that if there is a way to create a crazy, unexpected pattern in a batch of text someone out there has done so without realizing it — and you just might be one of those people.

Programmers cannot anticipate every possible exception to the rules. The exceptions have to be painstakingly documented and compiled and integrated into the algorithms, and these exceptions are accrued over long periods of time. I have no doubt that the search engines rewrite many of their document classifiers several times each. Code becomes confused and inefficient as it is changed. Eventually it’s better to just rewrite a program that has outgrown its initial specifications so that it can handle current and future needs much better.

I wish I could give you a formula to calculate how long you have to wait for the Disavow process to work, but I doubt even Matt Cutts could give you a reasonable estimate. The search engines are still learning from the data we supply them, and they no doubt are trying to be careful to weed out false-positive data. Just because you Disavow a linking Website doesn’t mean it’s really misbehaving. Remember, as soon as these tools were made available people began speculating that “bad Disavows” could be used to get sites penalized or downgraded. Regardless of whether the search engineers thought about that possibility in advance, they know about it now.

And that just may help explain why it is taking so long.

UPDATE: After this article was published, Search Engine Roundtable spotted a discussion in the Google support forums where it was pointed out that “only the most recent copy of the Disavow File (list) is used” to block PageRank and Anchor Text from passing to your site. So if at first you don’t succeed, you can try, try again. Presumably, Google resets the block list when it detects a change in submitted URLs.

Holiday Special Offer: Click here for discount SEO Consulting and Website offers from Reflective Dynamics.

Follow Reflective Dynamics

A confirmation email will be sent to new subscribers. Please look for it!

Click here to follow Reflective Dynamics on Twitter: @RefDynamics.

Reflective Dynamics Blog RSS Feed (summaries only)


7 comments for “How Google’s Disavow Tool Works with the Penguin Algorithm

  1. Jon Wade
    July 3, 2013 at 4:35 am

    Interesting stuff. Do we have any idea of the sort of possible on-site factors that may be at play. I was affected by Penguin2. I have disavowed a bunch of dodgy looking links, including some old directory links with keywords for anchors, no change yet.

    Wondering about on page factors. One of my main sites has a 2 word keyword phrase mentioned 137 times. It is a big page though, with over 30,000 words, so for old skoolers that is about 0.9%. Wondering if the sheer number of mentions looks odd to Google now. Most are in the comments where people have searched for X and then asked for more info on X, so it does not read odd at all.

    Other than that, nothing else odd that I can see on the site, so maybe I should be looking more at links.

    • July 3, 2013 at 5:54 pm

      Jon, I would scan the page for readability issues. Beyond that, you might look into Russ Jones’ Penguin analysis (Cf. Open Penguin Data). You should download the data and just look through it, visit some sites, etc. The correlation data is not very helpful except to illustrate some of the things you can look at/for in a statistical analysis.

  2. Martin
    July 3, 2013 at 1:15 pm

    Muy ilustrativo el artículo y bastante coherente pero lo dificil es conocer que enlaces te están perjudicando. Existen algunas herramientas que ofrecen informar de los enlaces tóxicos y no son nada baratas pero yo no me creo que una herramienta automática te pueda informar de estos link.
    Como el caso del anterior comentario Jon wade, tengo un sitio del motor en español afectado. Desde Mayo de 2012 he hecho cambios en la web. He repudidado varios enlaces que tengo como sospechosos. Entre ellos enlaces que tenía en comentarios de blog con anchor text.
    No tengo penalización manual pues así me han informado en herramientas webmaster.
    ¿Sabemos de casos concretos que se han recuperado?
    Matcutts ya dijo cuando se lanzó el pinguino que en muchos casos quizás la única solución era abrir un sitio nuevo.

    • July 3, 2013 at 6:12 pm

      Martin,

      Creo que cualquier persona afectada por Penguin – si se utilizan las herramientas populares de enlaces SEO y enlace tácticas – debe saber que une a repudiar.

      Si “enlaces colocados” a través de “blogs invitados”, “comentarios del blog”, “Guía de SEO”, “infografia”, “redes de blogs”, o “la comercialización del artículo”, entonces esos son probablemente malos enlaces. Si usted contrató a alguien para “establecer vínculos” para usted, entonces usted sabe que vincula a repudiar.

      Yo sólo mirar a los llamados enlaces “dofollow”. Cualquier herramienta para detectar malos enlaces hacer conjeturas cuestionables.

      No hice una lista de pingüino casos de recuperación porque no puedo estar seguro de que son fiables. La gente puede creer que estaban arreglando rebaja pingüino, pero pueden haber fijado otra cosa. La mayor parte de las “recuperaciones reclamado” he leído acerca de sí dijo que comenzaron con nuevos sitios.

      La gente debe mirar con cuidado los sitios que eran similares a los suyos, en su contenido y backlinks. Si cualquiera de estos sitios recuperados son bastante similares, es la mejor guía para usted.

      IN ENGLISH
      I believe that anyone affected by Penguin — if they used popular SEO link tools and link tactics — should know which links to disavow. If you “placed links” through “guest blogging”, “blog comments”, “SEO friendly directory”, “infographics”, “blog networks”, or “article marketing” then those are probably bad links. If you hired someone to “build links” for you, then you know which links to disavow.

      I would only look at so-called “dofollow” links. Any tool to detect bad links will make questionable guesses.

      I did not make a list of Penguin recovery case studies because I cannot be sure they are reliable. People may believe they were fixing Penguin downgrades but may have fixed something else. Most of the “claimed recoveries” I have read about did say they started with new sites.

      People should look carefully for sites that were similar to their own, in content and backlinks. If any of those recovered sites are similar enough, that is the best guide for you.

  3. Arvid
    July 13, 2013 at 8:06 am

    Michael, it’s a great analysis, but the thing that really interests me is in the 2nd paragraph. You say that if it’s just the algo, then there’s no message. Can I go a little bit off-topic?

    Do you remember last July when webmasters were sent two different unnatural links warning messages?

    Let’s say Site A and Site B both received the severe message on 19th July but then the Site B also received the more lenient (updated) message on 23rd July.

    Would you say that both sites were hit by a manual penalty, or was it just the Site A that was hit by the manual action?

    • July 13, 2013 at 9:19 am

      The article assumes that any message you receive from Google is intentional, not a mistake.

      If I received a message that was sent by accident and I saw no changes in Google referral traffic I would not be concerned.

      If I did see changes in Google referral traffic but they said the message was a mistake I would err on the side of caution and assume that something is wrong but that the message is coincidental. So I would review my site looking for things an algorithm might address without the assumption of a manual action.

Comments are closed.