Selected answers from the Dumb SEO Questions G+ community.
Jesse McDonald: If you have a page that’s been indexed and you need it removed from index. The follow allows it to be crawled for the noindex directive to be seen by the crawlbot. At least, that’s how it’s been explained to me.
Ryan Jones: Basically you don’t want that page indexed but it may link to other pages that you do. This allows google to discover those other pages.
Arsen Rabinovich: Have you ever had to use it in this capacity? The only case I can think of is if there are a bunch of pages that are not linked to from anywhere on the site (or the sitemap) and the only way for Google to get to those pages is through this page that you don`t want to be indexed. Which is kind of silly.
Ammon Johns: >> "This allows google to discover those other pages."I know you know it Ryan, but be careful with the absolutes. It makes it *possible*, and it *may* allow Google to discover the links.However, there have been instances where a search engine treated the mere presence of a NOINDEX to trump all else, and include adding the links to a link index. Basically, if the parser finds a NOINDEX it ceases parsing. That`s pretty fair.AltaVista, back in the day, treated ANY robots meta as a NOINDEX for a while, and always treated the NOINDEX to trump and supercede any following of links. After all, technically, they have been given a clear instruction NOT to index the contents of the document, including its links, under their own agreed process (Robots Exclusion).For certain, links on a NOINDEX page, even if set to follow, are ignored completely when it comes to the calculation of PageRank, in all the original papers and patents.
Dan Thies: My favorite application for meta noindex and nofollow is to always put them in the code unless the request comes from a verified spider. Keeps lots of proxies from getting indexed.
Ammon Johns: The NOINDEX meta is utterly and completely unmissable and *always* assigned to the document. In that regard it is more reliable to prevent indexing than the robots.txt which requires a separate get, and a separate parsing process.The advantage of the robots.txt is to, hopefully, reduce the number of gets on files you don`t want indexed in the first place.So, to an extent think of the meta as the most reliable method for preventing indexation, while the robots.txt is more about controlling the crawl budget and resource allocation.
Arsen Rabinovich: Ammon, makes sense! My question is about a meta robots directive to noindex/follow, where you are telling G to crawl through a page but to not pick up any information, and how/when is it used.
Ammon Johns: Arsen Rabinovich in my above reply to Ryan Jones you`ll find those specifics, I think.
Arsen Rabinovich: I read that, but still can`t figure out in which scenario is it obsoletely necessary to use this meta robots directive.
Ammon Johns: Arsen Rabinovich in the scenario of wishful thinking, of course. :) Remember the old REVISIT AFTER meta tags? They were there to act as a throttle, originally, but lots of people used wishful thinking to try to use them as an accelerator. The exact same thinking applies, on both sides.The Robots Exclusions are a tool to limit and restrict bots. Never, ever, as a tool to control them. FOLLOW just tells the bot that it`s okay IF IT WANTS TO follow the links. It is never, ever, treated as an instruction.Even if the bot followed the links, unless it finds other, indexable links as well, then the resulting URL is treated as an orphan page, with zero INDEXED links, and completely and utterly removed from PageRank calculations, just as dangling links are removed.
Rishi Lakhani: what your thought on this Ammon Johns http://refugeeks.com/search-results-pages-index-robots.../
Ammon Johns: Rishi Lakhani I think, between the 8 or so replies already made, I`ve been pretty clear. :) Like you, I`ve seen instances where a robots.txt has failed - sometimes due to a robot getting an error when trying to grab that file. So like you, when I really want to keep a page out of results, I`m far more likely to use the robots meta tag set to NOINDEX.Only place we differ is that I have NEVER seen an instance where the follow or nofollow made any difference to indexing. There`s absolutely no reason or logic for it to do so.
Rishi Lakhani: Alan Bleiweiss Ammon Johns what about http://refugeeks.com/keeping-page-out-of-googles-index/... #
Alan Bleiweiss: Rishi whatever that page is about I don`t know. Multiple attempts have taken more than 20 seconds without rendering starting so I left.
Alan Bleiweiss: Rishi okay page finally loads. So meta robots noindex is, itself, not a problem, and is valid for noindex needs. My issue is with the magic combination of noindex, follow. That hilarious combination.
Rishi Lakhani: Alan Bleiweiss I think you arent seeing my argument here. No index by ITSELF doeesnt guarantee noindexation. noindex, follow combo DOES.
Rishi Lakhani: Alan Bleiweiss Noindex can still index a url without the snippet.
Rishi Lakhani: Alan Bleiweiss https://www.google.co.uk/search?q=refugeeks+best+seo+site
Jim Munro: Rishi Lakhani I`m not sure that`s accurate, mate. "follow" is not amongst the list of directives. "follow" is what happens if you don`t add a comma and specify a second directive "nofollow". https://developers.google.com/.../refe.../robots_meta_tag...
Ammon Johns: Rishi Lakhani regarding the https://www.google.co.uk/search... thing, obviously there is no Robots meta involved as the page doesn`t exist.Are you conflating different parts of the algorithms here and getting a bit mixed up?
Rishi Lakhani: Ammon Johns there is cross examples yes. i think I need to run a couple of practical experiments to demonstrate what I mean.
Rishi Lakhani: Jim Munro ok I see where my error here is - we are looking at the x robots not page level meta?
Ammon Johns: Incidentally, on a 404 page is where I most recommend putting a NOINDEX meta tag precisely because it prevents that sort of mishap. Otherwise, if enough links to the URL exist, and it is a result without millions of results, Google may well rely on it`s link database and trust that lots of webmasters knew something the spider couldn`t reach, that they hadn`t specifically been told never to index.
Jim Munro: Rishi Lakhani I`m still not sure if you are teasing me, mate, but X-Robots_Tag and meta robots tag work the same. They differ only in the delivery.
Rishi Lakhani: Jim Munro I think I need to rework the logic in my instance:Using page level Robots Meta: When you "noinndex" a page - the SE will not index it, however if the url is found via some other means - then it MAY index just the URL AND surface it for results. When you "noindex, follow" search engines WONT index the URL regardless of whether they found it elsewhere. Therefore in situations when a page`s discover-ability is out of your hands, eg linked to by an external site, using a "noindex, follow" directive works better at keeping even the URL out of the index. Thats what i am trying to say.
Ammon Johns: Rishi Lakhani I`ve seen a fresh URL surfaced where Google hadn`t crawled the page, so hadn`t yet discovered it`s meta content, and knew of the URL only from links, but it was a very specific set of circumstances, and, as stated, they hadn`t been able to grab the page yet to even see the NOINDEX meta content.I haven`t seen a page Google have crawled and indexed despite having found a Robots Meta set to NOINDEX. Ever. Got an example?
Rishi Lakhani: Ammon Johns I am going to have to run the experiments again. You want to collabo to build the test rules so that you can adjudicate?
Arsen Rabinovich: TBH, I was kind of waiting for Alan to chime in on this. But I already know what his thoughts on this are. <3
Ammon Johns: Sheesh, if I`d known you only wanted the newbie response ... ;)
Arsen Rabinovich: I want to hear a good case for the use of it.Why? Because I`ve been yelling at people telling them that it is utterly useless, and just wanted to make sure I`m not missing anything. So far it looks like I will continue yelling at people who use it.
Ammon Johns: Arsen Rabinovich It`s still useful as the most reliable way to prevent the indexing of the page while *allowing the possibility* of the discovery of links. After all, technically, couldn`t the presence of NOINDEX and NOFOLLOW be interpreted to suggest that, at least potentially, the author/publisher didn`t want any of the links indexed either?It`s not harmful.Unless someone thinks it WILL make the links count, which is highly unlikely.
Arsen Rabinovich: Do you think noindex/follow may become harmful by eating up crawl budget allocation? I think Mueller said something at one point about us not worrying about crawl budgets, but I think he was talking about 301s.
Ammon Johns: Google would rather retain full control of their own indexing and crawl, outside of what is explicitly forbidden. But then, Google don`t mind if 10% of your site that is algorithmically designated as `low priority` isn`t indexed, while you might.You wanted a specific use case where the tag might be useful and not ridiculous.Some years back, a particular content site had a HTML page sitemap (well, not really HTML as it was dynamic, but same difference). But that Sitemap wasn`t a brilliant landing page for giving a good experience to the user, more a tool of last resource for navigation.Some of the searches that led to the site were for specific article titles, and where those were in the deeper, less-used parts of the extensive site, sometimes the superior linkage to the sitemap (linked to in every page including the homepage) made it come up above the actual article.Eventually, they put the NOINDEX meta tag into the page, as they still had many other methods of page discovery, and it really wasn`t a great `first impression` landing page - too confusing. I think that was a sensible use case. They set it to NOINDEX FOLLOW just to send a clear directive of their intent - only deindex THIS page.
Jim Munro: Sorry, mate, but there are good reasons for using noindex. For example you can use it to exclude auto-generated fluff like tag pages. BTW, the default is index, follow so leaving off the follow directive will have the same outcome as adding it. Better to leave it off altogether. Also, if you noindex a page with a meta tag or X-Robots-Tag, also make sure that you do not block this page in robots.txt so that googlebot can freely discover the noindex. googlebot`s going to crawl the page on it`s own timetable no matter what you do so it`s only use is in hiding low-quality content.
Michael Martinez: Arsen Rabinovich The Web marketing community`s emphasis on "crawl budget" is misplaced, due (in my opinion) to confusing the search engine`s "crawl budgeted" with what the Website owner can do to manage crawl (two completely different things). The sea...Lihat Lainnya
Ammon Johns: Michael Martinez yeah, the language gets convoluted fast. Essentially, only Google control how much budget is given, but, the webmaster has some control over how much of it may be `wasted` or rather spent on low-priority over high priority tasks. You can`t make Google want to index more of your site than it does - but you can do quite a lot to control which 80% of it it does. That make more clarity?