Dumb SEO Questions

(Entry was posted by Emma Schwartz on this post in the Dumb SEO Questions community on Facebook, 05/21/2020).

Googlebot does not yet support HTTP/2

I`m getting an error in search console saying that a number of my posts and pages are displaying a `noindex` tag. These pages are not set to noindex. Any idea what could be causing this? (this is happening on two different websites). I`ve checked the source code and the pages are definitely not set to noindex
This question begins at 00:27:36 into the clip. Did this video clip play correctly? Watch this question on YouTube commencing at 00:27:36
Video would not load
I see YouTube error message
I see static
Video clip did not start at this question

YOUR ANSWERS

Selected answers from the Dumb SEO Questions Facebook & G+ community.

  • Jacek Wieczorek: Maybe it`s a GSC glitch. I`d run a separate crawl to be 100% sure. Have you checked X-Robots too? I`d recommend pulno.com for a crawl :)
  • Emma Schwartz: It is only displaying the error in search console.I have quite a few pages that are not indexed and it`s baffling me as to why
  • Emma Schwartz: And it`s happening on two different sites 🙈
  • Jacek Wieczorek: I can see 10 pages with a nofollow attribute on your site. How many Google found?
  • Emma Schwartz: The only pages that should be noindex are the category archives
  • Emma Schwartz: Also there are around 10 no indexed pages which should not be indexed as they are landing pages. GSC is showing a lot of my posts as being no index when these are not set to noindex
  • Jacek Wieczorek: Have you recrawled them in GSC? What does Google say after another "inspection"?
  • Jacek Wieczorek: It`s the status from December 2019. Request re-indexation and let`s see what will G tell
  • Brenda Michelin: Emma Schwartz That is your XML Sitemap, it should NOT be indexed, that is fine.
  • Emma Schwartz: Jacek Wieczorek I keep requesting it and nothing happens.
  • Brenda Michelin: Your category / tags are set to noindex on those pages, so Google is right.
  • Emma Schwartz: Brenda Michelin The category archives are set to noindex but not the actual posts / pages
  • Emma Schwartz: Its weird because some new posts get indexed quite quickly whereas other posts are pages are not being indexed at all even when I request indexing. It`s actually happening on 3 websites which is quite strange (maybe it`s something I have done without realising?)
  • Jacek Wieczorek: In this case, I`d say it may be backlinks and low domain power. https://www.pulno.com/blog/tutorial/no-indexing-reasons... Pulno finds only one link. If your page was noindexed before or there were some issues with your robots.txt (like errors 500). Also, the page "pricing" isn`t much of value to Google. Add a couple of paraghaps and get a few backlinks. Can you share the other URL you have issues with indexing?
  • Emma Schwartz: Jacek Wieczorek I`m not too bothered about that particular page to be honest (it`s just one example) The site doesn`t have many backlinks but I wouldn`t have thought that would cause search console to displays errors relating to noindex tags that aren`t there. It`s very frustrating but probably not much I can do about it. This site also has the same issue fruitandvegbox.co.uk.
  • Jacek Wieczorek: It`s usually not a good idea to noindex blog or category pages. Check this out: https://www.seroundtable.com/google-long-term-noindex...
  • Emma Schwartz: Jacek Wieczorek I have never known noindex on category archives to cause the posts themselves to not be indexed. Easily sorted if that`s the problem! :)
  • Richard Hearne: Brenda Michelin Nope. It`s a page /pricing. The report only says the URL was listed in the XML sitemap.
  • Richard Hearne: Brenda Michelin Sorry, it will seem like I`m picking on you. I`m not. But I don`t think your comment about categories / tags is relevant either.
  • Brenda Michelin: LOL, I am the grasshopper! 😀
  • Henry Vanny Unabor: If you use Yoast, check if the no index is set there.
  • Emma Schwartz: It`s not set to noindex
  • Henry Vanny Unabor: Emma Schwartz okay. Check if you have any plugin that you have set to protect your self from crawlers or the public domain. Also check the settings of any security plugin you are using. You might as well check out for updates if available and finally, disable individual plugins to see of the cause is from there. If you use AMP plugin, also check the SEO setting too
  • Emma Schwartz: Henry Vanny Unabor Tried disabling plugins (apart from elementor and elementor pro).
  • Henry Vanny Unabor: I think it should be your cache. It just happened to me. Please clear your cache both wordpress and other external caching eg eg cloudfare or sucuri
  • Maja Jovančević: Have you checked the http header for X robots tags
  • Emma Schwartz: it all seems to be ok as far as I can tell
  • Michael Martinez: Emma Schwartz Are you using a tool like this one to check the HTTP headers?

    https://www.webconfs.com/http-header-check.php
  • Emma Schwartz: I tried a couple of different tools, nothing seems to be showing any noindex tags 🤷‍♀️
  • Michael Martinez: Well, the only other thing that comes to mind is a controversial 302-hijack trick. I would be surprised if it works.

    What do you see when you click on one of the links in Google Search Console? Does it provide any more information in the deeper report?
  • Emma Schwartz: When I click on it, it basically just says that a noindex tag has been detected.
  • Michael Martinez: Emma Schwartz If you send me a PM with an example URL I will try to help you diagnose this further.
  • Emma Schwartz: Michael Martinez Thank you, will do.
  • Michael Martinez: Do the URLs end in "/feed/"? If so, many popular SEO plugins automatically add a "noindex" to these RSS feeds. They are mostly single page and comment section feeds.
  • Emma Schwartz: No, they are actual posts and pages (mainly posts).
  • Victoria Gitelshtein: You can scan your site(s) with Screaming frog and check what is nonindexable actually.
  • Emma Schwartz: The urls are showing as indexable in screaming frog but not in search console.
  • Richard Hearne: Emma Schwartz Try doing a live test via GSC or Mobile Friendly Testing Tool (https://search.google.com/test/mobile-friendly), check the HTML output and headers either of those tools provide and look for NOINDEX.

    It`s very weird that Google would retain such a stale version of a page in this day and age. Stats from Dec 2019 are pretty odd. You could create a new XML sitemap file and manually submit it. I`d imagine they`ll crawl pretty quickly.

    But this is very odd indeed.
  • Emma Schwartz: It`s really weird. I have checked the HTML using the mobile testing tools (amongst other tools) and there is no noindex.
  • Richard Hearne: I`ve just checked using x-forwarded-for Googlebot/Google IP, and it looks clean with INDEX. The above might be able to get around cloaking sometimes done by hacks.

    My best/last suggestion is that you wait for the indexing request to go through in GSC and wait for the report to update based on current page.

    If that updates and it still shows NOINDEX I`d be expecting either something very weird with your hosting, or a hack of some kind (although the value of NOINDEX really negates this possibility IMO). Best of luck with it.
  • Richard Hearne: Just in case it might have any use, here are the CURL queries I used (first HEAD request, second full request):

    curl -I -L -H "X-Forwarded-For: 66.249.66.219" -H "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" https://emmaschwartz.co.uk/pricing/

    curl -H "X-Forwarded-For: 66.249.66.219" -H "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" https://emmaschwartz.co.uk/pricing/
  • Emma Schwartz: Richard Hearne Thank you. I have actually been waiting a while after requesting indexing and it`s still doing the same. Some URLs are indexing fine, it seems quite random which ones index and which don`t. Thanks for your help
  • Michael Martinez: I have a partial update on this. Unfortunately I forgot that Facebook sometimes neglects to tell me when I have messages, so it was quite a few hours before I looked at the URLs Emma shared with me.

    I did suggest she try submitting the URLs for indexing before reading her last reply in this discussion.

    The site is running on HTTP/2. Googlebot doesn`t crawl HTTP/2. It`s supposed to negotiate down to HTTP/1.1 with a server if it can do that.

    I tested one of the URLs with Google`s mobile-friendly test tool and it was able to fetch SOME parts of the page but not all.

    ADDED ON EDIT: I`ve advised Emma to check with her Web host to see if they can shed any light on the HTTP/2 versus HTTP/1.1 situation. It looks to me as though the server is handling some of Googlebot-mobile`s requests.
  • Tye Dee: Time to get a new host there Emma Schwartz.
  • Emma Schwartz: Tye Dee I`m with siteground, I`ve not heard of any other siteground users having this issue. I`d like to be sure its definitely a hosting issue before moving to a new host, especially as I`ve already paid upfront for the year 🤦‍♀️
  • Tye Dee: Your site is new then correct?
  • Emma Schwartz: Tye Dee No it is not a new website.
  • Tye Dee: Are you seeing in search console any errors mentioned?
  • Emma Schwartz: It says that certain pages have noindex tags when they dont.
  • Emma Schwartz: and there a lot of pages that haven`t been crawled for months (or at all).
  • Tye Dee: When you do site:yoururl in Google do you see all those pages indexed or not sure all?
  • Emma Schwartz: No not all the pages are there only a few urls are indexed
  • Tye Dee: What SEO plugin are you using?
  • Emma Schwartz: Just Yoast
  • Michael Martinez: Has Siteground replied to your inquiry?
  • Emma Schwartz: Michael Martinez yes they said it isnt to do with them and they cant help
  • Michael Martinez: Emma Schwartz Well, there is a way to test whether HTTP/2 is the problem. I don`t know how to do it on Nginx but maybe someone can suggest how it`s done.

    Basically, you configure your site NOT to serve content over HTTP/2. This can be done in the ".htaccess" file for Apache sites.

    If your server is configured to use HTTP/1.1 as a backup then everything should be fine.

    If it`s configured to only use HTTP/2 then your site will immediately stop working and you`ll have to undo the configuration change.

    I don`t know if you can do this on a Siteground hosting plan. It`s not something I would recommend for a beginner.

    A lot of people are reporting indexing problems with Google. It`s possible HTTP/2 is the problem but I think more likely there are multiple potential causes.

    Earlier this week on Twitter, Googler Gary Illyes said they`ll dump what their algorithms determine to be "low quality content" from the index if they need the space. They`ve always dumped content from the index and most of it is recrawled and reindexed quickly.

    One possible indication that your content has been deemed "low quality" *MIGHT* be if the URL inspection tool in Google Search Console throws up a CAPTCHA every time you submit a URL for indexing (this is only a hypothesis and without confirmation from Google there is no way to confirm that this is the case).

    If you`re not getting a CAPTCHA on URL submission then I don`t think Google has flagged the site as problematic. But, again, without confirmation from Google there is no way to confirm that hypothesis. They use the CAPTCHA for a reason but I have never seen an official explanation for what that reason is.
  • Michael Martinez: I should add that if Siteground is only running on HTTP/2 then my hypothesis seems less likely. As recently as a year ago I found independent confirmation of Googlebot`s inability to crawl HTTP/2. I haven`t seen anything recent.

    I have 1 client who launched an HTTP/2 site last year (we confirmed that the hosting server had deactivated HTTP/1.1) and it had problems getting into Google`s index. The site is now fully indexed and running on HTTP/2. We submitted and submitted and checked everything on the site for over a month and nothing happened. When I checked recently Google had been fully indexing the site for several weeks.

    So it`s conceivable they are upgrading their crawlers and not all of them are yet able to handle HTTP/2. But you should continue looking for other possible reasons why this is happening to you.
  • Michael Martinez: Okay, Googler Martin Splitt *HAS* confirmed (as of April 29, 2020) that Googlebot does not yet support HTTP/2.

    This may relate to your problem but it`s hard to tell. I am sure the problem has something to do with the resources that the Google Mobile Friendly test tool cannot fetch.

    https://youtu.be/nZO0OVN37aY?t=710
  • Emma Schwartz: Wouldnt all siteground users (and all sites running http2) experience similar issues if that was the case? I will keep checking it and hopefully will see some change.
  • Michael Martinez: Emma Schwartz If Siteground is only using HTTP/2 then everyone should feel the pain.

    So I`m guessing that HTTP/1.1 is still running in the background. That leaves open the possibility that some resources might only be served via HTTP/2.

    Why? I don`t know. HTTP/2 is kind of a spooky protocol, sort of operating like quantum mechanics. There have been questions about its stability ever since Google developed SPDY (the protocol from which HTTP/2 was developed).

    I think if I were in your position I would look at the resources that the mobile-friendly test tool cannot fetch. They may be the key to your problem.

View original question in the Dumb SEO Questions community on Facebook, 05/21/2020).