Thursday, March 8, 2018

Googlebot added /Common/Ajax before the link and caused 404

This is pretty interesting. It was sent to me by a top 100 etailer. Has anyone seen this before? What advice would you pass along to their internal team?

"We found a lot of 404s from Googlebot.

Browsers don’t seem to get these 404s, but Googlebot seems to be taking the protocol neutral links (“//” instead of “https://”) and rendering the JS to add “/Common/Ajax/”

The 404 looks like they are from img request: /Common/Ajax/\x5C/\x5C/images10.xxxxxx.com\x5C/Marketing_Place\x5C/Seller_logo\x5C/Seller_A958_2cfd34af-4b2f-471d-8195-4ada385bb9c0.gif

1. The link should be \/\/images10.xxxxxx.com\/Marketing_Place\/Seller_logo\/Seller_A958_2cfd34af-4b2f-471d-8195-4ada385bb9c0.gif

2. But Googlebot added /Common/Ajax before this link and caused 404" Again, can`t say I`ve ever seen this. My thanks in advance for any feedback.

  • Arsen Rabinovich: Following
  • George G.: https://developers.google.com/webmasters/ajax-crawling/docs/learn-more i`d send them this link and ask them if all this is done like google wants it
  • Michael Martinez: They should verify the ip addresses. Sounds like a broken bot spoofing googlebot. Some people run fake bots from the Google cloud. Googlebot has its own ip addresses.
  • Stockbridge Truslow: Somewhere in your script something is wrong. Most likely a double quote where there should be a single or vice versa. It`s not parsing the URL correctly. First, it`s not recognizing the path because the // isn`t being decoded back from the \x5C/\x5C/ value that is used to store it in the database. It is that code there that says, "Start over from the top". Without those being parsed and decoded, it`s starting at the path of the script doing the execution and appending the URL there. Not sure what script that would be - I`d probably start looking for Lazy Load since that`s the most common ajax script used for images, but really - it could be anything.

