Hello there,
I`ve requested that my client`s developer fixes the 500 errors that keep appearing in the Google Search Console crawl errors report, but he doesn`t want to tackle this until we get to the root issue that determines where these URLs are coming from.
The URLs that are being detected appear to be from a much older version of the site (and potentially, from some other domains that they had set up that were effectively duplicates of the site with different domain names, which have since been redirected).
His questions are:
- why is GSC detecting these URLs now, and where is it finding them (the vast majority of these links are linked internally from other pages that serve a 500 error)
- how is GSC finding new ones? It`s worth noting I did a Wayback Machine search on a few of these and some appear to be as old as 2010, so I`m guessing if WBM can find them they were pages that existed at some point and the redirects haven`t been configured correctly - hence why Google is finding them (?)
- why does GSC drop URLs from the list (I have explained it does so after crawling and finding the error several times)
There were a few URLs that had linked from an external source - some random page on an Aussie site which may have been hacked based on how malformed the URLs were, and given the results that appeared when I performed a site: search on this domain. Upon checking these errors a week or so later, they had disappeared from the list and it looked to me like the site that these URLs were linking from may have been compromised in some way, but had been fixed and so these URLs had also been removed from the error report.
My recommendation is to redirect all of these URLs that serve a 500 error to the homepage as he has done rewrites for a number of them (which I have now marked as fixed) but as new ones keep appearing, he does not want to have to keep writing redirect rules to redirect any new URLs that appear. Which is fair enough, cause who knows how long and how many more URLs GSC will pick up.
If it helps at all, the site is currently hosted on an IIS server and I understand that it previously was hosted on an Apache server. It`s now built in .net (I think previously it was .php) so I don`t know if there is some kind of compatibility issue with rewrites. All of the server errors are caused by URLs from the OLD version/s of the site. Could it even be that these pages should serve a 404, but are instead serving a 500 for some reason?
Hopefully someone can help me - failing that I`ll go stick my head in an oven.
Many thanks,
Vic