Selected answers from the Dumb SEO Questions G+ community.
Michael Martinez: It`s not a crawl budget waster unless you`re talking about 100, 000+ URLs. Orphaned pages are not necessarily a bad thing but they do make it more difficult for search engines to crawl a site (and most if not all SEO crawlers will miss them in basic crawling).
Scott Clark: The numbers of sitemap references are at that level. The actual number of pages referred to is <5k. Lots of repeats. I have no idea how they made this sitemap.
Michael Martinez: That sounds like it could be a problem. If the pages aren`t serving any useful purpose you should probably recommend they purge them. Otherwise, some sort of restructuring seems necessary. At the very least, rebuild the sitemap.
Scott Clark: I`m sure there`s a good story behind this. But I`m thinking these numbers are large enough to potentially be a core cause of their (serious) ranking issue. I just don`t know if I need to use the word "core" here.
Stockbridge Truslow: Honestly, if it were me, I`d pull the site map altogether - at least until it`s cleaned up. You don`t NEED a site map to get crawled. All Google does with a site map is check it against what it has discovered and make sure it`s hitting what your link system is saying is there. (Well, it`s a little more than that, but... ) It is NOT used as the "basis" from which Google crawls your site.
On very large sites, I often have better luck with no sitemap at all than having one that can contain redundancies, orphan pages, and other issues. Let Google discover what`s important (all the important stuff should be no more than a handful of hops from the home page anyway). It`ll index everything just fine.
Mixed and confusing signals from a sitemap that doesn`t reflect what the site is saying is there hurts more than it helps. It makes Big G less likely to trust other signals you`re sending. It`s better to send no signal at all, than a confusing one.
Just my two cents. I know a zillion SEO`s will disagree, but.. sitemaps aren`t as important as Google and the SEO community make them out to be. Good navigation and structure trumps a site map every time.
Scott Clark: well, the trend I`m seeing is that injection via XML is the future and crawled discovery is on its way out. So if their XML is as screwed up as this, removal may very well be a good idea.
Stockbridge Truslow: I think in terms of this mysterious and ambiguous metric of "trust" that may be the case. If all your signals are consistent and clear and Google has a good understanding of how your site works, what it all means, and can feel safe in making assumptions about certain things, it may very well find a page on the site map, index that page, and then start ranking it based upon its assumptions about it.
For sites where that`s not the case, discovery will always be key. You can`t rank a page until you know how it connects to everything else and all that fun stuffs. If Google doesn`t fully understand your site or if some things do this and some do that and it generally doesn`t have a good sense of what you`re doing, it can`t make assumptions and has to wait through the discovery period anyway.
Without a site map, if you have a "What`s New" page set up and it clearly shows all the new stuff - Google will learn that and start to use that as it`s discovery seed as well. Been doing that for years and it still works - whether it`s a blog, product database, or whatever.
Dave Elliott: I`d treat it as a site structure issue as opposed to worrying about the sitemap. Are the orphaned pages important? if so why aren`t there any internal links pointing at them? If they aren`t why the hell are they still live?..yeah, could be a crawl budget issue but if they are orphaned is Google really crawling them often?
Scott Clark: Yes, and yes. I`m on here asking because I`ve never encountered such a sitemap as this even on multinational sites with the language signaling in the sitemap.
I feel confident in saying fixing this could fundamentally change their ranking, but it`s an outlier situation.