Dumb SEO Questions

(Entry was posted by Patrick Healy on this post in the Dumb SEO Questions community on Facebook, 07/28/2015).

HTTPS Issues

Hi Folks, I`m having an issue with a client site that`s been having issues since we moved to https – so I`m turning to a few communities that I trust to see if there is something I`ve missed. All ideas are welcome

They are developed in Joomla and have a subdirectory with a WordPress install as their blog (domain.com/blog). We`re submitting the sitemaps via XML and RSS for both site and blog. 

The WP pages are indexing fine. There are duplicates being submitted due to tags and categories but the `indexed pages` under the blog line up with the number of posts that they have. Don`t think there`s any issue there.

The Joomla site however, as you can see from the screenshot below, is not doing nearly as well. According to the google index, there is a stable 554 pages in their index of which 405 are the site (not the blog). So the pages are in there and are showing https (which is a relief). 

The thing that`s bothering me is that we`re submitting 217 pages from the Joomla site and GWMT is only saying it`s indexing 4. How can this be?

You`ll notice that the last three sitemaps on the list are the ones I`m talking about and the last two are somewhat redundant. We put that in at the advice of the XML sitemap generation plugin (Jsitemap) developer. He said that sometimes if you put a variation of the XML sitemap URL in this fixes the problem. It hasn`t (as you can see) and my intention is to delete it. 

Does anyone have any ideas here? I`m nervous that at some point I will have to be submitting all pages manually. I have never experienced something like this before and my gut it telling me that it`s the plugin making a bad XML sitemap but if that`s the case, how come GWMT is recognizing that we are submitting 217 pages?

I think you all in advance for your thoughts.
This question begins at 00:21:37 into the clip. Did this video clip play correctly? Watch this question on YouTube commencing at 00:21:37
Video would not load
I see YouTube error message
I see static
Video clip did not start at this question

YOUR ANSWERS

Selected answers from the Dumb SEO Questions Facebook & G+ community.

  • Patrick Healy: Hi Folks, I'm having an issue with a client site that's been having issues since we moved to https – so I'm turning to a few communities that I trust to see if there is something I've missed. All ideas are welcome

    They are developed in Joomla and have a subdirectory with a WordPress install as their blog (domain.com/blog). We're submitting the sitemaps via XML and RSS for both site and blog. ;

    The WP pages are indexing fine. There are duplicates being submitted due to tags and categories but the 'indexed pages' under the blog line up with the number of posts that they have. Don't think there's any issue there.

    The Joomla site however, as you can see from the screenshot below, is not doing nearly as well. According to the google index, there is a stable 554 pages in their index of which 405 are the site (not the blog). So the pages are in there and are showing https (which is a relief). ;

    The thing that's bothering me is that we're submitting 217 pages from the Joomla site and GWMT is only saying it's indexing 4. How can this be?

    You'll notice that the last three sitemaps on the list are the ones I'm talking about and the last two are somewhat redundant. We put that in at the advice of the XML sitemap generation plugin (Jsitemap) developer. He said that sometimes if you put a variation of the XML sitemap URL in this fixes the problem. It hasn't (as you can see) and my intention is to delete it. ;

    Does anyone have any ideas here? I'm nervous that at some point I will have to be submitting all pages manually. I have never experienced something like this before and my gut it telling me that it's the plugin making a bad XML sitemap but if that's the case, how come GWMT is recognizing that we are submitting 217 pages?

    I think you all in advance for your thoughts.
  • Phillip Marquez: Would you mind DM me the domain you're working with?  ;I'd like to poke around a little if I can manage it today between work.
  • Patrick Healy: Done!
  • Phillip Marquez: Brief summary for the public.  ;You never know when someone else might stumble across this looking for a little guidance:

    Remember that sitemaps are guidance to get your pages indexed.  ;Check your Index Status in GSC (GWT).  ;If you have most or all of your page count accounted for in Total Indexed here, then you're already OK.  ;Also, not every page deserves to be in the index -- this is normal.

    If you're still convinced you're missing out:

    1) Does GSC show crawl errors that need to be addressed?

    2) did you submit the https version of the domain to GSC (GWT)?

    3) do  ;your canonicals support your submitted URLs in your sitemap? (are they http in your sitemap?  ;Do you use filename (e.g. default.html) in one but not the other?

    4) www and non-WWW? (not the case here, but may be the case for others)

    Remember, sitemap submitted URLs vs sitemap indexed vs total indexed are, quite often, radically different.  ;I have a site with 111 submitted URLs, 94 sitemap indexed and 1,578 Total Indexed pages.  ;The important part here is that Total Indexed.  ;As long as this graph looks stable and changes relative to actual changes to my site, then I'm happy.
  • Phillip Marquez: Hmm.  ;Also pop into URL Parameters (under Crawl in GCS) and do a sanity check there as well.  ;I don't think this will be the problem since it looks like the index has a good chunk of non-blog pages based on your post and what I can see, but this is a really good idea if you haven't done so already.
  • Phillip Marquez: Also, make sure all your pages in your sitemap are returning a server 200 response.  ;You can check Crawl Errors - Server Error tab or if you want to really get down to it, take your sitemap and feed it into something like Screaming Frog and then check response codes for all sitemap URLs.

    (sorry, I guess I'll just keep dumping more and more as I think of 'em)
  • Patrick Healy: Great feedback +Phillip Marquez! I've done all of this - so this feedback actually makes me feel better in that I'm not crazy or deficient. :-) I've got a suspicion that this XML sitemap plugin is just not crafting the URLs right. The formatting (I know it means nothing) is not like anything I've ever seen so it immediately made me apprehensive.

View original question in the Dumb SEO Questions community on Facebook, 07/28/2015).