Dumb SEO Questions

(Entry was posted by Alexander Velinov on this post in the Dumb SEO Questions community on Facebook, 03/15/2014).

What should I do to de-index all these pages?

Hi everybody. I am new in this community and I will be happy if someone helps me with advice.
I have the following issue. We start a new site. The site was develiped under dev.mysite.com version. That`s why all pageas at this moment are indexed. So when i found this i suggest to set password on dev version, so it shuld stop to be indexed. I think that is good to set 301 for all pages from dev version. What should I do to de-index all this pages in a gentle way without having some issues with google? Is anything that i am saying wrong?
Thank you in advice. ?
This question begins at 00:19:59 into the clip. Did this video clip play correctly? Watch this question on YouTube commencing at 00:19:59
Video would not load
I see YouTube error message
I see static
Video clip did not start at this question

YOUR ANSWERS

Selected answers from the Dumb SEO Questions Facebook & G+ community.

  • Alexander Velinov: Hi everybody. I am new in this community and I will be happy if someone helps me with advice.
    I have the following issue. We start a new site. The site was develiped under dev.mysite.com version. That's why all pageas at this moment are indexed. So when i found this i suggest to set password on dev version, so it shuld stop to be indexed. I think that is good to set 301 for all pages from dev version. What should I do to de-index all this pages in a gentle way without having some issues with google? Is anything that i am saying wrong?
    Thank you in advice. 
  • Jim Munro: Hi Alexander, I am not an expert but I think rel=canonical would be the kindest, gentlest method of correction. Do not block in robots.txt.

    https://support.google.com/webmasters/answer/139394?hl=en
  • David Harry: Robots.txt them out and hit the page removal tool in GWT. About all U can do. In the future always block the dev server from Google.
  • Jim Munro: I'm not sure you noticed but the issue is that the pages are in the index, David.

    robots.txt won't take them out, will it?
  • David Harry: Nope, but if you use the page removal tool and then robot the dev server, should work. ;
  • Tony McCreath: The removal requests only last 90 days so the blocked pages may return to the search index.

    I think the best solution is not to block in robots.txt but do a noindex.

    You could do it in the robots meta tag for every page.  ;However that may not be a trivial task and does not remove non html files like images and include files.

    An alternate that should work better is to set the x-robots http header tag to noindex on every file via the .htaccess file or similar.  ;e.g.

    Header set X-Robots-Tag "noindex, nofollow"

    Google has this guide: ;

    https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
  • Jim Munro: The reason I like rel=canonical for this scenario is because the dev subdomain's pages are already in the index and there's no way of knowing how long it will take to index the primary site, particularly when googlebot already knows that identical pages pre-exist on the dev subdomain.

    If he rel=canonicals page for page from the subdomain to his primary site they'll be seamlessly replaced as they are recrawled but not if they are robotted out. Noindexing will only drop the dev pages from the index, it won't cause googlebot to forget them.
  • Tony McCreath: +Jim Munro ;I do like that. Leverage them to help speed up indexing.
  • Alexander Velinov: Thank you for all comments. They will be very usefull for me. We just make password protection of dev version and it seems that dev.mysite begins slowly to be pushed by mysite version. I like canonnical but in this case when develpoers need dev. version which should be password protected do you think that canonnical will work?
  • Jim Munro: If googlebot is not allowed to read the page, it won't learn of  ;the canonical to apply it. It should be OK to have a dev site as long as all pages have a rel=canonical pointing to the same page on the primary site. I think so anyway. However, if ;+Eric Wu ;or +Alistair Lattimore ;have an opinion on this, you should take note of what they say.
  • Eric Wu: 1. Check analytics to see if you're receiving any organic search traffic to dev.

    If no (or very little), just 301 redirect all URLs to www.
    If yes, then rel=canonical as +Jim Munro ;says.

    After all the traffic is migrated over, or you're seeing a slow down Googlebot requests in your server access logs to dev., and you're really annoyed by having the URLs in the search index, then at that time, put up a robots.txt to block all the URLs and then request removal through Google Webmaster Tools.

    +Tony McCreath ;is correct that removal only lasts 90 days, but it can be nearly permanent if there aren't any search signals pointing in that direction anymore (via the canonicals / 301s).

    If you want to prevent your dev or stage environment from being indexed in the future, check out my post on StackOverflow for some hints on how to get that done: ;http://stackoverflow.com/questions/12197979/how-to-prevent-staging-to-be-indexed-in-search-engines/12219987#12219987
  • W.E. Jonk: From the expert panel in this weeks SEO Questions hangout on air on 00:20:07 into the YouTube video: https://dumbseoquestions.com/q/what_should_i_do_to_deindex_all_these_pages +Alexander Velinov

View original question in the Dumb SEO Questions community on Facebook, 03/15/2014).