3 Advanced WordPress SEO and duplicate content
Once you’ve done all the basic stuff, you’ll find that the rest of the problems amount to one simple thing: duplicate content. Loads of it in fact. Out of the box, WordPress comes with a few different types of taxonomy:
- date based
- category based
- tag based
Next to that, it seems to think you actually need to be able to click on from page to page starting at the frontpage, way back to the first post you ever did. Last but not least, each author has his own archive too, under
/author/<author-name>/, resulting in completely duplicate content on single author blogs.
In essence that means that, worst case scenario, a post is available on 5 pages outside of the single page where it should be available. We’re going to get rid of all those duplicate content pools, by still allowing them to be spidered, but not indexed, and fixing the pagination issues that come with these things.
3.1 Noindex, follow archive pages and disable some archives
Using the Yoast SEO plugin, make sure to prevent indexing (or even existence) of archive pages that do not apply for your site. You do this under SEO → Titles & Metas, where you’ll find the following options on the “Archives” tab:
The settings above are the settings for our site. As you can see, we’ve completely disabled the date based archives, as we don’t use those. Any date based link will redirect to our homepage because of this setting. We’ve left the author archives untouched, but we have set the subpages of those archives to be noindex, follow by default. So you’ll never land on page 2 of an archive on our site from the search engines (change this on SEO → Titles & Metas → Other tab):
On smaller sites it might make sense to noindex either the category or the tag structure, but in our experience noindexing those on yoast.com does little to no change at all.
There is one type of archive that is noindex,follow by default as well in the Yoast SEO plugin: the search result pages. This is a best practice from Google for which a setting is left out as you should just have that anyway.
A lot has changed in how Google handles paginated archives recently when they introduced their support for
rel="prev" links. We’ve written an article about that:
rel="prev" for paginated archives, which is a bit too technical to fully list here, but suffice to say our Yoast SEO plugin takes care of all the needed changes automatically.
3.2 Disable unnecessary archives
If your blog is a one author blog, or you don’t think you need author archives, use WordPress SEO to disable the author archives. Also, if you don’t think you need a date based archive: disable it as we have. Even if you’re not using these archives in your template, someone might link to them and thus break your WordPress SEO…
Thirdly, you’ll want to make sure that if a bot goes to a category page, it can reach all underlying pages without any trouble. Otherwise, if you have a lot of posts in a category, a bot might have to go back 10 pages before being able to find the link to one of your awesome earlier posts…
There’s an easy fix, in fact, there are several plugins that deal with this. Our favorite one by far is WP-PageNavi, maintained by Scribu, one of the best WordPress developers around. If you have the Genesis Theme like we do here on Yoast.m, you can just enable numeric navigation under Theme Settings → Content Archives.
In February 2009, the major search engines introduced the
rel="canonical"element. This is another utility to help fight duplicate content. WordPress has built-in support for canonical link elements on single posts and pages, but it has some slight bugs in that. It doesn’t output canonical links on any other page. With our Yoast SEO plugin activated, you automatically get canonical link elements for every page type in WordPress.