How to find and fix on-site Duplicate Content
Many of you may already know that having duplicate content on your blog is BAD. Search Engine’s (especially Google) penalise sites that have duplicate content and this will negatively affect your overall Search Engine Rankings. What you may not know is that you may be publishing duplicate content every time you hit the publish button your WordPress site.
Test your URL with and without the WWW prefix
Before we begin I am assuming you have not tackled this issue before and that no existing redirects have been setup on your site. If there are redirects in place or if you have modified files like “.htaccess” or “robots.txt”, this article is NOT for you.
Let’s begin: A very simple check will show if your blog is showing duplicate content to the Search Engines. Open your web browser and insert your blog or website site URL with the WWW prefix (eg. http://www.yourdomain.com). Does your website load? Now, do it again, without the WWW prefix (eg. http://yourdomain.com). If your site loads using both and you haven’t tinkered with redirect or preferred domains settings; you are probably publishing duplicate content.
If you are using HTTP and HTTPS protocols, then variants of this may also be causing duplicate content. (eg http://www.yourdomain.com and https://www.yourdomain.com). Another such situation is if you have a backslash at the end of your URLS (eg. www.yourdomain.com and www.yourdomain.com/).
All the variations above are seen as unique pages by the Search Engines and thus regarded as being duplicate content. There are more situations where duplicate content comes into play, but these are the basics for beginners to consider.
Should you use WWW prefix or not
A clean install of WordPress will automatically default to resolving your website to the NON-WWW version of the domain name (http://yourdomain.com). In the WordPress Dashboard, navigate to Settings >> General and choose whether you want your URL to contain the WWW prefix or not.
I won’t go into detail as to which is better (i.e. WWW or non-WWW). You simply need to decide which you are going to use and stick to that indefinitely. You can change this later on, but doing so will have a major impact on your SEO & Website Traffic (It’s not recommended to change this unless you are prepared to take the knock).
Setting the Preferred Domain in Google Webmasters Tools
If you don’t already have a Google Webmasters Tools (Search Console) account, I highly suggest you get one. This post isn’t about the features of Webmasters Tools or how to use it, but I do share a little tip that makes all the difference with how Google indexes your URLs.
Google allows you to set your preferred URL in the Domain Settings section of any domain you have added and verified within the Search Console. To do this; you first need to add both the WWW version and the non-WWW version of the domain name to the Search Console. Once both are validated follow the steps below:
- Login to Google Webmasters Tools
- Click on your preferred domain
- Click on the ‘cog icon’ (top right corner) and select the ‘Site Settings’ option
- Under the section ‘Preferred Domain’, select the option that suites you best
- That’s it; you will get a notification message to confirm the change [break]
For the vast majority of users this should be sufficient, since you are mostly targeting traffic from Google. However, if you want to take this further or wanting to define these settings for BING or any other Search Engine, you need to get your hands dirty and play with some code…
Edit your .htaccess file to redirect traffic
Messing around with your .htaccess file isn’t recommended unless you know what you are doing. Also, I highly recommend you make a backup of this file before attempting any changes. Again, I’m not giving lessons on how to use FTP in this post. If you want to edit your .htaccess file, you will need to download a copy of this file from your Web Server. Personally, I recommend you contact your Web Hosting Company and ask them to edit your .htaccess file to create the necessary redirects to or from WWW or non-WWW as per your requirements.
For those of you comfortable with editing the .htaccess file, here is a code snippet to create the necessary redirects from non-WWW to WWW.
NOTE: The verdict is still out when it comes to using the WWW prefix or not. Some say it’s best to use WWW and other say it’s best without WWW. Personally, I have used the WWW for most sites simply because that’s what the majority of visitors are familiar with. If a domain name is relatively long, then I would consider going without the WWW purely to shorten the URL. [Update: See my comment about dropping WWW on future sites]
Tackling Duplicate Content in more detail
Using tools like Majestic or Ahrefs one can do a more detailed audit to find duplicate content that comes from varying URL strings, Categories and Taxonomies, but this falls outside the scope of this post. One last thing I will mention, which is rather complex to understand is Canonical Redirects. I mention this here, because while researching ‘how to tackle duplicate content’ I came across this post, which takes what I’ve discussed to another level.
[boxibt style=”info”]NOTE: There are plugins available which can help you with doing page/post redirects and some that can also edit your .htaccess file for you. I use the redirection plugin to do individual page/post redirects, but don’t like plugins that edit my .htaccess file.[/boxibt]
What measures do you take to prevent on-site duplicate content issues?