Duplicate content, plagiarism and impact on SEO
Duplicate content issues are common in SEO and can negatively impact your SEO efforts and ranking in Google results and online traffic.
Duplicate content would be one of the 5 most common problems in SEO. From this observation, it seems important to address this subject to know what are the causes that can lead Google to consider that content is duplicated, leading to harmful consequences on your work of acquiring visibility and traffic.
The quality of content in SEO starts with unique texts, on and off the site.
What is duplicate content in SEO?
Duplicate content refers to text that is very similar, or exactly the same . He who appears on the Internet in several places. These locations are defined by unique website addresses (URLs). So, if the same content appears at multiple web addresses, you have duplicate content.
Duplicate content can be found on different sites, but also within the same site. It is estimated that 25-30% of the web is made up of duplicate content. It adds little or no value to your visitors. Therefore, urls with little or no body are also considered duplicates.
It has several disadvantages:
- It brings little or no added value to the search engine, nor to your visitors.
- They can harm your online SEO work.
- It can have many intentional or unintentional origins.
Here's Google's definition of duplicate content: Duplicate content generally refers to substantial blocks within or between domains that completely match other content or are substantially similar .
Why is Google fighting against them?
The first goal of the Mountain View company is to offer its users an optimal experience in terms of research on its engine. The second is to save resources by focusing on legitimate and unique texts.
From these two observations, it is important for the search engine to fight against those who do not bring any particular value. Google therefore values and rewards the original . It's a great way to increase the cost of SEO while creating a better user experience.
When Googlebot visits your site, it saves the content in its databases. It is compared to others in the database. If substantial matches are found, Google may decide that your content is duplicated .
He will decide which version is the most reliable and the most worthy of being presented to Internet users.
Duplication is a real problem when you have multiple versions of the same text on your site. In these cases it may be difficult to choose which page to display.
What are the problems with SEO?
Duplication can have significant repercussions on your SEO depending on whether it is internal or external.
Internal and external duplicate content poses these concerns:
- Crawl budget not optimized: Google limits its visits to your site to optimize its machine resources. Too much duplicate content can waste that crawl budget and prevent centralizing indexing resources on unique and important content.
- Panda type penalties: urls with too little developed content can be considered as duplicates or too similar and result in penalties. It is important to avoid generating pages automatically with poor texts without added value
- Dilution of netlinking: by offering different URLs for the same content, you increase the chances of receiving external links on several pages rather than centralizing backlinks on a single URL. Google will have a hard time consolidating link metrics, especially when other sites are linking to multiple versions.
- Lack of control over positioning: when Google detects duplicate urls (title, descriptions, text, etc.), it generally favors a page and may display another than the one you want to highlight. -Determine who the original is from: When multiple versions are available, it is difficult to determine which version to show in their search results and who the original author is. If it is duplicated on a site with greater authority than the copied site, authorship may be attributed to the site of the higher authoritative copier.
Is there a duplicate content penalty?
Duplication can hurt your SEO performance , but it won't incur any penalties until you intentionally copy someone else's website . If you're an honest site owner and encounter technical issues without trying to trick Google, you don't have to worry about a penalty from them.
If you have intentionally copied large amounts of content from other sites, you find yourself in a sticky situation.
“Duplicate content on a site is not cause for action on that site unless it appears that the intent is to be misleading and to manipulate search engine results." Google.
What are the main causes of duplicate content?
These problems can have many origins which are more or less easy to identify depending on whether you are a specialist in natural referencing or not. We are going to list here the main origins that cause content to be duplicated.
Theft and copy
Google is not always able to distinguish between the original and the copy . It is therefore important to monitor any copies of texts of which you could be a victim. There are a certain number of tools which make it possible to set up a watch at this level. The Copyscape tool is the best known.
Duplication from one site to another
Some content, such as product sheets given by manufacturers, is generally found on a multitude of e-commerce sites. It is therefore important in your natural referencing strategy to use unique and relevant texts for all your publications.
Google considers each URL to be unique . Depending on the web development techniques and the CMS used, the home page of a site can be accessible from several addresses and therefore present as many duplicate contents as there are addresses.
DUST (Duplicate URL, Same Text) occurs within the same site when the same content is visible via several different URLs.
Syndication and curation
When content is deliberately duplicated on other platforms in order to increase its visibility , it is important to set rules for the publishers you work with so that the syndication of content does not turn into an SEO problem for duplicate content.
Ideally, the publisher should use the article's canonical tag to indicate that your site is the original source of the content. Another option is to use a noindex tag on syndicated content.
Session parameters and identifiers
Sites often use parameters for filtering or visitor tracking purposes. Similarly, session IDs are used to track visitors, such as keeping track of which items they have placed in their shopping cart. These parameters or session identifiers are added to the original URL without modifying the content. Again,
https://www.yoursite.com/ is a different page from
Dynamically generated URL parameters
They are often used to store certain user information (such as session IDs) or to display a slightly different version of the same page (such as a sorting or filtering adjustment made).
This results in URLs that look like this:
These pages usually contain the same or very similar content that is considered duplicate. Most of the time, these dynamic settings create dozens of different versions. These issues can generate major concerns for e-commerce sites with hundreds or thousands of references if this concern applies to each reference.
The WWW and non-WWW versions of a site
Many people assume that
example.com are the same . But, these two URLs are completely different in the eyes of search engines. Allowing every page of a site to be displayed in these two configurations results in the duplication of an entire site.
This problem is usually solved by implementing 301 redirects or specifying your preferred domain in Search Console.
The HTTPS and HTTP versions of a site
Many sites have both secure (https) and insecure (http) versions . As with www and non-www,
http://www.example.com/ are not the same. A site must be accessible through one or the other. Ideally, the preferred version of a site would be secure (https), as Google has indicated that having a secure site is a positive ranking factor and has announced that it will index secure versions of pages first.
Poor and similar content
When we talk about duplication, we often image completely identical content. However, very similar elements also fall within the definition.
“If you have several similar pages, consider expanding each one or consolidating them into one. For example, if you have a travel site with separate pages for two cities, but with the same information, you can either merge them into one about both cities or you can expand each to contain unique content about each city. " Google.
Such issues can frequently arise with e-commerce sites , with descriptions of similar products differing only in a few specifications.
Blogs often offer the possibility of grouping articles by themes and keywords via taxonomy features . This feature should be used with tact, because it can generate duplicate content very easily if the same content comes up too often on the pages generated to present each category or keyword.
Domain name migration
When you change your domain name, it is important to report the change and redirect old content to new ones. Successful migration following a change of domain name without losing your SEO work is not complicated but requires a few specific operations.
How to avoid duplicate content issues on your site?
To combat these concerns, it is necessary to pay particular attention to these points:
- Do not use content already present on the internet.
- Do not duplicate them on several of your sites.
- Ensure that they are not used by other publishers through plagiarism, even partial.
- Create something original and unique and don't look for simplicity.
- Create dense ones and do not duplicate even minimal parts of your texts.
- Fight plagiarism.
- Use a verification tool like copyscape or killduplicate.
Correcting duplicate content internal to a site means telling Google which page is the one to take into account, and which are copies.
In many cases, the best way to combat duplicate content is to set up a 301 redirect from the “duplicate” page to the original one. When multiple pages with the potential to rank well are combined into one, they not only stop competing with each other, they also create a stronger relevance and popularity signal overall.
This will have a positive impact on his ability to rank well. Using the 404 error will not retain the power gained by the deleted page and its links .
The rel=canonical tags
Another solution is to use the rel=canonical tag attributes . A canonical URL is the preferred version of a set of pages with similar content that tells search engines that a given page should be treated as a copy of the specified URL and that all links and popularity applied to that page should actually be credited to the one specified as canonical.
Choosing a canonical url solves several problems:
- To set the URL to display in the results.
- To group links for similar or duplicate urls.
- To simplify tracking statistics for a single product/topic.
- To manage syndicated content.
- To avoid wasting time crawling duplicate pages and optimize the Budget crawl. Only one version of the URL should be submitted to Google crawlers via the sitemap file tool.
Noindex Meta Robots
The “Noindex,Follow” meta tag can be particularly useful in dealing with these issues . This tag allows search engines to analyze the links on a page, but prevents them from including them in their indexes. Using meta robots is a particularly effective solution for duplicate content related to pagination.
Top comments (0)