|
|
|||||||||
|
|||||||||
|
|||||||||
| |
||
| ||||||||||||||||||||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Duplicate content: will PDF version hurt me?
Hello,
I have a VERY LONG academic article that I want to place online (more than 50,000 words). The article has many important words sprinkled throughout it like autoimmunity, squamous, and mastopathy. I will divide my content into "chapters" that are about 700 words long. Each chapter will be available in PDF format. I understand that search engines do not like duplicate content. What if I have the exact content in BOTH the HTML version and the PDF version? Will my search-engine placement be compromised because of the duplicate content? One SEO veteran told me to place the PDF files in a directory that my robots.txt blocks search-engines from looking in, but I do not want to do this because if someone searches for PDF documents I want to be found: http://www.google.com/search?as_q=mastopathy&hl=en&as_filetype=pdf Will the PDF versions help or hurt me? One idea I had is to give my pdf versions a lower priority in the sitemaps.xml file. Please share your thoughts, thanks! |
|
#2
|
|||
|
|||
|
Quote:
They don't like it if you're using the same content as 5 other websites, if you are repeating your own content on your own site then you are safe. What will happen is that the search engines will decide which is more important, the HTML version or the PDF and will display only it in the results pages. If it is vital that one or the other version show up, there are several things you can do to tell the SEs that one is more important than the other. Take a look at how citeseer.ist.psu.edu does it. |
|
#3
|
|||
|
|||
|
Html/pdf
Quote:
If I am repeating my own content within my site and I don't care whether the HTML or PDF version is displayed/viewed, then should I still include a robots.txt no follow or is it okay to just include a link to the PDF file from the HTML page with the same content? Thanks! Last edited by JagNet : January 2nd, 2008 at 11:04 AM. Reason: fix quote tags |
|
#4
|
||||
|
||||
|
The approach I'd take is:
1) Embed links in the PDF back to your site and the article. If others link to the PDF version rather than the html pages, you'll still get the benefit of link juice flowing to other pages within your site. Also, if others link to the PDF and there are no embedded links, you lose the potential traffic benefits. How many times have you seen .pdf and .doc documents in the SERPs that contain no references or links to the site from which they came?! 2) Nofollow the link from the html page to the PDF. Don't block the PDF version with robots.txt otherwise you'll lose any benefit from other sites linking to it. The nofollowed link will save the internal link juice being wasted on a page you don't intend ranking for, and instead it'll be passed elsewhere in your site. 3) If you really don't want the PDF to appear in the SERPs, then consider throwing a noindex HTTP header for the PDF file: X-Robots-Tag: noindex Any incoming link juice to the PDF from other sites will still flow to your site through the links within the document.
__________________
... I'll be back for breakfast
New to SEO? SEOChat SEO FAQsForum Rules and Posting Guidelines URL canonicalization code solutions Vigorously pursuing the floccinaucinihilipilification of cheap SEO tricks
|
![]() |
| Viewing: SEO Chat Forums > Search Engine Strategies > Search Engine Optimization > Duplicate content: will PDF version hurt me? |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|