Search Engine Optimization
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsSearch Engine StrategiesSearch Engine Optimization

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
  #1  
Old January 1st, 2008, 04:27 PM
hknight hknight is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2005
Posts: 44 hknight User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 12 h 50 m 12 sec
Reputation Power: 5
Duplicate content: will PDF version hurt me?

Hello,

I have a VERY LONG academic article that I want to place online (more than 50,000 words).

The article has many important words sprinkled throughout it like autoimmunity, squamous, and mastopathy.

I will divide my content into "chapters" that are about 700 words long.

Each chapter will be available in PDF format.

I understand that search engines do not like duplicate content.

What if I have the exact content in BOTH the HTML version and the PDF version? Will my search-engine placement be compromised because of the duplicate content?

One SEO veteran told me to place the PDF files in a directory that my robots.txt blocks search-engines from looking in, but I do not want to do this because if someone searches for PDF documents I want to be found:

http://www.google.com/search?as_q=mastopathy&hl=en&as_filetype=pdf

Will the PDF versions help or hurt me?

One idea I had is to give my pdf versions a lower priority in the sitemaps.xml file.

Please share your thoughts, thanks!

Reply With Quote
  #2  
Old January 1st, 2008, 08:30 PM
Powerspirit Powerspirit is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jun 2006
Location: Toronto, Canada
Posts: 113 Powerspirit User rank is Corporal (100 - 500 Reputation Level)Powerspirit User rank is Corporal (100 - 500 Reputation Level)Powerspirit User rank is Corporal (100 - 500 Reputation Level)Powerspirit User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Day 16 h 50 m 36 sec
Reputation Power: 5
Quote:
Originally Posted by hknight
I understand that search engines do not like duplicate content.

What if I have the exact content in BOTH the HTML version and the PDF version? Will my search-engine placement be compromised because of the duplicate content?


They don't like it if you're using the same content as 5 other websites, if you are repeating your own content on your own site then you are safe.

What will happen is that the search engines will decide which is more important, the HTML version or the PDF and will display only it in the results pages. If it is vital that one or the other version show up, there are several things you can do to tell the SEs that one is more important than the other.

Take a look at how citeseer.ist.psu.edu does it.
Comments on this post
channel5 agrees: !

Reply With Quote
  #3  
Old January 2nd, 2008, 09:07 AM
entrancesw entrancesw is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Dec 2007
Posts: 14 entrancesw User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 57 m 30 sec
Reputation Power: 0
Html/pdf

Quote:
Originally Posted by Powerspirit
if you are repeating your own content on your own site then you are safe ...
What will happen is that the search engines will decide which is more important, the HTML version or the PDF and will display only it in the results pages.


If I am repeating my own content within my site and I don't care whether the HTML or PDF version is displayed/viewed, then should I still include a robots.txt no follow or is it okay to just include a link to the PDF file from the HTML page with the same content? Thanks!

Last edited by JagNet : January 2nd, 2008 at 11:04 AM. Reason: fix quote tags

Reply With Quote
  #4  
Old January 2nd, 2008, 11:02 AM
JagNet's Avatar
JagNet JagNet is offline
Smoke me a kipper...
SEO Chat Regular (2000 - 2499 posts)
 
Join Date: Aug 2007
Posts: 2,487 JagNet User rank is Second Lieutenant (5000 - 10000 Reputation Level)JagNet User rank is Second Lieutenant (5000 - 10000 Reputation Level)JagNet User rank is Second Lieutenant (5000 - 10000 Reputation Level)JagNet User rank is Second Lieutenant (5000 - 10000 Reputation Level)JagNet User rank is Second Lieutenant (5000 - 10000 Reputation Level)JagNet User rank is Second Lieutenant (5000 - 10000 Reputation Level)JagNet User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 4 Days 3 h 37 m 22 sec
Reputation Power: 68
The approach I'd take is:

1) Embed links in the PDF back to your site and the article. If others link to the PDF version rather than the html pages, you'll still get the benefit of link juice flowing to other pages within your site. Also, if others link to the PDF and there are no embedded links, you lose the potential traffic benefits. How many times have you seen .pdf and .doc documents in the SERPs that contain no references or links to the site from which they came?!

2) Nofollow the link from the html page to the PDF. Don't block the PDF version with robots.txt otherwise you'll lose any benefit from other sites linking to it.

The nofollowed link will save the internal link juice being wasted on a page you don't intend ranking for, and instead it'll be passed elsewhere in your site.

3) If you really don't want the PDF to appear in the SERPs, then consider throwing a noindex HTTP header for the PDF file: X-Robots-Tag: noindex

Any incoming link juice to the PDF from other sites will still flow to your site through the links within the document.
__________________
... I'll be back for breakfast
New to SEO? SEOChat SEO FAQs
Forum Rules and Posting Guidelines
URL canonicalization code solutions
Vigorously pursuing the floccinaucinihilipilification of cheap SEO tricks

Reply With Quote
Reply

Viewing: SEO Chat ForumsSearch Engine StrategiesSearch Engine Optimization > Duplicate content: will PDF version hurt me?


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump




 Free IT White Papers!
 
Create the Optimal Architecture for your Critical Applications
Warburton's the largest independently owned bakery in the UK faced a number of difficult challenges in providing the most robust yet efficient IT infrastructure for their organization's success. IBM's services combined with their xSeries servers created the perfect platform for their SAP environment with sufficient flexibility, and did so in very time effective fashion.

Request Your Free Technology Downloads!
 
Five Best Practices for Deploying a Successful Service-Oriented Architecture
This white paper describes the benefits you can expect with SOA, and how IBM can help take your business there.

Request Your Free Technology Downloads!
 
Gartner Magic Quadrant for Application Delivery Controllers
Gartner summarizes its view on Application Delivery Controllers, evaluates strengths and weaknesses of solutions, and provides Magic Quadrant reporting for a quick comparison across all vendors. Learn from Gartner how you can benefit from an all-in-one device like Citrix NetScaler that delivers the highest levels of availability, performance and security.

Request Your Free Technology Downloads!
 
Knowledge is Power
What you don't know can hurt you, and is likely costing you money and increasing your security risks during an era of scarce resources. This white paper proposes six key strategies that enterprise security managers can use to improve their network defense posture.

Request Your Free Technology Downloads!
 
Rationalizing the Multi-Tool Environment
The rationalized multi-tool approach is flexible, scalable and cost effective. It provides the necessary input to the IT service management business processes. It preserves prior investments in monitoring tools, empowers technologists to select the best tools with which to do their jobs, and enhances effective response to incidents.

Request Your Free Technology Downloads!
 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 




© 2003-2010 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
For more Enterprise Application Development news, visit eWeek