Search Engine Optimization
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
 
User Name:
Password:
Remember me
Go Back   SEO Chat ForumsSearch Engine StrategiesSearch Engine Optimization

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread SEO Chat Forums Sponsor:
Dell PowerEdge Servers
  #1  
Old January 1st, 2008, 04:27 PM
hknight hknight is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Oct 2005
Posts: 37 hknight User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 10 h 44 m 57 sec
Reputation Power: 3
Duplicate content: will PDF version hurt me?

Hello,

I have a VERY LONG academic article that I want to place online (more than 50,000 words).

The article has many important words sprinkled throughout it like autoimmunity, squamous, and mastopathy.

I will divide my content into "chapters" that are about 700 words long.

Each chapter will be available in PDF format.

I understand that search engines do not like duplicate content.

What if I have the exact content in BOTH the HTML version and the PDF version? Will my search-engine placement be compromised because of the duplicate content?

One SEO veteran told me to place the PDF files in a directory that my robots.txt blocks search-engines from looking in, but I do not want to do this because if someone searches for PDF documents I want to be found:

http://www.google.com/search?as_q=mastopathy&hl=en&as_filetype=pdf

Will the PDF versions help or hurt me?

One idea I had is to give my pdf versions a lower priority in the sitemaps.xml file.

Please share your thoughts, thanks!

Reply With Quote
  #2  
Old January 1st, 2008, 08:30 PM
Powerspirit Powerspirit is offline
Contributing User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Jun 2006
Location: Toronto, Canada
Posts: 68 Powerspirit User rank is Lance Corporal (50 - 100 Reputation Level)Powerspirit User rank is Lance Corporal (50 - 100 Reputation Level)Powerspirit User rank is Lance Corporal (50 - 100 Reputation Level) 
Time spent in forums: 1 Day 6 h 46 m 18 sec
Reputation Power: 2
Quote:
Originally Posted by hknight
I understand that search engines do not like duplicate content.

What if I have the exact content in BOTH the HTML version and the PDF version? Will my search-engine placement be compromised because of the duplicate content?


They don't like it if you're using the same content as 5 other websites, if you are repeating your own content on your own site then you are safe.

What will happen is that the search engines will decide which is more important, the HTML version or the PDF and will display only it in the results pages. If it is vital that one or the other version show up, there are several things you can do to tell the SEs that one is more important than the other.

Take a look at how citeseer.ist.psu.edu does it.
Comments on this post
channel5 agrees: !

Reply With Quote
  #3  
Old January 2nd, 2008, 09:07 AM
entrancesw entrancesw is offline
Registered User
SEO Chat Newbie (0 - 499 posts)
 
Join Date: Dec 2007
Posts: 14 entrancesw User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 57 m 30 sec
Reputation Power: 0
Html/pdf

Quote:
Originally Posted by Powerspirit
if you are repeating your own content on your own site then you are safe ...
What will happen is that the search engines will decide which is more important, the HTML version or the PDF and will display only it in the results pages.


If I am repeating my own content within my site and I don't care whether the HTML or PDF version is displayed/viewed, then should I still include a robots.txt no follow or is it okay to just include a link to the PDF file from the HTML page with the same content? Thanks!

Last edited by JagNet : January 2nd, 2008 at 11:04 AM. Reason: fix quote tags

Reply With Quote
  #4  
Old January 2nd, 2008, 11:02 AM
JagNet's Avatar
JagNet JagNet is offline
Smoke me a kipper...
Click here for more information
 
Join Date: Aug 2007
Posts: 1,458 JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level)JagNet User rank is Sergeant Major (2000 - 5000 Reputation Level) 
Time spent in forums: 3 Weeks 6 Days 8 h 3 m 29 sec
Reputation Power: 23
The approach I'd take is:

1) Embed links in the PDF back to your site and the article. If others link to the PDF version rather than the html pages, you'll still get the benefit of link juice flowing to other pages within your site. Also, if others link to the PDF and there are no embedded links, you lose the potential traffic benefits. How many times have you seen .pdf and .doc documents in the SERPs that contain no references or links to the site from which they came?!

2) Nofollow the link from the html page to the PDF. Don't block the PDF version with robots.txt otherwise you'll lose any benefit from other sites linking to it.

The nofollowed link will save the internal link juice being wasted on a page you don't intend ranking for, and instead it'll be passed elsewhere in your site.

3) If you really don't want the PDF to appear in the SERPs, then consider throwing a noindex HTTP header for the PDF file: X-Robots-Tag: noindex

Any incoming link juice to the PDF from other sites will still flow to your site through the links within the document.
__________________
!! Peanut free and Google friendly !!
New to SEO? SEOChat SEO FAQs
Forum Rules and Posting Guidelines
URL canonicalization code solutions

Reply With Quote
Reply

Viewing: SEO Chat ForumsSearch Engine StrategiesSearch Engine Optimization > Duplicate content: will PDF version hurt me?


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

 Free IT White Papers!
 
Accelerating Trading Partner Performance
One in five. That's how many partner transactions have at least one error. That is an amazing statistic, particularly given the extraordinary leaps in innovation across the global supply chain during the past two decades. Download this white paper to learn more.

 
Competing on Analytics
This Tech Analysis is designed to help identify characteristics shared by analytics competitors, and includes information about 32 organizations that have made a commitment to quantitative, fact-based analysis.

 
Cost Effective Scaling with Virtualization and Coyote Point Systems
An overview of the industry trend toward virtualization, how server consolidation has increased the importance of application uptime and the steps being taken to integrate load balancing technology with virtualized servers.

 
Five Checkpoints to Implementing IP Telephony
Implementation planning for IP PBX software and IP telephony has become vital as businesses replace discontinued legacy PBX phone systems. This informative whitepaper outlines five "checkpoints" for any implementation plan that will help make IP communications a successful proposition.

 
Hosted Email Security: Staying Ahead of New Threats
In the last two years, email has become a fierce battleground between the nefarious forces of spam and malware, and the heroes of messaging protection. The spam volumes increased alarmingly every month, bringing clever new forms of phishing and virus propagation attacks.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway