Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    May 2003
    Posts
    84
    Rep Power
    16

    If-Modified-Since http header


    Is this the header that the google bot is using to know whether the page is new? If so, how can I tell if my server sends this information in an HTTP GET request? (is there a way for me to perform the request and see the results?)
  2. #2
  3. Moderator
    SEO Chat Good Citizen (1000 - 1499 posts)

    Join Date
    Jan 2003
    Location
    Madrid, Spain
    Posts
    1,382
    Rep Power
    17
    No, the if-modified-since command is used to tell Google the page hasn't been updated - this can prevent Google from re-spidering the page unnecessarily and save you bandwidth.

    Gringo.
  4. #3
  5. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    May 2003
    Posts
    84
    Rep Power
    16
    I see. Is there another header similar to this one which lets the bot know what the page was last updated, and if so it needs to be refreshed?

    in other words, how can freshbot know my page has changed, and what can I do to help it recognize that it did?
  6. #4
  7. Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Mar 2003
    Location
    Maine USA
    Posts
    524
    Rep Power
    16

    Re: If-Modified-Since http header


    Originally posted by "roni"

    Is this the header that the google bot is using to know whether the page is new? If so, how can I tell if my server sends this information in an HTTP GET request? (is there a way for me to perform the request and see the results?)

    roni,

    http://www.ranks.nl/cgi-bin/ranksnl/spider/spider.cgi?lang=

    Cheers,
    theBear
    theBear
  8. #5
  9. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    May 2003
    Posts
    84
    Rep Power
    16
    thanx

    so I'm guessing by this page that my server doesn't return "last modified" (it says "Last Modified: Unknown ")

    So how do I make the server return the last modified? Is there something with apache or htaccess I can do?
  10. #6
  11. No Profile Picture
    Registered User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    May 2003
    Posts
    7
    Rep Power
    0
    I too am looking for this information. I have scrounged the Apache site, httpd.conf, etc. and dand if I can find out how to accomplish this!

    Any insight would be appreciated. For whatever reason, fresh is not picking up the changes to my pages. I have even changed the name of one of my sites and it has not been picked up. However, new content is grabbed right away--frustrating!
  12. #7
  13. No Profile Picture
    TJ
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Apr 2003
    Posts
    39
    Rep Power
    16
    What is your PR?

    It also depends on your PR. If your PR is lower then 3 or 4, freshbot will not visit your site that often.

    Tejas
  14. #8
  15. Professional SEO
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    May 2003
    Location
    Finland
    Posts
    710
    Rep Power
    16
    First a text snippet that explains some things:

    "The last modified header doesn't effect the browser in such a way that it requests the page again. The Last-Modified header and If-Modified-Since header are used by the server to determine if a status of "304 Not Modified" should be sent. The client does not use them, accept to communicate the last modified date of the cached file back to the server. The Last-Modified header is not a command to the client. However, the expiration date is a command to the client."

    To set last modified/get googlebot visit allways, you could:

    a) send HTTP header 200 (OK) with all your PHP (or whatever) files . To do this use mod_header with apache or with IIS use MMC to make correct settings.

    b)Set in programmaticly.

    In asp you can add last-modified header with:
    Response.AddHeader "Last-modified","Mon, 01 Sep 1997 01:03:33 GMT"

    or with PHP:

    Header("Expires: Mon, 01 Jan 2001 00:00:01 GMT"); // Date in the past
    Header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");// always modified
    Header("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1
    Header("Pragma: no-cache"); // HTTP/1.0

    But there are problems with PHP+Apache 2.x-series. Instead using dynamic date with last-modified, you have to use static text - otherwise it will return 304 automaticly after second reload of the page. This is a PHP/Apache bug.
  16. #9
  17. Professional SEO
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    May 2003
    Location
    Finland
    Posts
    710
    Rep Power
    16
    just started thinking things (still waking up) and thus delved into depths of RFC 2616. And...

    "A Last-Modified time, when used as a validator in a request, is implicitly weak unless it is possible to deduce that it is strong, using the following rules:

    "The presented Last-Modified time is at least 60 seconds before the Date value."

    So this may be the reason why

    Header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");// always modified

    don't work. But setting date -61 seconds might do the trick;)

    Also... mod_header is not the only way. You could use mod_setenvif, which allows you to set headers per browser basis. So you could serve human users HTTP 304, but for robots allways 200
  18. #10
  19. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Mar 2003
    Posts
    36
    Rep Power
    16
    Since I launched a site about 4 months ago I have been disapointed by the
    low frequency of freshbot visits and the absence of subsequent refreshing.

    The front page has PR 04, the inner pages have PR 03 and all of them are
    modified every week or so.

    It looks that updates only happened after deepbot visits and the resulting
    dances (as we knew them...) that is roughly once per month.

    So I ran a diagnostic with the help of a webmaster-tools type site (forgot
    which) and the answer was that my server did not inform the last updating
    date and time in the http header.

    Went on researching on "http headers" and found nothing practical and
    usefull.

    So, could someone please help with the following, in a http-for-dummies
    fashion (without hinting that I just could switch servers, I can't):

    - how can I visualize the http header my server sends to a visitor? Is there
    a small ap for this? a dos command? a way to complement the url typed in
    the browser's address bar? anything else? The source code never shows
    anything higher than the start of the html section of a page...

    - is there any way to force the server to inform the last update to visitors
    (INC. SPIDERS!), without having access to it and through some instruction
    in the head section or else. I imagine that the "visit every now and then"
    META tag would not be an escape anyway since search engines disregard it
    anyway, or so I've heard.

    Edited: by analogy, I know a tracking script that I use (but
    don't understand) that has some set of instructions that forces an
    unwilling server to spit the REMOTE_HOST variable even when this
    function has been disabled by the server's admin to save on
    processing time. Maybe something similar is possible to make the
    update time and date appear and sent to the visitor?


    - if not trough the http header, is there any way to let a spider know that
    the visited page has been updated and when?

    If deep crawling as we knew it is to disapear, these questions become VERY
    important. Please help!

    Pierre
  20. #11
  21. Professional SEO
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    May 2003
    Location
    Finland
    Posts
    710
    Rep Power
    16
    So, could someone please help with the following, in a http-for-dummies
    fashion
    Let's give it a try. However this is such a vast and difficult area, that I must warn: there are no easy answers. You just have to read and study most of the things (or pay someone ;)

    First... If the server is not sending last-modified header, there's nothing wrong. This usually means that pages are created dynamicly and thus they are allways refreshed. They get an automatic status of 200 (OK) unless someone has not messed server settings to save bandwith.

    How can I visualize the http header my server sends to a visitor?
    You could do some coding (try a search for "server enviroment variables HTTP Headers" with the language of your choise) .

    or use some web service that checks headers for you. E.g. Try google search ""HTTP Header Viewer" (with quotes)
  22. #12
  23. Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Mar 2003
    Location
    Maine USA
    Posts
    524
    Rep Power
    16
    Originally posted by "Popup"

    Since I launched a site about 4 months ago I have been disapointed by the
    low frequency of freshbot visits and the absence of subsequent refreshing.

    The front page has PR 04, the inner pages have PR 03 and all of them are
    modified every week or so.

    It looks that updates only happened after deepbot visits and the resulting
    dances (as we knew them...) that is roughly once per month.

    So I ran a diagnostic with the help of a webmaster-tools type site (forgot
    which) and the answer was that my server did not inform the last updating
    date and time in the http header.

    Went on researching on "http headers" and found nothing practical and
    usefull.

    So, could someone please help with the following, in a http-for-dummies
    fashion (without hinting that I just could switch servers, I can't):

    - how can I visualize the http header my server sends to a visitor? Is there
    a small ap for this? a dos command? a way to complement the url typed in
    the browser's address bar? anything else? The source code never shows
    anything higher than the start of the html section of a page...
    http://www.ranks.nl/cgi-bin/ranksnl/spider/spider.cgi?lang=


    Originally posted by "Popup"



    - is there any way to force the server to inform the last update to visitors
    (INC. SPIDERS!), without having access to it and through some instruction
    in the head section or else. I imagine that the "visit every now and then"
    META tag would not be an escape anyway since search engines disregard it
    anyway, or so I've heard.

    Edited: by analogy, I know a tracking script that I use (but
    don't understand) that has some set of instructions that forces an
    unwilling server to spit the REMOTE_HOST variable even when this
    function has been disabled by the server's admin to save on
    processing time. Maybe something similar is possible to make the
    update time and date appear and sent to the visitor?
    There are methods to determine the Host system type from certain TCP stack related implementations. This may be what your script does. It isn't being sent by the host but is being guessed by the tracking script (quite acturate in a lot of cases).

    Originally posted by "Popup"



    - if not trough the http header, is there any way to let a spider know that
    the visited page has been updated and when?

    If deep crawling as we knew it is to disapear, these questions become VERY
    important. Please help!

    Pierre
    I seriously doubt if you'll be able to tell a Google spider when to visit, it appears to be related to your PR as to how often it visits and maybe how deep the bot goes. I further bet that Freshbot (in the past and maybe now) follows links that it did not cover in pages it got in prior trips to your site along with the first few pages of your site to see if you added a new section. It also appears that Deepbot followed a simular pattern just going for many more pages.

    It used to take several Deepcrawls to index a large site.

    Cheers,
    theBear
  24. #13
  25. No Profile Picture
    Contributing User
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Mar 2003
    Posts
    36
    Rep Power
    16
    To 2K:

    Tks for your answer. Don't mistake me, it's not lazyness (I knew NOTHING
    a few months back...) but rather a compromise between getting something
    to work FAST, even if not well understood, and sound learning curve.

    Found a few header viewers, tks.

    The only dynamically created parts on my pages are two lines coming from
    Googletrax and Spydertrax (I'm the belt-AND-suspenders type...). The
    pages are html but my .htaccess instructs to parse them as shtml. And
    the header does indeed inform a 200 status. Would it be because of these
    two lines only?

    Now, would that 200 status mean to Freshbot that the page is frequently
    (or constantly) updated and hence worth frequent visits? Or would the
    absence of "last-modified" date in the header discourage Freshbot to
    visit more than once a month AND refresh the index? If this is true,
    the problem remains to make the "last-modified" date appear somehow.
    But how to do it? Would the removal of the SSI instructions from
    Googletrax and Spydertrax reestablish the "last-modified" date in the
    header, the pages not being anymore dynamically updated? (It would
    be ironic that a spider tracking script would indirectly prevent a
    spider to act!)
    In fact, the front page (/index.html) status IS 200, its PR IS 04 but
    Freshbot only comes once a month and never bothers to refresh anything
    in the index. Only Deepbot does (when it used to come...). Same same for
    all the other pages (all in the root) except that their PR is 03.

    Edited: in time, I've temporarily removed the SSI instructions from
    one of the pages. Result: same same. Return code: OK (200)
    Last Modified: Unknown


    Since the last "true" dance I've tweaked all pages a lot, I've even
    301-redirected to a new domain url where my main keyword is more
    proeminent, that's why I'm impatient to see some refreshing by Google.

    Because I'm seing various comments in this forum about sites with only PR04,
    being promptly refreshed in the google index (even now) after frequent
    visits by Freshbot, I must think that there is something wrong somewhere
    in my case.


    To theBear:

    Tks for your tips!

    I didn't explain clearly:

    I understand that the only way to "seduce" Freshbot is to show him, DURING a
    visit, that the site is frequently updated (and has a good PR) and that
    you can't book a Freshbot visit on the phone (what a dream...). So my last
    question was: would there be any way of writing on the walls, other than the
    HTTP header, for a spider to see DURING its crawl (like a meta tag or
    something) that the page has been recently changed. Given that my bad humored
    server doesn't send this info in the http header.

    Sorry for this lengthy post and thanks for your patience with a dummy but I
    think that many of us would love to improve their relationship with Freshbot
    (or whatever future hybrid of FreshDeep that Google is cooking) and it has
    complicated aspects as 2K has said.

    Pierre
  26. #14
  27. Contributing User
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    Mar 2003
    Location
    Maine USA
    Posts
    524
    Rep Power
    16
    Originally posted by "Popup"


    To theBear:

    Tks for your tips!
    You're welcome.

    Originally posted by "Popup"



    I didn't explain clearly:

    I understand that the only way to "seduce" Freshbot is to show him, DURING a
    visit, that the site is frequently updated (and has a good PR) and that
    you can't book a Freshbot visit on the phone (what a dream...). So my last
    question was: would there be any way of writing on the walls, other than the
    HTTP header, for a spider to see DURING its crawl (like a meta tag or
    something) that the page has been recently changed. Given that my bad humored
    server doesn't send this info in the http header.

    Sorry for this lengthy post and thanks for your patience with a dummy but I
    think that many of us would love to improve their relationship with Freshbot
    (or whatever future hybrid of FreshDeep that Google is cooking) and it has
    complicated aspects as 2K has said.

    Pierre

    I think I understood what you were getting at, maybe I wasn't clear with my answer.

    I believe Google schedules the bot visits according to PR and availible time/crawlers.

    Thus the lower down the food chain (PR) you are the less frequntly you get visited and from what I'm seeing makes sense.

    Google doesn't care how often you change your content and it doesn't care what your cache / modification headers or lack thereof say.

    Everyone gets placed in a hopper (dang rabbits) and if your site pops up before the next cycle starts then the bot visits. On the next cycle the process starts all over.

    Remember PR4 for one site is not exactly the same has PR4 for another site. The sites may occupy vastly different points in the hopper. That is there may be 3 million sites (or more) between two PR4 sites.

    Cheers,
    theBear
  28. #15
  29. Professional SEO
    SEO Chat Adventurer (500 - 999 posts)

    Join Date
    May 2003
    Location
    Finland
    Posts
    710
    Rep Power
    16
    The
    pages are html but my .htaccess instructs to parse them as shtml. And
    the header does indeed inform a 200 status
    And thus your server does not return last-modified. SHTML files are usually interpreted as dynamic files.

    To add headers in PHP or ASP, see examples in my postings above. For other languages, try Google for "LanguageX set last-modified".

    Happy holidays, I'm off for one week in the beach ;)

    PS. You can always ask your server host/admin for their settings concerning last-modified headers.
Page 1 of 2 12 Last
  • Jump to page:

Similar Threads

  1. Replies: 2
    Last Post: Sep 12th, 2003, 09:28 AM
  2. Atill unsure about HTTP header effect on Search Engines
    By robbielockie in forum Google Optimization
    Replies: 6
    Last Post: Aug 12th, 2003, 12:02 PM
  3. Server HTTP HEADERS and Search engines
    By robbielockie in forum Google Optimization
    Replies: 2
    Last Post: Jul 30th, 2003, 10:20 AM
  4. will google / users frown upon a 80Kb flash header?
    By Gary in forum Google Optimization
    Replies: 11
    Last Post: Jun 15th, 2003, 03:09 PM
  5. Anyone else ever searched for 'http' in google?
    By digitalirony in forum SEO Help (General Chat)
    Replies: 9
    Last Post: Feb 28th, 2003, 02:39 AM

IMN logo majestic logo threadwatch logo seochat tools logo