#1
  1. No Profile Picture
    Newbie
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jul 2016
    Posts
    5
    Rep Power
    0

    How to use the database?


    Hi. I trained databases. For example, I read all the domains and robots.txt files. Today I have about 5 million domains. I can do different samples, such as samples for partner-pub. I do not know why this is and to whom it is necessary.
    Advise that can be useful to do with my database.
    ps
    His website show shy. I do not sell.
  2. #2
  3. SEO Since 97
    SEO Chat Mastermind (5000+ posts)

    Join Date
    Mar 2011
    Location
    Arizona
    Posts
    8,766
    Rep Power
    5665
    your question or lack of question makes no sense at all.
    try again.
  4. #3
  5. Dinosaur
    SEO Chat Mastermind (5000+ posts)

    Join Date
    Jun 2011
    Location
    UK
    Posts
    5,147
    Rep Power
    7343
    I can tell you exactly what to do with the database



    Its totally worthless.
    5 million records from around 1 billion websites ? such a small sample and I bet its way out of date as well.
    Owner of Page Explorer the page onsite SEO checker
    Useful Tools: Site Crawler: Screaming Frog | Free SSL: Cloudflare | Backlinking 101: Backlinking 101
  6. #4
  7. No Profile Picture
    Newbie
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jul 2016
    Posts
    5
    Rep Power
    0
    Test-Ok,
    I do not know how to ask questions. I realized that no one is not necessary.
    ---
    Chedders,
    For example, I can bring all the main pages on which there is a code of <a href = "https://www.semrush.com/sem/?ref Why should I? I do not know.
    Or find any code, including js or css. Or withdraw all alt tags with a specific word or domain. Why do I need it? I dont know.
    For example, I learned that in robots.txt files are distributed executable code. That is, are the rules, then the code. Why do I need it? I dont know.
    ---
    Today I do not squeeze the meta tags, keep a copy of the page, including observe sensitive. Even so sparsennyh of 3 sites to the database write just one page, so the rest is absolutely identical (compare md5 sum). That is, the sites will be a billion, but the unique master pages there will be only one-third. In this case no filters in my base.
    On the day I check 250 thousand of links, can be increased to 1 million references to the 1 computer with internet speed of 50 mb. Ten computers and for half a year, you can read everything. Appropriately if it is good, it is possible to multiply the resources of ten billion pages and read for ten days. Today I read six million sites. Who needs it?
  8. #5
  9. No Profile Picture
    Newbie
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jul 2016
    Posts
    5
    Rep Power
    0
    sparsennyh = reads
  10. #6
  11. Dinosaur
    SEO Chat Mastermind (5000+ posts)

    Join Date
    Jun 2011
    Location
    UK
    Posts
    5,147
    Rep Power
    7343
    Its really hard to understand your english but I think I understand what you mean.

    The issue is reading websites is not hard, its 1 line of code. Extract all the links and put them in a queue and hay presto you have a spider or a bot. Its of only any use if you want to create a search engine like google.but just having the data is worthless unless you have the ability to analyze it and display it to users in a useful way. Google for example does exactly this.

    a 50mb broadband line is nothing in the scheme of things you need massive amounts of bandwidth to capture the entire internet and again massive amounts of storage to store it.


    This image is just part of what google use to store their systems on. and even they cant keep it up to date in real time.

    So I have no idea where your going with this. Are you thinking of taking on Google from your garage ?
    Last edited by Chedders; Jul 24th, 2016 at 06:19 AM.
    Owner of Page Explorer the page onsite SEO checker
    Useful Tools: Site Crawler: Screaming Frog | Free SSL: Cloudflare | Backlinking 101: Backlinking 101
  12. #7
  13. No Profile Picture
    Newbie
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jul 2016
    Posts
    5
    Rep Power
    0
    Chedders,
    Sorry for my English.
    I made two mistakes. I did the same database. We had to do a distributed storage system. Now, I understand how to do it.
    I did not understand the importance of character encodings, and now have difficulty indexing Arab and Japanese sites.
    I have the very great difficulties with sample like, but this table design error.
    I have no purpose to beat google. For example, I can bring all the ref semrush site. I do not care to be convenient to provide information. There is a list of sites. In google, you will not be able to do this. For these samples do not need data centers, hundreds of computer enthusiasts is enough.
    However, once it is no use to anyone, and the sense to dream about these computers is not.
  14. #8
  15. Super Moderator
    SEO Chat Mastermind (5000+ posts)

    Join Date
    Mar 2004
    Location
    Gloucester (South West UK).
    Posts
    6,533
    Rep Power
    3522
    Originally Posted by Chedders
    Its really hard to understand your english but I think I understand what you mean
    You're a feckin' genius then... I read the thread twice and I have no idea what he's saying (even with your reply to 'point me in the right direction')!
    ClickyB
    "The quality of the visitor is more important than the volume..." (Egol 22nd Feb 2008)
    [New to SEO/SeoChat?] [Canonical Problems?] [Forum Rules & Posting Guidelines]
  16. #9
  17. Dinosaur
    SEO Chat Mastermind (5000+ posts)

    Join Date
    Jun 2011
    Location
    UK
    Posts
    5,147
    Rep Power
    7343
    LOL
    he has written a crawler and gathered 5 million domain names and is looking for ideas on what to do with the database. I think
    To be honest its about 30 mins coding to do maybe a bit more if you want it multi threaded and then just let it run, its schoolboy stuff and worth nothing. The only limit is bandwidth and disk storage and something any competent coder can do. Pointless .

    Comments on this post

    • ClickyB agrees : feckin' genius... (sorry no rp4u)!
    Owner of Page Explorer the page onsite SEO checker
    Useful Tools: Site Crawler: Screaming Frog | Free SSL: Cloudflare | Backlinking 101: Backlinking 101
  18. #10
  19. No Profile Picture
    Newbie
    SEO Chat Explorer (0 - 99 posts)

    Join Date
    Jul 2016
    Posts
    5
    Rep Power
    0
    I was shown an example of a site, thank you. publicwww.com
    I have 11 million domains, they have 183 million domains.
    Also I have the technical and financial problems. However, the idea of searching for exactly the same.
    Last edited by Chedders; Aug 24th, 2016 at 02:07 AM.

Similar Threads

  1. Need help with database transfer
    By Kevinh in forum Web Design, Coding and Programming
    Replies: 1
    Last Post: Jul 9th, 2006, 03:42 PM
  2. sql database
    By akdiver in forum Search Engine Optimization
    Replies: 1
    Last Post: Oct 8th, 2005, 02:55 AM
  3. Google Database!
    By Weboptimizer in forum Google Optimization
    Replies: 2
    Last Post: Dec 20th, 2004, 02:07 AM
  4. What is in their database?
    By TigerGreen in forum Google Optimization
    Replies: 1
    Last Post: Aug 22nd, 2003, 09:47 AM

IMN logo majestic logo threadwatch logo seochat tools logo