Check daily thousand web domains from a list searching for new pages containing some keywords
€30-250 EUR
Closed
Posted about 8 years ago
€30-250 EUR
Paid on delivery
Writing a software, running 24/7 on an internet connected server, checking daily thousand web domains from a list, searching for new pages (just published) that contains some keywords taken from another list.
Web domains to be checked are about 10.000, each with hundreds of pages, so the software has to check about 1.000.000 web pages daily.
The problem is that the software has to check daily thousands and thousands of webpages, and has to find new pages containing keywords not more then 1 week after those pages has been published.
Time is crucial, web pages has to be new, not more then 7 days from date of publishing.
Keywords to check are not so much... 10 or 20, not more.
Maybe the software can start checking once all domains, and creating a tree map, then cut all branches of the tree that doesn't contain some words .
Next times the software will check for new pages only from the "right" branches of the tree... not form all the tree.
This is the only strategy I can immagine to make smaller the number of pages to be checked but I need advises... and I would prefer to use a "brute force" approach, checking all the pages in all domains every day, if this can be done ...
Every time the software find a new web page containing one of the passwords then it will send an email containing the link to the new page.
One email a day is enough, with hte links to all the new web pages found.
That's a complex project that seems to be only possible if you have a machine good enough to run the software. Do you have any language requirement? Could the interface be very simple? How should the appearance of the tags be reported?
Hope you are doing good.
Your project sounds excellent to me. It can be achieved using cron jobs and scripting.
I am well familiar with what you need and quite experienced with automation of tasks. Please let me know when to start.