Prosper Intelligence is looking for external support to improve performance of its own internal development tasks. We are looking for a PHP or Python programmer to develop a universal website parsing script which is flexible to be started by time and repeatability. Knowing that many web-pages do not provide a common public API, the code to be developed need to have universal capability which might oblige using coding languages like Java Script, JQuery or similar either instead of PHP or Python or as completion.
Mainly this parser is used to parse news for various news channels from their website or specific sections of their website (i.e. /business/; /jobs/; /economy/; /finance/; etc.). The nature of the news channel is not relevant. However, in most cases they have either political or economic background and are mixed.
Since we do not want to download the whole content of a web-site we are looking for parsing the headlines for specific words or short phrases and if one or more words matches open the below link and repeat the whole process again. The purpose is to find out if the complete article includes matches more words than registered before in the headline of the news. Our aim is to identify news with a high performance relevant for our purpose. As output you need to provide all words identified in a single news article (including the headline and text of article) and the source where it was found (URL-link). It is our strong interest not to trouble any website-owner because of the risk to be punished. Our parsing must be quick and efficient not producing any striking load on their hosting server.
Please read the attached document and only react if you can fulfill the MANDATORY REQUIREMENTS
12 freelancers are bidding on average $14/hour for this job
I've read the requirements. It seems straight forward to me. I would like to send you my programming and sequencing approach. I will wait for your answer in chat.