Closed

Scrape a website & insert into database & perform some tasks with the information

I need someone to write some software that will archive every listing posted on a particular website and use that information as described in the features section of this post.

Basic logic of program:

1. Send a request to a website that returns listings in xml format

2. Check each listing against a mysql database

3. Send a web request to each new listing individually to get all the information

4. Features 1,2,3 (Explained in detail below)

5. Upload images from the listings to amazon S3

6. Add the information for each listing to a mysql database

7. Sleep before looping back to step 1 (Read feature 4)

Limitations:

The website is limited to a 20 listings at a time (Step 1). If all new listings are found, keep sending web requests for the next page of listings until previous listings are found, so no listings are missed. (During peak times it is possible for more than 20 listings to be posted between the minimum sleep period of 2 minutes)

Features:

1. Create a table that tracks listings that are from the same user (by using two values found in the listing). Keep a tally of how many listings that user has posted and a tally of how many of those listings are unique (I suggest this is done on a separate thread as to not slow down the scraping).

2. If enabled, check each new listing's price against comparable listings on another website (web request to an api), and calculate the average value for comparable listings using the archive of listings in my database. Use some math calculations to decide if the listing is undervalued by a configurable amount/percent and send an alert (Amazon SNS and database entry). (This must be done on a separate thread as to not slow down the scraping)

3. Check each listing against search criteria, which can be configured by adding rows of criteria to a mysql database, and send an alert (Amazon SNS and database entry) if a new listing satisfies that criteria. (This will be simple criteria, such as if the listings price is >100, or if the listing is a specific model, etc). (This must be done on a separate thread as to not slow down the scraping)

4. Adjust the sleep time automatically as to minimize the amount of pages requested before finding previous listings (Explained in limitations). With a minimum sleep time of 2 minutes, a maximum of 15 minutes from 7AM - 11PM, and a maximum of 2 hours from 11PM-7AM, before looping.

5. Once daily check each active listing in the database against the website to see if the listing has been updated, or if the listing has been deleted. If it has been updated, save the changes to the database as a new row. If it has been deleted, change the status in the database so the listing will not be checked again. (I suggest this be a separate script ran by a cron job).

Requirements:

1. Must run on a linux server

2. Error Handling (Website down, website responds with unexpected data, etc)

3. Log activity/errors in a text file. Send an alert if errors occur (Amazon SNS and entry into database)

Program can be coded in any language that can run on a linux vps and take advantage of the multiple ip addresses the server has. PHP would be preferred.

Skills: Data Entry, Excel

See more: write for finding a job, website price value, vps price, vps linux price, vps for web scraping, simple scraping software, need of tally, minimum multiple in math, maximum price to create website, job finding website, how to write a finding, how to create database program, how to create a website with database, how to add new pages to a website, finding a new job, data scraping from website software, database entry software, average job search time, amazon price scraping, amazon api scraping

About the Employer:
( 0 reviews ) Bangladesh

Project ID: #10186611

12 freelancers are bidding on average $380 for this job

diamond247

We are a team (19 operator and 2 Quality checker)here from last 4 year giving all research service world wide with best quality output , I have gone through your project description, It is really a interesting job, and More

$250 CAD in 10 days
(154 Reviews)
6.8
mananraja

Hi, I have read the description & would like to discuss.. I have good web scraping experience & reviews. & can develop web scraping scripts in Python & C# Hope we can discuss details..

$250 CAD in 3 days
(24 Reviews)
4.8
demossoft

I have reviewed your bid request and I am very interested in your project. I was trained overseas and have an extensive customer service record so contact me so we can discuss further or begin. I work in milestones and More

$261 CAD in 7 days
(6 Reviews)
3.9
fluo3

I have great expertise in web scraping in PHP. I have built up a personal library that lets me accomplish every request easily. I can handle sessions, proxies and avoid anti-scraping controls.

$250 CAD in 3 days
(0 Reviews)
0.0
rahulkatyal

I am New to Freelancer. But i have been working with a company and was working really good i have made few apps and done like more than 1K data entry projects and i have typing speed almost 95 WPM and can assure to com More

$277 CAD in 10 days
(0 Reviews)
0.0
leomedina01

Hi sir, My name is Leonardo Medina, and I'm from Brazil. I'm a web developer, have lots of experience in PHP/MySQL/Jquery/Ajax, and perhaps I can assist you in your project. Please drop me a line, so we can discu More

$250 CAD in 3 days
(0 Reviews)
0.0
$444 CAD in 10 days
(0 Reviews)
0.0
$555 CAD in 10 days
(0 Reviews)
0.0
Felisha1

A proposal has not yet been provided

$555 CAD in 10 days
(0 Reviews)
0.0
stdhtelkom

Hello, We are understood with your requirement and we already has engine with PHP to do this job. We are expert on this kind of jobs and has long term relationship with some clients with trust on each other. Her More

$600 CAD in 10 days
(0 Reviews)
0.0
responsiveweb15

Hello: Greeting for the day! We have gone through your job post and are very excited about bringing your project on board. We are design and development company and providing outsource services. Our main expertis More

$555 CAD in 15 days
(0 Reviews)
0.0
simeunfurtula

I have strong background in web scraping, api client development and similar things in php. I have developed various web site monitoring tools for big ISP in past.

$311 CAD in 10 days
(0 Reviews)
0.0
$555 CAD in 10 days
(0 Reviews)
0.0