307778 EPESI - DATA - FETCHING - PHP

In Progress Posted Mar 29, 2009 Paid on delivery
In Progress Paid on delivery

Need a module working under EPESI custom platform. It will fetch/grab data on demand and daily basis from websites such as [url removed, login to view], [url removed, login to view], etc.

Data would have to be matched based on upc/ean, mpn, sku, product name, etc,... and displayed on a webpage with multi or single line items (if automatically can not be done it should be done manually).

The main goal is to scrape over 100 websites, rather close to 200. One scraper per website is fine as long as there is a configuration that will handle all the scrapers setting, etc.

Websites should contain:

Deliverables:

1) Complete and fully-functional working program(s) as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

4) NDA

Requirements:

1. Has to work under EPESI platform located at [url removed, login to view]

2. Common identification of products such as UPC/EAN, MPN, SKU. If no identification of product found it will be matched by Title.

3. Whenever possible use functions already used in the system, no overhead accepted

4. Products that are scrapped (grabbed) from price search websites should be matched as per specification

5. Products need to have price search action button next to it in WMS module as well ability to search using autocomplete function.

6. New sites to scrap should be easy to add (no more than 2 hour of work to add new site by admin) and utilize plugins

7. Sites that are already done have to be configurable in order for admin to make script corrections

8. Websites categories should be matched with categories in wms and ecommerce module, therefore with a single click desired products from a chosen category should be fetched. Action button in WMS module.

9. Multiple fetching hosts. Hosts will fetch data from multiple websites locations and place it in the same db. Master module will command slave modules how, when ,and from which website to fetch the data. Master module will tell which fetching host to use for which website. It should alternate randomly hosts and report any problems through alerter in EPESI. Active hosts will redistribute the load if there are problems with fetching. (something like load-balancing)

10. Websites categories should be indexed and saved for future fetching requests. It will have to be done periodically and if any current settings are changed send alert through EPESI to inform admin in order to make changes.

11. Implement on the fly translation in order to match categories and products info being fetched as well matching should be done base on sample of products in the category if identification at point 2 of the req. is not available.

12. Data should be stored in the database for easy retrieval as per EPESI project manager specification

14. Real-time exchange rates updates for different currencies

15. Number of entries to calculate average prices needs to be configurable

16. All Displayed columns has to be sorted.

17. When action initialized:

a)It will collect data once a day as a whole system

b)As a Category (list of products)

c)As a single product

18. Ability to add, remove columns (data)

Pages should have:

1. Manufacturer, Model, Description, vendor, Category, Lowest Price from website (will calculate currency based on the default selected) and actual lowest price, show percentage of difference between the lowest value and the website value

a) Highlight in green the price that is the lowest and in the red that is the highest.

b) Ability to click on the price to go to the particular website's product page

c) Ability to remove vendors that have unreal prices from website or have very low ratings

2. Manufacturer, Model, Description, Category, Average Price of 5 (configurable) lowest entries from website (using the selected default currency), Average Price of 5 lowest entries using original currency, show percentage of difference between the lowest ave 5 and the highest website price

a) Highlight in green the price that is the lowest and in the red that is the highest

b) Ability to click on the ave. price to go to the particular website's products page

c) Ability to remove vendors that have unreal prices from website or have very low ratings

3. Product Name, Description, 5 websites with the lowest prices (default currency), price range of 5 lowest websites (default currency), price range of 5 highest websites (default currency), percentage between average of 5 lowest and 5 highest website prices (default currency)

a) Ability to click on price to go to the particular website's products page

b) Ability to remove vendors that have unreal prices from website or have very low ratings

4. Reporting module to Generate reports from the data stored in the database

a) The reporting module will have to work similar to crystal reports. I can create my own reports and the data would be populated on the website. Charting is not necessary but if it goes with reports it would be ok.

b) Products that have the highest percentage difference

It has to be easy to integrate into a website (modular design) and have admin site to control the configuration.

All parameters used should be configurable as per EPESI module administration

Additional website (price search engine) should be easy to add. The data grabber website should run without locking up, be fast and responsive.

The operation of the module has to be user friendly.

Platform:

EPESI,PHP,AJAX,JAVASCRIPT and MySQL DB

Additional questions.

UPC/EAN matching, MPN matching, SKU matching matching. IF no identification of product found it will be matched by Title.

Title matching should be automatic base on the probability...

1. If All words are present it would be 100% match.

2. If at least 2 words are matched and the rest is not it would be 75% match.

3. If 1 word is present it would be 25% match.

4. No match

The percentage of matching does not matter at this point it would have to be worked out.

The base for text matching is a Title of the product.

Point 1. "Nikon D90" present everywhere would be 100% match

Point 2. "Nikon D90 body", "Nikon D90 korpus", "Nikon D90 kit" 75%

Point 3. "Nikon lens", "Nikon flash", "Nikon P80" 25%

Point 4. None would be left in the repository for matching or deletion.

All 100% matches would be done without manual intervention.

The 75% matches would be shown to the end user as the best suggestion and accepted or not. If not it would have to be matched with the remaining products.

The 25% matches would be shown to the end user as the best suggestion. If not accepted it would have to be matched with the remaining products.

If products is matched it should be remembered.

Example:

Initial project will have 3 websites:

1. [url removed, login to view]

2. [url removed, login to view]

3. [url removed, login to view]

Here is the sample of a page for Nikon D90 that info should be pulled and matched,

I think the best way to grab and match the data is based on categories, here is the sample of photo category:

http://cameras.pricegrabber.com/digital/Nikon-D90-Black-SLR-Digital-Body/m90725732.html/search=Nikon%20d90/st=product/sv=title

[url removed, login to view]

[url removed, login to view]

AJAX MySQL Odd Jobs PHP SQL

Project ID: #2053574

About the project

Remote project Active Jul 11, 2012