Need a program to auto catch data from a site.
This project was awarded to gangabass for $263 USD.Get free quotes for a project like this
Project Budget$250 - $750 USD
I need someone to write a program that can auto catch data from this site : [url removed, login to view]
The program is what I [url removed, login to view] the data.
Look at the category list in the right site first.
I need the program can auto catch some categories' data.
From [キングダム] to [横浜線ドッペルゲンガー].
*Check the attachment named “category”
You have to output data in the following way:
url----URL of the article
picture----the cover the article
*Check check the attachment named “tip1,tip2”
[cat],means the name of [url removed, login to view] the category list in the right [url removed, login to view] can see words like [キングダム] and [トキワ来たれり!!],they are categorise.
[title],check the category [キングダム],turns to a new page,you can see words such as [キングダム 最新 492話 ネタバレ＆感想 入隊選抜試験と逸材発見！？] or [キングダム 最新 491話 ネタバレ＆感想 秦趙決裂と軍備強化], they are titles.
[article],check one title like [キングダム 最新 492話 ネタバレ＆感想 入隊選抜試験と逸材発見！？],turns to a new page，you can see an article with lot of [url removed, login to view] have to catch the body which from the title(キングダム 最新 492話 ネタバレ＆感想 入隊選抜試験と逸材発見！？) to the end of the article (end at the place above [第４９１話へ][第４９3話へ] and advertisements).
[entry_data_at],means the publish time of the articel,for example,the publish time of キングダム 最新 492話 ネタバレ＆感想 入隊選抜試験と逸材発見！？ is the one written under the title - 2016/10/[url removed, login to view] have to record it by using timestamp,which would turn 2016/10/01 into 1451577600.
[url],means the url of the article,like [url removed, login to view]
[site],all write as [url removed, login to view]
[url removed, login to view]
You can see words written in blue [第４９２話 成長への募兵].
In the Developer Tools which is
<span style="font-size: x-large; color: #0000ff;">
The number 492 is the [character].
About [author],[magazine],[genre],[picutre],[id] and [created_at],you should do the following step first.
Search any [cat] in [url removed, login to view],use the first result.
For example,search [キングダム] in [url removed, login to view],you can get:
ジャンル： バトル・アクション / 歴史 / 青年マンガ / アニメ化 / 中国史・三国志
[author],means the words after [作家：]. In the example the [author] is [原泰久].
[magazine],means the words after [雑誌・レーベル：], In the example the [magazine] is [ヤングジャンプ].
[genre],means the words after [genre：],need to use "," to separate them. In the example the [genre] is [バトル・アクション,歴史,青年マンガ,アニメ化,中国史・三国志].
[pitucre],the cover of the first [url removed, login to view] have to catch covers and store [url removed, login to view] the datebase there should add a data bar of [pictuer] and have url of each cover.
[id],means the order, the first one is 1, the second one is 2, etc.(MySQL autoincrement field)
[created_at],means the time you catch the article,also have to record by using timestamp. For example,if I catch the date on UTC/GMT+08:00 2016/10/11 14:40:30, so the [created_at] should be 1476168030.
Use [キングダム] as the example, do what I said,you can get:
url:[url removed, login to view]
title:キングダム 最新 492話 ネタバレ＆感想 入隊選抜試験と逸材発見！？
site：[url removed, login to view]
*Check the explanation named “database sample”.
This is what I [url removed, login to view] have to make the program to catch data in this way to make my server can recognize the data.
Need to catch data 2 hours one time.
Need to send me the program you write to catch data.
Need he full data scraper , also need the program that can catch new data and not catch old data again.
Tap 113114 in your bid.
Browse Related Skills
Other things people do on Freelancer
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online