Need a program to auto catch data from a site.

This project was awarded to gangabass for $263 USD.

Get free quotes for a project like this
Employer working
Project Budget
$250 - $750 USD
Total Bids
Project Description

Hi there.

I need someone to write a program that can auto catch data from this site : [url removed, login to view]

The program is what I [url removed, login to view] the data.

Look at the category list in the right site first.

I need the program can auto catch some categories' data.

From [キングダム] to [横浜線ドッペルゲンガー].

*Check the attachment named “category”

You have to output data in the following way:

Data Structure:


url----URL of the article







site----goal website

article----the text

entry_data_at----publish time

created_at----catch time

picture----the cover the article

*Check check the attachment named “tip1,tip2”


[cat],means the name of [url removed, login to view] the category list in the right [url removed, login to view] can see words like [キングダム] and [トキワ来たれり!!],they are categorise.

[title],check the category [キングダム],turns to a new page,you can see words such as [キングダム 最新 492話 ネタバレ&感想 入隊選抜試験と逸材発見!?] or [キングダム 最新 491話 ネタバレ&感想 秦趙決裂と軍備強化], they are titles.

[article],check one title like [キングダム 最新 492話 ネタバレ&感想 入隊選抜試験と逸材発見!?],turns to a new page,you can see an article with lot of [url removed, login to view] have to catch the body which from the title(キングダム 最新 492話 ネタバレ&感想 入隊選抜試験と逸材発見!?) to the end of the article (end at the place above [第491話へ][第493話へ] and advertisements).

[entry_data_at],means the publish time of the articel,for example,the publish time of キングダム 最新 492話 ネタバレ&感想 入隊選抜試験と逸材発見!? is the one written under the title - 2016/10/[url removed, login to view] have to record it by using timestamp,which would turn 2016/10/01 into 1451577600.

[url],means the url of the article,like [url removed, login to view]

[site],all write as [url removed, login to view]

[character],for example,

[url removed, login to view]

You can see words written in blue [第492話 成長への募兵].

In the Developer Tools which is

<span style="font-size: x-large; color: #0000ff;">

<strong>第492話 成長への募兵</strong>


The number 492 is the [character].

About [author],[magazine],[genre],[picutre],[id] and [created_at],you should do the following step first.

Search any [cat] in [url removed, login to view],use the first result.

For example,search [キングダム] in [url removed, login to view],you can get:



ジャンル: バトル・アクション / 歴史 / 青年マンガ / アニメ化 / 中国史・三国志


[author],means the words after [作家:]. In the example the [author] is [原泰久].

[magazine],means the words after [雑誌・レーベル:], In the example the [magazine] is [ヤングジャンプ].

[genre],means the words after [genre:],need to use "," to separate them. In the example the [genre] is [バトル・アクション,歴史,青年マンガ,アニメ化,中国史・三国志].

[pitucre],the cover of the first [url removed, login to view] have to catch covers and store [url removed, login to view] the datebase there should add a data bar of [pictuer] and have url of each cover.

[id],means the order, the first one is 1, the second one is 2, etc.(MySQL autoincrement field)

[created_at],means the time you catch the article,also have to record by using timestamp. For example,if I catch the date on UTC/GMT+08:00 2016/10/11 14:40:30, so the [created_at] should be 1476168030.

Use [キングダム] as the example, do what I said,you can get:


url:[url removed, login to view]


title:キングダム 最新 492話 ネタバレ&感想 入隊選抜試験と逸材発見!?


author: ヤングジャンプ

genre: バトル・アクション,歴史,青年マンガ,アニメ化,中国史・三国志


site:[url removed, login to view]

article:<h1 class="entry-title">......



*Check the explanation named “database sample”.

This is what I [url removed, login to view] have to make the program to catch data in this way to make my server can recognize the data.

Need to catch data 2 hours one time.

Need to send me the program you write to catch data.

Need he full data scraper , also need the program that can catch new data and not catch old data again.

Tap 113114 in your bid.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online