Closed

Instagram / Twitter / Flickr Python Data Crawler

This project received 11 bids from talented freelancers with an average bid price of $859 USD.

Get free quotes for a project like this
Employer working
Project Budget
$250 - $750 USD
Total Bids
11
Project Description

Description

We need to crawl 10M geotagged data from Flickr / Instagram / Twitter to do a data visualization on the map. To achieve something like

[url removed, login to view]

Freelancer will need to deliver

tasks:

1. register Flickr / Instagram/ Twitter dev account

2. research their API to write a crawler to grab the data within the geofence bounding box. e.g. San Francisco bounding box: [url removed, login to view], [url removed, login to view], [url removed, login to view], 37.8324.

3.

deliverables:

1. three daemon/service-like python programs to crawl the geotagged data from Instagram / Twitter and Instagram and stores these data into the NoSQL database MongoDB.

2. It should be stable enough to crawl the data 24/7.

3. It should crawl 1 millions geotagged data per week even given the rate limit of the APIs.

4. the programs must have scalibility and multithread ability like queue library e.g. Celery in Python.

GEOTAG is a must! we don't need data with no GPS information.

Qualities needed to be successful

Python Experience to write service / daemon like

MongoDB, Redis, Celery

Twitter / Instagram / Flickr API experience.

Other Skills: Data Science Data scraping MongoDB Python Redis Web Crawler

You will be asked to answer the following questions when submitting a proposal:

(1)Have you written a Python crawler to use Twitter / Instagram / Flickr API before?

(2)Have you used any queue library (e.g. Celery) with multithreaded workers in Python to write daemon/service like program?

(3)Have you used any noSQL database before to store data like mongoDB?

(4) We want to estimate how much time you need to put on this whole project.

(5) And we want to set up with a small interview milestone to test: simply use your API to grab 10+ Instagram, Flickr and Twitter raw json data with GEOTAG (latitude and longitude).

(6) Next question will be how can you deal with rate limitation while crawling data? Multiple IPs / accounts ?

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online