Scraping USDA 2016 database
Go to [url removed, login to view]
Scrape all the items in the database (77,413 listed):
For each page (1549 listed), grab for each item:
Information from the three columns:
From “Description”, extract the UPC code if available (leave it in the Description field too)
Go to the link of the “NDB. No” column and grab information from columns
Column 1: “Nutrient”
Column 2: “Unit”
Column 3: “Value per 100g”
Export into Excel file with
1 row per item
Columns: NDB. No, Description, Food group, UPC, and nutrition information
Note: “nutrition information” does not have a consistent format. Some items give “energy, proteins, total lipid, carbohydrates, sodium”, some give “energy, proteins, total lipid, carbohydrates, fibers, sugar, calcium, iron”, and some other might give other nutrients. Make sure to keep similar nutrients in the same column and add a column for every new nutrient found.
In the output file, similar nutrients should be put in corresponding columns. If an item does not contain information on a nutrient, the cell should be left blank (do not put a zero).
29 freelancers are bidding on average $141 for this job
Hi there, I have checked the website link.. I can write a web scraping script to get this data from the website. Let me know & we can discuss details.. Thanks..
Hi I have read and understood the work. I can provide you accurate data by scraping the site. I have done same type of scraping projects. Can we discuss please.
Hello, I believe I'm suitable for the job (Web Scraping) after reading your job request. As I'm expert on Web Scraping. Please feel free to contact me. I am looking forward to hear from you. Best regards