I need a perl script which will parse an html result page from [login to view URL] of buisness listings into comma delimitied text for easy database insertion. This program DOES NOT NEED TO ACCESS THE WEBPAGE, it would only need to work at the command line level to accept a locally saved copy of an html result page from [login to view URL] and output the results to a text file. Example command line execution: [login to view URL] < [login to view URL] > [login to view URL]
## Deliverables
The program should be able to accept a search result page from [login to view URL] which would be saved to the computer and parsed from the computer. Here is an example of the way which the program should execute: [login to view URL] < [login to view URL] > [login to view URL] I have attached a complete example input file to test the parsing. Here is a snipet of the HTML which contains one row worth of information:
[Barina Jerome F][1]
| <nobr>(262) 637-1555 </nobr> |
| 201 6th, Racine, WI 53403
This contains a name, a phone number, and an address seperated by commas and terminating the row with a semicolen. Example: "Barina Jerome F","(262) 637-1555","201 6th, Racine, WI 53403"; I would also like the catagory and subcatagory included in each row, this information can be found in the html source file included. The catagory and subcatagory of the attached html file is Catagory: Attorneys Subcatagory: Attorneys So the final output should look like this: "name","phone","address","catagory","subcatagory"; "Barina Jerome F","(262) 637-1555","201 6th, Racine, WI 53403","Attorneys","Attorneys"; There is a script on o'reillys website which does almost EXACTALLY what I want to do, except it is built for googles phonebook output. Here is the script if it helps: #!/usr/bin/perl use strict; my $file='[login to view URL]'; chomp $file; die('no filename passed to me!') unless ($file); open(FILE,$file) or die("Couldn't open the file!".$!); print qq{"name","phone number","address"\n}; my listings = split /
* * *
/, join '', ; foreach (listings[1..($#listings-1)]) { s!\n!!g; # drop spurious newlines s!<.+?>!!g; # drop all HTML tags s!"!""!g; # double escape " marks print '"' . join('","', (split /\s+-\s+/)[0..2]) . "\"\n"; } close FILE; Thanks for looking, if you have any additional questions or concerns don't hesitate to contact me. Additional Requirements: 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Complete ownership and distribution copyrights to all work purchased.
## Platform
Perl 5+ |