I need someone to write a parsing code (preferably in perl) which takes articles in html format and transfers them into to Excel.
The html files will often have multiple articles contained within. They have been downloaded from two article databases, that show their results differently from each other. But each database creates its own html files consistently. As for output, for each entry line in excel I need the company name (in the filename of the html file), date of publication, full text, article ID, all of which should each be in separate columns.
## Deliverables
The attached file contains examples of the html files that need processing.