We need an extraction script to search through a given public URL and save the results in an excel file.
The URL is " [login to view URL] " and the variable is the counter "ID=003931" that go to 488500 (more or less, growing every day).
The tasks are:
1. To write the script with an interface were we can choose the initial and final counter numbers (search interval). At the end we want to have the script to extract ourselves new records.
2. To have an excel sheet with the extracted information between record number "ID=003931" and the last one available
We are only interested in records that have the words "Insc. 1 -" at the line before data we want to extract.
The script have to access each page (increasing the counter) and extract the following fields:
1. FIRMA:
2. NIPC:
3. NATUREZA JURÍDICA:
4. SEDE:
5. Distrito:
6. Código Postal:
7. OBJECTO:
8. CAPITAL :
that at the given URL page correspondes to:
1. FIRMA: AUTO OLIVEIRA & CUNHA LDA
2. NIPC: 507519400
3. NATUREZA JURÍDICA: SOCIEDADE POR QUOTAS
4. SEDE: Rua do Rondão, Sub-Cave Esqº
5. Distrito: Braga Concelho: Guimarães Freguesia: Polvoreira
6. 4800 GUIMARÃES
7. OBJECTO: Reparação automóvel , mecânica automóvel.
8. CAPITAL : 5.000,00 Euros
The script must extract also the managers of the company that at the given URL page corresponds to:
GERÊNCIA:
José Filipe Ribeiro Oliveira
João Pedro Fernandes Cunha
If you have any doubts, please drop me a line.
Regards