Build a library to open a PDF, build a full text index and save pages as images usable by DeepZoom
$30-5000 USD
Paid on delivery
We need to show PDF files in Silverlight with DeepZoom, with the ability to do full text searches. All the data must be stored in a database.
We provide numerous hints and code samples that can be used to solve the problem: text extraction, conversion to images, DeepZoom file format.
## Deliverables
**
Requisites:** use C#, SqlServer and eventually open source external libraries that can be deployed as part of the main application. The links below only give hints how to solve the problem. It is not mandatory to follow them if you can suggest better solutions.
**
Build a full text index**
The iTextSharp open source project seems to be usable for extracting text from a PDF.
<[url removed, login to view]>
The comment "To extract text using itextSharp" here:
<[url removed, login to view]>
gives a sample code how to do it. Suggest a solution to create a full text index returning a list of page numbers where a given search appears. The search must be case insensitive and accent insensitive. If multiple words are entered, a logical AND is applied in the search.
**
Extract pages as images**
The project:
<[url removed, login to view]>
gives an example how to use GhostScript to convert PDF pages to images.
**Store tiles in database for DeepZoom**
The images should then be processed and stored in the database to be usable by a MultiScaleImage control in Silverlight.
The following project could be used:
<[url removed, login to view]>
**
Sample program**
Create a C# program (can be a console executable) taking a PDF and a page range, creating the text index and the page images. Allow a search in the index returning a list of pages where the words are found. Create a Silverlight sample showing the processed PDF, one page at a time.
Project ID: #3892390