Hi,
I have just setup the LogicalDoc system on my Qnap NAS.
When I started filling the LogicalDoc system with scans that I have made, I noticed that the Full-Text search does not seem to work on documents that I have scanned to PDF (using a OCR scanner).
Does Logicaldoc index the text layer of such a PDF?
Can I check the indexes to see what is in them at this point?
Full text search with PDF
Moderator: car031
Post
Re: Full text search with PDF
OCR feature is only available in the commercial editions and require some configurations as per the installation guide. Once you create a document in LogicalDOC it is not immediately indexed and available for full-text searches. You have to wait the indexer task to process the new documents. Your PDF will be processed and OCRed.
When a document is indexed you will see a small white cylinder icon, click on it and all the extracted text will be downloaded in a .txt file for easy inspection
When a document is indexed you will see a small white cylinder icon, click on it and all the extracted text will be downloaded in a .txt file for easy inspection
Post
Re: Full text search with PDF
Hi Bertsjuhn,
I confirm that LogicalDOC indexes the text in PDF.
But this operation is typically performed as a scheduled task.
The indexing operation is performed by the task "Document Indexing"
(see the attached image)
http://help.logicaldoc.com/en/administr ... uled-tasks
When the document has been indexed on his left appears an icon representing a small gray silo.
By clicking on this icon you can view the text that has been extracted from the indexer
I confirm that LogicalDOC indexes the text in PDF.
But this operation is typically performed as a scheduled task.
The indexing operation is performed by the task "Document Indexing"
(see the attached image)
http://help.logicaldoc.com/en/administr ... uled-tasks
When the document has been indexed on his left appears an icon representing a small gray silo.
By clicking on this icon you can view the text that has been extracted from the indexer
- Attachments
-
- Indexed text
- 02-indexed-text.gif (24.31 KiB) Viewed 18919 times
-
- Scheduled tasks
- 01-Scheduled-tasks.gif (29.91 KiB) Viewed 18919 times
Post
Re: Full text search with PDF
Thanks for the reply
I just checked the extracted text.
This text is directly selected out of the PDF:
This is the same part of the document but then from the extracted text file
I just checked the extracted text.
This text is directly selected out of the PDF:
Code: Select all
Een schadegeval i s altijd vervelend. Mocht u echter schade hebben, dan kunt u rekenen
op onze persoonlijke dienstverlening en snelle en adequate schadebehandeling.
Code: Select all
E e n s c h a d e g e v a l i s a l t i j d v e r v e l e n d . Mocht u e c h t e r schade h e b b e n , d a n k u n t u r e k e n e n
op onze p e r s o o n l i j k e d i e n s t v e r l e n i n g en s n e l l e en adequate schadebehande l ing .
Post
Re: Full text search with PDF
Probably that PDF is a result of an OCR. While your PDF viewer shows you the words correctly, in the file each character was placed in an independent word.
Post
Re: Full text search with PDF
The strange part is that if I extract the text out of the same PDF, I do get the correct text.
Without the spaces between every letter.
Without the spaces between every letter.
Post
Re: Full text search with PDF
If it were possible, I ask you to send your PDF to our support service, we would like to examine it.
Send it to support at logicaldoc.com or attach it to this thread (as a .zip file)
Send it to support at logicaldoc.com or attach it to this thread (as a .zip file)
Post
Re: Full text search with PDF
Hello Bertsjuhn,
we tried your file and actually the extracted text contains all the distinct letters.
Currently LogicalDOC is not able to index it properly.
Commercial versions of LogicalDOC have an integrated OCR (Tesseract), which is able to execute the character recognition on images and raster PDFs.
You should try to add into LogicalDOC a document without OCR and use the LogicalDOC internal OCR.
More information about Tesseract in LogicalDOC are available here
http://help.logicaldoc.com/en/installat ... ware-linux
http://help.logicaldoc.com/en/installat ... /tesseract
The same guide is also available for Ubuntu.
On Windows Tesseract is installed by the LogicalDOC setup, so you don't need to worry about it
See the images below to configure the OCR in LogicalDOC
we tried your file and actually the extracted text contains all the distinct letters.
Currently LogicalDOC is not able to index it properly.
Commercial versions of LogicalDOC have an integrated OCR (Tesseract), which is able to execute the character recognition on images and raster PDFs.
You should try to add into LogicalDOC a document without OCR and use the LogicalDOC internal OCR.
More information about Tesseract in LogicalDOC are available here
http://help.logicaldoc.com/en/installat ... ware-linux
http://help.logicaldoc.com/en/installat ... /tesseract
The same guide is also available for Ubuntu.
On Windows Tesseract is installed by the LogicalDOC setup, so you don't need to worry about it
See the images below to configure the OCR in LogicalDOC
- Attachments
-
- Tesseract OCR Windows (OCR enabled)
- tesseract-OCR-windows-enabled.gif (27.51 KiB) Viewed 18896 times
-
- Tesseract OCR Windows (OCR disabled)
- tesseract-OCR-windows-disabled.gif (27.19 KiB) Viewed 18896 times
-
- Tesseract OCR Linux
- tesseract-OCR-linux-02.gif (25.92 KiB) Viewed 18896 times
Post
Re: Full text search with PDF
Hi Bertsjuhn,
I read your interesting post that you are able to install logicalDoc on the QNAP NAS. I need some help and advice as to how do you get root access in the QNAP to install it as SSH or telnet to QNAP is restricted only to admin user.
Hope to hear from you soon.
Best regards,
Melvyn
I read your interesting post that you are able to install logicalDoc on the QNAP NAS. I need some help and advice as to how do you get root access in the QNAP to install it as SSH or telnet to QNAP is restricted only to admin user.
Hope to hear from you soon.
Best regards,
Melvyn
Who is online
Users browsing this forum: No registered users and 49 guests