Page 1 of 2

Full text search with PDF

Posted: Sun Feb 22, 2015 4:34 pm
by Bertsjuhn
Hi,

I have just setup the LogicalDoc system on my Qnap NAS.
When I started filling the LogicalDoc system with scans that I have made, I noticed that the Full-Text search does not seem to work on documents that I have scanned to PDF (using a OCR scanner).
Does Logicaldoc index the text layer of such a PDF?
Can I check the indexes to see what is in them at this point?

Re: Full text search with PDF

Posted: Mon Feb 23, 2015 2:36 pm
by mmeschieri
OCR feature is only available in the commercial editions and require some configurations as per the installation guide. Once you create a document in LogicalDOC it is not immediately indexed and available for full-text searches. You have to wait the indexer task to process the new documents. Your PDF will be processed and OCRed.

When a document is indexed you will see a small white cylinder icon, click on it and all the extracted text will be downloaded in a .txt file for easy inspection

Re: Full text search with PDF

Posted: Mon Feb 23, 2015 2:46 pm
by agaspa
Hi Bertsjuhn,
I confirm that LogicalDOC indexes the text in PDF.

But this operation is typically performed as a scheduled task.
The indexing operation is performed by the task "Document Indexing"
(see the attached image)
http://help.logicaldoc.com/en/administr ... uled-tasks

When the document has been indexed on his left appears an icon representing a small gray silo.
By clicking on this icon you can view the text that has been extracted from the indexer

Re: Full text search with PDF

Posted: Mon Feb 23, 2015 3:53 pm
by Bertsjuhn
Thanks for the reply :)

I just checked the extracted text.

This text is directly selected out of the PDF:

Code: Select all

Een schadegeval i s altijd vervelend. Mocht u echter schade hebben, dan kunt u rekenen
op onze persoonlijke dienstverlening en snelle en adequate schadebehandeling.
This is the same part of the document but then from the extracted text file

Code: Select all

E e n s c h a d e g e v a l i s a l t i j d v e r v e l e n d . Mocht u e c h t e r schade h e b b e n , d a n k u n t u r e k e n e n
op onze p e r s o o n l i j k e d i e n s t v e r l e n i n g en s n e l l e en adequate schadebehande l ing .

Re: Full text search with PDF

Posted: Mon Feb 23, 2015 4:15 pm
by mmeschieri
Probably that PDF is a result of an OCR. While your PDF viewer shows you the words correctly, in the file each character was placed in an independent word.

Re: Full text search with PDF

Posted: Mon Feb 23, 2015 8:00 pm
by Bertsjuhn
The strange part is that if I extract the text out of the same PDF, I do get the correct text.
Without the spaces between every letter.

Re: Full text search with PDF

Posted: Tue Feb 24, 2015 9:14 am
by agaspa
If it were possible, I ask you to send your PDF to our support service, we would like to examine it.
Send it to support at logicaldoc.com or attach it to this thread (as a .zip file)

Re: Full text search with PDF

Posted: Tue Feb 24, 2015 10:08 pm
by Bertsjuhn
Since the file is to large, I have send an email :)

Re: Full text search with PDF

Posted: Wed Feb 25, 2015 11:53 am
by agaspa
Hello Bertsjuhn,

we tried your file and actually the extracted text contains all the distinct letters.
Currently LogicalDOC is not able to index it properly.
Commercial versions of LogicalDOC have an integrated OCR (Tesseract), which is able to execute the character recognition on images and raster PDFs.
You should try to add into LogicalDOC a document without OCR and use the LogicalDOC internal OCR.

More information about Tesseract in LogicalDOC are available here
http://help.logicaldoc.com/en/installat ... ware-linux
http://help.logicaldoc.com/en/installat ... /tesseract

The same guide is also available for Ubuntu.
On Windows Tesseract is installed by the LogicalDOC setup, so you don't need to worry about it

See the images below to configure the OCR in LogicalDOC

Re: Full text search with PDF

Posted: Tue Nov 03, 2015 2:48 pm
by neuromanticx
Hi Bertsjuhn,

I read your interesting post that you are able to install logicalDoc on the QNAP NAS. I need some help and advice as to how do you get root access in the QNAP to install it as SSH or telnet to QNAP is restricted only to admin user.

Hope to hear from you soon.

Best regards,
Melvyn