Foxit PDF SDK
foxit::addon::ocr::OCRConfig Class Reference

Inherits Object.

Public Member Functions

 OCRConfig ()
 Constructor.
 
 OCRConfig (bool is_detect_pictures, bool is_remove_noise, bool is_correct_skew, bool is_enable_text_extraction_mode, bool is_sequentially_process, bool is_auto_overwrite_resolution, int resolution_to_overwrite, int confidence)
 Constructor, with parameters. More...
 
bool operator!= (const OCRConfig &other)
 Not equal operator. More...
 
OCRConfigoperator= (const OCRConfig &other)
 Assign operator. More...
 
void Set (bool is_detect_pictures, bool is_remove_noise, bool is_correct_skew, bool is_enable_text_extraction_mode, bool is_sequentially_process, bool is_auto_overwrite_resolution, int resolution_to_overwrite, int confidence)
 Set value. More...
 

Public Attributes

int confidence
 The confidence threshold used to determine whether the recognized text is reliable. More...
 
bool is_auto_overwrite_resolution
 Decide whether to set the resolution automatically. More...
 
bool is_correct_skew
 Decide whether to enable skew correction. true means to enable skew correction. false means not to enable skew correction. Default value: true. More...
 
bool is_detect_pictures
 Decide whether to detect pictures. true means the pictures will be detected during analysis process. false means not to detect the picture, the picture content on the image of PDF document might be interpreted as text. If you would like to extract only text from the image, this option can be set to false. Default value: true.
 
bool is_enable_text_extraction_mode
 Decide whether to enable text extraction mode. More...
 
bool is_remove_noise
 Decide whether to remove noise of the image of PDF. It can be useful if the image of the PDF contains some noise, such as random black dots or speckles. If the lines of letters on the image are thin, this option should be set to false, otherwise it will affect the recognition of the text. true means the noise in the image will not be recognized during the OCR process. Noise will not be recognized as text. false means not block noise. Default value: true.
 
bool is_sequentially_process
 Decide whether the OCR engine will process pages sequentially on one process. More...
 
int resolution_to_overwrite
 The resolution (DPI) used to overwrite the image resolution of PDF document. More...
 

Detailed Description

This class represents config used for OCR.

Constructor & Destructor Documentation

◆ OCRConfig()

foxit::addon::ocr::OCRConfig::OCRConfig ( bool  is_detect_pictures,
bool  is_remove_noise,
bool  is_correct_skew,
bool  is_enable_text_extraction_mode,
bool  is_sequentially_process,
bool  is_auto_overwrite_resolution,
int  resolution_to_overwrite,
int  confidence 
)
inline

Constructor, with parameters.

Parameters
[in]is_detect_picturesDecide whether to detect pictures.
[in]is_remove_noiseDecide whether to remove noise of the image of PDF.
[in]is_correct_skewDecide whether to enable skew correction.
[in]is_enable_text_extraction_modeDecide whether to enable text extraction mode.
[in]is_sequentially_processDecide whether the OCR engine will process pages sequentially on one process.
[in]is_auto_overwrite_resolutionDecide whether to auto overwrite resolution.
[in]resolution_to_overwriteThe resolution to overwrite. This parameter is valid only when parameter is_auto_overwrite_resolution is set to false.
[in]confidenceThe confidence threshold used to determine whether the recognized text is reliable. The value range is from 0 to 100.

Member Function Documentation

◆ operator!=()

bool foxit::addon::ocr::OCRConfig::operator!= ( const OCRConfig other)
inline

Not equal operator.

Parameters
[in]otherAnother OCR config object. This function will check if current object is not equal to this one.
Returns
true means not equal, while false means equal.

◆ operator=()

OCRConfig& foxit::addon::ocr::OCRConfig::operator= ( const OCRConfig other)
inline

Assign operator.

Parameters
[in]otherAnother OCR config object, whose value would be assigned to current object.
Returns
Reference to current object itself.

◆ Set()

void foxit::addon::ocr::OCRConfig::Set ( bool  is_detect_pictures,
bool  is_remove_noise,
bool  is_correct_skew,
bool  is_enable_text_extraction_mode,
bool  is_sequentially_process,
bool  is_auto_overwrite_resolution,
int  resolution_to_overwrite,
int  confidence 
)
inline

Set value.

Parameters
[in]is_detect_picturesDecide whether to detect pictures.
[in]is_remove_noiseDecide whether to remove noise of the image of PDF.
[in]is_correct_skewDecide whether to enable skew correction.
[in]is_enable_text_extraction_modeDecide whether to enable text extraction mode.
[in]is_sequentially_processDecide whether the OCR engine will process pages sequentially on one process.
[in]is_auto_overwrite_resolutionDecide whether to auto overwrite resolution.
[in]resolution_to_overwriteThe resolution to overwrite. This parameter is valid only when parameter is_auto_overwrite_resolution is set to false.
[in]confidenceThe confidence threshold used to determine whether the recognized text is reliable. The value range is from 0 to 100.
Returns
None.

Member Data Documentation

◆ confidence

int foxit::addon::ocr::OCRConfig::confidence

The confidence threshold used to determine whether the recognized text is reliable.

The value range is [0, 100]. The larger the value, the higher the confidence requirement. For example, if this value is set to 30, the recognized text with confidence lower than 30 will be considered as unreliable text and the recognized text will be removed. Default value: 0.

◆ is_auto_overwrite_resolution

bool foxit::addon::ocr::OCRConfig::is_auto_overwrite_resolution

Decide whether to set the resolution automatically.

true means the OCR engine will automatically detect and overwrite image resolution. false means set the resolution manually by parameter resolution_to_overwrite.

◆ is_correct_skew

bool foxit::addon::ocr::OCRConfig::is_correct_skew

Decide whether to enable skew correction. true means to enable skew correction. false means not to enable skew correction. Default value: true.

Note
Skew can be corrected only for angles not greater than 20 degrees.

◆ is_enable_text_extraction_mode

bool foxit::addon::ocr::OCRConfig::is_enable_text_extraction_mode

Decide whether to enable text extraction mode.

Usually, when some parts of the text are not be found as a text block such as text on a picture or handwriting, it is recommended to set this parameter to true. It is recommended to set this parameter to false in case the complete text of a picture is recognized correctly or the sample contains images or patterns that may be considered and recognized as text. To be short this parameter enables the Engine to recognize everything remotely close to letters as text. true means to enable text extraction mode, while false means not to enable text extraction mode. Default value: false.

◆ is_sequentially_process

bool foxit::addon::ocr::OCRConfig::is_sequentially_process

Decide whether the OCR engine will process pages sequentially on one process.

This parameter is only used in OCR conversion. true means the OCR engine will process pages sequentially on one process, and the conversion time will increase.
false means the OCR engine will detecte the number of processes automatically. If only one page is processed or there is only one processor in the system, one process is used. Otherwise, parallel processing is used.
Default value: false.

◆ resolution_to_overwrite

int foxit::addon::ocr::OCRConfig::resolution_to_overwrite

The resolution (DPI) used to overwrite the image resolution of PDF document.

This parameter is valid only when parameter is_auto_overwrite_resolution is set to false. Default value: 300.