OCR & Document Processing
OCR (Optical Character Recognition) is an advanced feature of Heptora that allows you to automatically extract information from physical or digital documents. It converts printed, handwritten, or digital text into structured data that can be processed, validated, and used in your automations.
Digital Document Transformation
Section titled “Digital Document Transformation”Heptora’s OCR system eliminates the need for manual data entry, reducing errors and dramatically accelerating document processing in your workflows.
Advantages of Integrated OCR
Section titled “Advantages of Integrated OCR”- 📄 Multiple Formats: Processes PDF, JPG/PNG/TIFF images, and scanned documents
- 🤖 Integrated AI: Automatic document type classification
- 🎯 Intelligent Extraction: Automatically identifies key fields without prior configuration
- 📊 Complex Structures: Recognizes tables, grids, and complex layouts
- ✓ Automatic Validation: Verifies formats of tax IDs, IBANs, dates, and other data
- 🌍 Multilingual: Support for multiple languages and special characters
- 📈 High Accuracy: Works with variable quality documents
Extraction Capabilities
Section titled “Extraction Capabilities”Supported Formats
Section titled “Supported Formats”Heptora’s OCR can process a wide variety of input formats:
Digital Documents
Section titled “Digital Documents”- Native PDFs: Digitally created PDF documents
- Scanned PDFs: Physical documents converted to PDF
- Hybrid documents: PDFs with both digital and scanned content
Images
Section titled “Images”- JPG/JPEG: Document photographs
- PNG: Screenshots and digital documents
- TIFF: High-quality scanned documents
- BMP: Bitmap images
Adaptive Quality
Section titled “Adaptive Quality”The system automatically adapts to different conditions:
- Documents with variable resolution (from 150 DPI)
- Images with uneven lighting
- Documents with slight rotation or tilt
- Texts with different font sizes
- Documents with watermarks or stamps
Structured Extraction
Section titled “Structured Extraction”Heptora’s OCR goes beyond simple text extraction, identifying document structure:
Text Fields
Section titled “Text Fields”- Headers: Main titles and sections
- Paragraphs: Text blocks with semantic structure
- Lists: Enumerated or bulleted items
- Footnotes: References and annotations
- Form fields: Data in predefined templates
Tables and Grids
Section titled “Tables and Grids”The system recognizes and preserves tabular structure:
{ "table_1": { "headers": ["Item", "Quantity", "Price", "Total"], "rows": [ ["Product A", "10", "$25.00", "$250.00"], ["Product B", "5", "$40.00", "$200.00"] ], "total_rows": 2 }}Graphic Elements
Section titled “Graphic Elements”Identification of relevant non-textual elements:
- Logos: Extraction and position of corporate images
- Signatures: Detection of signed areas
- Barcodes: Reading 1D and 2D codes
- QR Codes: Extraction of encoded information
- Stamps: Identification of official marks
Coordinates and Positioning
Section titled “Coordinates and Positioning”Each extracted element includes its exact location in the document:
{ "field": "tax_id", "value": "12-3456789", "confidence": 0.98, "coordinates": { "x": 120, "y": 350, "width": 100, "height": 20, "page": 1 }}This allows:
- Verifying the expected position of critical fields
- Detecting misplaced or missing fields
- Creating visualizations of the extraction process
- Validating document structure
Supported Document Types
Section titled “Supported Document Types”Invoices
Section titled “Invoices”Complete extraction of commercial invoice information:
Issuer Data
Section titled “Issuer Data”- Business name and trade name
- Issuer tax ID
- Complete fiscal address
- Contact details (phone, email, website)
Recipient Data
Section titled “Recipient Data”- Customer name or business name
- Recipient tax ID
- Billing address
- Shipping address (if different)
Invoice Information
Section titled “Invoice Information”- Invoice number
- Invoice series
- Issue date
- Due date
- Billing period
Items and Totals
Section titled “Items and Totals”- Product/service descriptions
- Quantities and units
- Unit prices
- Applied discounts
- Tax base by tax rate
- Itemized tax amounts
- Withholdings (income tax, etc.)
- Total invoice amount
Additional Information
Section titled “Additional Information”- Payment method
- Bank details (IBAN)
- Purchase order reference
- Notes and observations
Contracts
Section titled “Contracts”Intelligent analysis of contractual documents:
Party Identification
Section titled “Party Identification”- Names of contracting parties
- Legal representatives
- Powers and authorities
- Registered addresses
Main Clauses
Section titled “Main Clauses”- Contract subject
- Duration and validity
- Automatic renewals
- Termination conditions
- Penalties
Economic Information
Section titled “Economic Information”- Price or consideration
- Payment method and terms
- Price reviews
- Guarantees and bonds
Relevant Dates
Section titled “Relevant Dates”- Signature date
- Effective start date
- End date
- Important milestones
Signatures and Annexes
Section titled “Signatures and Annexes”- Detection of signature areas
- Identification of signatories
- List of mentioned annexes
- References to external documents
Automated processing of structured forms:
Field Types
Section titled “Field Types”- Free text: Names, addresses, comments
- Checkboxes: Checked/unchecked options
- Radio buttons: Single selection among options
- Dropdown lists: Selected values
- Dates: In various formats (mm/dd/yyyy, etc.)
- Signatures: Handwritten or digital
Field Validation
Section titled “Field Validation”- Required fields completed
- Correct data format
- Consistency between related fields
- Detection of blank fields
Use Cases
Section titled “Use Cases”- Job applications
- Registration forms
- Surveys and questionnaires
- Medical forms
- Administrative declarations
Certificates
Section titled “Certificates”Data extraction from certified documents:
Academic Certificates
Section titled “Academic Certificates”- Issuing institution
- Degree obtained
- Grades
- Issue date
- Registration number
Professional Certificates
Section titled “Professional Certificates”- Certifying organization
- Certification type
- Level or category
- Issue and expiration dates
- Verification code
Official Certificates
Section titled “Official Certificates”- Issuing entity
- Certification subject
- Beneficiary data
- Validity
- Official seals and signatures
Identity Documents
Section titled “Identity Documents”Secure extraction of personal identification data:
ID Cards / Driver’s Licenses
Section titled “ID Cards / Driver’s Licenses”- Document number
- Full name
- Date of birth
- Nationality
- Issue and expiration dates
- Support number
Passports
Section titled “Passports”- Passport number
- Document type
- Issuing country
- Personal data
- MRZ (Machine Readable Zone)
- Issue and expiration dates
Driver’s Licenses
Section titled “Driver’s Licenses”- License number
- Authorized categories
- Issue date
- Expiration date
- Restrictions
Receipts and Tickets
Section titled “Receipts and Tickets”Processing of payment vouchers:
Purchase Tickets
Section titled “Purchase Tickets”- Issuing merchant
- Merchant tax ID
- Purchase date and time
- List of products/services
- Individual prices
- Applied discounts
- Total paid
- Payment method
Payment Receipts
Section titled “Payment Receipts”- Payment concept
- Issuer and recipient
- Amount
- Payment date
- Payment method
- Receipt reference
Use Cases
Section titled “Use Cases”- Company expense management
- Parking ticket control
- Utility receipt processing
- Payment reconciliation
Validation and Enrichment
Section titled “Validation and Enrichment”Format Validation
Section titled “Format Validation”The system includes specific validators for structured data:
Tax IDs / Business Numbers
Section titled “Tax IDs / Business Numbers”- Validation of check digit algorithm
- Verification of correct format
- Detection of impossible numbers
- Identification of type (individual/business)
- Validation of country code
- Verification of check digits
- Format according to international standard
- Correct length by country
- Recognized formats: mm/dd/yyyy, dd-mm-yy, yyyy-mm-dd, etc.
- Validation of impossible dates (February 31st, etc.)
- Normalization to standard format
- Detection of temporal inconsistencies
Amounts
Section titled “Amounts”- Recognition of decimal separators (. or ,)
- Detection of currency symbols ($, €, etc.)
- Normalization to numeric format
- Validation of expected ranges
Emails and URLs
Section titled “Emails and URLs”- Email format validation
- URL structure verification
- Domain detection
Inconsistency Detection
Section titled “Inconsistency Detection”The system automatically identifies anomalies:
Mathematical Inconsistencies
Section titled “Mathematical Inconsistencies”{ "error": "calculation_mismatch", "field": "total_invoice", "extracted_value": "$1,250.00", "calculated_value": "$1,235.50", "difference": "$14.50", "severity": "high"}Missing Data
Section titled “Missing Data”- Empty required fields
- Incomplete sections
- Missing pages (in multi-page documents)
Outliers
Section titled “Outliers”- Amounts outside expected range
- Future dates in historical documents
- Duplicate data
- Inconsistent formats
AI Enrichment
Section titled “AI Enrichment”Artificial intelligence complements extraction with additional analysis:
Automatic Classification
Section titled “Automatic Classification”The system identifies document type without prior configuration:
{ "document_type": "invoice", "confidence": 0.95, "sub_type": "service_invoice", "detected_features": [ "invoice_number", "tax_breakdown", "line_items", "company_header" ]}Semantic Extraction
Section titled “Semantic Extraction”Understands content meaning, not just text:
- Named entities: People, organizations, locations
- Relationships: Who invoices whom, who signs what
- Intentions: Request, notification, certification
- Sentiment: Document tone (for contracts and communications)
Categorization
Section titled “Categorization”Automatic document organization:
- By document type
- By supplier or customer
- By responsible department
- By date or period
- By amount or relevance
Confidence Score per Field
Section titled “Confidence Score per Field”Each extracted data includes a certainty level:
{ "invoice_number": { "value": "INV-2024-00123", "confidence": 0.99, "status": "verified" }, "invoice_date": { "value": "2024-03-15", "confidence": 0.95, "status": "verified" }, "total_amount": { "value": "$1,250.00", "confidence": 0.72, "status": "review_required", "reason": "low_image_quality" }}Confidence Thresholds
Section titled “Confidence Thresholds”- 0.95 - 1.00: Automatically verified
- 0.80 - 0.94: Accepted with validation
- 0.60 - 0.79: Review recommended
- < 0.60: Review required
Assisted Review
Section titled “Assisted Review”Specialized interface for human validation of low-confidence data:
Original Document View
Section titled “Original Document View”- Visualization of source document
- Highlighting of extracted fields
- Zoom on problematic areas
- Page navigation
Validation Panel
Section titled “Validation Panel”- List of fields to review
- Confidence indicator per field
- Alternative suggestions
- History of similar extractions
Quick Correction
Section titled “Quick Correction”- Direct value editing
- Selection among suggested options
- Marking fields as correct
- Indication of OCR errors
Workflow
Section titled “Workflow”- System marks fields with confidence < 0.80
- Sent to human review queue
- User validates or corrects values
- System learns from corrections
- Validated data integrated into process
Process Integration
Section titled “Process Integration”OCR Block in Builder
Section titled “OCR Block in Builder”OCR integrates as a draggable block in the visual process designer:
Basic Configuration
Section titled “Basic Configuration”Block: OCR Document ProcessingInput: Document (file or URL)Configuration: - Document type: Invoice - Language: English - Quality: High precisionOutput: Structured data (JSON)Flow Location
Section titled “Flow Location”The OCR block can be placed at any point in the process:
[Receive Email] → [Download Attachment] → [OCR] → [Validate Data] → [Insert into ERP]Visual Configuration
Section titled “Visual Configuration”From the visual builder you can:
- Select document type
- Define required fields
- Establish validation rules
- Configure actions based on confidence
- Define alternative flows for review
Zone Configuration
Section titled “Zone Configuration”For documents with consistent layout, you can define specific zones:
Rectangular Zones
Section titled “Rectangular Zones”Define exact document areas:
{ "zones": [ { "name": "invoice_number", "coordinates": { "x": 450, "y": 100, "width": 150, "height": 30 }, "page": 1, "type": "text", "validation": "alphanumeric" }, { "name": "total_amount", "coordinates": { "x": 450, "y": 650, "width": 100, "height": 25 }, "page": 1, "type": "currency", "validation": "positive_number" } ]}Relative Zones
Section titled “Relative Zones”Define areas relative to fixed elements:
{ "zone": "client_name", "reference_text": "Customer:", "offset_x": 100, "offset_y": 0, "width": 300, "height": 20}Zone Advantages
Section titled “Zone Advantages”- Greater precision in structured documents
- Shorter processing time
- Reduction of false positives
- Stricter validation
Document Templates
Section titled “Document Templates”Predefined models to accelerate configuration:
Included Templates
Section titled “Included Templates”Heptora includes templates for the most common documents:
- Generic invoices: Standard US/international model
- Electronic invoices: Standard electronic format
- Delivery notes: Shipping documents
- Purchase orders: Order documents
- Employment contracts: Standard models
- Identity documents: Various national IDs
Create Custom Templates
Section titled “Create Custom Templates”For organization-specific documents:
- Upload sample documents (minimum 3-5 examples)
- Label key fields in each example
- Define specific validations
- Test with new documents
- Refine and publish the template
Use Templates
Section titled “Use Templates”OCR Configuration: template: "vendor_invoice_xyz" fallback: "generic_invoice" confidence_threshold: 0.85Structured Output
Section titled “Structured Output”The OCR result is a complete JSON object:
{ "document_id": "doc_20240315_123456", "processing_date": "2024-03-15T10:30:00Z", "document_type": "invoice", "confidence": 0.94, "pages": 1, "language": "en",
"extracted_data": { "invoice_number": { "value": "INV-2024-00123", "confidence": 0.99, "coordinates": {"x": 450, "y": 100, "width": 150, "height": 30} }, "invoice_date": { "value": "2024-03-15", "confidence": 0.97, "coordinates": {"x": 450, "y": 130, "width": 100, "height": 25} }, "supplier": { "name": "Example Supplier Inc.", "tax_id": "12-3456789", "address": "123 Main Street, New York, NY 10001" }, "customer": { "name": "My Company LLC", "tax_id": "98-7654321", "address": "456 Business Ave, Los Angeles, CA 90001" }, "line_items": [ { "description": "Product A", "quantity": 10, "unit_price": 25.00, "total": 250.00 } ], "totals": { "subtotal": 250.00, "tax": 20.00, "total": 270.00, "currency": "USD" } },
"validation": { "status": "validated", "errors": [], "warnings": ["Image quality could be improved"] },
"metadata": { "file_name": "invoice_example.pdf", "file_size": 245678, "processing_time_ms": 2340 }}Accessing the Data
Section titled “Accessing the Data”In your process, access extracted data:
# Get OCR resultocr_result = step_output["ocr_document"]
# Access specific fieldsinvoice_num = ocr_result["extracted_data"]["invoice_number"]["value"]total = ocr_result["extracted_data"]["totals"]["total"]supplier_tax_id = ocr_result["extracted_data"]["supplier"]["tax_id"]
# Check confidenceif ocr_result["confidence"] > 0.9: # Automatic processing process_automatically(ocr_result)else: # Send to review send_to_review(ocr_result)Post-processing
Section titled “Post-processing”Transform and normalize extracted data:
Common Transformations
Section titled “Common Transformations”# Normalize tax IDs (remove spaces, hyphens)tax_id_clean = normalize_tax_id(extracted_tax_id)
# Convert dates to ISO formatdate_iso = convert_to_iso_date(extracted_date)
# Format amountsamount_decimal = parse_currency(extracted_amount)
# Validate and format IBANiban_formatted = validate_and_format_iban(extracted_iban)Data Enrichment
Section titled “Data Enrichment”Complement extracted data with external information:
# Look up supplier in databasesupplier = database.find_supplier_by_tax_id(extracted_tax_id)if supplier: ocr_result["supplier_id"] = supplier.id ocr_result["supplier_category"] = supplier.category
# Validate product codesfor item in line_items: product = database.find_product(item["description"]) if product: item["product_id"] = product.id item["product_category"] = product.categoryBusiness Rules
Section titled “Business Rules”Apply organization-specific logic:
# Classify invoice by amountif total > 10000: approval_level = "director"elif total > 1000: approval_level = "manager"else: approval_level = "supervisor"
# Assign to department by supplierdepartment = get_department_by_supplier(supplier_tax_id)
# Calculate payment date based on termspayment_date = calculate_payment_date( invoice_date, payment_terms, holidays_calendar)Practical Use Cases
Section titled “Practical Use Cases”Accounts Payable Automation
Section titled “Accounts Payable Automation”Scenario: Automatic processing of supplier invoices
1. [Email with invoice] → [Download PDF attachment]2. [OCR: Extract invoice data]3. [Validate: Supplier tax ID exists in system]4. [Verify: Calculations are correct]5. [Check: Associated purchase order]6. [If confidence > 95%] → [Register automatically in ERP]7. [If confidence < 95%] → [Send to human validation]8. [Update status] → [Notify accounting]Benefits:
- 80% reduction in processing time
- Elimination of transcription errors
- Complete process traceability
- Resource liberation for analysis tasks
Contract Management
Section titled “Contract Management”Scenario: Extraction of expiration dates and key conditions
1. [Signed contract] → [Scan or upload PDF]2. [OCR: Extract clauses and dates]3. [AI: Identify renewal conditions]4. [Extract: Expiration dates]5. [Create: Calendar alerts]6. [Register: In document management system]7. [30 days before expiration] → [Notify responsible party]Benefits:
- Don’t miss renewal dates
- Centralization of contractual conditions
- Proactive alerts
- Facilitates audits and reviews
Expense Control
Section titled “Expense Control”Scenario: Processing employee receipts and tickets
1. [Employee photographs receipt] → [Sends via mobile app]2. [OCR: Extract merchant, date, amount]3. [Classify: Expense type (meals, transport, etc.)]4. [Validate: Within company policy]5. [Associate: To project or client]6. [If valid] → [Approve automatically]7. [Register: In reimbursement system]8. [Generate: Monthly expense report]Benefits:
- Immediate reimbursement processing
- Compliance with expense policies
- Traceability and automatic reporting
- Improved employee experience
Customer Onboarding
Section titled “Customer Onboarding”Scenario: Identity and documentation verification
1. [Customer uploads ID and documents] → [Web portal]2. [OCR: Extract ID data]3. [Validate: ID number correct]4. [Verify: Legal age]5. [Compare: Data with completed form]6. [OCR: Process additional documents]7. [If all OK] → [Activate account automatically]8. [If discrepancies] → [Request clarification]Benefits:
- Instant onboarding (24/7)
- Reduced abandonment
- Regulatory compliance (KYC)
- Improved customer experience
Best Practices
Section titled “Best Practices”Document Preparation
Section titled “Document Preparation”Image Quality
Section titled “Image Quality”To maximize accuracy:
- Resolution: Minimum 300 DPI, optimal 400-600 DPI
- Format: Preferably PDF, or high-quality PNG/JPG
- Lighting: Uniform, without pronounced shadows
- Orientation: Document properly aligned
- Size: Avoid excessively large images (> 10MB)
Scanning
Section titled “Scanning”If scanning physical documents:
- Use color or grayscale scanning mode
- Avoid plain text mode (less flexibility)
- Clean scanner glass
- Flatten wrinkled documents
- Scan one page per file
Mobile Photography
Section titled “Mobile Photography”When using phone:
- Good natural or artificial lighting
- Avoid glare and reflections
- Frame entire document
- Keep phone parallel to document
- Use apps with automatic perspective correction
Performance Optimization
Section titled “Performance Optimization”Batch Processing
Section titled “Batch Processing”For large volumes:
# Process multiple documents in paralleldocuments = get_pending_documents()
# Divide into batches of 10batches = chunk_list(documents, 10)
for batch in batches: results = process_ocr_batch(batch, parallel=True) save_results(results)Result Caching
Section titled “Result Caching”Avoid reprocessing documents:
# Check if already processeddoc_hash = calculate_hash(document)cached_result = cache.get(doc_hash)
if cached_result: return cached_resultelse: result = process_ocr(document) cache.set(doc_hash, result, expiry=7_days) return resultIncremental Processing
Section titled “Incremental Processing”For multi-page documents:
- Process pages in parallel
- Allow early-exit if initial pages indicate invalid document
- Show progress to user
Error Management
Section titled “Error Management”Error Types
Section titled “Error Types”try: result = process_ocr(document)except OCRError as e: if e.type == "unreadable_document": notify_user("Document is not readable. Please improve quality.") elif e.type == "unsupported_format": notify_user("Unsupported format. Use PDF, JPG, or PNG.") elif e.type == "corrupted_file": notify_user("File is corrupted. Please upload again.") else: log_error(e) send_to_support(document, e)Smart Retries
Section titled “Smart Retries”max_retries = 3retry_count = 0
while retry_count < max_retries: try: result = process_ocr(document, quality="high") break except LowConfidenceError: retry_count += 1 if retry_count < max_retries: # Retry with higher quality document = enhance_image_quality(document) else: # Send to manual review send_to_review_queue(document)Security and Privacy
Section titled “Security and Privacy”Data Minimization
Section titled “Data Minimization”- Extract only necessary fields
- Don’t store unnecessary personal data
- Implement limited retention of original documents
Encryption
Section titled “Encryption”- Encrypt documents in transit (HTTPS)
- Encrypt storage of sensitive documents
- Use secrets for external system credentials
Traceability
Section titled “Traceability”Log all operations:
audit_log = { "timestamp": "2024-03-15T10:30:00Z", "user": "user@company.com", "action": "ocr_process", "document_id": "doc_123456", "document_type": "invoice", "fields_extracted": ["invoice_number", "total", "supplier_tax_id"], "confidence": 0.94, "status": "success"}
log_to_audit_system(audit_log)Anonymization
Section titled “Anonymization”For documents with personal data:
# Anonymize before storing for analysisanonymized = { "document_type": result["document_type"], "confidence": result["confidence"], "processing_time": result["metadata"]["processing_time_ms"], # Don't include personal data}
store_for_analytics(anonymized)Troubleshooting
Section titled “Troubleshooting”Low Extraction Accuracy
Section titled “Low Extraction Accuracy”Symptoms: Many fields with low confidence or incorrect values
Possible causes:
- Insufficient image quality
- Non-standard document format
- Language not configured correctly
- Document type misidentified
Solutions:
- Improve image quality (higher resolution, better lighting)
- Use specific templates for non-standard documents
- Verify configured language is correct
- Manually specify document type
- Define specific zones for critical fields
Tables Not Recognized
Section titled “Tables Not Recognized”Symptoms: Tables not extracted or lose structure
Possible causes:
- Very faint table lines
- Table without visible borders
- Complex merged cells
- Non-standard table format
Solutions:
- Activate “advanced table detection” in configuration
- Improve document contrast
- For borderless tables, use spacing-based detection
- Consider manual extraction for complex tables
- Define expected table structure in template
Multi-page Documents
Section titled “Multi-page Documents”Symptoms: Only first page is processed
Possible causes:
- Limited page configuration
- Processing timeout
- Very heavy document
Solutions:
- Verify configuration: “Process all pages”
- Increase processing timeout
- Split very large documents (>50 pages)
- Use batch processing for heavy documents
Special Characters Misinterpreted
Section titled “Special Characters Misinterpreted”Symptoms: Symbols or special characters incorrect
Possible causes:
- Incorrect encoding
- Language not configured
- Non-standard typeface
Solutions:
- Explicitly configure document language
- Verify encoding (UTF-8 recommended)
- For handwritten fonts, activate “handwriting recognition”
- Apply post-processing to normalize characters
Slow Processing
Section titled “Slow Processing”Symptoms: OCR takes a long time
Possible causes:
- Very large document or high resolution
- Multi-page processing
- Extraction of many tables
- Limited system resources
Solutions:
- Reduce resolution if > 600 DPI
- Process pages in parallel
- Use asynchronous processing for large documents
- Implement caching for repeated documents
- Consider scaling robot resources
Frequently Asked Questions
Section titled “Frequently Asked Questions”How accurate is Heptora’s OCR?
Section titled “How accurate is Heptora’s OCR?”Accuracy varies by document type and quality:
- Quality digital documents: 95-99% accuracy
- Good quality scanned documents: 90-95%
- Mobile photographed documents: 85-93%
- Low quality documents: 70-85%
Fields with confidence < 80% are marked for review.
Can I process handwritten documents?
Section titled “Can I process handwritten documents?”Yes, but with limitations. Legible handwriting has 70-85% accuracy. For forms with handwritten fields, it’s better to combine automatic OCR with human review of those specific fields.
How many documents can I process per month?
Section titled “How many documents can I process per month?”It depends on your Heptora plan. OCR consumes credits based on:
- Number of pages processed
- Document complexity (tables, low quality)
- Advanced features (AI, validation)
Check your usage dashboard or contact sales.
Are documents stored in the cloud?
Section titled “Are documents stored in the cloud?”It depends on your configuration:
- Local mode: Documents processed only on local robot, not sent to cloud
- Hybrid mode: Document sent for processing but not permanently stored
- Cloud mode: Documents stored according to your retention configuration
Choose based on your privacy requirements.
Can I train OCR with my documents?
Section titled “Can I train OCR with my documents?”Yes. You can create custom templates by training the system with examples of your specific documents. This significantly improves accuracy for proprietary or non-standard formats.
Does OCR work offline?
Section titled “Does OCR work offline?”Basic processing can work locally on the robot, but advanced AI features (classification, semantic validation) require connectivity. Configure mode according to your needs.
What do I do with fields that always have low confidence?
Section titled “What do I do with fields that always have low confidence?”For recurring problematic fields:
- Define a specific zone for that field
- Adjust validation parameters
- Create a custom template
- Consider specific post-processing
- If it persists, implement human validation only for that field
Need more help?
Section titled “Need more help?”If this guide didn’t solve your problem or you found an error in the documentation:
- Technical support: help@heptora.com
- Describe the type of document you’re trying to process
- Include a sample document (without sensitive data)
- Indicate specific fields with problems
- Mention the confidence obtained in fields
Our team will help you optimize OCR for your specific documents.
Related Resources
Section titled “Related Resources”- Process Builder - How to create automations with OCR
- Data Validation - Advanced validation rules (coming soon)
- ERP Integrations - Connect extracted data with your ERP (coming soon)
- Secrets Management - Protect external system credentials