Computer Vision Automation
Computer Vision Automation in Heptora enables interaction with any visible application on screen through visual element recognition, keyboard and mouse simulation, and text reading. It’s the perfect solution for automating legacy applications, systems without APIs, mainframes, and proprietary software.
Visual Application Control
Section titled “Visual Application Control”Heptora uses advanced computer vision technology to “see” the screen as a human would, identify visual elements, and execute precise actions on them. This eliminates dependency on APIs or programmatic interfaces.
Visual Automation Advantages
Section titled “Visual Automation Advantages”- 🎯 Universal Compatibility: Works with any visible application, no APIs needed
- 🖼️ Intelligent Recognition: Identifies buttons, fields, icons, and interface elements
- 📝 Integrated OCR: Reads and extracts text directly from screen
- 🎮 Complete Control: Simulates clicks, keystrokes, dragging, and any user action
- 🔍 Pixel-Perfect Precision: Exact localization of on-screen elements
- 🏢 Ideal for Legacy: Automates old systems without modernization
- 🚀 No Modifications: Requires no changes to target applications
- 🔄 Resilient: Adapts to minor interface changes
Visual Recognition Capabilities
Section titled “Visual Recognition Capabilities”UI Element Identification
Section titled “UI Element Identification”The computer vision system recognizes multiple types of on-screen elements:
Standard Components
Section titled “Standard Components”Buttons and Controls:
- Text buttons
- Icon buttons
- Radio buttons
- Checkboxes
- Selectors and dropdowns
- Sliders and slide controls
Input Fields:
- Text fields
- Multi-line text areas
- Password fields
- Numeric fields
- Date pickers
- Search fields
Navigation Elements:
- Menus and submenus
- Tabs
- Toolbars
- Breadcrumbs
- Links and hyperlinks
- Navigation icons
Visual Elements
Section titled “Visual Elements”Image Recognition:
# Find a specific button by its imagesave_button = vision.find_image( template="images/save_button.png", confidence=0.9)
if save_button: vision.click(save_button.center) print(f"Button found at: {save_button.coordinates}")else: print("Button not found on screen")Recognition features:
- Tolerance to color variations
- Detection at different scales
- Resistance to lighting changes
- Configurable partial matching
- Search in specific regions
OCR - Text Recognition
Section titled “OCR - Text Recognition”Extract text directly from any screen area:
Simple Text Reading
Section titled “Simple Text Reading”# Read text from a specific arearegion = {"x": 100, "y": 200, "width": 300, "height": 50}text = vision.read_text(region=region)print(f"Text found: {text}")Text Search on Screen
Section titled “Text Search on Screen”# Search for a specific word or phraseresult = vision.find_text( text="Total amount", exact_match=False)
if result: # Read the numeric value to the right value_region = { "x": result.x + result.width + 10, "y": result.y, "width": 100, "height": result.height } value = vision.read_text(region=value_region) print(f"Value found: {value}")Structured Extraction
Section titled “Structured Extraction”Tables and Grids:
# Extract data from a visual tabletable = vision.extract_table( region={"x": 50, "y": 100, "width": 800, "height": 400}, has_header=True)
for row in table.rows: print(f"ID: {row[0]}, Name: {row[1]}, Amount: {row[2]}")Forms:
# Extract all fields from a formform = vision.extract_form_fields( region="full_screen")
for field, value in form.items(): print(f"{field}: {value}")Screen Change Detection
Section titled “Screen Change Detection”Monitor specific areas to detect updates:
# Wait for an element to appearvision.wait_for_image( template="images/success_message.png", timeout=30)
# Wait for an element to disappear (e.g., loading spinner)vision.wait_until_gone( template="images/spinner.png", timeout=60)
# Wait for a screen area to changevision.wait_for_change( region={"x": 500, "y": 200, "width": 300, "height": 100}, timeout=10)Visual Validation
Section titled “Visual Validation”Verify application state through visual inspection:
# Verify we're on the correct screenif vision.image_exists("images/app_logo.png"): print("Start screen confirmed")else: raise Exception("Not on expected screen")
# Validate that a process completedif vision.text_exists("Process completed successfully"): log.info("Operation finished correctly")else: log.warning("Confirmation message not found")
# Check color of an element (e.g., status indicator)indicator_color = vision.get_pixel_color(x=850, y=120)if indicator_color == (0, 255, 0): # Green print("Status: Active")elif indicator_color == (255, 0, 0): # Red print("Status: Error")Keyboard and Mouse Control
Section titled “Keyboard and Mouse Control”Mouse Actions
Section titled “Mouse Actions”Clicks and Movements
Section titled “Clicks and Movements”# Simple click at specific coordinatesvision.mouse.click(x=500, y=300)
# Right clickvision.mouse.right_click(x=500, y=300)
# Double clickvision.mouse.double_click(x=500, y=300)
# Move cursor without clickingvision.mouse.move_to(x=500, y=300)
# Click at the center of a found imagebutton = vision.find_image("accept_button.png")vision.mouse.click(button.center)Drag & Drop
Section titled “Drag & Drop”# Drag from one point to anothervision.mouse.drag_to( from_x=200, from_y=300, to_x=400, to_y=300)
# Drag a found elementelement = vision.find_image("file.png")destination = vision.find_image("folder.png")vision.mouse.drag_to( from_x=element.center.x, from_y=element.center.y, to_x=destination.center.x, to_y=destination.center.y)Scroll and Navigation
Section titled “Scroll and Navigation”# Vertical scroll (positive down, negative up)vision.mouse.scroll(clicks=-5) # Scroll up
# Horizontal scrollvision.mouse.horizontal_scroll(clicks=3) # Scroll right
# Scroll in a specific regionvision.mouse.move_to(x=600, y=400)vision.mouse.scroll(clicks=10) # Scroll in that areaKeyboard Actions
Section titled “Keyboard Actions”Keys and Combinations
Section titled “Keys and Combinations”# Type textvision.keyboard.type("Hello World")
# Special keysvision.keyboard.press("Enter")vision.keyboard.press("Tab")vision.keyboard.press("Escape")
# Key combinationsvision.keyboard.hotkey("Ctrl", "C") # Copyvision.keyboard.hotkey("Ctrl", "V") # Pastevision.keyboard.hotkey("Ctrl", "S") # Savevision.keyboard.hotkey("Alt", "F4") # Close window
# Hold key downvision.keyboard.key_down("Shift")vision.keyboard.press("A")vision.keyboard.press("B")vision.keyboard.press("C")vision.keyboard.key_up("Shift") # Types "ABC"Intelligent Typing
Section titled “Intelligent Typing”# Type with pause between characters (more natural)vision.keyboard.type("user@example.com", interval=0.1)
# Type with interspersed special keysvision.keyboard.type("Name: ")vision.keyboard.type("John Doe")vision.keyboard.press("Tab")vision.keyboard.type("Email: ")vision.keyboard.type("john@example.com")
# Clear field and typevision.keyboard.hotkey("Ctrl", "A") # Select allvision.keyboard.press("Delete") # Deletevision.keyboard.type("New text") # TypeSystem-Specific Shortcuts
Section titled “System-Specific Shortcuts”# Windowsvision.keyboard.hotkey("Win", "R") # Runvision.keyboard.type("notepad")vision.keyboard.press("Enter")
# Window operationsvision.keyboard.hotkey("Alt", "Tab") # Switch windowvision.keyboard.hotkey("Win", "D") # Show desktopvision.keyboard.hotkey("Ctrl", "Shift", "Escape") # Task manager
# Text operationsvision.keyboard.hotkey("Ctrl", "Z") # Undovision.keyboard.hotkey("Ctrl", "Y") # Redovision.keyboard.hotkey("Ctrl", "F") # FindAction Combinations
Section titled “Action Combinations”Create complex sequences combining mouse and keyboard:
# Copy text visible on screendef copy_text_on_screen(region): # Triple click to select entire paragraph center_x = region["x"] + region["width"] // 2 center_y = region["y"] + region["height"] // 2
vision.mouse.move_to(center_x, center_y) vision.mouse.triple_click(center_x, center_y)
# Copy to clipboard vision.keyboard.hotkey("Ctrl", "C") time.sleep(0.5)
# Read from clipboard return clipboard.paste()
# Fill complete formdef fill_form(data): # Click on first field first_field = vision.find_text("Name:") vision.mouse.click(first_field.x + 100, first_field.y)
# Fill fields with Tab between them vision.keyboard.type(data["name"]) vision.keyboard.press("Tab")
vision.keyboard.type(data["lastname"]) vision.keyboard.press("Tab")
vision.keyboard.type(data["email"]) vision.keyboard.press("Tab")
vision.keyboard.type(data["phone"])
# Submit form vision.keyboard.press("Enter")Legacy Application Automation
Section titled “Legacy Application Automation”Terminal Systems and Mainframes
Section titled “Terminal Systems and Mainframes”Computer vision is ideal for automating terminal applications that lack modern APIs:
Terminal Emulators (AS/400, IBM 3270)
Section titled “Terminal Emulators (AS/400, IBM 3270)”Scenario: Order entry in mainframe system
# Wait for login screen to loadvision.wait_for_text("MAIN SYSTEM", timeout=10)
# Enter credentialsvision.keyboard.type(username)vision.keyboard.press("Tab")vision.keyboard.type(password)vision.keyboard.press("Enter")
# Wait for main menuvision.wait_for_text("MAIN MENU", timeout=5)
# Navigate to orders module (option 2)vision.keyboard.type("2")vision.keyboard.press("Enter")
# Wait for order entry screenvision.wait_for_text("ORDER ENTRY", timeout=5)
# Fill order formvision.keyboard.type(customer_code)vision.keyboard.press("Tab")vision.keyboard.type(order_number)vision.keyboard.press("Tab")
# Enter order linesfor item in order_items: vision.keyboard.type(item["code"]) vision.keyboard.press("Tab") vision.keyboard.type(str(item["quantity"])) vision.keyboard.press("Tab") vision.keyboard.press("Enter") # Confirm line
# Confirm order (F5)vision.keyboard.press("F5")
# Verify confirmation messageif vision.text_exists("ORDER REGISTERED"): # Extract confirmed order number number_region = vision.find_text("ORDER NO:") confirmed_number = vision.read_text( region={ "x": number_region.x + 150, "y": number_region.y, "width": 100, "height": 20 } ) log.info(f"Order confirmed: {confirmed_number}")else: raise Exception("Order confirmation not received")
# Return to main menuvision.keyboard.press("F3")DOS Applications
Section titled “DOS Applications”# Launch DOS applicationos.system("start dosbox legacy_app.exe")
# Wait for application to loadvision.wait_for_text("INVENTORY SYSTEM V3.2", timeout=15)
# Navigate menus (number + Enter)vision.keyboard.type("1") # Queriesvision.keyboard.press("Enter")
vision.keyboard.type("3") # Query by codevision.keyboard.press("Enter")
# Enter product codevision.keyboard.type("PROD-12345")vision.keyboard.press("Enter")
# Extract information from screenname = vision.read_text(region={"x": 150, "y": 100, "width": 300, "height": 20})stock = vision.read_text(region={"x": 150, "y": 140, "width": 100, "height": 20})price = vision.read_text(region={"x": 150, "y": 180, "width": 100, "height": 20})
print(f"Product: {name}")print(f"Stock: {stock}")print(f"Price: {price}")
# Exit (multiple Escape)for _ in range(3): vision.keyboard.press("Escape") time.sleep(0.5)Citrix and Remote Desktop Applications
Section titled “Citrix and Remote Desktop Applications”Automate applications running in remote sessions:
Connection and Navigation
Section titled “Connection and Navigation”# Connect to Citrix sessiondef connect_citrix(application): # Open Citrix Workspace vision.keyboard.hotkey("Win", "R") vision.keyboard.type("citrix workspace") vision.keyboard.press("Enter")
# Wait for loading vision.wait_for_image("citrix_workspace_logo.png", timeout=10)
# Search for application search_field = vision.find_image("search_icon.png") vision.mouse.click(search_field.center) vision.keyboard.type(application) time.sleep(1)
# Click on first result first_result = vision.find_image(f"{application}_icon.png") vision.mouse.click(first_result.center)
# Wait for application launch vision.wait_for_text("Loading...", timeout=5) vision.wait_until_gone("Loading...", timeout=30)
# Use remote applicationconnect_citrix("Financial ERP")
# Now interact normallyvision.wait_for_text("MAIN MENU")# ... rest of automationLatency Handling
Section titled “Latency Handling”# For remote connections, increase wait timesREMOTE_WAIT_TIME = 2.0
def remote_click(x, y): vision.mouse.click(x, y) time.sleep(REMOTE_WAIT_TIME)
def remote_type(text): vision.keyboard.type(text, interval=0.15) # Slower time.sleep(REMOTE_WAIT_TIME)
def verify_remote_screen_change(expected_text, timeout=30): start = time.time() while time.time() - start < timeout: if vision.text_exists(expected_text): time.sleep(REMOTE_WAIT_TIME) # Additional wait return True time.sleep(1) return FalseProprietary Applications without APIs
Section titled “Proprietary Applications without APIs”Automate closed or undocumented software:
Windows Desktop Applications
Section titled “Windows Desktop Applications”# Example: Proprietary medical management softwaredef register_patient(patient_data): # Ensure application is in foreground app_window = vision.find_image("app_logo.png") if not app_window: # Launch application if not open vision.keyboard.hotkey("Win", "R") vision.keyboard.type("C:\\Program\\MediManage\\mediman.exe") vision.keyboard.press("Enter") vision.wait_for_image("app_logo.png", timeout=20)
# Navigate to patients module patients_button = vision.find_image("buttons/patients.png") vision.mouse.click(patients_button.center)
# Click "New Patient" vision.wait_for_image("buttons/new_patient.png", timeout=5) new_button = vision.find_image("buttons/new_patient.png") vision.mouse.click(new_button.center)
# Wait for form to open vision.wait_for_text("PATIENT DATA", timeout=3)
# Fill form fields = [ patient_data["name"], patient_data["lastname"], patient_data["id_number"], patient_data["birth_date"], patient_data["phone"], patient_data["email"], patient_data["address"] ]
for field in fields: vision.keyboard.type(str(field)) vision.keyboard.press("Tab") time.sleep(0.3)
# Save save_button = vision.find_image("buttons/save.png") vision.mouse.click(save_button.center)
# Verify confirmation if vision.wait_for_text("PATIENT REGISTERED", timeout=5): # Extract medical record number number_region = vision.find_text("Record No:") medical_record = vision.read_text( region={ "x": number_region.x + 120, "y": number_region.y, "width": 100, "height": 20 } ) return medical_record else: raise Exception("Could not confirm registration")Software with Non-Standard Interface
Section titled “Software with Non-Standard Interface”# Example: Application with custom drawn controlsdef click_custom_button(button_color, search_region): """ Find and click custom drawn buttons based on their distinctive color """ # Capture screenshot of region screenshot = vision.capture_region(search_region)
# Find pixels of button color locations = vision.find_color( color=button_color, tolerance=10, region=search_region )
if locations: # Calculate center of pixel cluster center = calculate_cluster_center(locations) vision.mouse.click(center.x, center.y) return True return False
# UsageGREEN_ACCEPT = (0, 200, 50)RED_CANCEL = (200, 50, 50)
button_region = {"x": 400, "y": 500, "width": 300, "height": 100}
if click_custom_button(GREEN_ACCEPT, button_region): print("Accept button pressed")SAP GUI
Section titled “SAP GUI”While SAP has scripting, computer vision can be useful in restricted environments:
# Launch SAP transactiondef execute_sap_transaction(transaction_code): # Focus on command field command_field = vision.find_image("sap_command_field.png") vision.mouse.click(command_field.center)
# Clear any previous text vision.keyboard.hotkey("Ctrl", "A") vision.keyboard.press("Delete")
# Enter transaction vision.keyboard.type(f"/n{transaction_code}") vision.keyboard.press("Enter")
# Wait for loading time.sleep(2)
# Extract data from SAP tabledef extract_sap_table(table_region): # Click on first cell vision.mouse.click( table_region["x"] + 50, table_region["y"] + 50 )
rows = [] current_row = 0 max_rows = 50
while current_row < max_rows: # Select entire row vision.keyboard.hotkey("Shift", "End") vision.keyboard.hotkey("Ctrl", "C") time.sleep(0.3)
# Get data from clipboard row_text = clipboard.paste()
if not row_text or row_text in rows: break # End of table or repeated row
rows.append(row_text)
# Go to next row vision.keyboard.press("Down") vision.keyboard.press("Home") time.sleep(0.2)
current_row += 1
return rowsAdvanced Techniques
Section titled “Advanced Techniques”Multiple Monitor Handling
Section titled “Multiple Monitor Handling”# Specify which monitor to searchprimary_monitor = vision.get_monitor(0)secondary_monitor = vision.get_monitor(1)
# Find element on specific monitorbutton = vision.find_image( "button.png", region=secondary_monitor.bounds)
# Move cursor between monitorsvision.mouse.move_to( secondary_monitor.x + 500, secondary_monitor.y + 300)Capture and Comparison
Section titled “Capture and Comparison”# Capture current state of an areainitial_capture = vision.capture_region( region={"x": 100, "y": 100, "width": 400, "height": 300})
# Perform some actionvision.mouse.click(500, 400)time.sleep(2)
# Capture new statefinal_capture = vision.capture_region( region={"x": 100, "y": 100, "width": 400, "height": 300})
# Compare if changedif vision.images_are_different(initial_capture, final_capture, threshold=0.95): print("Screen changed after action")else: print("No changes detected")Adaptive Recognition
Section titled “Adaptive Recognition”# Find element with multiple variantsvariants = [ "save_button_en.png", "save_button_es.png", "save_button_alternative.png"]
found_button = Nonefor variant in variants: button = vision.find_image(variant, confidence=0.85) if button: found_button = button break
if found_button: vision.mouse.click(found_button.center)else: # Try text search as fallback if vision.text_exists("Save"): region = vision.find_text("Save") vision.mouse.click(region.center)Coordination with Other Automations
Section titled “Coordination with Other Automations”# Combine computer vision with web automationdef hybrid_process(): # Part 1: Extract data from legacy app with vision legacy_data = extract_mainframe_data()
# Part 2: Process in modern web application browser = heptora.browser.create() browser.navigate("https://modern-erp.company.com") browser.fill_form(legacy_data)
# Part 3: Verify result with computer vision vision.wait_for_text("Synchronization completed")
return TruePractical Use Cases
Section titled “Practical Use Cases”Data Migration from Legacy System
Section titled “Data Migration from Legacy System”Scenario: Extract product catalog from old DOS software
def extract_complete_catalog(): products = []
# Open legacy system open_inventory_system()
# Go to product query vision.keyboard.type("1") # Products menu vision.keyboard.press("Enter") vision.keyboard.type("2") # List all vision.keyboard.press("Enter")
# Extract page by page page = 1 while True: # Read products from current screen for line in range(5, 20): # 15 products per page # Position of each field y = 80 + (line * 25)
code = vision.read_text( region={"x": 50, "y": y, "width": 100, "height": 20} ).strip()
if not code: break # End of products
name = vision.read_text( region={"x": 160, "y": y, "width": 300, "height": 20} ).strip()
price = vision.read_text( region={"x": 470, "y": y, "width": 80, "height": 20} ).strip()
stock = vision.read_text( region={"x": 560, "y": y, "width": 60, "height": 20} ).strip()
products.append({ "code": code, "name": name, "price": price, "stock": stock })
# Try to go to next page vision.keyboard.press("PageDown") time.sleep(1)
# Check if there are more pages (look for indicator) if vision.text_exists("END OF LIST"): break
page += 1
if page > 100: # Safety limit break
return products
# Export to modern formatproducts = extract_complete_catalog()df = pandas.DataFrame(products)df.to_excel("migrated_catalog.xlsx", index=False)print(f"Extracted {len(products)} products")Medical Application Automation
Section titled “Medical Application Automation”Scenario: Laboratory results registration
def process_lab_results(results_file): # Read results from file results = read_excel(results_file)
# Open hospital management application open_hospital_app()
for result in results: # Search for patient search_patient(result["id_number"])
# Go to laboratory module lab_button = vision.find_image("modules/laboratory.png") vision.mouse.click(lab_button.center)
# New result vision.keyboard.hotkey("Ctrl", "N") vision.wait_for_text("NEW RESULT")
# Fill form vision.keyboard.type(result["analysis_type"]) vision.keyboard.press("Tab")
vision.keyboard.type(result["date"]) vision.keyboard.press("Tab")
# Enter values for value in result["values"]: vision.keyboard.type(value["parameter"]) vision.keyboard.press("Tab") vision.keyboard.type(str(value["result"])) vision.keyboard.press("Tab") vision.keyboard.type(value["unit"]) vision.keyboard.press("Enter")
# Save vision.keyboard.hotkey("Ctrl", "S")
# Verify saved if vision.wait_for_text("RESULT SAVED", timeout=5): log.info(f"Result saved for patient {result['id_number']}") else: log.error(f"Error saving result for {result['id_number']}") screenshot = vision.capture_screen() vision.save_image(screenshot, f"error_{result['id_number']}.png")
# Return to main menu vision.keyboard.press("Escape") vision.keyboard.press("Escape")Batch Process Validation
Section titled “Batch Process Validation”Scenario: Verify a batch process executed correctly
def verify_batch_process(process_name): # Open system monitor open_system_monitor()
# Search for specific process search_field = vision.find_image("search_icon.png") vision.mouse.click(search_field.center) vision.keyboard.type(process_name) vision.keyboard.press("Enter") time.sleep(2)
# Verify status status_region = vision.find_text("Status:") status_text = vision.read_text( region={ "x": status_region.x + 80, "y": status_region.y, "width": 150, "height": 25 } )
# Visually verify indicator color indicator_color = vision.get_pixel_color( x=status_region.x - 30, y=status_region.y + 10 )
if "COMPLETED" in status_text.upper() and indicator_color[1] > 200: # Green log.info(f"Process {process_name} completed successfully")
# Extract statistics records = vision.read_text_after_label("Records processed:") errors = vision.read_text_after_label("Errors:") duration = vision.read_text_after_label("Duration:")
return { "status": "COMPLETED", "records": records, "errors": errors, "duration": duration } else: log.error(f"Process {process_name} failed or pending") return {"status": "ERROR"}Best Practices
Section titled “Best Practices”Search Strategies
Section titled “Search Strategies”-
Use stable references
- Search for elements that don’t change (logos, titles)
- Avoid dynamic elements like timestamps
-
Combine methods
# Robust method: try image, then textbutton = vision.find_image("next_button.png")if not button:button = vision.find_text("Next")if button:vision.mouse.click(button.center) -
Use search regions
# Faster and more precisebutton = vision.find_image("button.png",region={"x": 700, "y": 400, "width": 200, "height": 100})
Error Handling
Section titled “Error Handling”def robust_action(max_attempts=3): for attempt in range(max_attempts): try: # Try the action vision.mouse.click_image("button.png")
# Verify it worked if vision.wait_for_text("Confirmation", timeout=5): return True except ElementNotFoundError: if attempt < max_attempts - 1: log.warning(f"Attempt {attempt + 1} failed, retrying...") time.sleep(2) else: # Capture error evidence screenshot = vision.capture_screen() vision.save_image(screenshot, "final_error.png") raise
return FalsePerformance Optimization
Section titled “Performance Optimization”-
Cache template images
# Load templates at startTEMPLATES = {"save": vision.load_template("buttons/save.png"),"cancel": vision.load_template("buttons/cancel.png"),"accept": vision.load_template("buttons/accept.png")}# Use cached templatesbutton = vision.find_template(TEMPLATES["save"]) -
Reduce search area
# Define common regionsREGIONS = {"bottom_buttons": {"x": 0, "y": 700, "width": 1920, "height": 280},"top_menu": {"x": 0, "y": 0, "width": 1920, "height": 100},"right_panel": {"x": 1400, "y": 100, "width": 520, "height": 900}} -
Adjust confidence levels
# For stable elements, use high confidencelogo = vision.find_image("logo.png", confidence=0.95)# For variable elements, reduce confidencebutton = vision.find_image("dynamic_button.png", confidence=0.75)
Evidence Capture
Section titled “Evidence Capture”def execute_with_evidence(process_name): timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") evidence_folder = f"evidence/{process_name}_{timestamp}" os.makedirs(evidence_folder, exist_ok=True)
step = 0
def capture_step(description): nonlocal step step += 1 screenshot = vision.capture_screen() path = f"{evidence_folder}/step_{step:02d}_{description}.png" vision.save_image(screenshot, path) log.info(f"Captured: {description}")
try: capture_step("start")
# Execute process open_application() capture_step("application_opened")
perform_operation() capture_step("operation_completed")
capture_step("successful_end")
except Exception as e: capture_step("error") log.error(f"Error: {str(e)}") raiseTroubleshooting
Section titled “Troubleshooting”Element Not Found
Section titled “Element Not Found”Symptoms: vision.find_image() or vision.find_text() returns None
Solutions:
- Verify element is visible on screen
- Reduce confidence level:
confidence=0.8instead of0.9 - Capture new template under current conditions
- Use more specific search region
- Try searching by text instead of image
Clicks Not Registered
Section titled “Clicks Not Registered”Symptoms: Click executes but has no effect
Solutions:
- Add
time.sleep(0.5)after click - Verify application has focus
- Use
double_click()if necessary - Verify click coordinates with screenshot
Inaccurate OCR
Section titled “Inaccurate OCR”Symptoms: Text read incorrectly
Solutions:
- Increase contrast of captured region
- Specify OCR language:
vision.read_text(lang='eng') - Preprocess image (grayscale, thresholding)
- Use region more tightly fitted to specific text
Slow Performance
Section titled “Slow Performance”Symptoms: Searches take too long
Solutions:
- Use limited search regions
- Reduce template resolution
- Cache loaded templates
- Avoid full screen searches
Frequently Asked Questions
Section titled “Frequently Asked Questions”Does computer vision work at any resolution?
Section titled “Does computer vision work at any resolution?”Yes, but it’s recommended to capture templates at the same resolution where the process will run. For multi-resolution, use text search instead of image when possible.
Can I automate applications in headless mode?
Section titled “Can I automate applications in headless mode?”No, computer vision requires the application to be visible on screen. For background automation, consider using the robot in a virtual machine with visible desktop.
Does it work with applications in multiple languages?
Section titled “Does it work with applications in multiple languages?”Yes, but you must capture templates for each language or use text recognition with the appropriate language specified.
What if the interface changes slightly?
Section titled “What if the interface changes slightly?”You can adjust the confidence level to tolerate minor variations. For major changes, you’ll need to update image templates.
Can I combine computer vision with web automation?
Section titled “Can I combine computer vision with web automation?”Absolutely. It’s common to use computer vision for legacy systems and web automation for modern systems in the same process.
How do I handle unexpected pop-ups?
Section titled “How do I handle unexpected pop-ups?”Implement periodic checks:
if vision.image_exists("error_popup.png"): vision.mouse.click_image("close_popup_button.png")Need more help?
Section titled “Need more help?”If this guide didn’t solve your problem or you found an error in the documentation:
- Technical support: help@heptora.com
- Describe the application you’re trying to automate
- Include screenshots of the interface
- Indicate which element you cannot locate or interact with
- Mention operating system and screen resolution
Our team will help you design the best visual automation strategy for your specific case.
Related Resources
Section titled “Related Resources”- Process Recorder - Record visual actions automatically
- OCR & Document Processing - Advanced OCR for documents
- Process Builder - How to create visual automations
- Secrets Management - Protect legacy system credentials