Skip to content

Computer Vision Automation

Computer Vision Automation in Heptora enables interaction with any visible application on screen through visual element recognition, keyboard and mouse simulation, and text reading. It’s the perfect solution for automating legacy applications, systems without APIs, mainframes, and proprietary software.

Heptora uses advanced computer vision technology to “see” the screen as a human would, identify visual elements, and execute precise actions on them. This eliminates dependency on APIs or programmatic interfaces.

  • 🎯 Universal Compatibility: Works with any visible application, no APIs needed
  • 🖼️ Intelligent Recognition: Identifies buttons, fields, icons, and interface elements
  • 📝 Integrated OCR: Reads and extracts text directly from screen
  • 🎮 Complete Control: Simulates clicks, keystrokes, dragging, and any user action
  • 🔍 Pixel-Perfect Precision: Exact localization of on-screen elements
  • 🏢 Ideal for Legacy: Automates old systems without modernization
  • 🚀 No Modifications: Requires no changes to target applications
  • 🔄 Resilient: Adapts to minor interface changes

The computer vision system recognizes multiple types of on-screen elements:

Buttons and Controls:

  • Text buttons
  • Icon buttons
  • Radio buttons
  • Checkboxes
  • Selectors and dropdowns
  • Sliders and slide controls

Input Fields:

  • Text fields
  • Multi-line text areas
  • Password fields
  • Numeric fields
  • Date pickers
  • Search fields

Navigation Elements:

  • Menus and submenus
  • Tabs
  • Toolbars
  • Breadcrumbs
  • Links and hyperlinks
  • Navigation icons

Image Recognition:

# Find a specific button by its image
save_button = vision.find_image(
template="images/save_button.png",
confidence=0.9
)
if save_button:
vision.click(save_button.center)
print(f"Button found at: {save_button.coordinates}")
else:
print("Button not found on screen")

Recognition features:

  • Tolerance to color variations
  • Detection at different scales
  • Resistance to lighting changes
  • Configurable partial matching
  • Search in specific regions

Extract text directly from any screen area:

# Read text from a specific area
region = {"x": 100, "y": 200, "width": 300, "height": 50}
text = vision.read_text(region=region)
print(f"Text found: {text}")
# Search for a specific word or phrase
result = vision.find_text(
text="Total amount",
exact_match=False
)
if result:
# Read the numeric value to the right
value_region = {
"x": result.x + result.width + 10,
"y": result.y,
"width": 100,
"height": result.height
}
value = vision.read_text(region=value_region)
print(f"Value found: {value}")

Tables and Grids:

# Extract data from a visual table
table = vision.extract_table(
region={"x": 50, "y": 100, "width": 800, "height": 400},
has_header=True
)
for row in table.rows:
print(f"ID: {row[0]}, Name: {row[1]}, Amount: {row[2]}")

Forms:

# Extract all fields from a form
form = vision.extract_form_fields(
region="full_screen"
)
for field, value in form.items():
print(f"{field}: {value}")

Monitor specific areas to detect updates:

# Wait for an element to appear
vision.wait_for_image(
template="images/success_message.png",
timeout=30
)
# Wait for an element to disappear (e.g., loading spinner)
vision.wait_until_gone(
template="images/spinner.png",
timeout=60
)
# Wait for a screen area to change
vision.wait_for_change(
region={"x": 500, "y": 200, "width": 300, "height": 100},
timeout=10
)

Verify application state through visual inspection:

# Verify we're on the correct screen
if vision.image_exists("images/app_logo.png"):
print("Start screen confirmed")
else:
raise Exception("Not on expected screen")
# Validate that a process completed
if vision.text_exists("Process completed successfully"):
log.info("Operation finished correctly")
else:
log.warning("Confirmation message not found")
# Check color of an element (e.g., status indicator)
indicator_color = vision.get_pixel_color(x=850, y=120)
if indicator_color == (0, 255, 0): # Green
print("Status: Active")
elif indicator_color == (255, 0, 0): # Red
print("Status: Error")
# Simple click at specific coordinates
vision.mouse.click(x=500, y=300)
# Right click
vision.mouse.right_click(x=500, y=300)
# Double click
vision.mouse.double_click(x=500, y=300)
# Move cursor without clicking
vision.mouse.move_to(x=500, y=300)
# Click at the center of a found image
button = vision.find_image("accept_button.png")
vision.mouse.click(button.center)
# Drag from one point to another
vision.mouse.drag_to(
from_x=200,
from_y=300,
to_x=400,
to_y=300
)
# Drag a found element
element = vision.find_image("file.png")
destination = vision.find_image("folder.png")
vision.mouse.drag_to(
from_x=element.center.x,
from_y=element.center.y,
to_x=destination.center.x,
to_y=destination.center.y
)
# Vertical scroll (positive down, negative up)
vision.mouse.scroll(clicks=-5) # Scroll up
# Horizontal scroll
vision.mouse.horizontal_scroll(clicks=3) # Scroll right
# Scroll in a specific region
vision.mouse.move_to(x=600, y=400)
vision.mouse.scroll(clicks=10) # Scroll in that area
# Type text
vision.keyboard.type("Hello World")
# Special keys
vision.keyboard.press("Enter")
vision.keyboard.press("Tab")
vision.keyboard.press("Escape")
# Key combinations
vision.keyboard.hotkey("Ctrl", "C") # Copy
vision.keyboard.hotkey("Ctrl", "V") # Paste
vision.keyboard.hotkey("Ctrl", "S") # Save
vision.keyboard.hotkey("Alt", "F4") # Close window
# Hold key down
vision.keyboard.key_down("Shift")
vision.keyboard.press("A")
vision.keyboard.press("B")
vision.keyboard.press("C")
vision.keyboard.key_up("Shift") # Types "ABC"
# Type with pause between characters (more natural)
vision.keyboard.type("user@example.com", interval=0.1)
# Type with interspersed special keys
vision.keyboard.type("Name: ")
vision.keyboard.type("John Doe")
vision.keyboard.press("Tab")
vision.keyboard.type("Email: ")
vision.keyboard.type("john@example.com")
# Clear field and type
vision.keyboard.hotkey("Ctrl", "A") # Select all
vision.keyboard.press("Delete") # Delete
vision.keyboard.type("New text") # Type
# Windows
vision.keyboard.hotkey("Win", "R") # Run
vision.keyboard.type("notepad")
vision.keyboard.press("Enter")
# Window operations
vision.keyboard.hotkey("Alt", "Tab") # Switch window
vision.keyboard.hotkey("Win", "D") # Show desktop
vision.keyboard.hotkey("Ctrl", "Shift", "Escape") # Task manager
# Text operations
vision.keyboard.hotkey("Ctrl", "Z") # Undo
vision.keyboard.hotkey("Ctrl", "Y") # Redo
vision.keyboard.hotkey("Ctrl", "F") # Find

Create complex sequences combining mouse and keyboard:

# Copy text visible on screen
def copy_text_on_screen(region):
# Triple click to select entire paragraph
center_x = region["x"] + region["width"] // 2
center_y = region["y"] + region["height"] // 2
vision.mouse.move_to(center_x, center_y)
vision.mouse.triple_click(center_x, center_y)
# Copy to clipboard
vision.keyboard.hotkey("Ctrl", "C")
time.sleep(0.5)
# Read from clipboard
return clipboard.paste()
# Fill complete form
def fill_form(data):
# Click on first field
first_field = vision.find_text("Name:")
vision.mouse.click(first_field.x + 100, first_field.y)
# Fill fields with Tab between them
vision.keyboard.type(data["name"])
vision.keyboard.press("Tab")
vision.keyboard.type(data["lastname"])
vision.keyboard.press("Tab")
vision.keyboard.type(data["email"])
vision.keyboard.press("Tab")
vision.keyboard.type(data["phone"])
# Submit form
vision.keyboard.press("Enter")

Computer vision is ideal for automating terminal applications that lack modern APIs:

Scenario: Order entry in mainframe system

# Wait for login screen to load
vision.wait_for_text("MAIN SYSTEM", timeout=10)
# Enter credentials
vision.keyboard.type(username)
vision.keyboard.press("Tab")
vision.keyboard.type(password)
vision.keyboard.press("Enter")
# Wait for main menu
vision.wait_for_text("MAIN MENU", timeout=5)
# Navigate to orders module (option 2)
vision.keyboard.type("2")
vision.keyboard.press("Enter")
# Wait for order entry screen
vision.wait_for_text("ORDER ENTRY", timeout=5)
# Fill order form
vision.keyboard.type(customer_code)
vision.keyboard.press("Tab")
vision.keyboard.type(order_number)
vision.keyboard.press("Tab")
# Enter order lines
for item in order_items:
vision.keyboard.type(item["code"])
vision.keyboard.press("Tab")
vision.keyboard.type(str(item["quantity"]))
vision.keyboard.press("Tab")
vision.keyboard.press("Enter") # Confirm line
# Confirm order (F5)
vision.keyboard.press("F5")
# Verify confirmation message
if vision.text_exists("ORDER REGISTERED"):
# Extract confirmed order number
number_region = vision.find_text("ORDER NO:")
confirmed_number = vision.read_text(
region={
"x": number_region.x + 150,
"y": number_region.y,
"width": 100,
"height": 20
}
)
log.info(f"Order confirmed: {confirmed_number}")
else:
raise Exception("Order confirmation not received")
# Return to main menu
vision.keyboard.press("F3")
# Launch DOS application
os.system("start dosbox legacy_app.exe")
# Wait for application to load
vision.wait_for_text("INVENTORY SYSTEM V3.2", timeout=15)
# Navigate menus (number + Enter)
vision.keyboard.type("1") # Queries
vision.keyboard.press("Enter")
vision.keyboard.type("3") # Query by code
vision.keyboard.press("Enter")
# Enter product code
vision.keyboard.type("PROD-12345")
vision.keyboard.press("Enter")
# Extract information from screen
name = vision.read_text(region={"x": 150, "y": 100, "width": 300, "height": 20})
stock = vision.read_text(region={"x": 150, "y": 140, "width": 100, "height": 20})
price = vision.read_text(region={"x": 150, "y": 180, "width": 100, "height": 20})
print(f"Product: {name}")
print(f"Stock: {stock}")
print(f"Price: {price}")
# Exit (multiple Escape)
for _ in range(3):
vision.keyboard.press("Escape")
time.sleep(0.5)

Automate applications running in remote sessions:

# Connect to Citrix session
def connect_citrix(application):
# Open Citrix Workspace
vision.keyboard.hotkey("Win", "R")
vision.keyboard.type("citrix workspace")
vision.keyboard.press("Enter")
# Wait for loading
vision.wait_for_image("citrix_workspace_logo.png", timeout=10)
# Search for application
search_field = vision.find_image("search_icon.png")
vision.mouse.click(search_field.center)
vision.keyboard.type(application)
time.sleep(1)
# Click on first result
first_result = vision.find_image(f"{application}_icon.png")
vision.mouse.click(first_result.center)
# Wait for application launch
vision.wait_for_text("Loading...", timeout=5)
vision.wait_until_gone("Loading...", timeout=30)
# Use remote application
connect_citrix("Financial ERP")
# Now interact normally
vision.wait_for_text("MAIN MENU")
# ... rest of automation
# For remote connections, increase wait times
REMOTE_WAIT_TIME = 2.0
def remote_click(x, y):
vision.mouse.click(x, y)
time.sleep(REMOTE_WAIT_TIME)
def remote_type(text):
vision.keyboard.type(text, interval=0.15) # Slower
time.sleep(REMOTE_WAIT_TIME)
def verify_remote_screen_change(expected_text, timeout=30):
start = time.time()
while time.time() - start < timeout:
if vision.text_exists(expected_text):
time.sleep(REMOTE_WAIT_TIME) # Additional wait
return True
time.sleep(1)
return False

Automate closed or undocumented software:

# Example: Proprietary medical management software
def register_patient(patient_data):
# Ensure application is in foreground
app_window = vision.find_image("app_logo.png")
if not app_window:
# Launch application if not open
vision.keyboard.hotkey("Win", "R")
vision.keyboard.type("C:\\Program\\MediManage\\mediman.exe")
vision.keyboard.press("Enter")
vision.wait_for_image("app_logo.png", timeout=20)
# Navigate to patients module
patients_button = vision.find_image("buttons/patients.png")
vision.mouse.click(patients_button.center)
# Click "New Patient"
vision.wait_for_image("buttons/new_patient.png", timeout=5)
new_button = vision.find_image("buttons/new_patient.png")
vision.mouse.click(new_button.center)
# Wait for form to open
vision.wait_for_text("PATIENT DATA", timeout=3)
# Fill form
fields = [
patient_data["name"],
patient_data["lastname"],
patient_data["id_number"],
patient_data["birth_date"],
patient_data["phone"],
patient_data["email"],
patient_data["address"]
]
for field in fields:
vision.keyboard.type(str(field))
vision.keyboard.press("Tab")
time.sleep(0.3)
# Save
save_button = vision.find_image("buttons/save.png")
vision.mouse.click(save_button.center)
# Verify confirmation
if vision.wait_for_text("PATIENT REGISTERED", timeout=5):
# Extract medical record number
number_region = vision.find_text("Record No:")
medical_record = vision.read_text(
region={
"x": number_region.x + 120,
"y": number_region.y,
"width": 100,
"height": 20
}
)
return medical_record
else:
raise Exception("Could not confirm registration")
# Example: Application with custom drawn controls
def click_custom_button(button_color, search_region):
"""
Find and click custom drawn buttons
based on their distinctive color
"""
# Capture screenshot of region
screenshot = vision.capture_region(search_region)
# Find pixels of button color
locations = vision.find_color(
color=button_color,
tolerance=10,
region=search_region
)
if locations:
# Calculate center of pixel cluster
center = calculate_cluster_center(locations)
vision.mouse.click(center.x, center.y)
return True
return False
# Usage
GREEN_ACCEPT = (0, 200, 50)
RED_CANCEL = (200, 50, 50)
button_region = {"x": 400, "y": 500, "width": 300, "height": 100}
if click_custom_button(GREEN_ACCEPT, button_region):
print("Accept button pressed")

While SAP has scripting, computer vision can be useful in restricted environments:

# Launch SAP transaction
def execute_sap_transaction(transaction_code):
# Focus on command field
command_field = vision.find_image("sap_command_field.png")
vision.mouse.click(command_field.center)
# Clear any previous text
vision.keyboard.hotkey("Ctrl", "A")
vision.keyboard.press("Delete")
# Enter transaction
vision.keyboard.type(f"/n{transaction_code}")
vision.keyboard.press("Enter")
# Wait for loading
time.sleep(2)
# Extract data from SAP table
def extract_sap_table(table_region):
# Click on first cell
vision.mouse.click(
table_region["x"] + 50,
table_region["y"] + 50
)
rows = []
current_row = 0
max_rows = 50
while current_row < max_rows:
# Select entire row
vision.keyboard.hotkey("Shift", "End")
vision.keyboard.hotkey("Ctrl", "C")
time.sleep(0.3)
# Get data from clipboard
row_text = clipboard.paste()
if not row_text or row_text in rows:
break # End of table or repeated row
rows.append(row_text)
# Go to next row
vision.keyboard.press("Down")
vision.keyboard.press("Home")
time.sleep(0.2)
current_row += 1
return rows
# Specify which monitor to search
primary_monitor = vision.get_monitor(0)
secondary_monitor = vision.get_monitor(1)
# Find element on specific monitor
button = vision.find_image(
"button.png",
region=secondary_monitor.bounds
)
# Move cursor between monitors
vision.mouse.move_to(
secondary_monitor.x + 500,
secondary_monitor.y + 300
)
# Capture current state of an area
initial_capture = vision.capture_region(
region={"x": 100, "y": 100, "width": 400, "height": 300}
)
# Perform some action
vision.mouse.click(500, 400)
time.sleep(2)
# Capture new state
final_capture = vision.capture_region(
region={"x": 100, "y": 100, "width": 400, "height": 300}
)
# Compare if changed
if vision.images_are_different(initial_capture, final_capture, threshold=0.95):
print("Screen changed after action")
else:
print("No changes detected")
# Find element with multiple variants
variants = [
"save_button_en.png",
"save_button_es.png",
"save_button_alternative.png"
]
found_button = None
for variant in variants:
button = vision.find_image(variant, confidence=0.85)
if button:
found_button = button
break
if found_button:
vision.mouse.click(found_button.center)
else:
# Try text search as fallback
if vision.text_exists("Save"):
region = vision.find_text("Save")
vision.mouse.click(region.center)
# Combine computer vision with web automation
def hybrid_process():
# Part 1: Extract data from legacy app with vision
legacy_data = extract_mainframe_data()
# Part 2: Process in modern web application
browser = heptora.browser.create()
browser.navigate("https://modern-erp.company.com")
browser.fill_form(legacy_data)
# Part 3: Verify result with computer vision
vision.wait_for_text("Synchronization completed")
return True

Scenario: Extract product catalog from old DOS software

def extract_complete_catalog():
products = []
# Open legacy system
open_inventory_system()
# Go to product query
vision.keyboard.type("1") # Products menu
vision.keyboard.press("Enter")
vision.keyboard.type("2") # List all
vision.keyboard.press("Enter")
# Extract page by page
page = 1
while True:
# Read products from current screen
for line in range(5, 20): # 15 products per page
# Position of each field
y = 80 + (line * 25)
code = vision.read_text(
region={"x": 50, "y": y, "width": 100, "height": 20}
).strip()
if not code:
break # End of products
name = vision.read_text(
region={"x": 160, "y": y, "width": 300, "height": 20}
).strip()
price = vision.read_text(
region={"x": 470, "y": y, "width": 80, "height": 20}
).strip()
stock = vision.read_text(
region={"x": 560, "y": y, "width": 60, "height": 20}
).strip()
products.append({
"code": code,
"name": name,
"price": price,
"stock": stock
})
# Try to go to next page
vision.keyboard.press("PageDown")
time.sleep(1)
# Check if there are more pages (look for indicator)
if vision.text_exists("END OF LIST"):
break
page += 1
if page > 100: # Safety limit
break
return products
# Export to modern format
products = extract_complete_catalog()
df = pandas.DataFrame(products)
df.to_excel("migrated_catalog.xlsx", index=False)
print(f"Extracted {len(products)} products")

Scenario: Laboratory results registration

def process_lab_results(results_file):
# Read results from file
results = read_excel(results_file)
# Open hospital management application
open_hospital_app()
for result in results:
# Search for patient
search_patient(result["id_number"])
# Go to laboratory module
lab_button = vision.find_image("modules/laboratory.png")
vision.mouse.click(lab_button.center)
# New result
vision.keyboard.hotkey("Ctrl", "N")
vision.wait_for_text("NEW RESULT")
# Fill form
vision.keyboard.type(result["analysis_type"])
vision.keyboard.press("Tab")
vision.keyboard.type(result["date"])
vision.keyboard.press("Tab")
# Enter values
for value in result["values"]:
vision.keyboard.type(value["parameter"])
vision.keyboard.press("Tab")
vision.keyboard.type(str(value["result"]))
vision.keyboard.press("Tab")
vision.keyboard.type(value["unit"])
vision.keyboard.press("Enter")
# Save
vision.keyboard.hotkey("Ctrl", "S")
# Verify saved
if vision.wait_for_text("RESULT SAVED", timeout=5):
log.info(f"Result saved for patient {result['id_number']}")
else:
log.error(f"Error saving result for {result['id_number']}")
screenshot = vision.capture_screen()
vision.save_image(screenshot, f"error_{result['id_number']}.png")
# Return to main menu
vision.keyboard.press("Escape")
vision.keyboard.press("Escape")

Scenario: Verify a batch process executed correctly

def verify_batch_process(process_name):
# Open system monitor
open_system_monitor()
# Search for specific process
search_field = vision.find_image("search_icon.png")
vision.mouse.click(search_field.center)
vision.keyboard.type(process_name)
vision.keyboard.press("Enter")
time.sleep(2)
# Verify status
status_region = vision.find_text("Status:")
status_text = vision.read_text(
region={
"x": status_region.x + 80,
"y": status_region.y,
"width": 150,
"height": 25
}
)
# Visually verify indicator color
indicator_color = vision.get_pixel_color(
x=status_region.x - 30,
y=status_region.y + 10
)
if "COMPLETED" in status_text.upper() and indicator_color[1] > 200: # Green
log.info(f"Process {process_name} completed successfully")
# Extract statistics
records = vision.read_text_after_label("Records processed:")
errors = vision.read_text_after_label("Errors:")
duration = vision.read_text_after_label("Duration:")
return {
"status": "COMPLETED",
"records": records,
"errors": errors,
"duration": duration
}
else:
log.error(f"Process {process_name} failed or pending")
return {"status": "ERROR"}
  1. Use stable references

    • Search for elements that don’t change (logos, titles)
    • Avoid dynamic elements like timestamps
  2. Combine methods

    # Robust method: try image, then text
    button = vision.find_image("next_button.png")
    if not button:
    button = vision.find_text("Next")
    if button:
    vision.mouse.click(button.center)
  3. Use search regions

    # Faster and more precise
    button = vision.find_image(
    "button.png",
    region={"x": 700, "y": 400, "width": 200, "height": 100}
    )
def robust_action(max_attempts=3):
for attempt in range(max_attempts):
try:
# Try the action
vision.mouse.click_image("button.png")
# Verify it worked
if vision.wait_for_text("Confirmation", timeout=5):
return True
except ElementNotFoundError:
if attempt < max_attempts - 1:
log.warning(f"Attempt {attempt + 1} failed, retrying...")
time.sleep(2)
else:
# Capture error evidence
screenshot = vision.capture_screen()
vision.save_image(screenshot, "final_error.png")
raise
return False
  1. Cache template images

    # Load templates at start
    TEMPLATES = {
    "save": vision.load_template("buttons/save.png"),
    "cancel": vision.load_template("buttons/cancel.png"),
    "accept": vision.load_template("buttons/accept.png")
    }
    # Use cached templates
    button = vision.find_template(TEMPLATES["save"])
  2. Reduce search area

    # Define common regions
    REGIONS = {
    "bottom_buttons": {"x": 0, "y": 700, "width": 1920, "height": 280},
    "top_menu": {"x": 0, "y": 0, "width": 1920, "height": 100},
    "right_panel": {"x": 1400, "y": 100, "width": 520, "height": 900}
    }
  3. Adjust confidence levels

    # For stable elements, use high confidence
    logo = vision.find_image("logo.png", confidence=0.95)
    # For variable elements, reduce confidence
    button = vision.find_image("dynamic_button.png", confidence=0.75)
def execute_with_evidence(process_name):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
evidence_folder = f"evidence/{process_name}_{timestamp}"
os.makedirs(evidence_folder, exist_ok=True)
step = 0
def capture_step(description):
nonlocal step
step += 1
screenshot = vision.capture_screen()
path = f"{evidence_folder}/step_{step:02d}_{description}.png"
vision.save_image(screenshot, path)
log.info(f"Captured: {description}")
try:
capture_step("start")
# Execute process
open_application()
capture_step("application_opened")
perform_operation()
capture_step("operation_completed")
capture_step("successful_end")
except Exception as e:
capture_step("error")
log.error(f"Error: {str(e)}")
raise

Symptoms: vision.find_image() or vision.find_text() returns None

Solutions:

  1. Verify element is visible on screen
  2. Reduce confidence level: confidence=0.8 instead of 0.9
  3. Capture new template under current conditions
  4. Use more specific search region
  5. Try searching by text instead of image

Symptoms: Click executes but has no effect

Solutions:

  1. Add time.sleep(0.5) after click
  2. Verify application has focus
  3. Use double_click() if necessary
  4. Verify click coordinates with screenshot

Symptoms: Text read incorrectly

Solutions:

  1. Increase contrast of captured region
  2. Specify OCR language: vision.read_text(lang='eng')
  3. Preprocess image (grayscale, thresholding)
  4. Use region more tightly fitted to specific text

Symptoms: Searches take too long

Solutions:

  1. Use limited search regions
  2. Reduce template resolution
  3. Cache loaded templates
  4. Avoid full screen searches

Does computer vision work at any resolution?

Section titled “Does computer vision work at any resolution?”

Yes, but it’s recommended to capture templates at the same resolution where the process will run. For multi-resolution, use text search instead of image when possible.

Can I automate applications in headless mode?

Section titled “Can I automate applications in headless mode?”

No, computer vision requires the application to be visible on screen. For background automation, consider using the robot in a virtual machine with visible desktop.

Does it work with applications in multiple languages?

Section titled “Does it work with applications in multiple languages?”

Yes, but you must capture templates for each language or use text recognition with the appropriate language specified.

You can adjust the confidence level to tolerate minor variations. For major changes, you’ll need to update image templates.

Can I combine computer vision with web automation?

Section titled “Can I combine computer vision with web automation?”

Absolutely. It’s common to use computer vision for legacy systems and web automation for modern systems in the same process.

Implement periodic checks:

if vision.image_exists("error_popup.png"):
vision.mouse.click_image("close_popup_button.png")

If this guide didn’t solve your problem or you found an error in the documentation:

  • Technical support: help@heptora.com
  • Describe the application you’re trying to automate
  • Include screenshots of the interface
  • Indicate which element you cannot locate or interact with
  • Mention operating system and screen resolution

Our team will help you design the best visual automation strategy for your specific case.