Debugging selenium • selenium

Web automation is a complex and fragile task, with many ways to go wrong. This article describes some common errors and pitfalls, and how you might go about resolving them.

Starting the client

If initializing a SeleniumSession fails, it is often useful to look at the logs from the server. If you ran the java -jar ... command manually, then you should be able to see the logs, but if you used selenium_server(), then the logs are unavailable by default. However, you can use the stdout and stderr arguments to enable log collection, and then use read_output() and read_error() to read the logs.

server <- selenium_server(stdout = "|", stderr = "|")
server$read_output()
server$read_error()

This will show any output/errors that the server has written to the console.

Timeout

Sometimes, the server can just take a very long time to start up. If you get an error from wait_for_server() or wait_for_selenium_available(), it can be worth increasing the max_time argument to something higher than 60, and seeing if that fixes the issue.

Wrong Java version

If starting the server results in an error, and wait_for_server(), server$read_error() or the logs show an error similar to:

... java.lang.UnsupportedClassVersionError: {file} has been compiled by a more recent version of the Java Runtime ...

This probably means that you need to update your Java version. Selenium’s minimum version is now Java 11 (see https://www.selenium.dev/blog/2023/java-8-support/). You can find later versions of Java here

Port and IP address

One reason why you may be unable to connect to the server is that the port and IP address you are connecting to is wrong.

If you are using selenium_server(), server$host and server$port give you the host IP address and port, respectively.

You can also get the IP address and port from the server logs. You should see a line like: INFO [Standalone.execute] - Started Selenium Standalone ... (revision ...): http://<IP>:<PORT>

The URL at the end of this message can be used to extract an IP address and a port number, which can then be passed into the host and port arguments. For example, if the URL was: http://172.17.0.1/4444, you would run:

session <- SeleniumSession$new(host = "172.17.0.1", port = 4444)

Using a different debugging port for Chrome

If you are using Chrome, and you see a browser open, but the call to SeleniumSession$new() times out, you may need to use a different debugging port. For example:

session <- SeleniumSession$new(
  browser = "chrome",
  capabilities = list(
    `goog:chromeOptions` = list(
      args = list("remote-debugging-port=9222")
    )
  )
)

Increasing /dev/shm/ size when using docker

If you are running selenium using docker, you may need to increase the size of /dev/shm/ to avoid running out of memory. This issue usually happens when using Chrome, and usually results in a message like session deleted because of page crash.

You can use the --shm-size to the selenium docker images to fix this issue. For example: docker run --shm-size="2g" selenium/standalone-chrome:<version>

Other common errors

Stale element reference errors

At some point, when using selenium, you will encounter the following error:

#> Error in `element$click()`:
#> ! Stale element reference.
#> ✖ The element with the reference <...> is not known in the current browsing context
#> Caused by error in `httr2::req_perform()`:
#> ! HTTP 404 Not Found.
#> Run `rlang::last_trace()` to see where the error occurred.

This error is common when automating a website. Selenium is telling you that an element which you previously identified no longer exists. In all websites, especially complex ones, the DOM will be constantly updating itself, constantly invalidating references to elements. This error is a particularly annoying one, as it can happen at any time and is impossible to predict.

One way to deal with this error is to use elements as soon as they are created, only keeping references to elements if you are sure that they will not be invalidated. For example, if you want to click the same element twice, with a second-long gap in between, you may want to consider fetching the element once for each time, rather than sharing the reference between the actions.

However, this solution is not infallible. If you find yourself encountering this error a lot, it may be a sign that a more high-level package, that can deal with this issue (e.g. selenider), is needed.