This vignette introduces you to the basics of automating a web browser using selenider.
Starting the session
To use selenider, you must first start a session with
selenider_session()
. If you don’t do this, it is done
automatically for you, but you may want to change some of the options
from their defaults (the backend, for example). Here, we use chromote as
a backend (the default), and we set the timeout to 10 seconds (the
default is 4). Finally, we’ll use [chromote_options()] to set options
that are specific to chromote (here we want to disable headless mode,
which will allow us to see the browser).
session <- selenider_session(
"chromote",
timeout = 10,
options = chromote_options(headless = FALSE)
)
The session, once created, will be set as the local session inside the current environment, meaning that in this case, it can be accessed anywhere in this script, and will be closed automatically when the script finishes running.
One thing to remember is that if you start a session inside a
function, it will be closed automatically when the function finishes
running. If you want to use the session outside the function, you need
to use the .env
argument. For example, let’s say we want a
wrapper function around selenider_session()
that always
uses selenium:
# Bad (unless you only need to use the session inside the function)
my_selenider_session <- function(...) {
selenider_session("selenium", ...)
# The session will be closed here
}
# Good - the session will be open in the caller environment/function
my_selenider_session <- function(..., .env = rlang::caller_env()) {
selenider_session("selenium", ..., .env = .env)
}
Use open_url()
to navigate to a website. selenider also
provides the back()
and forward()
functions to
easily navigate through your search history, and the
reload()
function to reload the current page.
Selecting elements
Use s()
to select an element. By default, CSS selectors
are used, but other options are available.
header <- s("#rStudioHeader")
header
#> { selenider_element }
#> <div id="rStudioHeader">
#> \n <div class="band">\n <div class="innards bandContent ...
#> </div>
For example, an XPath can be used instead. XPaths can be useful for more complex selectors, and are not limited to selecting from the ancestors of the current element. However, they can be difficult to read.
s(xpath = "//div/a")
#> { selenider_element }
#> <a class="productName" href="/">
#> Tidyverse
#> </a>
Use ss()
to select multiple elements.
all_links <- ss("a")
all_links
#> { selenider_elements (25) }
#> [1] <a class="productName" href="/">Tidyverse</a>
#> [2] <a class="menuItem " href="/packages/">Packages</a>
#> [3] <a class="menuItem " href="/blog/">Blog</a>
#> [4] <a class="menuItem " href="/learn/">Learn</a>
#> [5] <a class="menuItem " href="/help/">Help</a>
#> [6] <a class="menuItem " href="/contribute/">Contribute</a>
#> [7] <a href="https://dplyr.tidyverse.org"><img src="/css/images/hex/dplyr.png" al ...
#> [8] <a href="https://ggplot2.tidyverse.org"><img src="/css/images/hex/ggplot2.png ...
#> [9] <a href="https://readr.tidyverse.org"><img src="/css/images/hex/readr.png" al ...
#> [10] <a href="https://forcats.tidyverse.org"><img src="/css/images/hex/forcats.png ...
#> [11] <a href="https://stringr.tidyverse.org"><img src="/css/images/hex/stringr.png ...
#> [12] <a href="https://tibble.tidyverse.org"><img src="/css/images/hex/tibble.png" ...
#> [13] <a href="https://tidyr.tidyverse.org"><img src="/css/images/hex/tidyr.png" al ...
#> [14] <a href="https://purrr.tidyverse.org"><img src="/css/images/hex/purrr.png" al ...
#> [15] <a href="/packages">collection of R packages</a>
#> [16] <a href="https://r4ds.hadley.nz/" target="_blank" rel="noopener">online</a>
#> [17] <a href="http://amzn.to/2aHLAQ1" target="_blank" rel="noopener">the book</a>
#> [18] <a href="/learn/">resource</a>
#> [19] <a href="http://amzn.to/2aHLAQ1"><img class="bookCover" src="/images/cover.pn ...
#> [20] <a href="/help/#reprex">reprex</a>
#> ...
Use find_element()
and find_elements()
to
find child elements of an existing element. These can be chained with
the pipe operator (|>
) to specify paths to elements.
Just like s()
and ss()
, a variety of selector
types are available, but CSS selectors are used by default.
tidyverse_title <- s("#rStudioHeader") |>
find_element("div") |>
find_element(".productName")
tidyverse_title
#> { selenider_element }
#> <a class="productName" href="/">
#> Tidyverse
#> </a>
menu_items <- s("#rStudioHeader") |>
find_element("#menu") |>
find_elements(".menuItem")
menu_items
#> { selenider_elements (5) }
#> [1] <a class="menuItem " href="/packages/">Packages</a>
#> [2] <a class="menuItem " href="/blog/">Blog</a>
#> [3] <a class="menuItem " href="/learn/">Learn</a>
#> [4] <a class="menuItem " href="/help/">Help</a>
#> [5] <a class="menuItem " href="/contribute/">Contribute</a>
Use elem_children()
and friends to find elements using
their relative position to another.
s("#menuItems") |>
elem_children()
#> { selenider_elements (5) }
#> [1] <a class="menuItem " href="/packages/">Packages</a>
#> [2] <a class="menuItem " href="/blog/">Blog</a>
#> [3] <a class="menuItem " href="/learn/">Learn</a>
#> [4] <a class="menuItem " href="/help/">Help</a>
#> [5] <a class="menuItem " href="/contribute/">Contribute</a>
s("#menuItems") |>
elem_ancestors()
#> { selenider_elements (8) }
#> [1] <html lang="en-us"><head>\n\n\n\n\n\n<link rel="stylesheet" href="/css/fonts. ...
#> [2] <body>\n <div id="appTidyverseSite">\n <div id="main">\n \n ...
#> [3] <div id="appTidyverseSite">\n <div id="main">\n \n <div id ...
#> [4] <div id="main">\n \n <div id="rStudioHeader">\n <div c ...
#> [5] <div id="rStudioHeader">\n <div class="band">\n <div clas ...
#> [6] <div class="band">\n <div class="innards bandContent">\n ...
#> [7] <div class="innards bandContent">\n <div>\n <a cl ...
#> [8] <div id="menu">\n <div id="menuToggler"></div>\n <div id="menuItems" class= ...
You can use elem_filter()
and elem_find()
to filter collections of elements using a custom function.
elem_find()
returns the first matching element, while
elem_filter()
returns all matching elements. These
functions use the same interface as elem_expect()
: see the
“Expectations” section below.
# Find the blog item in the menu
menu_items |>
elem_find(has_text("Blog"))
#> { selenider_element }
#> <a class="menuItem " href="/blog/">
#> Blog
#> </a>
# Find the hex badges on the second row
s(".hexBadges") |>
find_elements("img") |>
elem_filter(
\(x) substring(elem_attr(x, "class"), 1, 2) == "r2"
)
#> { selenider_elements (3) }
#> [1] <img src="/css/images/hex/ggplot2.png" alt="ggplot2 hex sticker" class="r2 c0">
#> [2] <img src="/css/images/hex/forcats.png" alt="forcats hex sticker" class="r2 c1">
#> [3] <img src="/css/images/hex/tibble.png" alt="tibble hex sticker" class="r2 c2">
Interacting with an element
selenider elements are lazy, meaning that when you specify the path to an element or group of elements, they are not actually located in the DOM until you do something with them.
There are three types of functions that force an element to be collected:
- actions (e.g.
elem_click()
) - properties (e.g.
elem_text()
) - conditions (e.g.
is_visible()
)
Most functions that act on elements use the elem_
prefix.
Actions
There are various ways to interact with a HTML element.
Use elem_click()
, elem_right_click()
, or
elem_double_click()
to click on an element, and
elem_hover()
to hover over an element. Use
elem_scroll_to()
to scroll to an element before clicking
it, which is useful if the element is not currently in view.
s(".blurb") |>
find_element("a") |> # List of packages
elem_scroll_to() |>
elem_click()
Some links will not work when clicked on, since they will open their
content in a new tab. Use open_url()
manually to solve
this. This approach is recommended over using elem_click()
,
as it is more reliable.
s(".packages") |>
find_elements("a") |>
elem_find(has_text("dplyr")) |> # Find the link to the dplyr documentation
elem_attr("href") |> # Get the URL
open_url()
Use elem_set_value()
to set the value of an input
element, and elem_clear_value()
to clear the value.
s("input[type='search']") |>
elem_set_value("filter")
# Go back to the main page
back()
back()
selenider also provides a elem_submit()
function,
allowing you to submit a HTML form using any element inside the
form.
Properties
HTML elements have a number of accessible properties.
# Get the tag name
s("#appTidyverseSite") |>
elem_name()
#> [1] "div"
# Get the text inside the element
s(".tagline") |>
elem_text()
#> [1] "\n R packages for data science\n "
# Get an attribute
s(".hexBadges") |>
find_element("img") |>
elem_attr("alt")
#> [1] "dplyr hex sticker"
# Get every attribute
s(".hexBadges") |>
find_element("img") |>
elem_attrs()
#> $src
#> [1] "/css/images/hex/dplyr.png"
#>
#> $alt
#> [1] "dplyr hex sticker"
#>
#> $class
#> [1] "r1 c0"
# Get the 'value' attribute (`NULL` in this case)
s("#homeContent") |>
elem_value()
#> NULL
# Get a CSS property
s(".tagline") |>
elem_css_property("font-size")
#> [1] "36px"
Conditions
Conditions are predicate functions on HTML elements. Unlike all other
functions in selenider, they do not wait for the element to exist or for
the condition to be met: they return TRUE
or
FALSE
(or throw an error) instantly. For this reason, they
are designed to be used with elem_expect()
and
elem_wait_until()
, which will automatically wait for
conditions to be met.
There are a wide range of conditions, many of which do the same
thing. Each HTML property has a corresponding condition, and selenider
also provides conditions for basic checks like
is_present()
, is_visible()
and
is_enabled()
. In the documentation for any condition, you
can find all other conditions in the “See Also” section.
s(".hexBadges") |>
is_present()
#> [1] TRUE
Expectations
selenider provides a concise testing interface using the
elem_expect()
function. Provide an element, and one or more
conditions, and the function will wait until all the conditions are met.
Conditions can be functions or simple calls
(e.g. has_text("text")
will be turned into
has_text(<THE ELEMENT>, "text")
).
elem_expect()
tends to work well with R’s lambda function
syntax.
s(".tagline") |>
elem_expect(is_present) |>
elem_expect(has_text("data science"))
s(".hexBadges") |>
find_element("a") |>
elem_expect(is_visible, is_enabled)
s("#menu") |>
find_element("#menuItems") |>
elem_children() |>
elem_expect(has_at_least(4))
s(".productName") |>
elem_expect(
\(x) substring(elem_text(x), 1, 1) == "T" # Tidyverse starts with T
)
Errors try to give as much information as possible. Since we know this condition is going to fail, we’ll set the timeout to a lower value so we don’t have to wait for too long.
s(".band.first") |>
find_element(".blurb") |>
find_element("code") |>
elem_expect(has_text('install.packages("selenider")'), timeout = 1)
#> Error in `elem_expect()`:
#> ! Condition failed after waiting for 1 seconds:
#> `has_text("install.packages(\"selenider\")")`
#> ℹ `x` does not have text "install.packages(\"selenider\")".
#> ℹ Actual text: "install.packages(\"tidyverse\")".
And (&&
), or (||
) and not
(!
) can be used as if the conditions were logical values.
Additionally, you can omit the first argument to
elem_expect()
(but in this case, all conditions must be
calls).
s(".random-class") |>
elem_expect(!is_present)
s(".innards") |>
elem_expect(is_visible || is_enabled)
elem_1 <- s(".random-class")
elem_2 <- s("#main")
# Test that either the first or second element exists
elem_expect(is_present(elem_1) || is_present(elem_2))
Use elem_wait_until()
if you don’t want an error to be
thrown if a condition is not met. elem_wait_until()
will do
the exact same thing as elem_expect()
but always returns
TRUE
or FALSE
.
elem_wait_until(is_present(elem_1) || is_present(elem_2))
#> [1] TRUE
The syntax used for elem_expect()
and
elem_wait_until()
can also be used in
elem_filter()
and elem_find()
to filter
element collections. Additionally, selenider provides
elem_expect_all()
and elem_wait_until_all()
to
test a condition on every element in a collection.
s(".hexBadges") |>
find_elements("a") |>
elem_expect_all(is_visible)
Once we are done, we do not need to close the session; it is closed for us automatically!