Review of prototype 2 • financialDataAnalysis

This is a demonstration of prototype 2 of the ‘financialDataAnalysis’ project.

library(financialDataAnalysis)

Prototype 2 is a functionally complete project, meaning that almost all promised features are included. However, the user interface is very minimal, and involves directly calling R functions without any GUI.

The demonstration of this project also loads the following packages (although they are by no means required to have loaded for the project to work):

library(tibble)
library(vroom)
library(writexl)
library(lubridate)
library(prophet)
library(workflows)

Uploading data

This project allows you to upload your own files using the input_data() function. We’re going to create two dummy files to demonstrate this.

data_1 <- tibble(
  x = 1:10,
  y = 10:1
)
file_1 <- tempfile(fileext = ".csv")
vroom_write(data_1, file_1, ",")

data_2 <- tibble(
  x = 1:100,
  z = letters[1:100]
)
file_2 <- tempfile(fileext = ".xlsx")
write_xlsx(data_2, file_2)

The first argument of input_data() should be a character vector of one or more file paths, to be converted into data frames. If we give the function a bad file or a file containing bad data, it will return the default stock data and print out a message describing the problem.

input_data("aaaa")
#> [1] "Files were not converted correctly."
#> # A tibble: 497 × 100
#>    symbol company…¹ excha…² indus…³ website descr…⁴ ceo   secur…⁵ sector prima…⁶
#>    <chr>  <chr>     <chr>   <chr>   <chr>   <chr>   <chr> <chr>   <chr>    <dbl>
#>  1 MMM    3M Co.    NEW YO… "Offic… www.3m… "3M Co… Mich… 3M Co.  Manag…    3841
#>  2 AOS    A.O. Smi… NEW YO… "Heati… www.ao… "A. O.… Kevi… A.O. S… Manuf…    3630
#>  3 ABT    Abbott L… NEW YO… "Pharm… www.ab… "Abbot… Robe… Abbott… Manuf…    2834
#>  4 ABBV   Abbvie I… NEW YO… "Pharm… www.ab… "AbbVi… Rich… Abbvie… Manuf…    2834
#>  5 ABMD   Abiomed … NASDAQ  "Surgi… www.ab… "Abiom… Mich… Abiome… Manuf…    3841
#>  6 ACN    Accentur… NEW YO… "Data … www.ac… "Accen… NA    Accent… Infor…    7389
#>  7 ATVI   Activisi… NASDAQ  "Softw… www.ac… "Activ… Robe… Activi… Infor…    7372
#>  8 ADM    Archer D… NEW YO… "Flour… www.ad… "ADM u… Juan… Archer… Manuf…    2041
#>  9 ADBE   Adobe Inc NASDAQ  "Softw… www.ad… "Adobe… Shan… Adobe … Infor…    7372
#> 10 ADP    Automati… NASDAQ  "Data … www.ad… "ADP i… Carl… Automa… Infor…    7374
#> # … with 487 more rows, 90 more variables: employees <dbl>, address <chr>,
#> #   state <chr>, city <chr>, ZIP <chr>, country <chr>, phone <dbl>,
#> #   capital_expenditures <dbl>, cash_change <dbl>, cash_flow <dbl>,
#> #   cash_flow_financing <dbl>, changes_in_inventories <dbl>,
#> #   changes_in_receivables <dbl>, currency <chr>, depreciation <dbl>,
#> #   filing_type <chr>, fiscal_date <dbl>, net_borrowings <dbl>,
#> #   net_income <dbl>, report_date <dbl>, total_investing_cash_flows <dbl>, …

It is currently able to convert CSV and Excel files.

input_data(file_1)
#> # A tibble: 10 × 2
#>        x     y
#>    <dbl> <dbl>
#>  1     1    10
#>  2     2     9
#>  3     3     8
#>  4     4     7
#>  5     5     6
#>  6     6     5
#>  7     7     4
#>  8     8     3
#>  9     9     2
#> 10    10     1
input_data(file_2)
#> # A tibble: 100 × 2
#>        x z    
#>    <dbl> <chr>
#>  1     1 a    
#>  2     2 b    
#>  3     3 c    
#>  4     4 d    
#>  5     5 e    
#>  6     6 f    
#>  7     7 g    
#>  8     8 h    
#>  9     9 i    
#> 10    10 j    
#> # … with 90 more rows

If more than one files are given, the data frames are combined together. This is done in a way such that all data is preserved and the function never errors.

input_data(c(file_1, file_2))
#> # A tibble: 100 × 3
#>        x     y z    
#>    <dbl> <dbl> <chr>
#>  1     1    10 a    
#>  2     2     9 b    
#>  3     3     8 c    
#>  4     4     7 d    
#>  5     5     6 e    
#>  6     6     5 f    
#>  7     7     4 g    
#>  8     8     3 h    
#>  9     9     2 i    
#> 10    10     1 j    
#> # … with 90 more rows

The function also has a combine argument, which allows you to combine your data with the default stock data. This is useful if you want to add more rows or columns to the data.

input_data(file_1, combine = TRUE)
#> # A tibble: 507 × 102
#>        x     y symbol company_name excha…¹ indus…² website descr…³ ceo   secur…⁴
#>    <dbl> <dbl> <chr>  <chr>        <chr>   <chr>   <chr>   <chr>   <chr> <chr>  
#>  1     1    10 NA     NA           NA      NA      NA      NA      NA    NA     
#>  2     2     9 NA     NA           NA      NA      NA      NA      NA    NA     
#>  3     3     8 NA     NA           NA      NA      NA      NA      NA    NA     
#>  4     4     7 NA     NA           NA      NA      NA      NA      NA    NA     
#>  5     5     6 NA     NA           NA      NA      NA      NA      NA    NA     
#>  6     6     5 NA     NA           NA      NA      NA      NA      NA    NA     
#>  7     7     4 NA     NA           NA      NA      NA      NA      NA    NA     
#>  8     8     3 NA     NA           NA      NA      NA      NA      NA    NA     
#>  9     9     2 NA     NA           NA      NA      NA      NA      NA    NA     
#> 10    10     1 NA     NA           NA      NA      NA      NA      NA    NA     
#> # … with 497 more rows, 92 more variables: sector <chr>,
#> #   primary_sic_code <dbl>, employees <dbl>, address <chr>, state <chr>,
#> #   city <chr>, ZIP <chr>, country <chr>, phone <dbl>,
#> #   capital_expenditures <dbl>, cash_change <dbl>, cash_flow <dbl>,
#> #   cash_flow_financing <dbl>, changes_in_inventories <dbl>,
#> #   changes_in_receivables <dbl>, currency <chr>, depreciation <dbl>,
#> #   filing_type <chr>, fiscal_date <dbl>, net_borrowings <dbl>, …
input_data(c(file_1, file_2), combine = TRUE)
#> # A tibble: 597 × 103
#>        x     y z     symbol company_name exchange industry website descr…¹ ceo  
#>    <dbl> <dbl> <chr> <chr>  <chr>        <chr>    <chr>    <chr>   <chr>   <chr>
#>  1     1    10 a     NA     NA           NA       NA       NA      NA      NA   
#>  2     2     9 b     NA     NA           NA       NA       NA      NA      NA   
#>  3     3     8 c     NA     NA           NA       NA       NA      NA      NA   
#>  4     4     7 d     NA     NA           NA       NA       NA      NA      NA   
#>  5     5     6 e     NA     NA           NA       NA       NA      NA      NA   
#>  6     6     5 f     NA     NA           NA       NA       NA      NA      NA   
#>  7     7     4 g     NA     NA           NA       NA       NA      NA      NA   
#>  8     8     3 h     NA     NA           NA       NA       NA      NA      NA   
#>  9     9     2 i     NA     NA           NA       NA       NA      NA      NA   
#> 10    10     1 j     NA     NA           NA       NA       NA      NA      NA   
#> # … with 587 more rows, 93 more variables: security_name <chr>, sector <chr>,
#> #   primary_sic_code <dbl>, employees <dbl>, address <chr>, state <chr>,
#> #   city <chr>, ZIP <chr>, country <chr>, phone <dbl>,
#> #   capital_expenditures <dbl>, cash_change <dbl>, cash_flow <dbl>,
#> #   cash_flow_financing <dbl>, changes_in_inventories <dbl>,
#> #   changes_in_receivables <dbl>, currency <chr>, depreciation <dbl>,
#> #   filing_type <chr>, fiscal_date <dbl>, net_borrowings <dbl>, …

Scoring data

Once you have imported your data into R, you now need to score it. You do this by creating score specifications, which are definitions of how a score should be created.

These score specifications are stored in a table, where each row represents a single score.

The scores_init object is a table storing zero scores.

scores_init
#> # A tibble: 0 × 12
#> # … with 12 variables: score_type <chr>, colname <chr>, score_name <chr>,
#> #   weight <dbl>, lb <dbl>, ub <dbl>, centre <dbl>, inverse <lgl>,
#> #   exponential <lgl>, logarithmic <lgl>, magnitude <dbl>, custom_args <list>

This table has a number of fields that control how each score will be created:

Each score will always be between 0 and 1.

The score_type argument is arguably the most important, as it defines the method used to create a score. Each score type has a number of specific arguments.

Universal arguments

These arguments have to be defined for every score.

The colname argument defines the column that the score will be creating using.

The score_name argument defines the name of the score, which will be used as a column name when the score is added back to the original data frame. If score_name is “Default”, an informative and sensible default name will be used.

The weight argument defines the weight of the score when it is used to calculate the final score.

Exponential arguments

The exponential argument is required for all scores. If it is FALSE, the score is not modified after it has been created. If it is TRUE, an exponential transformation is applied to the score, and you will need to specify two additional arguments. The score will continue to be bounded by 0 and 1.

The logarithmic argument defines whether the transformation is exponential or logarithmic. If it TRUE, the transformation is inverted.

The magnitude argument defines the magnitude of the transformation: a higher number means that the transformation will have a bigger effect.

Linear scores

When the score_name is “Linear”, a linear score is created. To make the score you need to specify the lb and ub arguments.

Calculating the score

If the column value is less than or equal to the lb argument, the score is 0. If the column value is more than or equal to the ub argument, the score is 1.

Otherwise, the score is defined is the proportion of the distance of the column value between the lb and ub.

If the lb argument is more than the ub argument, the score is inverted. This means that the lb produces a score of 1, the ub produces a score of 0, etc.

Peak scores

When the score_name is “Peak”, a peak score is created. To make the score you need to specify the lb, ub, centre and inverse arguments. The lb, ub and centre arguments must be numeric, and the centre must be between the lb and ub.

Calculating the score

If the column value is less than or equal to the lb argument, the score is 0. If the column value is equal to the centre argument, the score is 1. If the column value is more than or equal to the ub argument, the score is 1.

If the column value is in between the lb and centre arguments, the score is defined as the proportion of the column value along between the lb and centre. If the column value is in between the centre and ub arguments, the score is defined as the proportion of the column value along between the ub and centre.

When inverse is TRUE, the score is inverted: the lower bound and upper bound produce a score of 1, and the centre produces a score of 0.

Custom scores

When score_type is “Custom coordinates”, a custom score is created. This allows you to define a set of coordinates, where the x coordinate is a value in the column, and the y coordinate is a score between 0 and 1. The score will then be created by connecting the coordinates together. The coordinates should be in the form of a data frame, with the x coordinates in the ‘x’ column and the y coordinates in the ‘y’ column.

This can be used to create a huge variety of different scores.

To add a score to a table, use the create_score() function. Lets create a linear and a peak score.

scores <- create_score(
  scores_init,
  score_type = "Linear", colname = "x", score_name = "Default",
  weight = 1, lb = 0, ub = 10, exponential = FALSE
)
scores <- create_score(
  scores,
  score_type = "Peak", colname = "y", score_name = "Default",
  weight = 5, lb = 0, ub = 20, centre = 5, inverse = FALSE, exponential = TRUE,
  logarithmic = FALSE, magnitude = 2
)
scores
#> # A tibble: 2 × 12
#>   score_type colname score_n…¹ weight    lb    ub centre inverse expon…² logar…³
#>   <chr>      <chr>   <glue>     <dbl> <dbl> <dbl>  <dbl> <lgl>   <lgl>   <lgl>  
#> 1 Linear     x       Score 1:…      1     0    10     NA NA      FALSE   NA     
#> 2 Peak       y       Score 2:…      5     0    20      5 FALSE   TRUE    FALSE  
#> # … with 2 more variables: magnitude <dbl>, custom_args <list>, and abbreviated
#> #   variable names ¹score_name, ²exponential, ³logarithmic

The create_score() function also allows you to edit a score, using the editing argument. Here, we will edit the linear score to change the weight. Note that the arguments you give must always be valid for the score to be added or edited.

scores <- create_score(
  scores,
  editing = 1, score_type = "Linear", colname = "x",
  score_name = "Default", weight = 2, lb = 0, ub = 10, exponential = FALSE
)
scores
#> # A tibble: 2 × 12
#>   score_type colname score_n…¹ weight    lb    ub centre inverse expon…² logar…³
#>   <chr>      <chr>   <glue>     <dbl> <dbl> <dbl>  <dbl> <lgl>   <lgl>   <lgl>  
#> 1 Linear     x       Score 1:…      2     0    10     NA NA      FALSE   NA     
#> 2 Peak       y       Score 2:…      5     0    20      5 FALSE   TRUE    FALSE  
#> # … with 2 more variables: magnitude <dbl>, custom_args <list>, and abbreviated
#> #   variable names ¹score_name, ²exponential, ³logarithmic

Scores can be deleted with the delete_scores() function. Enter a vector containing multiple numbers to delete multiple scores.

delete_scores(scores, 2)
#> # A tibble: 1 × 12
#>   score_type colname score_n…¹ weight    lb    ub centre inverse expon…² logar…³
#>   <chr>      <chr>   <glue>     <dbl> <dbl> <dbl>  <dbl> <lgl>   <lgl>   <lgl>  
#> 1 Linear     x       Score 1:…      2     0    10     NA NA      FALSE   NA     
#> # … with 2 more variables: magnitude <dbl>, custom_args <list>, and abbreviated
#> #   variable names ¹score_name, ²exponential, ³logarithmic
delete_scores(scores, c(1, 2))
#> # A tibble: 0 × 12
#> # … with 12 variables: score_type <chr>, colname <chr>, score_name <glue>,
#> #   weight <dbl>, lb <dbl>, ub <dbl>, centre <dbl>, inverse <lgl>,
#> #   exponential <lgl>, logarithmic <lgl>, magnitude <dbl>, custom_args <list>

You can observe how a score will be created using the score_summary() function. This is useful to check that the score you are going to create is what you expect.

column <- 1:100
score_summary(column, score_type = "Linear", lb = 0, ub = 10, exponential = FALSE)

Applying score specifications

Create the actual scores, and add them to your data, with the apply_scores() function.

data <- tibble(
  x = 1:100,
  y = 100:1,
  z = letters[rep(1:10, 10)]
)
scored_data <- apply_scores(data, scores)
scored_data
#> # A tibble: 100 × 5
#>        x     y z     `Score 1: x` `Score 2: y`
#>    <int> <int> <chr>        <dbl>        <dbl>
#>  1     1   100 a              0.1            0
#>  2     2    99 b              0.2            0
#>  3     3    98 c              0.3            0
#>  4     4    97 d              0.4            0
#>  5     5    96 e              0.5            0
#>  6     6    95 f              0.6            0
#>  7     7    94 g              0.7            0
#>  8     8    93 h              0.8            0
#>  9     9    92 i              0.9            0
#> 10    10    91 j              1              0
#> # … with 90 more rows

Finally, you can create a final score with the score_final() function. This calculates a weighted mean of all the scores you have created.

final_data <- score_final(scored_data, scores)
final_data
#> # A tibble: 100 × 6
#>        x     y z     `Score 1: x` `Score 2: y` final_score
#>    <int> <int> <chr>        <dbl>        <dbl>       <dbl>
#>  1     1   100 a              0.1            0      0.0286
#>  2     2    99 b              0.2            0      0.0571
#>  3     3    98 c              0.3            0      0.0857
#>  4     4    97 d              0.4            0      0.114 
#>  5     5    96 e              0.5            0      0.143 
#>  6     6    95 f              0.6            0      0.171 
#>  7     7    94 g              0.7            0      0.2   
#>  8     8    93 h              0.8            0      0.229 
#>  9     9    92 i              0.9            0      0.257 
#> 10    10    91 j              1              0      0.286 
#> # … with 90 more rows

Filtering data

Once you have scored your data, it is useful to be able to filter and sort it. Filters are stored in a table in the same way as scores are. Use the filters_init object to get a table with 0 filters in.

filters_init
#> # A tibble: 0 × 5
#> # … with 5 variables: type <chr>, colname <chr>, pattern <chr>, min <dbl>,
#> #   max <dbl>

Create a filter with the add_filter() function. All you need to specify initially is the column you want to filter.

filters <- add_filter(filters_init, "Score 1: x", final_data)
filters <- add_filter(filters, "z", final_data)
filters
#> # A tibble: 2 × 5
#>   type      colname    pattern   min   max
#>   <chr>     <chr>      <chr>   <dbl> <dbl>
#> 1 numeric   Score 1: x NA        0.1     1
#> 2 character z          ""       NA      NA

There are two types of filters: numeric and character filters.

Numeric filters filter numeric columns using a minimum and maximum value. Only rows where the value of the specified column is between the minimum and maximum are included in the filtered data frame.

String filters filter string (word) columns using a pattern. Only rows where the value of the specified column contains the pattern are included in the filtered data frame.

You can edit created filters with the edit_filter() function.

filters <- edit_filter(filters, 1, min = 0.5, max = 1)
filters <- edit_filter(filters, 2, pattern = "a")
filters
#> # A tibble: 2 × 5
#>   type      colname    pattern   min   max
#>   <chr>     <chr>      <chr>   <dbl> <dbl>
#> 1 numeric   Score 1: x NA        0.5     1
#> 2 character z          a        NA      NA

You can then apply these filters with the apply_filters() function.

filtered_data <- apply_filters(final_data, filters[2, ])
filtered_data
#> # A tibble: 10 × 6
#>        x     y z     `Score 1: x` `Score 2: y` final_score
#>    <int> <int> <chr>        <dbl>        <dbl>       <dbl>
#>  1     1   100 a              0.1        0          0.0286
#>  2    11    90 a              1          0          0.286 
#>  3    21    80 a              1          0          0.286 
#>  4    31    70 a              1          0          0.286 
#>  5    41    60 a              1          0          0.286 
#>  6    51    50 a              1          0          0.286 
#>  7    61    40 a              1          0          0.286 
#>  8    71    30 a              1          0          0.286 
#>  9    81    20 a              1          0          0.286 
#> 10    91    10 a              1          0.208      0.434

Sorting data

Data can be sorted with the sort_df() function. The desc argument controls whether the column is sorted in ascending or descending order.

sort_df(filtered_data, "y", desc = FALSE)
#> # A tibble: 10 × 6
#>        x     y z     `Score 1: x` `Score 2: y` final_score
#>    <int> <int> <chr>        <dbl>        <dbl>       <dbl>
#>  1    91    10 a              1          0.208      0.434 
#>  2    81    20 a              1          0          0.286 
#>  3    71    30 a              1          0          0.286 
#>  4    61    40 a              1          0          0.286 
#>  5    51    50 a              1          0          0.286 
#>  6    41    60 a              1          0          0.286 
#>  7    31    70 a              1          0          0.286 
#>  8    21    80 a              1          0          0.286 
#>  9    11    90 a              1          0          0.286 
#> 10     1   100 a              0.1        0          0.0286

Downloading data

Use the download_df() function to write your data frame to a file. The file_type argument currently only accepts “CSV” and “Excel”.

download_df(filtered_data, "CSV", "myfile.csv")

Predicting prices

A major part of this app is the ability to predict the price of a specified stock. First, certain stocks can be ‘favourited’ using the favourite_stock() function.

stock_data <- favourite_stock(default_stock_data, "GOOGL")

Stock data can be searched using the search_stock() function. The results will show the favourited stocks at the top.

search_stocks(stock_data, "go")
#> # A tibble: 4 × 100
#>   symbol company_…¹ excha…² indus…³ website descr…⁴ ceo   secur…⁵ sector prima…⁶
#>   <chr>  <chr>      <chr>   <chr>   <chr>   <chr>   <chr> <chr>   <chr>    <dbl>
#> 1 GOOGL  Alphabet … NASDAQ  "All O… abc.xyz Larry … Sund… Alphab… Infor…    7375
#> 2 AVGO   Broadcom … NASDAQ  "Semic… www.br… Broadc… Hock… Broadc… Manuf…    3674
#> 3 GS     Goldman S… NEW YO… "Inves… www.gs… The Go… Davi… Goldma… Finan…    6211
#> 4 WFC    Wells Far… NEW YO… "Comme… www.we… Wells … Char… Wells … Finan…    6021
#> # … with 90 more variables: employees <dbl>, address <chr>, state <chr>,
#> #   city <chr>, ZIP <chr>, country <chr>, phone <dbl>,
#> #   capital_expenditures <dbl>, cash_change <dbl>, cash_flow <dbl>,
#> #   cash_flow_financing <dbl>, changes_in_inventories <dbl>,
#> #   changes_in_receivables <dbl>, currency <chr>, depreciation <dbl>,
#> #   filing_type <chr>, fiscal_date <dbl>, net_borrowings <dbl>,
#> #   net_income <dbl>, report_date <dbl>, total_investing_cash_flows <dbl>, …

Once you have found a stock you want to make predictions on, you can generate a summary of it using the stock_summary() function.

stock <- which(stock_data$symbol == "GOOGL")
stock_summary(stock_data, stock)
#> # A tibble: 1 × 100
#>   symbol company_…¹ excha…² indus…³ website descr…⁴ ceo   secur…⁵ sector prima…⁶
#>   <chr>  <chr>      <chr>   <chr>   <chr>   <chr>   <chr> <chr>   <chr>    <dbl>
#> 1 GOOGL  Alphabet … NASDAQ  "All O… abc.xyz Larry … Sund… Alphab… Infor…    7375
#> # … with 90 more variables: employees <dbl>, address <chr>, state <chr>,
#> #   city <chr>, ZIP <chr>, country <chr>, phone <dbl>,
#> #   capital_expenditures <dbl>, cash_change <dbl>, cash_flow <dbl>,
#> #   cash_flow_financing <dbl>, changes_in_inventories <dbl>,
#> #   changes_in_receivables <dbl>, currency <chr>, depreciation <dbl>,
#> #   filing_type <chr>, fiscal_date <dbl>, net_borrowings <dbl>,
#> #   net_income <dbl>, report_date <dbl>, total_investing_cash_flows <dbl>, …

Finally, predictions can be made using the predict_price() function. Specify a stock, a start date and an end date to start making predictions. The function can make daily or monthly predictions, specify this using the freq argument.

predictions_daily <- predict_price(
  "GOOGL",
  start_date = today(),
  end_date = today() %m+% months(2), freq = "daily"
)
predictions_monthly <- predict_price(
  "GOOGL",
  start_date = today(),
  end_date = today() + years(1), freq = "monthly"
)

predictions_daily
#> # A tibble: 60 × 4
#>     yhat yhat_lower yhat_upper ref_date  
#>    <dbl>      <dbl>      <dbl> <date>    
#>  1  97.6       66.3       133. 2023-02-19
#>  2 114.        75.6       155. 2023-02-20
#>  3 114.        75.0       155. 2023-02-21
#>  4 114.        73.8       157. 2023-02-22
#>  5 115.        73.7       158. 2023-02-23
#>  6 115.        72.9       159. 2023-02-24
#>  7  99.2       62.3       138. 2023-02-25
#>  8  99.1       61.2       138. 2023-02-26
#>  9 115.        70.4       162. 2023-02-27
#> 10 115.        69.9       161. 2023-02-28
#> # … with 50 more rows
predictions_monthly
#> # A tibble: 13 × 4
#>     yhat yhat_lower yhat_upper ref_date  
#>    <dbl>      <dbl>      <dbl> <date>    
#>  1  137.       130.       158. 2023-02-19
#>  2  125.       120.       150. 2023-03-19
#>  3  116.       104.       133. 2023-04-19
#>  4  137.       126.       155. 2023-05-19
#>  5  145.       132.       159. 2023-06-19
#>  6  138.       121.       149. 2023-07-19
#>  7  129.       118.       146. 2023-08-19
#>  8  162.       155.       183. 2023-09-19
#>  9  130.       124.       152. 2023-10-19
#> 10  129.       124.       150. 2023-11-19
#> 11  142.       124.       152. 2023-12-19
#> 12  165.       144.       171. 2024-01-19
#> 13  161.       141.       169. 2024-02-19

Make a graph of these predictions using the plot_predictions() function.

plot_predictions(predictions_daily)

Plotting data

The project currently provides two ‘default’ plots, and a framework for you to create your own custom plots.

Score distribution

The score distribution plot creates a box plot with a jitter overlay to show the distribution of each of your scores.

To get your scores to plot, use the get_scores() function.

actual_scores <- get_scores(final_data, scores)
actual_scores
#> # A tibble: 100 × 3
#>    `Score 1: x` `Score 2: y` final_score
#>           <dbl>        <dbl>       <dbl>
#>  1          0.1            0      0.0286
#>  2          0.2            0      0.0571
#>  3          0.3            0      0.0857
#>  4          0.4            0      0.114 
#>  5          0.5            0      0.143 
#>  6          0.6            0      0.171 
#>  7          0.7            0      0.2   
#>  8          0.8            0      0.229 
#>  9          0.9            0      0.257 
#> 10          1              0      0.286 
#> # … with 90 more rows

This is the only argument to the score_distributions() function.

score_distributions(actual_scores)

Here we can see that most of the column values produced a single score, so we may want to change our score specifications to be more specific.

Score performance

The score performance graph allows you to plot a column of your choice against every one of your scores. To create this graph, use the score_performance() function.

score_performance(final_data, "y", actual_scores)

Custom plots

The custom_plot() function allows you to create a vast number of plots from your data. The first argument to the function is the data, followed by the plotting method. The rest of the arguments depend on the plotting method. Each argument should be specified in the format aesthetic = "column_name", where aesthetic is a visual property that a variable can be mapped to (e.g. x, colour), and column_name is the name of a column in your data.

Currently, three different types of graphs can be created: line graphs, scatter graphs and histograms.

Line graphs

Create line graphs passing in “line” to the plotting_method argument.

Line graphs accept the following aesthetics:

x - the variable on the x axis.
y - the variable on the y axis.
colour - the colour of the line.

x and y are required arguments, meaning that they must be supplied for a plot to be outputted.

custom_plot(final_data, "line", x = "x", y = "final_score", colour = "y")

Scatter graphs

Create scatter graphs passing in “scatter” to the plotting_method argument.

Scatter graphs accept the following aesthetics:

x - the variable on the x axis.
y - the variable on the y axis.
colour - the colour of the point.
size - the size of the point.
shape - the shape of the point.

x and y are required arguments, meaning that they must be supplied for a plot to be outputted.

custom_plot(final_data, "scatter", x = "x", y = "y", size = "Score 2: y")

Histograms

Create histograms passing in “histogram” to the plotting_method argument.

Histograms accept the following aesthetics:

x - the variable on the x axis.
colour - the colour of the bar.
size - the size of the bar.

x is a required arguments, meaning that it must be supplied for a plot to be outputted.

The y aesthetic of a histogram is the frequency density of the x coordinate.

custom_plot(final_data, "histogram", x = "Score 1: x", colour = "z")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.