Skip to contents

update_snapshots makes it easy to create and update a historical data table on a remote (SQL) server. The function takes the data (.data) as it looks on a given point in time (timestamp) and then updates (or creates) an remote table identified by db_table. This update only stores the changes between the new data (.data) and the data currently stored on the remote. This way, the data can be reconstructed as it looked at any point in time while taking as little space as possible.

See vignette("basic-principles") for further introduction to the function.

Usage

update_snapshot(
  .data,
  conn,
  db_table,
  timestamp,
  filters = NULL,
  message = NULL,
  tic = Sys.time(),
  logger = NULL,
  enforce_chronological_order = TRUE
)

Arguments

.data

(data.frame(1), tibble(1), data.table(1), or tbl_dbi(1))
Data object.

conn

(DBIConnection(1))
Connection object.

db_table

(id-like object(1))
A table specification (coercible by id()).

timestamp

(POSIXct(1), Date(1), or character(1))
The timestamp describing the data being processed (not the current time).

filters

(data.frame(1), tibble(1), data.table(1), or tbl_dbi(1))
A object subset data by. If filters is NULL, no filtering occurs. Otherwise, an inner_join() is performed using all columns of the filter object.

message

(character(1))
A message to add to the log-file (useful for supplying metadata to the log).

tic

(POSIXct(1))
A timestamp when computation began. If not supplied, it will be created at call-time (used to more accurately convey the runtime of the update process).

logger

(Logger(1))
A configured logging object. If none is given, one is initialized with default arguments.

enforce_chronological_order

(logical(1))
Are updates allowed if they are chronologically earlier than latest update?

Value

No return value, called for side effects.

See also

filter_keys

Examples

  conn <- get_connection()

  data <- dplyr::copy_to(conn, mtcars)

  # Copy the first 3 records
  update_snapshot(
    head(data, 3),
    conn = conn,
    db_table = "test.mtcars",
    timestamp = Sys.time()
  )
#> Warning: `log_path` and `log_tbl` are both `NULL` and therefore NO file or database logging will be done.
#> Consider adding options SCDB.log_table_id and/or SCDB.log_path to your .Rprofile
#> 2024-10-04 07:59:21.870 - runner - INFO - Started
#> 2024-10-04 07:59:21.870 - runner - INFO - Parsing data for table main.test.mtcars started
#> 2024-10-04 07:59:21.870 - runner - INFO - Given timestamp for table is 2024-10-04 07:59:21.870566
#> 2024-10-04 07:59:22.229 - runner - INFO - Deactivating records
#> 2024-10-04 07:59:22.229 - runner - INFO - After to_remove
#> 2024-10-04 07:59:22.329 - runner - INFO - After to_add
#> 2024-10-04 07:59:22.330 - runner - INFO - Deactivate records count: 0
#> 2024-10-04 07:59:22.330 - runner - INFO - Adding new records
#> 2024-10-04 07:59:22.388 - runner - INFO - Insert records count: 3
#> 2024-10-04 07:59:22.443 - runner - INFO - Finished processing for table main.test.mtcars
#> NULL

  # Update with the first 5 records
  update_snapshot(
    head(data, 5),
    conn = conn,
    db_table = "test.mtcars",
    timestamp = Sys.time()
  )
#> Warning: `log_path` and `log_tbl` are both `NULL` and therefore NO file or database logging will be done.
#> Consider adding options SCDB.log_table_id and/or SCDB.log_path to your .Rprofile
#> 2024-10-04 07:59:22.458 - runner - INFO - Started
#> 2024-10-04 07:59:22.458 - runner - INFO - Parsing data for table main.test.mtcars started
#> 2024-10-04 07:59:22.458 - runner - INFO - Given timestamp for table is 2024-10-04 07:59:22.458765
#> 2024-10-04 07:59:22.900 - runner - INFO - Deactivating records
#> 2024-10-04 07:59:22.992 - runner - INFO - After to_remove
#> 2024-10-04 07:59:23.094 - runner - INFO - After to_add
#> 2024-10-04 07:59:23.095 - runner - INFO - Deactivate records count: 0
#> 2024-10-04 07:59:23.095 - runner - INFO - Adding new records
#> 2024-10-04 07:59:23.156 - runner - INFO - Insert records count: 2
#> 2024-10-04 07:59:23.214 - runner - INFO - Finished processing for table main.test.mtcars
#> NULL

  dplyr::tbl(conn, "test.mtcars")
#> # Source:   table<`test.mtcars`> [5 x 14]
#> # Database: sqlite 3.46.0 [:memory:]
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb checksum    
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>       
#> 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4 1da1bd6281e…
#> 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4 7a302dae353…
#> 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1 0532ea093ca…
#> 4  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2 613d84d8ade…
#> 5  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1 7ff73a7388d…
#> # ℹ 2 more variables: from_ts <chr>, until_ts <dbl>

  close_connection(conn)