Cache
Cache the results of data loaders using the Serialisation
standard library. Cache keys are determined by the loader "recipe" and the type requested.
It is important to note that not all data types can be cached effectively, such as an IOStream
.
Recipe hashing
The driver, parameters, type(s), of a loader and the storage drivers of a dataset are all combined into the "recipe hash" of a loader.
╭─────────╮ ╭──────╮
│ Storage │ │ Type │
╰───┬─────╯ ╰───┬──╯
│ ╭╌╌╌╌╌╌╌╌╌╮ ╭───┴────╮ ╭────────╮
├╌╌╌╌┤ DataSet ├╌╌╌╌┤ Loader ├─┤ Driver │
│ ╰╌╌╌╌╌╌╌╌╌╯ ╰───┬────╯ ╰────────╯
╭───┴─────╮ ╭───┴───────╮
│ Storage ├─╼ │ Parmeters ├─╼
╰─────┬───╯ ╰───────┬───╯
╽ ╽
Since the parameters of the loader (and each storage backend) can reference other data sets (indicated with ╼
and ╽
), this hash is computed recursively, forming a Merkle Tree. In this manner the entire "recipe" leading to the final result is hashed.
╭───╮
│ E │
╭───╮ ╰─┬─╯
│ B ├──▶──┤
╭───╮ ╰─┬─╯ ╭─┴─╮
│ A ├──▶──┤ │ D │
╰───╯ ╭─┴─╮ ╰───╯
│ C ├──▶──┐
╰───╯ ╭─┴─╮
│ D │
╰───╯
In this example, the hash for a loader of data set "A" relies on the data sets "B" and "C", and so their hashes are calculated and included. "D" is required by both "B" and "C", and so is included in each. "E" is also used in "D".
Configuration
Store path
This uses the same store.path
configuration variable as the store
plugin (which see).
Disabling on a per-loader basis
Caching of individual loaders can be disabled by setting the "cache" parameter to false
, i.e.
[[somedata.loader]]
cache = false
...
Store management
System-wide configuration can be set via the store config set
REPL command, or directly modifying the DataToolkitCommon.Store.getinventory().config
struct.
A few (system-wide) settings determine garbage collection behaviour:
auto_gc
(default 2): How often to automatically run garbage collection (in hours). Set to a non-positive value to disable.max_age
(default 30): The maximum number of days since a collection was last seen before it is removed from consideration.max_size
(default 53687091200): The maximum (total) size of the store.recency_beta
(default 1): When removing items to avoid going overmax_size
, how much recency should be valued. Can be set to any value in (-∞, ∞). Larger (positive) values weight recency more, and negative values weight size more. -1 and 1 are equivalent.store_dir
(default store): The directory (either as an absolute path, or relative to the inventory file) that should be used for storage (IO) cache files.cache_dir
(default cache): The directory (either as an absolute path, or relative to the inventory file) that should be used for Julia cache files.