Contributing
This is a guide for contributing to DataToolkitCommon. It is intended to make it easier to contribute new transformers and plugins, but may also be of some general interest.
Using the development versions of everything
Given the inter-dependent packages and monorepo setup, the easiest way to use the development version of everything is by pasting the following into a Project.toml:
[deps]
DataToolkit = "dc83c90b-d41d-4e55-bdb7-0fc919659999"
DataToolkitBase = "e209d0c3-e863-446f-9b45-de6ca9730756"
DataToolkitCommon = "9e6fccbf-6142-406a-aa4c-75c1ae647f53"
DataToolkitCore = "caac3e55-418c-402e-a061-64d454aa8f4f"
DataToolkitREPL = "c58528a0-97a2-40a0-9a44-056fe1196995"
DataToolkitStore = "082ec3c2-3fb3-458f-ad22-5e5e31d4377a"
[sources]
DataToolkit = {url = "https://github.com/tecosaur/DataToolkit.jl.git", subdir="Main"}
DataToolkitBase = {url = "https://github.com/tecosaur/DataToolkit.jl.git", subdir="Base"}
DataToolkitCommon = {url = "https://github.com/tecosaur/DataToolkit.jl.git", subdir="Common"}
DataToolkitCore = {url = "https://github.com/tecosaur/DataToolkit.jl.git", subdir="Core"}
DataToolkitREPL = {url = "https://github.com/tecosaur/DataToolkit.jl.git", subdir="REPL"}
DataToolkitStore = {url = "https://github.com/tecosaur/DataToolkit.jl.git", subdir="Store"}Creating a new transformer
Say there's a format you're familiar with or need to work with that's relatively common and not (yet) supported out-of-the-box by DataToolkitCommon. This is a great oppotunity to spin up a PR adding support 😉. If you get stuck on anything, just open an issue or DM me (@tecosaur on Zulip, Slack, and more) and I'll happily see if I can help 🙂.
I always appreciate the value of a good example. Here are some transformers that I think might be helpful as a point of reference:
- The
filesystemstorage - The
arrowloader and writer
Loader
- Create a new file
src/transformers/saveload/{name}.jl - Add an
include("src/transformers/saveload/{name}.jl")line tosrc/DataToolkitCommon.jl(maintaining the sorted order) - Decide whether you want to use an extra package, if so: a. With
DataToolkitCommonas the current project, in the Pkg repl runadd --weak {MyPkg}b. Modify theProject.tomlto add a{MyPkg}Extto the[extensions]section c. Createext/{MyPkg}Ext.jld. Add a stub method tosaveload/{name}.jl, and implement it inext/{MyPkg}Ext.jle. Use@require {MyPkg}at the start of yourloadmethod implementation f. Add a@addpkg {MyPkg} {UUID}line to the__init__method insrc/DataToolkitCommon.jl(maintaining the sorted order) - Implement one or more
loadmethods forDataLoader{:name}. Use@getparamif you want to access parameters of the loader or dataset. - If you implemented multiple
loadmethods, consider whether it would also be appropriate to implement a specialisedsupportedtypesmethod. - Consider whether there is a reasonable implementation of
createautoyou could write. - At the end of the file, assign a docstring to a const using the form
const {name}_DOC = md"""...""", and update theappend!(DataToolkitCore.TRANSFORMER_DOCUMENTATION, [...])call insrc/DataToolkitCommon.jl's__init__method appropriately. - Add
"{name}"to theDocSaveloadlist indocs/make.jl - For brownie points: find a test file for the new loader and PR it to DataToolkitTestAssets and write a test using it.
Storage
The same as the loader steps, except:
- You want to create the file
src/transformers/storage/{name}.jl - You want to implement either:
storagegetstorageand/orputstorage
- Add
"{name}"toDocStorageinstead ofDocSaveloadindocs/make.jl
Writer
The same as the loader steps, except:
- You want to implement
save
Creating a new plugin
If you feel like DataToolkit lacks something, not support for a certain support/storage provider, but some more fundamental behaviour — it's entirely likely this behaviour can be added in via a Plugin.
Depending on the behaviour you have in mind, implementing a plugin can take five minutes and be just a dozen or two lines total, or something much larger (like DataToolkitStore). Feel free to reach out to me for a chat if you're not sure whether or how something can be done 🙂.
The plugins in src/plugins/ should provide an indication of what implementing a plugin can look like. The broad strokes look something like this though:
- Compare the behaviour in your mind to the join points currently available, and contemplate which of them would need to be changed to accomidate your target behaviour
- Create
src/plugins/{name}.jl, and implement advice functions that modify the identified join points - Construct a
Pluginand assign it to aconstvariable with a docstring - Add a
include("plugins/myplugin.jl")line tosrc/DataToolkitCommon.jl, and@dataplugin MY_PLUGINline to the__init__function (around the middle, with the other plugins). - Add the plugin to the
DocPluginslist indocs/make.jl, providing a mapping from the display name to the actual name.