class: center, middle, inverse, title-slide .title[ # Efficient dependency management with {renv} ] .author[ ### Julia Romanowska ] .date[ ### 2023-03-15 ] --- ## {renv} package <img src="figures/Screenshot_2023-03-14_Introduction_to_renv.png" alt="Screenshot of the 'Introduction to renv' vignette online. Text that says: 'The renv package is a new effort to bring project-local R dependency management to your projects. The goal is for renv to be a robust, stable replacement for the Packrat package, with fewer surprises and better default behaviors. Underlying the philosophy of renv is that any of your existing workflows should just work as they did before – renv helps manage library paths (and other project-specific state) to help isolate your project’s R dependencies, and the existing tools you’ve used for managing R packages (e.g. install.packages(), remove.packages()) should work as they did before.'" style="width: 80%; position: absolute; left: 10%; top: 150px;"> --- class: inverse, middle, center ## Why? -- ### reproducibility ??? One of the points to consider when creating reproducible code is the versions of software one uses. -- ### peace of mind ??? Remember the last time you needed to get back to a project after a longer break and suddenly you get some errors because packages were updated in the meantime? -- ### collaboration ??? No need to ask collaborators to try to re-install certain packages after your code would not run on their machines! --- ##
How? -- 1. `renv::init()` 1. work as usual 1. `renv::snapshot()` 1. work as usual 1. `renv::snapshot()` or `renv::restore()` ??? 1. Call `renv::init()` to initialize a new project-local environment with a private R library, 1. Work in the project as normal, installing and removing new R packages as they are needed in the project, 1. Call `renv::snapshot()` to save the state of the project library to the lockfile (called renv.lock), 1. Continue working on your project, installing and updating R packages as needed. 1. Call `renv::snapshot()` again to save the state of your project library if your attempts to update R packages were successful, or call `renv::restore()` to revert to the previous state as encoded in the lockfile if your attempts to update packages introduced some new problems. > `snapshot()` records: pkg version and installation source > `restore()` attempts to re-install the packages based on the info saved --- ## Details - file structure .pull-left[ new files and folders in your project: -
`.Rprofile` -
`renv.lock` -
`renv/activate.R` -
`renv/library` -
`renv/settings.dcf` ] -- .pull-right[ <br> <br> ← _commit to git_ ← _commit to git_ ← _commit to git_ ← _ignore in git_ ] ??? | File | Usage | |-------|-------| | .Rprofile | Used to activate renv for new R sessions launched in the project. | | renv.lock | The lockfile, describing the state of your project’s library at some point in time. | | renv/activate.R | The activation script run by the project .Rprofile. | | renv/library | The private project library. | | renv/settings.dcf | Project settings – see ?settings for more details. | --- ## Details - dependency discovery - `renv::dependencies()` searches for `library()` calls ??? This should work, but sometimes we're using only one function from a package, without loading it. Or the package is not easily found in standard sources. Or we don't want {renv} to search through some files. We can adjust that manually. -- - you can create
`_dependencies.R` with only `library()` calls - use
`.gitignore` and
`.renvignore` - add
`DESCRIPTION` file to the project ??? - `_dependencies.R` or `DESCRIPTION` works in a similar way: telling {renv} which packages to include _explicitly_ - `.renvignore` lists files that {renv} should not read --- ## Details - cache - packages live in global _cache_ = _main library dir_ -
`library` in each project has links to cache ??? Use of cached packages makes `restore()` and `install()` faster and saves disk space. You can always disable cache for each project separately or globally. -- > _NOTE: this will not work across disks on Windows!_ ??? If you're working on a network disk but have all the libraries installed on a local disk, you should either move the projects to the local disk or the main library to a network disk. --- ##
Caveats - takes care only of R-packages (see also `?renv::equip()`) - package version needs to be available (online/offline) - _default sources_ - CRAN, - GitHub/Gitlab, - Bioconductor, - Bitbucket - all things reproducibility:
[Building reproducible analytical pipelines with R](https://raps-with-r.dev/) ??? This is not a panaceum! If an R-package depends on an external library, this will have to be installed manually. `renv::equip()` can help for some software on Windows. Copy of the version of the package needs to be either online or offline. You can add sources manually, through settings - check [vignette for details](https://rstudio.github.io/renv/articles/renv.html#custom-r-package-repositories). --- class: inverse, left, middle ## What next? - combine with docker
- combine with git/GitHub
- combine with CI/GitHub actions