Contributing#
plenoptic is a python library of tools to help researchers better understand
their models. We welcome and encourage contributions from everyone!
First, please check out the Code of Conduct and read it before going any further. You may also want to check out the main page of the documentation for a longer overview of the project and how to get everything installed, as well as pointers for further reading, depending on your interests.
If you encounter any issues with plenoptic, first search the existing
issues and
discussions
to see if there’s already information to help you. If not, please open a new
issue! We have
a template for bug reports, and following it (so you provide the necessary
details) will make solving your problem much easier.
If you’d like to help improve plenoptic, there are many ways you can
contribute, from improving documentation to writing code. For those not already
familiar with the project, it can be very helpful for us if you go through the
tutorials, README, and documentation and let us know if anything is unclear,
what needs more detail (or clearer writing), etc. For those that want to
contribute code, we also have many
issues that we
are working through. If you would like to work on one of those, please give it a
try!
In order to submit changes, create a branch or fork of the project, make your changes, add documentation and tests, and submit a Pull Request. See contributing to the code below for more details on this process.
We try to keep all our communication on Github, and we use several channels:
Discussions is the place to ask usage questions, discuss issues too broad for a single issue, or show off what you’ve made with plenoptic.
If you’ve come across a bug, open an issue.
If you have an idea for an extension or enhancement, please post in the ideas section of discussions first. We’ll discuss it there and, if we decide to pursue it, open an issue to track progress.
Supported versions#
plenoptic tries to follow pytorch: we support the python versions that they support. We run our CPU tests on all supported versions, and the GPU tests and documentation build use the middle version.
Contributing to the code#
Contribution workflow#
We welcome contributions to plenoptic! We follow
the GitHub
Flow
workflow: no one is allowed to push to the main branch, all development
happens in separate feature branches (each of which, ideally, implements a
single feature, addresses a single issue, or fixes a single problem), and these
get merged into main once we have determined they’re ready. Then, after enough
changes have accumulated, we put out a new release, adding a new tag which
increments the version number, and uploading the new release to PyPI (see
Releases for more details).
In addition to the information that follows, Github (unsurprisingly) has good information on this workflow, as does the Caiman package (though note that the Caiman uses git flow, which involves a separate develop branch in addition to main).
Before we begin: everyone finds git confusing the first few (dozen) times they encounter it! And even people with a hard-won understanding frequently look up information on how it works. If you find the following difficult, we’re happy to help walk you through the process. Please post on our GitHub discussions page to get help.
Creating a development environment#
You’ll need a local copy of plenoptic which keeps up-to-date with any changes you make. To do so, you will need to fork and clone plenoptic.
Go to the plenoptic repo and click on the
Forkbutton at the top right of the page. This creates a copy of plenoptic in your Github account.You should then clone your fork to your local machine and create an editable installation. You will also add the upstream version, which makes it easy to keep your
plenopticup to date with the canonical version. To do so, run the lines of code below, also outlined in our docs:
# replace with your personal GitHub username
git clone https://github.com/github-username/plenoptic.git
cd plenoptic
# add the upstream branch. you will now have two remotes:
# your fork (origin) and the canonical version (upstream)
git remote add upstream https://github.com/plenoptic-org/plenoptic.git
You will then want to install all the
docsanddevsoptional dependencies, so that you can run tests and build the documentation. To do so, make a virtual environment and run the installation from within the copy ofplenopticon your machine.
# create virtual environment
python -m venv .venv
# activate environment
.venv/Scripts/activate
# install all dependencies. see jupyter notes if you also want to install [nb]
pip install -e ".[docs,dev]"
Finally, we strongly recommend you install pre-commit, which will run linting and formatting each time you make a commit. This will help you deal with common errors locally prior to pushing your code.
pip install pre-commit
pre-commit install
Creating a new branch#
As discussed above, each feature in plenoptic is worked on in a separate branch. This allows us to have multiple people developing multiple features simultaneously, without interfering with each other’s work. To create your own branch, run the following from within your plenoptic directory:
# switch to main branch of your fork
git checkout main
# update your fork from your github
git pull origin main
# ensure your fork is in sync with the canonical version
git pull upstream main
# update your fork's main branch with any changes from upstream
git push origin main
# create and switch to the branch
git checkout -b my_cool_branch
Anytime you want to sync with upstream changes, just run git pull upstream main. If you aren’t comfortable with git commands, I recommend the Software Carpentry git lesson.
Then, create new changes on this branch and, when you’re ready, you can begin running tests locally and committing to your branch.
Committing to your branch#
After making changes to the code on your branch, you should commit your changes. These should be done regularly so it is easy to see changes that were made and is simple to revert back to previous versions if issues arise. To make a commit, navigate to the plenoptic directory and run the following:
# stage the changes
git add src/plenoptic/the_file_you_changed.py
# commit your changes
git commit -m "A helpful message explaining my changes"
# if the pre-commit checks pass, you can then push to the origin remote
git push origin my_cool_branch
You can run git status at any time to keep track of what files have been changed and which are staged for committing. You can also run git switch my_other_branch to move between branches.
If you have installed pre-commit, you will see a number of linting and formatting checks running when you try to commit, see Code Style and Linting for more details. Often, these errors will include formatting specifications from ruff. Until these are fixed, the commit will not go through, so you must fix the errors, re-stage, and re-commit (or see Ignoring Ruff Linting if needed).
Documenting your code#
Depending on the type of change you are making, you will likely need to add or edit documentation related to the changes. There are four classes of documentation in plenoptic:
Docstrings: the “help” information that is written alongside code. If you add a new function or object, you will need to write an accompanying docstring.
Doctests: A section of the docstring, examples which show how to use plenoptic functions or objects. We are in the process of adding examples to every function in plenoptic – see issue 237 for progress and how to help! Any new functions or objects need to have examples included.
(Less common) Markdown files that live in
docs/, plain text which might show some example code. These are used for explanations like this one!(Less common) MyST notebooks: specially-formatted markdown files that live in
docs/, these get converted into jupyter notebooks and executed while building the documentation. These are used to provide longer examples of how to use plenoptic, mixed with code.
If you are adding a new function or object to plenoptic, you must include a docstring with accompanying doctests. Adding a new file to the docs/ folder is less common and determined on a case-by-case basis, and we can discuss whether it’s necessary during the course of a contribution. Please see Documentation section for specifics on formatting and testing, with notes on the above types, and Build the documentation for how to build the documentation locally.
Contributing your change back to plenoptic#
You can make any number of changes on your branch. Once you’re happy with your changes, make sure you have added tests and documentation, check that existing tests all run successfully, that your branch is up-to-date with main, and then open a pull request by clicking on the big Compare & pull request button that appears at the top of your fork after pushing to your branch (see here for a tutorial).
Your pull request should include information on what you changed and why, referencing any relevant issues or discussions, and highlighting any portion of your changes where you have lingering questions (e.g., “was this the right way to implement this?”) or want reviewers to pay special attention. You can look at previous closed pull requests to see what this looks like.
At this point, we will be notified of the pull request and will read it over. We will try to give an initial response quickly, and then do a longer in-depth review, at which point you will probably need to respond to our comments, making changes as appropriate. We’ll then respond again, and proceed in an iterative fashion until everyone is happy with the proposed changes. This process can take a while! (The more focused your pull request, the less time it will take.)
If your changes are integrated, you will be added as a Github contributor and as one of the authors of the package. Thank you for being part of plenoptic!
Code Style and Linting#
We use Ruff for linting and formatting our Python code to maintain a consistent code style and catch potential errors early. We run ruff as part of our CI (using pre-commit, see below) and non-compliant code will not be merged! You can see the version of ruff that we are currently using in the .pre-commit-config.yaml file in the project root .
Using Ruff#
Ruff is a fast and comprehensive Python formatter and linter that checks for common style and code quality issues. It combines multiple tools, like black, Pyflakes, pycodestyle, isort, and other linting rules into one efficient tool, which are specified in pyproject.toml. Before submitting your code, make sure to run Ruff to catch any issues. See other sections of this document for how to use nox and pre-commit to simplify this process.
Ruff has two components, a formatter and a linter. Formatters and linters are both static analysis tools, but formatters “quickly check and reformat your code for stylistic consistency without changing the runtime behavior of the code”, while linters “detect not just stylistic inconsistency but also potential logical bugs, and often suggest code fixes” (per GitHub’s readme project). There are many choices of formatters and linters in python; ruff aims to combine the features of many of them while being very fast.
For both the formatter and the linter, you can run ruff without any additional arguments; our configuration option are stored in the pyproject.toml file and so don’t need to be specified explicitly.
Formatting:#
ruff format is the primary entrypoint to the formatter. It accepts a list of files or directories, and formats all discovered Python files:
ruff format # Format all files in the current directory.
ruff format path/to/code/ # Format all files in `path/to/code` (and any subdirectories).
ruff format path/to/file.py # Format a single file.
For the full list of supported options, run ruff format --help.
Using Ruff for Linting:#
To run Ruff on your code:
ruff check .
It’ll then tell you which lines are violating linting rules and may suggest that some errors are automatically fixable.
To automatically fix lintint errors, run:
ruff --fix .
Be careful with unsafe fixes, safe fixes are symbolized with the tools emoji and are listed here!
Ignoring Ruff Linting#
In some cases, it may be acceptable to suppress lint errors, for example when too long lines (code E501) are desired because otherwise the url might not be readable anymore. These ignores will be evaluated on a case-by-case basis.
You can do this by adding the following to the end of the line:
This line is tooooooo long. # noqa: E501
If you want to suppress an error across an entire file, do this at the top of the file:
# ruff: noqa: E501
Below is my python script
...
...
And any line living in this file can be as long as it wants ...
...
In some cases, you want to not only suppress the error message a linter throws but actually disable a linting rule. An example might be if the import order matters and running isort would mess with this.
In these cases, you can introduce an action comment like this such that ruff does not sort the following packages alphabetically:
import numpy as np # isort: skip
import my_package as mp # isort: skip
For more details, refer to the documentation.
General Style Guide Recommendations:#
Longer, descriptive names are preferred (e.g.,
xis not an appropriate name for a variable), especially for anything user-facing, such as methods, attributes, or arguments.Any public method or function must have a complete type-annotated docstring (see below for details). Hidden ones do not need to have complete docstrings, but they probably should.
Pre-Commit Hooks: Identifying simple issues before submission to code review (and how to ignore those)#
Pre-commit hooks are useful for the developer to check if all the linting and formatting rules (see Ruff above) are honored before committing. That is, when you commit, pre-commit hooks are run and auto-fixed where possible (e.g., trailing whitespace). You then need to add again if you want these changes to be included in your commit. If the problem is not automatically fixable, you will need to manually update your code before you are able to commit.
Using pre-commit is optional. We use pre-commit.ci to run pre-commit as part of PRs (auto-fixing wherever possible), but it may simplify your life to integrate pre-commit into your workflow.
In order to use pre-commit, you must install the pre-commit package into your development environment, and then install the hooks:
pip install pre-commit
pre-commit install
See pre-commit docs for more details.
After installation, should you want to ignore pre-commit hooks for some reason (e.g., because you have to run to a meeting and so don’t have time to fix all the linting errors but still want your changes to be committed), you can add --no-verify to your commit message like this:
git commit -m <my commit message> --no-verify
Adding models or synthesis methods#
In addition to the above, see the documentation for a description of models and synthesis objects. Any new models or synthesis objects will need to meet the requirements outlined in those pages.
Releases#
We create releases on Github, deploy on / distribute via pypi, and try to follow semantic versioning:
Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes
When doing a new release, the following steps must be taken:
In a new PR do the following. The deploy action will not pass if these steps are not followed!
Update
docs/_static/version_switcher.json. You will need to add a section for the new release and move thepreferred=trueline to that section.Go to the plenoptic-binder repo and create a new branch whose name matches the tag for the new release, and update the
postBuildfile, replacingmainin the call toreplace_crossrefs.pywith this tag as well.
After merging the above PR into the
mainbranch, create a Github release with a new tag matching that used in the section above. Creating the release will trigger the deployment to pypi, via ourdeployaction (found in.github/workflows/deploy.yml). The built version will grab the version tag from the Github release, using setuptools_scm.
Shortly after the deploy to pypi goes through (typically within a day), a PR will be automatically opened on the conda-forge/plenoptic-feedstock repo. After merging that PR, the plenoptic version on conda-forge will also be updated
Testing#
Before running tests locally, you’ll need
ffmpeg installed on your system, as well as
the dev optional dependencies (i.e., you should run pip install -e ".[dev]"
from within your local copy of plenoptic).
To run all tests, run pytest from the main plenoptic directory. This
will take a while, as we have many tests, broken into categories. There are
several choices for how to run a subset of the tests:
Run tests from one file:
pytest tests/test_mod.pyRun tests by keyword expressions:
pytest -k "MyClass and not method". This will run tests which contain names that match the given string expression, which can include Python operators that use filenames, class names and function names as variables. The example above will runTestMyClass.test_somethingbut notTestMyClass.test_method_simple.To run a specific test within a module:
pytest tests/test_mod.py::test_funcAnother example specifying a test method in the command line:
pytest test_mod.py::TestClass::test_method
View the pytest documentation for more info.
CPU and GPU tests#
Plenoptic tests are all run on both the CPU and the GPU, and both must pass before any PR is merged.
On the CPU, tests are run using GitHub actions. Tests use all supported python versions and the most recent pytorch version.
On the GPU, tests are run using a Simons Foundation-hosted Jenkins instance. Tests use the middle supported python version and the pytorch version specified in jenkins/Dockerfile, which we manually update as pytorch puts out new releases.
Some of our tests check for reproducibility of synthesis methods. See below for a discussion of some difficulties guaranteeing reproducibility across different devices, but the tl;dr is that the tests that are most likely to fail for this reason are only run on the GPU and are expected to fail on GPUs that are different from that used by the Jenkins runner. Thus, if you run the full test suite locally (using a GPU) and find that tests in test_uploaded_files.py are failing when you have changed nothing that should affect them, you can probably safely ignore this — this is only a problem if these tests fail during the CI. Ask a maintainer if you have questions here.
Running pytest with non-standard cache directories#
When running tests on some machines (e.g., nodes of the Flatiron cluster), the default cache directories used by some of the libraries will not exist, so running the tests like normal will result in errors that complain about directories not existing or not having permission to create directories in e.g., /home/wbroderick.
To avoid this problem, we can use environment variables to control the behavior of these libraries. To do so for pooch (downloading test files), torch (downloading pretrained models), and matplotlib (caching config files), prepend the following to your pytest command: PYTORCH_KERNEL_CACHE_PATH=~/.cache/torch/kernels TORCH_HOME=~/.cache/torch MPLCONFIGDIR=~/.cache/matplotlib PLENOPTIC_CACHE_DIR=~/.cache/plenoptic (or replace ~/.cache/ with some other directory).
Tip
This may also be helpful when building the documentation.
See relevant torch 1 and 2, pooch, and matplotlib docs.
Using nox to simplify testing and linting#
This section is optional but if you want to easily run tests in an isolated environment using the nox command-line tool.
Before proceeding, you’ll need to install nox. You can do this in your plenoptic environment, but it is more common to install it system-wide using pipx: pipx install nox. This installs nox into a globally-available isolated environment, see pipx docs for more details.
You will also need to install pyyaml in the same environment as nox. If you used pipx, then run pipx inject nox pyyaml.
To run all tests, formatters, and linters through nox, from the root folder of the
plenoptic package, execute the following command,
nox
nox will read the configuration from the noxfile.py script.
If you only want to run an individual session (e.g., lint or test), you can first check which sessions are available with the following command:
nox -l
Then you can use
nox -s <your_nox_session>
to run the session of your choice.
Here are some examples:
If you want to run just the tests:
nox -s tests
for running only the linters,
nox -s lint
nox offers a variety of configuration options, you can learn more about it from their
documentation.
Note that nox works particularly well with pyenv, discussed later in this file, which makes it easy to install the multiple python versions used in testing.
Multi-python version testing with pyenv#
Sometimes, before opening a pull-request that will trigger the .github/workflow/ci.yml continuous
integration workflow, you may want to test your changes over all the supported python versions locally.
Handling multiple installed python versions on the same machine can be challenging and confusing. pyenv is a great tool that really comes to the rescue. Note that pyenv just handles python versions — virtual environments have to be handled separately, using pyenv-virtualenv!
This tool doesn’t come with the package dependencies and has to be installed separately. Installation instructions are system specific but the package readme is very details, see here.
Follow carefully the instructions to configure pyenv after installation.
Once you have the package installed and configured, you can install multiple python version through it. First get a list of the available versions with the command,
pyenv install -l
Install the python version you need. For this example, let’s assume we want python 3.10.11 and python 3.11.8,
pyenv install 3.10.11
pyenv install 3.11.8
You can check which python version is currently set as default, by typing,
pyenv which python
And you can list all available versions of python with,
pyenv versions
If you want to run nox on multiple python versions, all you need to do is:
Set your desired versions as
global.pyenv global 3.11.8 3.10.11
This will make both version available, and the default python will be set to the first one listed (
3.11.8in this case).Run nox specifying the python version as an option.
nox -p 3.10
Note that noxfile.py lists the available option as keyword arguments in a session specific manner.
As mentioned earlier, if you have multiple python version installed, we recommend you manage your virtual environments through pyenv using the pyenv-virtualenv extension.
This tool works with most of the environment managers including (venv and conda).
Creating an environment with it is as simple as calling,
pyenv virtualenv my-python my-enviroment
Here, my-python is the python version, chosen from those listed by pyenv versions, and my-environment is your new environment name.
If my-python has conda installed, it will create a conda environment, if not, it will use venv.
You can list the virtual environment only with,
pyenv virtualenvs
And you can uninstall an environment with,
pyenv uninstall my-environment
Adding tests#
New tests can be added in any of the existing tests/test_*.py scripts. Tests
should be functions, contained within classes. The class contains a bunch of
related tests (e.g., metamers, metrics), and each test should ideally be a unit
test, only testing one thing. The classes should be named TestSomething, while
test functions should be named test_something in snakecase.
If you’re adding a substantial bunch of tests that are separate from the
existing ones, you can create a new test script. Its name must begin with
test_, it must have an py extension, and it must be contained within the
tests directory. Assuming you do that, our github actions will automatically
find it and add it to the tests-to-run.
Note that we also require that tests raise no warnings (see PR #335). This allows to stay on top of deprecation warnings from our dependencies. There are several ways to avoid warnings in tests, in order from most-to-least preferred:
Write tests such that they avoid warnings. For example, all synthesis methods call
validate_model, which will raise a warning if the model is in “training mode” (e.g.,model.trainingexists and is True). The default behavior oftorch.nn.Moduleobjects is to be in training mode after initialization. Thus, we callmodel.evalbefore passing a model to a synthesis method in a test.Selectively ignore the warning on a given test using
@pytest.mark.filterwarnings. See our tests for an example or the pytest documentation for an explanation.Configure pytest to ignore the warning for all tests by updating
filterwarningsinpyproject.toml(they must come after"error"). These should only include warnings that are temporary, such as deprecation warnings that we raise or warnings that have been fixed upstream but not released yet.
Testing notebooks#
We use jupyter
execute
to test our notebooks and make sure everything runs. You can run it locally to
try and debug some errors (though errors that result from environment issues
obviously will be harder to figure out locally); jupyter execute is part of
the standard jupyter install as long as you have nbclient>=0.5.5.
Long-running synthesis and tutorial notebooks#
Occasionally, we want to include one or more synthesis calls within a tutorial
notebook that take a long time to run (for example, because we’re reproducing a
result from the literature). In order to avoid having the documentation build
take a long time, we instead write a regression test (in
tests/test_uploaded_files.py), which runs the synthesis, saves the output, and
compares it against a cached version stored in our OSF project. See
tests/test_uploaded_files.py to see how these tests look. Some important notes:
The new tests should be added to the
TestTutorialNotebooksclass intest_uploaded_files.py.The synthesize call should be shown in the notebook, in a code block (unlike a
code-cell,code-blockare not run). Thiscode-blockshould be preceded by a markdown comment giving the class and name of the corresponding test. with a name that corresponds to the name of the test. So, if our test was calledtest_berardino_onoffand found within theTestDemoEigendistortionnotebook, the corresponding code block should look like:<!-- TestDemoEigendistortion.test_berardino_onoff --> ```{code-block} python eigendist_f.synthesize(k=3, method="power", max_iter=2000) ```This block will be checked whether it is part of the corresponding test (literally, with
in).If a variable has a different name in the block and in the test, the preceding comment should include square brackets containing a comma-separated list of the replacements:
<!-- TestDemoEigendistortion.test_berardino_onoff[eigendist_f:eig] -->.If the test has lines we want to ignore (because they’re test-specific), they should contain
lint_ignoresomewhere on the line, not in a comment (e.g.,this_variable_lint_ignore = 100but notthis_variable = 100 # lint_ignore).
src/plenoptic/data/_fetch.pyneeds the hash and the URL slug of each new file, so make sure to update them. The hash can be computed by callingopenssl sha256 path/to/fileon the command line.
We have a linter that checks the conditions above.
pytest does not run the tests found under TestTutorialNotebooks by default,
since they take a long time. In order to run them, you must explicitly set the
environment variable RUN_REGRESSION_SYNTH=1 when calling pytest.
Exact reproducibility#
Exact reproducibility with pytorch is hard. See issue #368 for some details, but the tl;dr is: you should not expect to get the same outputs (or even, within floating point precision) when running synthesis for long enough (seems to be > 1000 iterations) on devices with different CUDA versions and driver versions. Small differences in the output of e.g., torch.einsum / torch.matmul will lead to small differences in the gradient, which will accumulate and eventually lead to fairly different optimization outputs.
To deal with this, your regression tests should save their output into the uploaded_files folder after synthesis (and before checking). The contents of that folder will be made available as Jenkins artifacts, which you can then download after the test. This will allow you to download the output of any failing test, manually verify the results look good, and then upload them to the OSF to test against.
To deal with this, all tests for reproducibility should use dtype torch.float64 (rather than the default torch.float32). Additionally, these regression tests should save their output into the uploaded_files folder after synthesis (and before checking). The contents of that folder will be made available as Jenkins artifacts, which you can then download after the test. This will allow you to download the output of any failing test, manually verify the results look good, and then upload them to the OSF to test against.
Most of these tests are only run when explicitly enabled by setting the environmental variable RUN_REGRESSION_SYNTH=1, which we only do on the GPU, since they take a while to run.
Test parameterizations and fixtures#
Parametrize#
If you have many variants on a test you wish to run, you should probably make
use of pytests’ parametrize mark. There are many examples throughout our
existing tests (and see official pytest
docs), but the basic idea
is that you write a function that takes an argument and then use the
@pytest.mark.parametrize decorator to show pytest how to iterate over the
arguments. For example, instead of writing:
def test_basic_1():
assert int('3') == 3
def test_basic_2():
assert int('5') == 5
You could write:
@pytest.mark.parametrize('a', [3, 5])
def test_basic(a):
if a == '3':
test_val = 3
elif a == '5':
test_val = 5
assert int(a) == test_val
This starts to become very helpful when you have multiple arguments you wish to iterate over in this manner.
Fixtures#
If you are using an object that gets used in multiple tests (such as an image or
model), you should make use of fixtures to avoid having to load or initialize
the object multiple times. Look at conftest.py to see those fixtures available
for all tests, or you can write your own (though pay attention to the
scope).
For example, conftest.py contains several images that you can use for your
tests, such as basic_stim, curie_img, or color_img. To use them, simply
add them as arguments to your function:
def test_img(curie_img):
img = po.load_images('data/curie.pgm')
assert torch.allclose(img, curie_img)
WARNING: If you’re using fixtures, make sure you don’t modify them in your test (or you reset them to their original state at the end of the test). The fixture is a single object that will get reused across tests, so modifying it will lead to unexpected behaviors in other tests depending on which tests were run and their execution order.
Combining the two#
You can combine fixtures and parameterization, which is helpful for when you
want to test multiple models with a synthesis method, for example. This is
slightly more complicated and relies on pytest’s indirect
parametrization
(and requires pytest>=5.1.2 to work properly). For example, conftest.py has
a fixture, model , which accepts a string and returns an instantiated model on
the right device. Use it like so:
@pytest.mark.parametrize('model', ['SPyr', 'LNL'], indirect=True)
def test_synth(curie_img, model):
met = po.Metamer(curie_img, model)
met.synthesize()
This model will be run twice, once with the steerable pyramid model and once
with the Linear-Nonlinear model. See the get_model function in conftest.py
for the available strings. Note that unlike in the simple
parametrize example, we add the indirect=True argument here.
If we did not include that argument, model would just be the strings 'SPyr'
and 'LNL'!
Documentation#
Adding documentation#
The amount and form of documentation that need to be added alongside a change depends on the size of the submitted change. For a significant change (a new model or synthesis method), please include a new tutorial notebook that walks through how to use them. For enhancements of existing methods, you can probably just modify the existing tutorials and add documentation. If unsure, ask!
Documentation in plenoptic is built using Sphinx on some of Flatiron’s Jenkins
runners and hosted on GitHub pages. If that means nothing to you, don’t worry!
All of our documentation is written as markdown files, with the extension md. We use the myst parser, along with myst-nb. Both process markdown files, but myst-nb allows us to write text-based notebooks, with python code that gets executed when the documentation is built.
The text-based notebooks are tutorials and show how to use the various functions and classes contained in the package. If you add or change a substantial amount of code, please add a tutorial showing how to use it.
In all markdown files, you should try to use sphinx’s cross-reference syntax to refer to code objects in API documentation whenever one is mentioned. For example, you should refer to the Metamer class as
{class}`~plenoptic.Metamer`
You should similarly refer to code objects in other packages (e.g., pytorch and matplotlib), though the syntax is different. See myst-parser docs for more details and the existing documentation for more examples. As part of the pull request review process, we run linters that will check for missing cross-references. The only objects that can be referred to simply as monospace font are function arguments and generic attributes / method (e.g., saying that plenoptic models must have a forward method). The linter will ignore all monospace font that have the word “argument” or “keyword” after them (e.g., “the scales keyword” or “the scales argument”) or an html comment containing “skip-lint”, for example:
the `scales` <!-- skip-lint --> method
(html comments are not rendered in sphinx).
The regular markdown files contain everything else, especially discussions about why you should use some code in the package and the theory behind it, and should all be located in one of the subfolders within the docs/ directory. Decide which subfolder to place it in (ask for your help if you’re unsure) and add it to that subfolder’s index.md by adding the name of the file (without extension) to the toctree block.
In order for table of contents to look good, your md file must be well structured. All markdown files (text-based notebooks and regular) must have a single H1 header (you can have as many sub-headers as you’d like).
You should build the docs yourself to ensure it looks correct before pushing.
Images and plots#
You can include images in md files in the documentation as well. Simply
place them in the docs/images/ folder and use the figure role, e.g.,:
:::{figure} images/path_to_my_image.svg
:figwidth: 100%
:alt: Alt-text describing my image.
Caption describing my image.
:::
To refer to it directly, you may want to use the numref role (which has been enabled for our documentation).
If you have plots or other images generated by code that you wish to include,
you can include them in the file directly without either saving the output in
docs/_static/images/ or turning the page into a notebook. This is useful if you want
to show something generated by code but the code itself isn’t the point. We
handle this with matplotlib’s plot directive
(which has already been enabled). Add a python script to docs/scripts/ and
write a function that creates the matplotlib figure you wish to display. Then,
in your documentation, add:
```{eval-rst}
.. plot:: scripts/path_to_my_script.py my_useful_plotting_function
Caption describing what is in my plot.
```
Similar to figures, you can use numref to refer to plots as well.
API Documentation#
All public functions and classes must be included on the API documentation page.
Therefore, if you add a new public function or class, make sure to add it to one
of the rst files in docs/api/ in an appropriate location. If this is not
done, linting/check_apidocs.py will fail (this check is included in our
pre-commit config and thus is required to pass for a PR to merge).
If the new function or class does not belong in one of the existing rst files
found in docs/api/, you can create a new one, referring to the existing files
as templates. You must then add it to the api_order list in docs/conf.py.
This list determines the order in which these documents are displayed in the
index page of the API documentation, roughly from most to least important. Ask
for help if you’re not sure where to put it.
Docstrings#
All public-facing functions and classes should have complete docstrings, which
start with a one-line short summary of the function, a medium-length description
of the function / class and what it does, and a complete description of all
arguments and return values. Math should be included in a Notes section when
necessary to explain what the function is doing, and references to primary
literature should be included in a References section when appropriate.
Docstrings should be relatively short, providing the information necessary for a
user to use the code. Longer discussions of why you would use one method over
another, an explanation of the theory behind a method, and extended examples
should instead be part of the tutorials or documentation.
Private functions and classes should have sufficient explanation that other developers know what the function / class does and how to use it, but do not need to be as extensive.
We follow the numpydoc conventions for docstring structure.
Doctests#
All public-facing functions and classes should include doctest, which are the standard python way of showing short code examples in docstrings. These should be included in their own Examples section of the docstring. Every docstring should include at least one example, which shows the most common way of interacting with the function / class. Additional examples should be included where helpful, to show other common ways of interacting with the object (e.g., setting optional arguments), with brief descriptions describing what each example is doing.
A function’s Examples section must run independently from those of other functions (i.e., it can’t reuse an object defined in a different function), but different blocks within the section can depend on each other (so that e.g., you don’t have to reimport plenoptic in each block).
Our doctests are built using Sphinx as part of the documentation (which is part of the Jenkins CI step) and tested using pytest in our CI (the doctests action). Both are required to pass before any PR will be merged. Some notes about this:
If you would like to include a figure, use matplotlib’s plot directive. That means, your example should be structured like:
.. plot:: :context: close-figs >>> import plenoptic as po >>> # more example code here...
:context: close-figsis important to make sure that the figures are independent across examples (this should probably be:context: resetfor the first plot directive in a given docstring,close-figsthereafter). However, unfortunately, only sphinx knows how to interpret this directive; pytest ignores it. That means the doctests must be written in such a way that they will not fail if they are run with open figures lying around. One could easily start their doctests by closing any open figures, but this generally goes against the principle of making these examples as compact and useful as possible. Unfortunately, I have not found a good general solution here.sphinx only runs the blocks contained within the plot directive. That means, if, for example, you have two blocks in the doctest, only the second of which creates a plot, both of them need to be contained within a plot directive so that the second block can reuse objects (including the imported plenoptic library!) from the first. That is, the following will fail:
Examples -------- >>> import plenoptic as po >>> einstein = po.data.einstein() And now we display the image: .. plot:: :context: reset >>> po.plot.imshow(einstein)
whereas the following will succeed:
Examples -------- .. plot:: :reset: reset >>> import plenoptic as po >>> einstein = po.data.einstein() And now we display the image: .. plot:: :context: close-figs >>> po.plot.imshow(einstein)
Note that the second block cannot have
:context: resetor you will be unable to use the objects from the first block!This only affects sphinx. pytest ignores the plot directive and so either structure will pass.
pytest only checks whether the actual execution outputs of Python code match the expected results written in your documentation. Therefore, while the following will succeed in sphinx,
Examples -------- .. plot:: :reset: reset >>> import plenoptic as po >>> einstein = po.data.einstein() >>> po.plot.imshow(img)
you must add the expected output in order for pytest to succeed:
Examples -------- .. plot:: :reset: reset >>> import plenoptic as po >>> einstein = po.data.einstein() >>> po.plot.imshow(img) <PyrFigure size ... with 1 Axes>
pytest allows for the use of fixtures in doctests. In order to be used, they must be included in the
src/plenoptic/conftest.pyfile (not thetests/conftest.pyfile). Among other things, any downloads fromtorchvision/torch.hubshould happen in those fixtures (since they are difficult to properly write doctests for, see note there for details). This does mean that the first time you run doctests on your local machine, it will take a while to download those files; they are cached and won’t be downloaded again, so this is only a one-time cost.
Testing doctests with Pytest#
You should also run pytest locally before submitting a pull request: pytest -n 0 --doctest-continue-on-failure --doctest-modules src/ -W "ignore". You can also replace src/ with the path to a subdirectory or specific file to avoid running tests on everything. This will check whether the outputs of any code match what you have documented it should be. It can be very particular about format and syntax - see pytest docs for details.
Build the documentation#
Important
If you just want to read the documentation, you do not need to do this; documentation is built automatically, pushed to the plenoptic-documentation github repo and published at http://docs.plenoptic.org/.
Sphinx#
In addition to being viewed on the web, plenoptic’s documentation can be built locally. You should do this if you’ve made changes locally to the documentation (or the docstrings) that you would like to examine before pushing. All necessary requirements are included in the [docs] optional dependency bundle, which you can install with pip install plenoptic[docs].
Then, to build the documentation, run: make html from within the docs/ directory. The outputs will be put into /docs/_build/html/ and you can then open built pages in your browser. For example, you can open the landing page (docs/_build/html/index.html) by opening it in your files or through the command line using your choice of browser (e.g., firefox docs/_build/html/index.html). It will typically not display figures if the formatting is incorrect. However, sphinx does not check whether the code output is accurate, this is done with pytest.
By default, the text-based notebooks (see earlier) are not run because they take a longish time to do so, especially if you do not have a GPU. In order to run all of them, prepend RUN_NB=1 to the make command above. In order to run specific notebooks, set RUN_NB to a globbable comma-separated string in the above, e.g., RUN_NB=Metamer,MAD to run docs/user_guide/synthesis/Metamer, docs/user_guide/synthesis/MAD_Competition_1, and docs/user_guide/synthesis/MAD_Competition_2.
Additionally, our docstrings have a variety of Examples blocks that are run by sphinx in order to render plots in the documentation, which can take a while to run. You can temporarily disable these by prepending SKIP_MPL=1 to the make command. This will cause docutils to raise an error every time it encounters a plot directive, and so the build will appear to fail. However, the output will be rendered correctly (with the exception of those plot blocks), and so this can be useful for rapidly viewing changes locally when editing other parts of the documentation. In order to ensure the documentation build completes without any warnings or errors, you should build the docs normally before pushing.