plenoptic.metric package
Submodules
plenoptic.metric.classes module
- class plenoptic.metric.classes.NLP[source]
Bases:
Module
simple class for implementing normalized laplacian pyramid
This class just calls
plenoptic.metric.normalized_laplacian_pyramid
on the image and returns a 3d tensor with the flattened activations.NOTE: synthesis using this class will not be the exact same as synthesis using the
plenoptic.metric.nlpd
function (by default), because thenlpd
function uses the root-mean square of the L2 distance (i.e.,torch.sqrt(torch.mean(x-y)**2))
as the distance metric between representations.Methods
add_module
(name, module)Add a child module to the current module.
apply
(fn)Apply
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Return an iterator over module buffers.
children
()Return an iterator over immediate children modules.
compile
(*args, **kwargs)Compile this Module's forward using
torch.compile()
.cpu
()Move all model parameters and buffers to the CPU.
cuda
([device])Move all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.eval
()Set the module in evaluation mode.
extra_repr
()Set the extra representation of the module.
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(image)returns flattened NLP activations
get_buffer
(target)Return the buffer given by
target
if it exists, otherwise throw an error.get_extra_state
()Return any extra state to include in the module's state_dict.
get_parameter
(target)Return the parameter given by
target
if it exists, otherwise throw an error.get_submodule
(target)Return the submodule given by
target
if it exists, otherwise throw an error.half
()Casts all floating point parameters and buffers to
half
datatype.ipu
([device])Move all model parameters and buffers to the IPU.
load_state_dict
(state_dict[, strict, assign])Copy parameters and buffers from
state_dict
into this module and its descendants.modules
()Return an iterator over all modules in the network.
mtia
([device])Move all model parameters and buffers to the MTIA.
named_buffers
([prefix, recurse, ...])Return an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Return an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Return an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse, ...])Return an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Return an iterator over module parameters.
register_backward_hook
(hook)Register a backward hook on the module.
register_buffer
(name, tensor[, persistent])Add a buffer to the module.
register_forward_hook
(hook, *[, prepend, ...])Register a forward hook on the module.
register_forward_pre_hook
(hook, *[, ...])Register a forward pre-hook on the module.
register_full_backward_hook
(hook[, prepend])Register a backward hook on the module.
register_full_backward_pre_hook
(hook[, prepend])Register a backward pre-hook on the module.
register_load_state_dict_post_hook
(hook)Register a post-hook to be run after module's
load_state_dict()
is called.register_load_state_dict_pre_hook
(hook)Register a pre-hook to be run before module's
load_state_dict()
is called.register_module
(name, module)Alias for
add_module()
.register_parameter
(name, param)Add a parameter to the module.
register_state_dict_post_hook
(hook)Register a post-hook for the
state_dict()
method.register_state_dict_pre_hook
(hook)Register a pre-hook for the
state_dict()
method.requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)Set extra state contained in the loaded state_dict.
set_submodule
(target, module)Set the submodule given by
target
if it exists, otherwise throw an error.share_memory
()See
torch.Tensor.share_memory_()
.state_dict
(*args[, destination, prefix, ...])Return a dictionary containing references to the whole state of the module.
to
(*args, **kwargs)Move and/or cast the parameters and buffers.
to_empty
(*, device[, recurse])Move the parameters and buffers to the specified device without copying storage.
train
([mode])Set the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Move all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Reset gradients of all model parameters.
__call__
- forward(image)[source]
returns flattened NLP activations
WARNING: For now this only supports images with batch and channel size 1
- Parameters:
image (torch.Tensor) – image to pass to normalized_laplacian_pyramid
- Returns:
representation – 3d tensor with flattened NLP activations
- Return type:
torch.Tensor
plenoptic.metric.model_metric module
plenoptic.metric.naive module
- plenoptic.metric.naive.mse(img1, img2)[source]
return the MSE between img1 and img2
Our baseline metric to compare two images is often mean-squared error, MSE. This is not a good approximation of the human visual system, but is handy to compare against.
For two images, \(x\) and \(y\), with \(n\) pixels each:
\[MSE &= \frac{1}{n}\sum_i=1^n (x_i - y_i)^2\]The two images must have a float dtype
- Parameters:
img1 (torch.Tensor) – The first image to compare
img2 (torch.Tensor) – The second image to compare, must be same size as
img1
- Returns:
mse – the mean-squared error between
img1
andimg2
- Return type:
torch.float
plenoptic.metric.perceptual_distance module
- plenoptic.metric.perceptual_distance.ms_ssim(img1, img2, power_factors=None)[source]
Multiscale structural similarity index (MS-SSIM)
As described in [1], multiscale structural similarity index (MS-SSIM) is an improvement upon structural similarity index (SSIM) that takes into account the perceptual distance between two images on different scales.
SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images, producing three maps instead of scalars. The SSIM map is the elementwise product of these three maps. See metric.ssim and metric.ssim_map for a full description of SSIM.
To get images of different scales, average pooling operations with kernel size 2 are performed recursively on the input images. The product of contrast map and structure map (the “contrast-structure map”) is computed for all but the coarsest scales, and the overall SSIM map is only computed for the coarsest scale. Their mean values are raised to exponents and multiplied to produce MS-SSIM:
\[MSSSIM = {SSIM}_M^{a_M} \prod_{i=1}^{M-1} ({CS}_i)^{a_i}\]Here :math: M is the number of scales, :math: {CS}_i is the mean value of the contrast-structure map for the i’th finest scale, and :math: {SSIM}_M is the mean value of the SSIM map for the coarsest scale. If at least one of these terms are negative, the value of MS-SSIM is zero. The values of :math: a_i, i=1,…,M are taken from the argument power_factors.
- Parameters:
img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.
img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
power_factors (1D array, optional.) – power exponents for the mean values of maps, for different scales (from fine to coarse). The length of this array determines the number of scales. By default, this is set to [0.0448, 0.2856, 0.3001, 0.2363, 0.1333], which is what psychophysical experiments in [1] found.
- Returns:
msssim – 2d tensor of shape (batch, channel) containing the MS-SSIM for each image
- Return type:
torch.Tensor
References
- plenoptic.metric.perceptual_distance.nlpd(img1, img2)[source]
Normalized Laplacian Pyramid Distance
As described in [1], this is an image quality metric based on the transformations associated with the early visual system: local luminance subtraction and local contrast gain control.
A laplacian pyramid subtracts a local estimate of the mean luminance at six scales. Then a local gain control divides these centered coefficients by a weighted sum of absolute values in spatial neighborhood.
These weights parameters were optimized for redundancy reduction over an training database of (undistorted) natural images.
Note that we compute root mean squared error for each scale, and then average over these, effectively giving larger weight to the lower frequency coefficients (which are fewer in number, due to subsampling).
- Parameters:
img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.
img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
- Returns:
distance – The normalized Laplacian Pyramid distance.
- Return type:
torch.Tensor of shape (batch, channel)
References
[1]Laparra, V., Ballé, J., Berardino, A. and Simoncelli, E.P., 2016. Perceptual image quality assessment using a normalized Laplacian pyramid. Electronic Imaging, 2016(16), pp.1-6.
- plenoptic.metric.perceptual_distance.normalized_laplacian_pyramid(img)[source]
Compute the normalized Laplacian Pyramid using pre-optimized parameters
- Parameters:
img (torch.Tensor of shape (batch, channel, height, width)) – Image, or batch of images. This representation is designed for grayscale images and will be computed separately for each channel (so channels are treated in the same way as batches).
- Returns:
normalized_laplacian_activations – The normalized Laplacian Pyramid with six scales
- Return type:
list of torch.Tensor
- plenoptic.metric.perceptual_distance.ssim(img1, img2, weighted=False, pad=False)[source]
Structural similarity index
As described in [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.
This implementation follows the original implementation, as found at [2], as well as providing the option to use the weighted version used in [4] (which was shown to consistently improve the image quality prediction on the LIVE database).
Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.
This function returns the mean SSIM, a scalar-valued metric giving the average over the whole image. For the SSIM map (showing the computed value across the image), call ssim_map.
- Parameters:
img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.
img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
weighted (bool, optional) – whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight
pad ({False, 'constant', 'reflect', 'replicate', 'circular'}, optional) – If not False, how to pad the image for the convolutions computing the local average of each image. See torch.nn.functional.pad for how these work.
- Returns:
mssim – 2d tensor of shape (batch, channel) containing the mean SSIM for each image, averaged over the whole image
- Return type:
torch.Tensor
Notes
The weight used when weighted=True is:
\[\log((1+\frac{\sigma_1^2}{C_2})(1+\frac{\sigma_2^2}{C_2}))\]where \(sigma_1^2\) and \(sigma_2^2\) are the variances of img1 and img2, respectively, and \(C_2\) is a constant. See [4] for more details.
References
[1] (1,2)Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity” IEEE Transactions on Image Processing, vol. 13, no. 1, Jan. 2004.
[2][matlab code](https://www.cns.nyu.edu/~lcv/ssim/ssim_index.m)
[3][project page](https://www.cns.nyu.edu/~lcv/ssim/)
[4] (1,2,3)Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. https://dx.doi.org/10.1167/8.12.8
- plenoptic.metric.perceptual_distance.ssim_map(img1, img2)[source]
Structural similarity index map
As described in [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.
This implementation follows the original implementation, as found at [2], as well as providing the option to use the weighted version used in [4] (which was shown to consistently improve the image quality prediction on the LIVE database).
Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.
This function returns the SSIM map, showing the SSIM values across the image. For the mean SSIM (a single value metric), call ssim.
- Parameters:
img1 (torch.Tensor of shape (batch, channel, height, width)) – The first image or batch of images.
img2 (torch.Tensor of shape (batch, channel, height, width)) – The second image or batch of images. The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
weighted (bool, optional) – whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight
- Returns:
ssim_map – 4d tensor containing the map of SSIM values.
- Return type:
torch.Tensor
References
[1] (1,2)Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error measurement to structural similarity” IEEE Transactions on Image Processing, vol. 13, no. 1, Jan. 2004.
[2][matlab code](https://www.cns.nyu.edu/~lcv/ssim/ssim_index.m)
[3][project page](https://www.cns.nyu.edu/~lcv/ssim/)
[4] (1,2)Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual discriminability. Journal of Vision, 8(12), 1–13. https://dx.doi.org/10.1167/8.12.8