plenoptic.metric.ssim#

plenoptic.metric.ssim(img1, img2, weighted=False, pad=False)[source]#

Compute the structural similarity index.

As described in Wang et al., 2004 [1], the structural similarity index (SSIM) is a perceptual distance metric, giving the distance between two images. SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images. See the references for more information.

This implementation follows the original implementation, as found online [2], as well as providing the option to use the weighted version used in Wang and Simoncelli, 2008 [4] (which was shown to consistently improve the image quality prediction on the LIVE database). More info can be found online [3].

Note that this is a similarity metric (not a distance), and so 1 means the two images are identical and 0 means they’re very different. When the two images are negatively correlated, SSIM can be negative. SSIM is bounded between -1 and 1.

This function returns the mean SSIM, a scalar-valued metric giving the average over the whole image. For the SSIM map (showing the computed value across the image), call ssim_map.

Parameters:

img1 (Tensor) – The first image or batch of images, of shape (batch, channel, height, width).
img2 (Tensor) – The second image or batch of images, of shape (batch, channel, height, width). The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).
weighted (bool (default: False)) – Whether to use the original, unweighted SSIM version (False) as used in [1] or the weighted version (True) as used in [4]. See Notes section for the weight.
pad (Literal[False, 'constant', 'reflect', 'replicate', 'circular'] (default: False)) – If not False, how to pad the image for the convolutions computing the local average of each image. See torch.nn.functional.pad for how these work.

Return type:

Tensor

Returns:

mssim – 2d tensor of shape (batch, channel) containing the mean SSIM for each image, averaged over the whole image.

Raises:

ValueError – If either img1 or img2 is not 4d.
ValueError – If img1 and img2 have different height or width.
ValueError – If img1 and img2 have different batch or channel, unless one of them has a 1 there, so they can be broadcast.
ValueError – If img1 and img2 have different dtypes.

Warns:

UserWarning – If either img1 or img2 has multiple channels, as SSIM was designed for grayscale images.
UserWarning – If at least one scale from either img1 or img2 has height or width of less than 11, since SSIM uses an 11x11 convolutional kernel.

Notes

The weight used when weighted=True is:

\[\log((1+\frac{\sigma_1^2}{C_2})(1+\frac{\sigma_2^2}{C_2}))\]

where \(\sigma_1^2\) and \(\sigma_2^2\) are the variances of img1 and img2, respectively, and \(C_2\) is a constant. See [4] for more details.

References

Examples

>>> import plenoptic as po
>>> import torch
>>> po.set_seed(0)
>>> img = po.data.einstein()
>>> po.metric.ssim(img, img + torch.rand_like(img))
tensor([[0.0519]])