plenoptic.metric.ms_ssim#

plenoptic.metric.ms_ssim(img1, img2, power_factors=None)[source]#

Multiscale structural similarity index (MS-SSIM).

As described in Wang et al., 2003 [9], multiscale structural similarity index (MS-SSIM) is an improvement upon structural similarity index (SSIM) that takes into account the perceptual distance between two images on different scales.

SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images, producing three maps instead of scalars. The SSIM map is the elementwise product of these three maps. See ssim and ssim_map for a full description of SSIM.

To get images of different scales, average pooling operations with kernel size 2 are performed recursively on the input images. The product of contrast map and structure map (the “contrast-structure map”) is computed for all but the coarsest scales, and the overall SSIM map is only computed for the coarsest scale. Their mean values are raised to exponents and multiplied to produce MS-SSIM:

\[MSSSIM = {SSIM}_M^{a_M} \prod_{i=1}^{M-1} ({CS}_i)^{a_i}\]

Here \(M\) is the number of scales, \({CS}_i\) is the mean value of the contrast-structure map for the i’th finest scale, and \({SSIM}_M\) is the mean value of the SSIM map for the coarsest scale. If at least one of these terms are negative, the value of MS-SSIM is zero. The values of \(a_i, i=1,...,M\) are taken from the argument power_factors.

Parameters:
  • img1 (Tensor) – The first image or batch of images, of shape (batch, channel, height, width).

  • img2 (Tensor) – The second image or batch of images, of shape (batch, channel, height, width). The heights and widths of img1 and img2 must be the same. The numbers of batches and channels of img1 and img2 need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).

  • power_factors (Tensor | None (default: None)) – Power exponents for the mean values of maps, for different scales (from fine to coarse). The length of this array determines the number of scales. If None, set to [0.0448, 0.2856, 0.3001, 0.2363, 0.1333], which is what psychophysical experiments in Wang et al., 2003 [9] found.

Return type:

Tensor

Returns:

msssim – 2d tensor of shape (batch, channel) containing the MS-SSIM for each image.

Raises:
  • ValueError – If either img1 or img2 is not 4d.

  • ValueError – If img1 and img2 have different height or width.

  • ValueError – If img1 and img2 have different batch or channel, unless one of them has a 1 there, so they can be broadcast.

  • ValueError – If img1 and img2 have different dtypes.

Warns:
  • UserWarning – If either img1 or img2 has multiple channels, as MS-SSIM was designed for grayscale images.

  • UserWarning – If at least one scale from either img1 or img2 has height or width of less than 11, since SSIM uses an 11x11 convolutional kernel.

References

Examples

>>> import plenoptic as po
>>> import torch
>>> po.set_seed(0)
>>> img = po.data.einstein()
>>> po.metric.ms_ssim(img, img + torch.rand_like(img))
tensor([[0.4684]])