plenoptic.metric.ms_ssim#
- plenoptic.metric.ms_ssim(img1, img2, power_factors=None)[source]#
Multiscale structural similarity index (MS-SSIM).
As described in Wang et al., 2003 [9], multiscale structural similarity index (MS-SSIM) is an improvement upon structural similarity index (SSIM) that takes into account the perceptual distance between two images on different scales.
SSIM is based on three comparison measurements between the two images: luminance, contrast, and structure. All of these are computed convolutionally across the images, producing three maps instead of scalars. The SSIM map is the elementwise product of these three maps. See
ssimandssim_mapfor a full description of SSIM.To get images of different scales, average pooling operations with kernel size 2 are performed recursively on the input images. The product of contrast map and structure map (the “contrast-structure map”) is computed for all but the coarsest scales, and the overall SSIM map is only computed for the coarsest scale. Their mean values are raised to exponents and multiplied to produce MS-SSIM:
\[MSSSIM = {SSIM}_M^{a_M} \prod_{i=1}^{M-1} ({CS}_i)^{a_i}\]Here \(M\) is the number of scales, \({CS}_i\) is the mean value of the contrast-structure map for the i’th finest scale, and \({SSIM}_M\) is the mean value of the SSIM map for the coarsest scale. If at least one of these terms are negative, the value of MS-SSIM is zero. The values of \(a_i, i=1,...,M\) are taken from the argument
power_factors.- Parameters:
img1 (
Tensor) – The first image or batch of images, of shape (batch, channel, height, width).img2 (
Tensor) – The second image or batch of images, of shape (batch, channel, height, width). The heights and widths ofimg1andimg2must be the same. The numbers of batches and channels ofimg1andimg2need to be broadcastable: either they are the same or one of them is 1. The output will be computed separately for each channel (so channels are treated in the same way as batches). Both images should have values between 0 and 1. Otherwise, the result may be inaccurate, and we will raise a warning (but will still compute it).power_factors (
Tensor|None(default:None)) – Power exponents for the mean values of maps, for different scales (from fine to coarse). The length of this array determines the number of scales. IfNone, set to[0.0448, 0.2856, 0.3001, 0.2363, 0.1333], which is what psychophysical experiments in Wang et al., 2003 [9] found.
- Return type:
- Returns:
msssim – 2d tensor of shape (batch, channel) containing the MS-SSIM for each image.
- Raises:
ValueError – If either
img1orimg2is not 4d.ValueError – If
img1andimg2have different height or width.ValueError – If
img1andimg2have different batch or channel, unless one of them has a 1 there, so they can be broadcast.ValueError – If
img1andimg2have different dtypes.
- Warns:
UserWarning – If either
img1orimg2has multiple channels, as MS-SSIM was designed for grayscale images.UserWarning – If at least one scale from either
img1orimg2has height or width of less than 11, since SSIM uses an 11x11 convolutional kernel.
References
Examples
>>> import plenoptic as po >>> import torch >>> po.set_seed(0) >>> img = po.data.einstein() >>> po.metric.ms_ssim(img, img + torch.rand_like(img)) tensor([[0.4684]])