System MTF from Bayer Sensors

In this and the previous article I discuss how Modulation Transfer Functions (MTF) obtained from every raw color plane of a Bayer CFA in isolation can be combined to provide an objective and meaningful composite MTF curve for the imaging system as a whole. There are two main ways to accomplish this goal:

an input-referred linear Hardware System MTF ( $MTF_L$ ) that reflects the mix of spectral information captured in the raw data, divorced from downstream color science; and
an output-referred linear Luminance System MTF ( $MTF_Y$ ) that reflects the luminance channel of the image as neutrally displayed.

Both are valid on their own, though the weights of the former are fixed for any Bayer sensor while the latter are scene, camera/lens and illuminant dependent. For this reason I usually prefer input-referred weights as a first pass when comparing cameras and lens hardware in similar conditions.

3 Color Planes Yield 3 MTF curves

Each raw $r,g,b$ color plane of a Bayer sensor will stand on its own and produce ‘grayscale’ images with linear spatial characteristics that depend on the spectral properties of light from the scene after it has gone through the lens and the filter it is sitting under, as earlier discussed.

Even in the center of well corrected lenses there are usually substantial differences in the spatial frequency response of the three planes in part because of the different wavelengths that each lets through but also because of axial color and spherical aberrations, which often result in some color planes being better focused than others. The $r$ color plane in consumer prime lenses around their sharpest working apertures is usually quite defocused compared to $g$ and $b$ .

The three color planes are effectively normalized to the same neutral intensity by white balance without affecting the relative MTF curves, which are themselves equivalently normalized to one at the origin.

Figure 1. As far as the slanted edge method is concerned the three color planes might as well have been produced by full resolution sensors sitting behind full sheet color filters

So once we have obtained the MTFs off the three color planes making up the raw image by the slanted edge or other method, how can these be combined to estimate the performance of the system as a whole?

Hardware System MTF: L

We know from Dubois’ paper discussed in the preceding post^[3] that when a monochromatic raw capture of a neutral subject is white balanced the subsampled chromatic components should ideally disappear leaving us with the full resolution baseband image ( $\hat L$ ) in the frequency domain. In that case $\hat L$ is related to the 2D Fourier transforms of the individual, white-balanced, raw color plane images ( $\hat r,\hat g,\hat b$ ) by the following relationship

(1) $\begin{equation*} \hat L = 0.25\hat r + 0.5\hat g + 0.25\hat b \end{equation*}$

This is just a representation of the full resolution, neutral, white-balanced raw CFA image ( $L$ ) in the frequency domain. Since we are assuming a linear system, this relationship of course also applies to the relative MTF curves in Figure 1, simply add them up element-by-element after multiplying them by the shown weights.

The following plot drives the point home. It shows system MTF obtained three ways from a white-balanced Bayer CFA raw image of a neutral slanted edge:

the blue curve is the MTF measured from the fully populated raw Bayer CFA image as in Figure 1, after white-balancing;
the open circles are the MTF curve resulting from white-balanced, perfectly demosaiced image $L$ , with every pixel equal to the weighted sum of the intensity of each color channel in the proportions of Equation (1);
the x’s represent the weighted sum of the MTF curves of each subsampled raw color plane individually, in the same proportions as in Equation (1).

Figure 2. Virtually identical system MTF curves produced three different ways from the same raw Bayer CFA image.

They are practically the same (other than for some noise at higher frequencies introduced by the highly defocused red channel MTF messing up the other two).

So MTF curves obtained from the three separate raw color planes in isolation can be added in the following proportions to produce an accurate System MTF for the hardware as a whole:

(2) $\begin{equation*} MTF_{Sys}_L = 0.25 MTF_r + 0.5 MTF_g + 0.25 MTF_b \end{equation*}$

where $L$ stands for linear luma, a hardware proxy for luminance before involving color science. This System MTF curve represents the input-referred spatial frequency capabilities of the hardware. We can use this aggregate System MTF to derive sharpness metrics for it like MTF50, SQF or Acutance. It can be legitimately used to compare the performance of different cameras and lenses in similar conditions.

This result is not surprising if looked at from an Information Science perspective: it indeed represents a correct proportion of the spatial information collected in the raw data of a Bayer CFA sensor. Superposition in the spatial domain would also suggest this result.

Luminance System MTF: Y

The Human Visual System , however, does not perceive the intensity of colors in the three color planes equally. In addition we know from the psychologists who study color science that humans are at least four times more sensitive to luminance than to chrominance changes in an image. The previous article and Hunt’s model provides some intuition for why that is. So a more perceptual System MTF could be derived, based on the luminance of the color image as perceived by a typical observer.

Luminance is simply input spectral radiance times the Luminosity Function, which not coincidentally happens to be the same as the curve of the 2 degree CIE Standard Observer Color Matching Function responsible for producing the $Y$ channel in the standard $XYZ$ connection color space. In other words $Y$ represents absolute Luminance in cd/m^2 if all the units are carried through. Otherwise it is relative Luminance, which is equally as useful in imaging.

When an image is captured in the raw data with a Bayer sensor, every converter has to first project it to a connection colorimetric color space where it is ideally coincident with the coordinates perceived by a typical viewer at the scene viewing the scene directly.

The projection of raw data to $XYZ$ occurs by applying a Compromise Color Matrix, specifically derived for the given camera/lens, scene and illuminant. For DPReview’s Nikon D610 camera with 85mm f/1.7G lens capturing a ColorChecker 24 target under D65 lighting such a forward matrix looks like this:

refer to the linked article for how to obtain it and its meaning. Its middle row produces the $Y$ channel when multiplied by white balanced raw data $rgb$ , an estimate of Luminance from the scene as captured by a Bayer sensor, without the additional confounding processing that is part of rendering an RGB image. It is the most faithful representation of the grayscale image that a neutral raw converter would and should produce. Therefore for this setup

(3) $\begin{equation*} Y_{D610{wbraw}->XYZ_{D65}} &= 0.250r + 1.057g - 0.307b \end{equation*}$

It is this scene, illuminant and camera/lens dependent weighted mix of spatial frequencies that comes closest to the Luminance that was present at the scene and thus to what would have been perceived by a typical observer there.

Therefore to obtain a Luminance System MTF representative of the image as displayed by a truly neutral raw converter MTF curves from the individual raw planes should be mixed with those weights for this kit as setup:

(4) $\begin{equation*} MTF_{Sys}_Y = 0.25 MTF_r + 1.06 MTF_g - 0.31 MTF_b \end{equation*}$

Contrary to input-referred Hardware System MTF, this set of weights is camera, scene and illuminant dependent. Below you can see that MTF curves from the individual color planes added accordingly (x’s below) are equivalent to measuring MTF directly off the $Y$ channel of a neutrally rendered image (o’s).

Figure 3. MTF measured on the Y channel off a slanted edge image captured in the raw data versus Luminance System MTF calculated as the weighted sum of the MTF curves measured off the individual raw color planes. They are virtually identical as explained in the text.

Note the negative coefficient in the blue channel, meaning that about 30% of the MTF curve obtained in the blue plane is subtracted element-by-element from the green one plus 1/4 of the red MTF curve. This is correct, as color science presumes a linear system. What does this tell us about the desirability of a very ‘sharp’ blue channel?

On the other hand, if the blue plane had a response similar to the green plane’s, as is sometimes the case with well corrected prime lenses, the two System MTFs from Equations (2) and (4) become effectively the same with this setup.

In case you are wondering, white balance multipliers do not enter the equation because MTF is normalized to one at the origin by definition so, ignoring noise, the MTF curves of individual color planes are independent of the absolute intensity of the target. In other words, the spectral energy is mixed presuming the color planes are intensity equalized, which is the job of white balance. Forward Matrices presume the same so their coefficients are relevant as-is.

Conclusion: 3 Color Planes, 2 System MTFs

Therefore with a few simplifying assumptions we come up with the answer to the question posed a couple of posts ago: as far as the measurement of linear spatial resolution off a white balanced raw capture by a Bayer CFA sensor of a neutral slanted edge under a uniform illuminant is concerned, MTF curves obtained from the three separate raw color planes in isolation can be added in the above proportions to produce an objective and meaningful MTF for the imaging system as a whole, taking into consideration either the hardware only (Hardware System MTF) or also the sensitivity of a typical observer (Luminance System MTF).

The former is valid for all Bayer sensors independently of the scene, equipment and illuminant, so it should be compared under similar conditions; the latter is not, so coefficients need to be derived for every specific setup. For lens sharpness comparisons I prefer to just use the green channel, which dominates and is typically the one ‘in-focus’. In the next article I discuss how to use the other two to evaluate axial aberrations.

Appendix I: Wrong Luminance

In a fully linear imaging system Luminance^[5] from the scene in cd/m $^2$ is supposed to be proportional to the Luminance in cd/m $^2$ striking the eyes of the viewer of the photograph. In the days of black and white photography and TV that was self-evident. When color came of age and captured images started to be produced in standard (colorimetric) output RGB color spaces, different formulas were developed to attempt to recover the original luminance channel $Y$ when unavailable:

$\begin{align*} Y'_{601} &= 0.2990R + 0.5870G + 0.1140B \\ \\ Y_{709} &= 0.2126R + 0.7152G + 0.0722B \\ Y_{AdobeRGB} &= 0.2973 R+ 0.6274G + 0.0753B \\ Y_{2020/2100} &= 0.2627R + 0.6780G + 0.0593B \\ \end{align*}$

Not surprisingly the formulas above are the middle row of the relative $RGB$ -> $XYZ_{D65}}$ linear matrix that converts $RGB$ values from the given output colorimetric color space to $XYZ$ , where Y is supposed to be a proxy for Luminance. Depending on the application and the relative color space, variations on $Y$ take names like luminance, luminosity, luma, lightness, brightness, value etc.

The various coefficients try to estimate from data in a standard output RGB color space the original full resolution grayscale intensity proportional to Luminance from the scene. They attempt to reflect the fact that we perceive red and green as substantially brighter than blue, that phosphors in analog TVs were close to certain primaries and the primaries of modern LCD/LED panels can be close to others, that the response of output devices were nonlinear and required gamma corrections (the first equation above, $Y'$ ) while today’s may not (the other three linear $Y$ ).

Their goal is to approximate a grayscale image similar to what a typical observer would perceive from Luminance from the scene or a calibrated black and white monitor – starting from data from a Jpeg, say. However, such data has usually been subjected to substantial subjective non-linear processing – including gamma, tone mapping, noise reduction and sharpening even with all converter sliders at zero – thus contaminating the spatial frequency information.

So whenever possible it is better to use objective luminance $Y$ obtained from the linear raw data, with the relative forward matrix, of which Equation (3) is an example.

Appendix II: What does L Look Like?

For hardware evaluation purposes¹ we don’t need to determine a setup-dependent Equation (3) because we do in fact have a signal proportional to the original full resolution luminance information: it is the white balanced, grayscale, baseband image $L$ in the otherwise pristinely linear raw data, proportional to scene Luminance.

Once the neutral edge under uniform illumination is captured and the relative raw data is white balanced, each of the four raw channels ( $r,g_1,g_2,b$ ) is effectively gray and can be a proxy for luminance: twice the luminance, twice the mean values; half the luminance, half the mean values; and so on all across the edge profile.

And because the system is supposed to be linear, so it should be all the way to the output color space. If a pixel – any pixel, independently of its color heritage – has an intensity value of 1/10th of full scale in the white balanced raw data, it should ideally have an intensity of 1/10th of full scale in sRGB before gamma is applied. Ideally zero maps to zero, 100% maps to 100% and all other neutral values in between fall into place linearly. If a hueless subject is white balanced in the raw data with r=g1=g2=b it should be neutral in sRGB with R=G=B, and vice versa. Recall that the linear colorimetric RGB color image and the raw CFA image are supposed to share the same luma channel ^[4].

Of course pixels sitting under the r, g or b Bayer CFA filters will have different characteristics because they are the result of physically different wavelengths and quantities of photons. These differences only mean different noise and spatial properties in the three planes. We do not care about different noise because within these limits the slanted edge method is relatively insensitive to it; and the different spatial properties are what we wish to measure.

WB Raw = A Grayscale Image

So the white-balanced, undemosaiced raw data mosaic represents an accurate grayscale image for our purposes¹ . It can be a proxy for $Y$ :

(5) $\begin{equation*} Y_{grayscale} = \text{white-balanced raw data intensity} \end{equation*}$

with the raw data under the four CFA color channels as laid out on the sensor. In this achromatic context we already have a version of $L$ in the raw data for free, just as Dubois and Alleysson explained^[3]^[4]. Here is one example of full resolution D610 CFA raw data displayed as-is, after white balancing and nothing else:

Figure 4. The full resolution raw data was unpacked by DCRAW -D -4, white balanced off the forehead and saved losslessly to 8 bit PNG with sRGB’s gamma (10MB download).

It’s part of the B&W back cover of a magazine, illuminated with a halogen light and white balanced off the forehead. There was also some light coming in from outdoors.

It was captured slightly out of focus on purpose because moirè off the fabric and the printing process was otherwise drawing lines all over the page (aliasing = sharp). The objective is to show the untouched grayscale channel in the raw data. There is some slight pixelation locally where the white balance begins to be less representative of neutral; for instance the magazine paper and the white paper it is resting on have different white points.

Of course the grayscale image breaks down in the upsidedown ColorChecker color patches, which are not neutral hence outside the perimeter of this article¹. To drive the point home here is another example of the grayscale image in the untouched raw data, this time a Sony a7II ISO200 capture of DPReview.com’s studio scene, just the CFA white balanced intensities as they were on the sensor:

Figure 5. The full resolution raw data was unpacked by DCRAW -D -4, white balanced on the third neutral ColorChecker square from the right and saved losslessly to 8 bit PNG with sRGB’s gamma (13MB download)

In the neutral areas of the scene the full resolution CFA raw data is representative of luminance as-is. In the color areas it is not, thus pixelated, but that does not concern us here. Pixelation of course disappears if the image is downsized (if you don’t click on the image above), producing an intensity equal to $L$ ..

A more in depth take on the physical meaning of input-referred grayscale image $L$ vs colorimetric $Y$ can be found in the article on the effect of a Bayer CFA on sharpness.

Appendix III: Slanted Edges Make Things Simple

The capture of a neutral slanted edge in the raw data with good technique can provide an Edge Spread Function (ESF), the differential of which is its Line Spread Function (LSF), the normalized modulus of the Fourier transform of which is a good approximation to the Modulation Transfer Function (MTF) of the imaging system.

The resulting MTF curve refers to the linear spatial frequency response of the imaging system near the center of the edge in the direction perpendicular to it. As a consequence of its slant, the edge is oversampled, avoiding aliasing effects and yielding a number of useful properties (see this dedicated article for a more in-depth description of the method).

For example, because of super-resolution pixel spacing is not critical in order to obtain a good edge outline, assuming edge length and angle are appropriately chosen. Within limits, an accurate edge spread function could just as easily be obtained if only every other raw pixel were present in the mosaic, as is the case of the separate color planes shown in Figure 2 below.

Nor is noise a major issue in typical testing conditions, because with tens or hundreds of pixels being projected every small interval onto the edge normal, the edge’s underlying intensity can be accurately estimated by regression, eliminating much of the noise in the ESF.

And recall that what matters is the relative intensity of the edge because MTF does not care about absolute intensity, since the curves are normalized to 1 at the origin. So white balancing the raw data does not affect the resulting MTFs.

Figure 6. Fully populated Bayer CFA on the left, Monochrome to the right. Image courtesy of Cburnett, under licence

Therefore, within limits, it makes no difference to the ESF whether it is the result of a fully populated monochrome sensor as the one shown above right, or a sparsely populated one such as those found in the single $r, g, b$ raw color planes of a Bayer CFA sensor below.

Figure 7. The sparsely populated r, g, b color planes under Bayer CFA filters. Image courtesy of Cburnett, under licence, replaced bottom text with ‘Color Planes’

As far as the method is concerned, they can be considered as full and separate images, one for each color plane. The resulting MTFs will be practically as good as those produced by a 3 way beam splitter projecting the scene onto three fully populated monochrome sensors behind sheet color filters of equivalent characteristics to the filters in the CFA. This avoids having to demosaic the raw data, which is useful when evaluating hardware because it eliminates an additional confounding variable.

Notes and References

_{1. In this article ‘the context’ or ‘the purpose’ will mean raw captures of neutral (hueless,achromatic) slanted edges under a uniform illuminant by Bayer CFA digital cameras for the purposes of measuring by the slanted edge method the linear spatial resolution (‘sharpness’) of photographic equipment.

2. Taking the Fourier Transform of the Line Spread Function is equivalent to applying the Fourier Slice Theorem to the Radon Transform of the system’s two dimensional Point Spread Function. It results in a radial slice of the two dimensional Fourier Transform of the two dimensional PSF in the direction of the edge normal.

3. Frequency-Domain Methods for Demosaicking of Bayer-Sampled Color Images. Eric Dubois. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (12), p. 847.

4. Frequency selection demosaicking: A review and a look ahead. D. Alleysson and B. Chaix de Lavarène, Proc. SPIE 6822, 68221M, 2008, Section 2, Spatio-Chromatic Model of CFA Image.

5. In this article capitalized Luminance refers to the photometric quantity in cd/m2, while relative luminance with a lowercase ‘l’ means the spatial map of linear image intensity Y in XYZ space normally expressed as a percentage of full scale – but also in cd/m2 when properly calibrated (one also finds Y’ in the literature, which however does not concern us here since it depends on gamma encoded values).

6. Matlab code that shows how the plots were produced.}

One thought on “System MTF from Bayer Sensors”

Jack says:

March 16, 2016 at 9:00 pm

Jim Kasson Wrote:
These are not different approximations. These are different because the color space primaries are different, and thus conversions to 1931 CIE XYZ yields different Y. If the standard observer is right, they are right. Of course, the standard observer does not take adaptation and spatial effects into account. That’s where the guesstimation occurs.

Makes sense, thanks Jim. I’ve corrected the text to reflect that.

Strolls with my Dog