In this and the previous article I discuss how Modulation Transfer Functions (MTF) obtained from every color channel of a Bayer CFA raw capture in isolation can be combined to provide a meaningful composite MTF curve for the imaging system as a whole.
There are two ways that this can be accomplished: an input-referred approach () that reflects the performance of the hardware only; and an output-referred one () that also takes into consideration how the image will be displayed. Both are valid and differences are typically minor, though the weights of the latter are scene, camera/lens, illuminant dependent – while the former are not. Therefore my recommendation in this context is to stick with input-referred weights when comparing cameras and lenses.1
Slanted Edges Make Things Simple
The capture of a neutral slanted edge in the raw data with good technique can provide an Edge Spread Function (ESF), the differential of which is its Line Spread Function (LSF), the normalized modulus of the Fourier transform of which is a good approximation to the Modulation Transfer Function (MTF) of the imaging system.
The resulting MTF curve refers to the linear spatial frequency response of the imaging system near the center of the edge in the direction perpendicular to it. As a consequence of its slant, the edge is oversampled, avoiding aliasing effects and yielding a number of useful properties (see this dedicated article for a more in-depth description of the method).
For example, because of super-resolution pixel spacing is not critical in order to obtain a good edge outline, assuming edge length and angle are appropriately chosen. Within limits, an accurate edge spread function could just as easily be obtained if only every other raw pixel were present in the mosaic, as is the case of the separate color planes shown in Figure 2 below.
Nor is noise a major issue in typical testing conditions, because with tens or hundreds of pixels being projected every small interval onto the edge normal, the edge’s underlying intensity can be accurately estimated by regression, eliminating much of the noise in the ESF.
And recall that what matters is the relative intensity of the edge because MTF does not care about absolute intensity, since the curves are normalized to 1 at the origin. So white balancing the raw data does not affect the resulting MTFs.
Therefore, within limits, it makes no difference to the ESF whether it is the result of a fully populated monochrome sensor as the one shown above right, or a sparsely populated one such as those found in the single raw color planes of a Bayer CFA sensor below.
As far as the method is concerned, they can be considered as full and separate images, one for each color plane. The resulting MTFs will be practically as good as those produced by a 3 way beam splitter projecting the scene onto three fully populated monochrome sensors behind sheet color filters of equivalent characteristics to the filters in the CFA. This avoids having to demosaic the raw data, which is useful when evaluating hardware because it eliminates an additional confounding variable.
3 Continuous Grayscale ESFs Yield 3 MTFs
Each raw color plane will stand on its own and produce ‘grayscale’ images with linear spatial characteristics that depend on the spectral properties of light from the scene and the filter it is sitting under, as discussed earlier. The resulting ESFs can be normalized to the same mean intensity by white balance without affecting the relative MTF curve.
The Fourier transform of the individual LSFs then produces three independent curves, , one each for the respective color plane , valid around the location of the edge in the direction perpendicular to it[2]. So once we have obtained by the slanted edge method the MTFs of the three subsampled color channels making up the raw image, how are these related to the performance of the system as a whole?
Combined System MTF: Hardware = Raw L
We know from Dubois’ paper[3] discussed in the preceding post that when a monochromatic raw capture of a neutral subject is white balanced the subsampled chromatic components should ideally disappear leaving us with the full resolution baseband image (). In that case is related to the 2D Fourier transforms of the individual, white-balanced, raw color plane images () by the following relationship
(1)
This is just a representation of the full resolution, neutral, white-balanced raw CFA image () in the frequency domain. This relationship of course also applies to the relative MTF curves in Figure 3.
The following plot drives the point home. It shows system MTF obtained three ways from a white-balanced Bayer CFA raw image of a neutral slanted edge:
- the blue curve is the result of running MTF Mapper on the fully populated raw Bayer CFA image as in Figure 1, after white-balancing;
- the open circles are the result of running MTF Mapper on full-resolution, white-balanced, perfectly demosaiced image , with every pixel equal to the weighted sum of the intensity of each color channel there in the proportions of Equation (1);
- the x’s represent the weighted sum of the MTF curves obtained by running MTF Mapper on each subsampled raw color plane individually, in the same proportions as in Equation (1).
They are practically the same (other than for some noise at higher frequencies introduced by the defocused red channel MTF hovering above zero there where in reality there should be no energy).
Therefore with a few simplifying assumptions we come up with the answer to the question posed a couple of posts ago: as far as the measurement of linear spatial resolution off a white balanced raw capture by a Bayer CFA sensor of a neutral slanted edge under a uniform illuminant is concerned, MTFs obtained from the three separate raw color planes in isolation can be added in the following proportions to produce an accurate System MTF for the hardware as a whole:
(2)
where stands for linear luma, proportional to luminance. This System MTF curve represents the input-referred spatial frequency capabilities of the hardware. We can use this aggregate System MTF to derive sharpness metrics for it like MTF50, SQF or Acutance. It can be legitimately used to compare performance of different cameras and lenses in similar conditions.
This result is not surprising if looked at from an Information Science perspective: it indeed represents a correct proportion of the spatial information collected in the raw data of a Bayer CFA sensor. Superposition in the spatial domain would also suggest this result.
Combined System MTF: As Displayed = Y
In a fully linear imaging system Luminance[5] from the scene in cd/m is supposed to be proportional to the Luminance in cd/m striking the eyes of the viewer of the photograph. In the days of black and white photography and TV that was self-evident. When color came of age and captured images started to be produced in standard (colorimetric) output RGB color spaces, different formulas were developed to attempt to recover the original luminance channel when unavailable:
Not surpisingly the first four formulas above are the middle row of the relative -> linear matrix that converts values from the given output colorimetric color space to , where Y is supposed to be a proxy for Luminance. Depending on the application and the relative color space, variations on take names like luminance, luminosity, luma, lightness, brightness, value etc.
The various coefficients try to estimate from data in a standard output RGB color space the original full resolution grayscale intensity proportional to Luminance from the scene. They attempt to reflect the fact that we perceive red and green as substantially brighter than blue, that phosphors in analog TVs were close to certain primaries and the primaries of modern LCD/LED panels can be close to others, that the response of output devices were nonlinear and required gamma corrections (the first equation above, ) while today’s may not (the other three linear ). However, their goal is to approximate a grayscale image similar to what one would get from calibrated black and white monitors.
The last equation instead converts input white balanced and demosaiced raw intensity captured by the sensor to the same space. It is the one that comes closest to bringing the input-referred raw data described earlier directly into a colorimetric color space, without the additional confounding processing that is part of rendering an RGB image. It is the grayscale image that a neutral raw converter should produce.
Note that, contrary to all others though, it has a negative component, with implications better explained in the next article. It is this scene, illuminant and camera/lens dependent weighted mix of spatial frequencies that comes closest to the Luminance that will be presented to, thus perceived by, the Human Visual System in the displayed image. Therefore use this mix of MTF curves from the individual raw planes of this camera and lens as setup to obtain a System MTF representative of the image as displayed by a neutral raw converter:
(3)
However, contrary to input-referred System MTF, this set of weights is camera, scene and illuminant dependent – thus in this case only valid for a generic CC24-like target captured in D65 lighting by a Nikon D610 with 24-120mm/4 lens.
Appendix: What does L Look Like?
Fortunately for our purposes1 we don’t need to determine a setup-dependent Equation (3) because we do in fact have a signal proportional to the original full resolution luminance information: it is the white balanced, grayscale, baseband image in the otherwise pristinely linear raw data, proportional to scene Luminance.
Once the neutral edge under uniform illumination is captured and the relative raw data is white balanced, each of the four raw channels () is effectively gray and can be a proxy for luminance: twice the luminance, twice the mean values; half the luminance, half the mean values; and so on all across the edge profile.
And because the system is supposed to be linear, so it should be all the way to the output color space. If a pixel – any pixel, independently of its color heritage – has an intensity value of 1/10th of full scale in the white balanced raw data, it should ideally have an intensity of 1/10th of full scale in sRGB before gamma is applied. Ideally zero maps to zero, 100% maps to 100% and all other neutral values in between fall into place linearly. If a hueless subject is white balanced in the raw data with r=g1=g2=b it should be neutral in sRGB with R=G=B, and vice versa. Recall that the linear colorimetric RGB color image and the raw CFA image are supposed to share the same luma channel [4].
Of course pixels sitting under the r, g or b Bayer CFA filters will have different characteristics because they are the result of physically different wavelengths and quantities of photons. These differences only mean different noise and spatial properties in the three planes. We do not care about different noise because within these limits the slanted edge method is relatively insensitive to it; and the different spatial properties are what we wish to measure.
WB Raw = A Grayscale Image
So the white-balanced, undemosaiced raw data mosaic represents an accurate grayscale image for our purposes1 . It can be a proxy for , what the top four formulas above are trying to reengineer back-to-front from output RGB data:
(4)
with the raw data under the four CFA color channels as laid out on the sensor. In this achromatic context we already have a version of in the raw data for free, just as Dubois and Alleysson explained[3][4]. Here is one example of full resolution D610 CFA raw data displayed as-is, after white balancing and nothing else:
It’s part of the B&W back cover of a magazine, illuminated with a halogen light and white balanced off the forehead. There was also some light coming in from outdoors.
It was captured slightly out of focus on purpose because moirè off the fabric and the printing process was otherwise drawing lines all over the page (aliasing = sharp). The objective is to show the untouched grayscale channel in the raw data. There is some slight pixelation locally where the white balance begins to be less representative of neutral; for instance the magazine paper and the white paper it is resting on have different white points.
Of course the grayscale image breaks down in the upsidedown ColorChecker color patches, which are not neutral hence outside the perimeter of this article1. To drive the point home here is another example of the grayscale image in the untouched raw data, this time a Sony a7II ISO200 capture of DPReview.com’s studio scene, just the CFA white balanced intensities as they were on the sensor:
In the neutral areas of the scene the full resolution CFA raw data is representative of luminance as-is. In the color areas it is not, thus pixelated, but that does not concern us here. Pixelation of course disappears if the image is downsized (if you don’t click on the image above), producing an intensity equal to ..
A more in depth take on the physical meaning of input-referred grayscale image vs colorimetric can be found in the article on the effect of a Bayer CFA on sharpness.
Notes and References
1. In this article ‘the context’ or ‘the purpose’ will mean raw captures of neutral (hueless,achromatic) slanted edges under a uniform illuminant by Bayer CFA digital cameras for the purposes of measuring by the slanted edge method the linear spatial resolution (‘sharpness’) of photographic equipment.
2. Taking the Fourier Transform of the Line Spread Function is equivalent to applying the Fourier Slice Theorem to the Radon Transform of the system’s two dimensional Point Spread Function. It results in a radial slice of the two dimensional Fourier Transform of the two dimensional PSF in the direction of the edge normal.
3. Frequency-Domain Methods for Demosaicking of Bayer-Sampled Color Images. Eric Dubois. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (12), p. 847.
4. Frequency selection demosaicking: A review and a look ahead. D. Alleysson and B. Chaix de Lavarène, Proc. SPIE 6822, 68221M, 2008, Section 2, Spatio-Chromatic Model of CFA Image.
5. In this article capitalized Luminance refers to the photometric quantity in cd/m2, while relative luminance with a lowercase ‘l’ means the spatial map of linear image intensity Y in XYZ space normally expressed as a percentage of full scale – but also in cd/m2 when properly calibrated (one also finds Y’ in the literature, which however does not concern us here since it depends on gamma encoded values).
Jim Kasson Wrote:
These are not different approximations. These are different because the color space primaries are different, and thus conversions to 1931 CIE XYZ yields different Y. If the standard observer is right, they are right. Of course, the standard observer does not take adaptation and spatial effects into account. That’s where the guesstimation occurs.
Makes sense, thanks Jim. I’ve corrected the text to reflect that.