In this article we shall find that the effect of a Bayer CFA on the spatial frequencies and hence the ‘sharpness’ information captured by a sensor compared to those from the corresponding monochrome version can go from (almost) nothing to halving the potentially unaliased range – based on the chrominance content of the image and the direction in which the spatial frequencies are being stressed.
A Little Sampling Theory
We know from Goodman[1] and previous articles that the sampled image ( ) captured in the raw data by a typical current digital camera can be represented mathematically as the continuous image on the sensing plane ( ) multiplied by a rectangular lattice of Dirac delta functions positioned at the center of each pixel at coordinates :
(1)
with the functions representing the two dimensional grid of delta functions, sampling pitch apart horizontally and vertically.
To keep things simple the sensing plane is considered here to be the imager’s silicon itself, which sits below microlenses and other filters so the continuous image is assumed to incorporate their as well as pixel aperture’s effects.
Because spatial domain multiplications become convolutions in the frequency domain, the Fourier Spectrum (|) of sampled image is just the magnitude of the Fourier Transform () of the continuous image convolved with the Fourier Transform of the comb functions, which turn out to also be comb functions albeit of inverse spacing, as shown below
(2)
with indicating two dimensional convolution and sampling pitch (pixel pitch here). We saw what the Spectrum looked like with a perfect lens and pixel aperture in the article on Aliasing:
Monochrome Spectrum
This time I am going to show the magnitude of the Discrete Fourier Transform of a typical test target captured by a monochrome digital camera in 2D as an image, with brightness representing the energy of the relative spatial frequency:
On the left is the grayscale rendition of a Siemens star target captured by the fine folks at DPReview.com with a Leica Monochrome Typ 216 at base ISO. On the right is the linear magnitude of the DFT of the same raw capture as performed by Matlab/Octave, otherwise known as its two dimensional Fourier Spectrum, shown as an image.
It is difficult to see what’s going on above right because of the very large energy excursion involved, so normally the logarithm of the Spectrum is shown instead – keeping in mind that this approach tends to overemphasize low energy effects. Below the Spectrum in Figure 2 is displayed as the natural logarithm of (1+Spectrum):
The horizontal and vertical units are cycles per pixel pitch with zero cycles/pitch (c/p) top left and one c/p at the other three corners: (0,0), (1,0), (1,1), (0,1) clockwise for linear spatial frequencies (,) in the x and y directions respectively. The monochrome Nyquist frequencies beyond which aliasing can occur are then half way down the top, bottom left and right edges of Figure 3, corresponding to the yellow lines at 0.5 c/p horizontally and vertically.
The DFT routine only shows one period of the baseband Spectrum but, mentally, tile Figure 3 vertically and horizontally (almost) forever because the function is actually infinitely periodic, per equation (2). The tiling is caused by the orthogonal Dirac delta that effectively modulate baseband out at cycles per sampling pitch spacing, as seen in the repeating ‘tent’ pattern in Figure 1. When you do that it becomes sometimes useful to look at the baseband Spectrum with the spatial frequency origin (0,0) shifted to the center of the relative spectrum image, pretend that we are looking down onto a single tent in Figure 1:
Now the origin of the (,) spatial frequencies is in the center of the Spectrum image, the four corners representing (-0.5,-0.5), (0.5,-0.5), (0.5,0.5), (-0.5,0.5) c/p clockwise starting top left. The monochrome Nyquist frequencies therefore run along the yellow lines now along the four edges. It is clear that as long as all the energy of the spatial image fits within the Nyquist Frequency boundaries there will be no aliasing. On the other hand it is obvious by looking at Figure 3 that this is not the case here because some of the Spectrum clearly extends into neighboring quadrants, therefore exceeding Nyquist. Ignore the ‘reflection’ of rays at the edges of the Spectrum in Figure 4, they are an artifact due to having performed the DFT without any padding (tsk, tsk).
The Impact of a Bayer CFA
So that is the story for a sampled image produced by a monochrome digital sensor. What impact will the introduction of a Bayer Color Filter Array have on the Spectrum of a current digital camera all else equal? Linearity and superposition apply so one way to look at the CFA image is to pretend that we are actually sub-sampling four separate continuous images, one per color plane, at a spacing of every other monochrome pixel according to the Bayer layout. We can then estimate the spectrum of each sub-sampled image separately – and add the individual results up to obtain the spectrum of the CFA image.
Each sub-sampled color plane has half the linear pixels of the fully populated monochrome sensor so sampling pitch () is apparently doubled for a Bayer CFA sensor compared to monochrome . Twice the pitch means convolving the baseband spectrum with with half the spacing between the Dirac deltas in equation (2), therefore halving the Nyquist frequency. To verify this intuition we can take a look at the spectrum of a Siemens star, again captured in the raw data by the fine folks at DPReview.com, this time with a Bayer Color Filter Array Nikon D7200 at base ISO. It is shown below with the same spatial frequency scale as in Figure 4. Note pixelation in the spatial image to the left due to the different relative intensity in the four color planes resulting from the different Spectral Sensitivity Functions of each in the CFA:
Comparing Figure 6 to Figure 4, the spectrum of the Bayer CFA D7200 does indeed appear to have components packed twice as tightly as the Monochrome Typ 216, reducing by half the potentially pristine unaliased baseband signal. Nyquist here seems to occur at -1/4 and 1/4 cycles per monochrome pitch, versus -1/2 and 1/2 before.
However, this intuitive explanation is somewhat dissatisfying. For instance, why couldn’t a Bayer CFA sensor behave more like a monochrome sensor if the subject were neutral and the raw data properly white balanced before demosaicing?
A Frequency Domain Bayer CFA Model
David Alleysson[2] and, subsequently, Eric Dubois[3] came up with a mathematical model that clearly explains the effect of color on a Bayer CFA sampled image in the frequency domain, their papers are worth a read. In this article I will use Dubois’ notation which I find a little easier to follow.
Their insight is based on assuming three fully populated planes (i.e. not subsampled, each the size of the equivalent full size monochrome sensor), receiving exactly the same light from the scene, each behind a large color filter with spectral sensitivity equivalent to that of the respective , or Color Filter Array on the digital sensor, as shown below left.
Following the simple math in Dubois’ letter we can see that the image information captured in the raw data by a Bayer CFA sensor can be expressed as follows in the spatial domain:
(3)
where:
- is the Bayer CFA image captured in the raw data
- are the coordinates of the center of each pixel in the full size image on the sampling lattice, 0,1,2,3… #samples horizontally and vertically resp.
- is the input-referred full resolution baseband ‘luma’ component of the image on the sensing plane, defined below
- and are the two full resolution ‘chrominance’ components defined below
- are the exponential terms responsible for the checkered look of the Bayer CFA image; the 2 in the denominator indicates half the ‘sampling’ frequency as .
, and are input-referred and defined in terms of full resolution , and raw color plane data as follows:
(4)
Full Res Grayscale, Subsampled Chrominance
By full resolution I mean not sub-sampled – but with the same pixel layout and resolution as an equivalent monochrome sensor. As Dubois says, after taking the Fourier Transform of each term in Equation (3) the image in each quadrant of the frequency domain ‘can be interpreted as a baseband luma component , a chrominance component modulated at the spatial frequency (0.5, 0.5), and a second chrominance component modulated at the two spatial frequencies (0.5, 0) and (0, 0.5), where spatial frequencies are expressed in cycles per pixel…’ , not too far from our earlier intuition.
Going from the spatial to the frequency domain, i.e. taking the Fourier Transform of Equation (3), becomes a delta function at and products become convolutions so it is clear that the exponential terms are responsible for performing the mentioned modulation, hence for the replicas of at the corners and at the cardinal points of the 2D spectrum.
This is interesting because it suggests that mathematically a Bayer CFA image consists of a full resolution baseband luma component with Nyquist frequency at monochrome c/p like in a monochrome sensor – and only the additive chrominance components and are sub-sampled at twice the pitch.
As a result the subsampled chrominance components get modulated out to the monochrome Nyquist frequencies and potentially corrupt the otherwise pristine baseband signal, possibly halving monochrome Nyquist and the useful frequency range depending on their energy, as shown in Figure 8. In natural scenes chrominance typically has a smaller bandwidth than luminance, which tends to be the dominant determinant of perceived sharpness.
A Bayer CFA raw file contains a full resolution baseband luma image because of the correlation between adjacent color pixels, which our earlier thought experiment ignored. Interestingly this is also true of the Human Visual System, as it appears not to pay a penalty for incorporating trichromacy.[4] Perhaps not coincidentally, neurons in the retina also encode the signal into luminance and chrominance components before sending it to the visual cortex in the brain.
To Subsample or not to Subsample
The other insight comes from Equation (4), where we can easily see that the sub-sampling function of the terms in Equation (3) on the and full resolution images is immaterial when and/or are equal to zero.
For instance it is obvious that disappears whenever = , that is when those two color channels have the same intensity. I simulated this case by taking the full resolution raw data of the Monochrome Typ 216 and applying a factor of 0.6 to the pixels that would have corresponded to the and channels in a Bayer CFA file – a common scenario in raw data captured by current digital cameras in daylight. Below to the left you can see the demosaiced image, to the right the Spectrum of the mosaiced CFA as described. Note the missing components at the cardinal points:
The same will be true anywhere the and planes are equal.
On the other hand disappears when the sum of the two channels is equal to the sum of and . In this case I applied factors of 1.4 and 0.6 to the and planes respectively, note the missing components at the corners of the spectrum:
The same will be true anywhere the sum of the two planes is equal to the sum of and . Here for instance factors of 0.7 and 1.3 respectively were applied instead, to the same effect:
Of course both and chrominance components disappear when = = , the case where the subject is neutral and the color planes each see the same intensity, Here the baseband luma component is left alone, uncorrupted by chrominance crosstalk:
What about in practice?
That’s how it works in theory. In the real life Bayer CFA Spectrum of the earlier D7200 capture in Figure 6, white balancing the raw data before demosaicing effectively suppressed and as shown below in Figure 13. There’s the approximation of the full resolution baseband luma image present in the raw data of a Bayer sensor (don’t forget that what is actually shown is the natural logarithm of the magnitude of the Discrete Fourier Transform, which tends to overemphasize low level signals).
White balancing the raw data was not able to completely eliminate and because of residual slight differences in the information collected by each channel, some physical and some due to non-idealities in the system. For instance shot noise, a non mono-chromatic light source, differences in the amount of noise, diffraction, chromatic aberrations – and non-uniformities in the sensor.
If you would like to see a couple of example black and white images of undemosaiced white balanced raw data you can take a look at those in an earlier article with a less formal take on this subject.
How a Bayer CFA Affects Sharpness
In conclusion we have seen that the effect of a Bayer CFA on the spatial frequencies and hence the ‘sharpness’ information captured by a sensor compared to those from the corresponding monochrome version can go from (almost) nothing to halving the potentially unaliased range, based on the chrominance content of the image and the direction in which the spatial frequencies are being stressed.
This kind of analysis was responsible for a flurry of papers on frequency domain demosaicing which were state-of-the-art about a decade ago and still pretty good today.
Appendix
1) What do , and represent?
, and , the luma and chrominance components under discussion, make up a linear space. Together I consider them an input-referred linear space because they represent the performance of the hardware directly, before any psychovisual weighting. As such there is no one-for-one correspondence to standard output-referred Color Science terms like photometric luminance and chromaticity from the colorimetric color space, that are instead weighted by CIE Color Matching Functions , , purported to represent the response of the Human Visual System.
Of course unless the reference monochrome sensor had a spectral response exactly equal to , which in most CMF implementations is made virtually the same as the photopic luminosity function, its baseband response would also not correspond exactly to luminance.
In the Alleysson and Dubois papers luma is defined to be exactly equal to 0.25 + 0.5 +0.25, with representing white balanced raw data from a Bayer CFA sensor for the given illuminant. On the other hand the estimate of photometric luminance for the Nikon D7200 treated in this page is the result of a compromise which I estimated off a ColorChecker 24 to be approximately equal to 0.23 + 0.93 – 0.16.
Since and are both color spaces linked linearly to , they are also linearly related to each other by simple matrix projection. With a neutral subject, that is when == and therefore ==0, baseband grayscale image is equally legitimate to and similar in both spaces, as explained in part 3) below.
What space is more appropriate for resolution evaluations is a good question. I would argue that both have their place depending on context – and in fact perhaps there is a yet-to-be-defined third space more relevant to perceptual sharpness measurements. Until then in my opinion is a convenient input-referred space useful for investigating the effects of imaging hardware on resolution.
2) Space Conversion Matrices
Since both and are linear spaces linearly related to linear white balanced raw data , we can obtain matrices to convert from one space to the other.
(5)
(6)
On the other hand the following compromise matrices only apply approximately to the Bayer CFA Nikon D7200 in the conditions tested here:
(7)
Assuming the raw data was the result of a capture under illuminant D65, substituting from Equation (6) into Equation (7) we obtain the compromise matrix for conversion from to :
(8)
Inverting Equation (8), we obtain the compromise matrix from to :
(9)
Finally for completeness we can take from the Nikon D7200 in this condition to two widely used linear colorimetric output RGB spaces, sRGB and Adobe RGB:
(10)
(11)
For obvious reasons the first columns will always be ones while the second and third columns will vary with the camera and the illuminant.
3) Is luma the Baseband Grayscale Image?
Assume that a neutral subject in daylight projects image on a digital camera’s sensing plane, representing the photometric luminance distribution of the subject. A monochrome sensor at the sensing plane captures an approximation proportional to in its raw file, call it . It is an approximation because the monochrome sensor’s response is not in practice the same as the photopic luminosity function that determines photometric luminance.
Now cover the monochrome sensor with a Bayer Color Filter Array. After white balancing the captured raw data so that == in a uniform patch of the neutral subject, it also stores an approximation proportional to in its file, call it .
It is then clear from Equation (5) that = and ==0. From Equation (8) ==. Equations (10) and (11) confirm that, everything being linear, in the output RGB spaces a neutral subject results in R=G=B= and therefore .
So with a neutral, white balanced input, the chrominance components disappear and luma image represents a valid approximation of achromatic baseband image throughout the chain – just as legitimate, when rendered in grayscale, as those reproduced in , or any other space by the monochrome sensor, as evidenced by such renditions of the B&W magazine pictures in the previous article. The difference from colorimetric can be estimated by computing , which for the D7200 in this page can be seen to be approximately 0.4().
Notes and References
1. Introduction to Fourier Optics 3rd Edition, Joseph W. Goodman, p. 22.
2. Linear demosaicing inspired by the human visual system. David Alleysson, Sabine Susstrunk, Jeanny Herault. IEEE Transactions on Image Processing, Institute of Electrical and Electronics Engineers, 2005, 14 (4), pp.439-449.
3. Frequency-Domain Methods for Demosaicking of Bayer-Sampled Color Images. Eric Dubois. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (12), p. 847.
4. The Cost of Trichromacy for Spatial Vision. David R. Williams, Nobutoshi Sekiguchi, William Haake, David Brainard, Orin Packer.
5. Adaptive Filtering for Color Filter Array demosaicing. IEEE Transactions on Image processing, vol. 16, no. 10, October 2007, Nai-Xiang Lian, LanlanChang, Yap-PengTan, and VitaliZagorodnov.
6. I was introduced to this line of thinking by Cliff Rames in this fascinating thread with contributions by star physicist and AMAZE demosaicing creator Emil J. Martinec.