Building on a preceeding article of this series, once demosaiced raw data from a Bayer Color Filter Array sensor represents the captured image as a set of triplets, corresponding to the estimated light intensity at a given pixel under each of the three spectral filters part of the CFA. The filters are band-pass and named for the representative peak wavelength that they let through, typically red, green, blue or , , for short.
Since the resulting intensities are linearly independent they can form the basis of a 3D coordinate system, with each triplet representing a point within it. The system is bounded in the raw data by the extent of the Analog to Digital Converter, with all three channels spanning the same range, from Black Level with no light to clipping with maximum recordable light. Therefore it can be thought to represent a space in the form of a cube – or better, a parallelepiped – with the origin at [0,0,0] and the opposite vertex at the clipping value in Data Numbers, expressed as [1,1,1] if we normalize all data by it.
The job of the color transform is to project demosaiced raw data to a standard output color space designed for viewing. Such spaces have names like , or . The output space can also be shown in 3D as a parallelepiped with the origin at [0,0,0] with no light and the opposite vertex at [1,1,1] with maximum displayable light.
The projection is often expressed as follows, with the needed color transformation
(1)
The convention in Color Science is for the input and output data to be in column-vector format, i.e. 3xN, with N the number of tone triplets, as shown below. At its simplest the transform is a 3×3 matrix made up of 9 coefficients related to the camera+lens, the scene, the light source and the output color space. In such case * represents matrix multiplication and linear algebra applies: just matrix multiply demosaiced raw data triplets by the appropriate matrix to obtain image intensities in the color space of choice.
(2)
In this article we will break down the transform into its components using as an example a sunny 5300K (D53) capture of a ColorChecker 24 target by the Nikon D5100 camera and lens featured recently.
Step by Step
If we had a set of raw data from our camera/lens and respective reference values in the colorimetric color space of choice we could solve for the transform directly as will be shown below. However, it is useful to first break down the process into its components to distill it into its essence and to better understand the factors at play.
Thanks to the associative property, we know that single matrix can be split into the product of a number of 3×3 matrices representing every step of color conversion, from raw to . For a given camera and lens, demosaiced raw data is in sequence:
- white balanced by the illuminant-dependent matrix containing the multipliers on the diagonal, );[1]
- projected to by the appropriate scene-and-illuminant-dependent compromise color matrix (or simply for consistency with previous articles);
- Chromatically Adapted from the capture’s illuminant (here D53) White Point to that of the viewer (say D65) by matrix ;
- projected to , the output color space (for example ), by standard matrix
This can be expressed as follows
(3)
The sequence of multiplication appears to be in reverse order because image intensities in Color Science are conventionally expected to be column vectors arranged in a 3xN array as shown in Equation (2).[2] We will refer to the projection from demosaiced, white balanced raw data to the color space in Step 2 as for the rest of the article in order to simplify notation and because that’s what we have been calling it in this series.
Note that the final matrix to the chosen output color space in step 4 is constant (we can lift it from Bruce Lindbloom’s site [9] ) but the other three depend on the illuminant.
Pre-Cooked Transforms (M)
Assuming we knew the characteristics of the light source, however, we should be able to obtain the white balance multipliers in step 1 for the given camera and lens – for instance from the camera’s own estimate or from a gray card – and to calculate the chromatic adaptation matrix in step 3.[8] Therefore, for a given illuminant, camera and lens the only remaining unknowns would be the 9 coefficients in the 3×3 matrix of step 2, the Compromise Color Matrix that projects white-balanced raw data to in the given conditions. In addition to the above, these also require knowledge of the scene.[3]
If the camera/converter had a number of precomputed white balanced raw to transforms for different scenes and illuminants, the main input variable in full-trip conversion from raw to output color space would be the illuminant: for a given scene figure out the illuminant, select the appropriate transform and the full-trip conversion falls into place.
And so it is, every camera (and raw converter) comes indeed loaded with a set of pre-cooked transforms computed for its specific Spectral Sensitivity Functions as setup, each to be used with a particular scene and illuminant combination.
Solving for M by Linear Regression
To determine 3×3 matrix we typically capture in the raw data of the camera that we wish to characterize a known target under a known illuminant. We attempted such a feat less formally in the Forward Matrix article, it may be useful to give it a quick glance before reading on.
The target presents a number of representative diffuse reflectances, the more representative of the scenes we are planning to shoot the better. For general photography the 24 patches of a ColorChecker 24 can be unexpectedly effective.[4]
Since there are typically more equations than unknowns by design, the problem is overdetermined. In linear algebra it is usually shown in the following form
(4)
which presumes data in row-vector format, i.e. Nx3 (also the format used by Matlab/Octave for vectorization). Compared to the notation used in matrix multiplication and Color Science therefore would correspond to demosaiced and white-balanced raw data , to the relative intensity in and to , with indicating the transpose of the relative array. Restating Equation (1) in the form of (4) we have for the problem at hand
(5)
So for and reference in row-vector format the resulting matrix is the transpose of that expected by Color Science. The Normal Equation provides a unique least-squares solution to the overdetermined problem as stated. In our case
(6)
with the inverse, the transpose of the relative array and referenced to Color Science friendly 3xN column-vector data. For instance if is the BabelColor30 CC24 target ‘scene’ captured in the raw data;[5] by the Nikon D5100 with the mystery lens featured in the last few articles;[6] under a daylight illuminant of correlated color temperature around 5300 degrees Kelvin, call it D53; the resulting matrix would be
with a mean residual CIEDE2000 error of 1.093 over the 18 color patches plus mid-gray patch 22. This matrix projects white balanced raw data to the color space with white point equal to [0.9559 1.0000 0.8763], as can be gathered by multiplying it by the white balanced raw triplet corresponding to normalized maximum diffuse white, – equivalent to summing its rows.
Solving for M with an Optimization Algorithm
The least-square solution minimizes the sum of the squares of the differences between the entries on either side of the equal sign of Equation (5) with equal weight, which is often not ideal because the Human Visual System is more sensitive to some colors than others. We can fine tune the result by applying an efficient optimization algorithm (such as Matlab’s Genetic Pattern Search) to find more perceptually relevant solutions for . In this case I used fminunc to minimize the residual CIEDE2000 differences[10]
Now mean CIEDE2000 over the 19 patches is a little bit better at 0.950, differences are often more marked. This matrix projects white balanced raw data to the color space with white point [0.9587 1.0000 0.8802]. Here are the residual errors for each of the CC24 patches after optimization
As can be seen in the code linked to in the notes, the cost function for the optimization routine is the mean CIEDE2000 of all of the 24 patches, 6 of which are varying intensities of gray. This ensures that the result does not stray too far from the appropriate White Point for the given illuminant.
Note however that neither white point of the last two matrices is exactly equal to the White Point of the illuminant, D53, which is instead [0.9593 1.0000 0.8833] in the 380-730nm range we are using in this series of articles with CIE (2012) 2-deg “physiologically relevant” CMFs. These are small errors in this case but they can be substantial in others, causing unwelcome color casts.
White Preserving Normal Equation
The solution is to correct the Normal Equation by adding a Lagrange multiplier in order to force the sum of the rows to be equal to the White Point of the illuminant. I used the formula in the appendix of the relative cic97 paper by Finlayson and Drew, [7] modified to allow specifying the White Point. You can find the code in a Matlab / Octave function referenced in the notes at bottom.
The result is the least squares solution constrained to the White Point of the illuminant:
And in fact the sum of the rows now corresponds to illuminant D53’s White Point [0.9593 1.0000 0.8833], though this transform’s weakness is still that it does not take into consideration the sensitivity of the Human Visual System to different tones.
White Preserving Optimization
However, we can also set up the optimization algorithm to only look for solutions with the sum of the rows equal to the White Point. In fact this effectively reduces the number of unknowns in the 3×3 matrix from 9 to 6, since we are adding three equations specifying the weighted sums of the variables as three constants. Should you be interested you can see how this is accomplished in the Matlab code in the notes. Here is the relative White Preserved optimized matrix
The sum of the rows correspond indeed to the D53 White Point, [0.9593 1.0000 0.8833]. By doing so we have lost a tiny bit of fidelity, the CIEDE2000 average difference over the chosen 19 patches is now 0.954 instead of 0.950, immaterial in this case (though it can be material in other cases). These are the residual errors for each of the CC24 patches after white point preserving optimization
In this case the WPP optimization is trivially different from the freely obtained one – but this is not always the case.
Step by Step
To conclude our example we apply the four steps that together represent transform that projects demosaiced raw data – from a sunny generic CC24 scene illuminated by D53 captured by a Nikon D5100 and lens – to the output color space.
Step 1: the white balance multipliers for the Nikon D5100 and lens under sun are the inverse of the readings off a gray card put into a diagonal matrix, normalized so that the green factor is unity
Step 2: the chosen matrix from white balanced raw data to with a White Point would be the white preserving, perceptually optimized one above
Step 3: per Lindbloom’s simplified method,[8] the Chromatic Adaptation matrix from D53’s White Point to ‘s D65 using the Bradford transform is
Step 4: from Lindbloom’s site,[9] the projection to from once adapted to D65 is
Different matrices can be selected in steps 3 and 4 to end up in any desired destination color space like Adobe RGB, REC2020 etc.[11]
The Linear Color Transform
Multiplying the matrices from the four steps together per Equation (3) we get the full-trip matrix
This is the linear transform that we set out to find. It projects demosaiced raw data triplets to the output color space for our setup, in column-major format:
And all with an estimated average error less than a noticeable difference, 1 CIEDE2000.
Out of Gamut
The sum of the rows of the inverse of are the reciprocal of white balance multipliers, because once normalized per step 1 above, the remaining transform maps the white-balanced raw cube to an output color space, also a parallelepiped with origin at [0,0,0] and opposite vertex at [1,1,1].
Note that the conversion from to color space, step 4 in particular, has introduced some sizable positive and negative coefficients. Large positive coefficients can result in values greater than one, try for instance a pure [1,0,0] or a pure [0,0,1] triplet from the camera with above. Since 1 represents the brightest normalized intensity that a display can produce, these values are effectively clipped. Also, displays cannot produce negative intensities so resulting values below zero are blocked and never see the light of day. Try for instance [0,1,1] or [1,0,1] raw data. Such tones fall outside of the ‘cube’ and are out of gamut, you can see some examples from a natural image in the article on applying the Forward Matrix.
Non Linear Color Transforms
Of course the matrix is the result of many compromises so the transform can and often is improved by the application of Look Up Tables (so-called LUTs) to tamp down larger errors such as those sticking out in the last few Figures. Our cameras and raw converters contain several such kit-specific transforms for a number of illuminant and scene type combinations. Choosing the right one is not trivial and can result in incorrect tones. Just how incorrect will be the subject of the next article or three.
Notes and References
1. The derivation of raw data and White Balance multipliers Krgb in physical units was the subject of an earlier article in this series, you can find it here.
2. In linear algebra transposing the result of a product of a number of matrices is the same as transposing each individual matrix and reversing the order of multiplication.
3. Adobe curated, open source Digital Negative specification 1.6.0.0 can be found here.
4. Jim Kasson trained transforms on a large number of different reflectance sets, including ColorCheckers, and tested the results on a number of databases. The ColorCheckers did unexpectedly well. See for instance his blog here.
5. The Excel spreadsheet with BabelColor.com average measured reflectances of 30 ColorChecker 24 targets in their database can be found here.
6. “Reference data set for camera spectral sensitivity estimation, Maryam Mohammadzadeh Darrodi, Graham Finlayson, Teresa Goodman and Michal Mackiewicz, Vol. 32, No. 3 / March 2015 / J. Opt. Soc. Am. A”. The relative data, the paper and additional information are available in this page.
7. White Point Preserving Color Correction, Graham D. Finlayson, Mark S. Drew, 1997.
8. Bruce Lindbloom’s instructions on calculating Chromatic Adaptation matrices can be found here.
9. Bruce Lindbloom’s pre-calculated matrix from XYZ D65 to sRGB can be found here.
10. The Matlab / Octave code used in the article can be downloaded by clicking here
11. If you are trying this at home, any small discrepancies in the figures are probably due rounding errors and my using only the 380-730nm range for the calculations, in order to match BabelColor-collected CC24 reflectance data.
I’m loving the series and looking forward to the next post, especially in the context of how easy it is to naively overfit color correction matrices in typical capture scenarios, e.g.:
https://www.colour-science.org/posts/the-colorchecker-considered-mostly-harmless/
Thanks for the kind words Jed, I agree that controlling the large number of variables in the field is not easy.
Hi, Sorry for my silly question and bad English. Here is the point that I do not understant:
Isn’t Chromatic Adaptation a better way of doing White Balance? Why should I apply R/G gain B/G gain first to make grey’s R=G=B and than still need to transfer the white points?
I have read the CAM 3rd, when it introduces Von Kries Model (a relative simple form of CAT), it tell me that, the model “scale or aligned” the white points of input and output space in LMS domain. To me it is just like what Camera WB is doing…
In DNG spec, when Forward Matrix does not exist, the Camera2XYZ_D50 Matrix = CA* Inverse(AB*CC*CM), here AB is AnalogBalance Matrix, which is usually not used in modern cameras(?), Besides that I did not find the way how Adobe applies the xy chromatic point written in the tag…
or maybe the camera WB gain is to adjust Camera RGB to D53 white point’s xy chromatic value???
Hello Huang,
To answer your first question, white balance is not chromatic adaptation by ‘wrong’ Von Kries because the three raw channels are linearly independent at that stage. We can do whatever we want with them linearly and it won’t affect the final result because the subsequently derived compromise color matrix will compensate for anything we may have done and transfer the white point correctly to XYZ. Chromatic adaptation comes after that, usually via Bradford and LMS.
The DNG workflow provides Forward Matrices to a D50 white point assuming white balanced raw data.
If you have further questions, send me a message via the form in the ‘About’ page and we can continue via email.
Jack
Hi, Jack. Sorry for the late response as I did not get a notification of your reply.
As far as I know, the RGB raw data must be converted to medium color space, say CIE1931XYZ, and after that, we could apply CAT with both the input and output of XYZ.
So the first conversion from RGB raw data to XYZ we must assume some gain to finish it, that is just a preparation, and the actual adaptation is the later step.
Is this what you mean by “it won’t affect the final result”
Yes, pretend for instance that the camera/lens spectral response is the same as color matching functions for XYZ or LMS. That’s what we try to mimic with a single compromise color matrix from raw data as-is to one of those spaces. Sometimes for compatibility and/or convenience we may break down the single matrix into the product of two or more matrices: one containing white balance multipliers (Step 1 in the article), one with a sort of Forward Matrix (Step 2), etc.
Raw intensities so projected to XYZ or LMS will depend entirely on the physics of the setting and the hardware, nothing to do with adaptation. Next we consider adaptation (Step 3)…