Goodman, in his excellent Introduction to Fourier Optics[1], describes how an image is formed on a camera sensing plane starting from first principles, that is electromagnetic propagation according to Maxwell’s wave equation. If you want the play by play account I highly recommend his math intensive book. But for the budding photographer it is sufficient to know what happens at the Exit Pupil of the lens because after that the transformations to Point Spread and Modulation Transfer Functions are straightforward, as we will show in this article.
The following diagram exemplifies the last few millimeters of the journey that light from the scene has to travel in order to be absorbed by a camera’s sensing medium. Light from the scene in the form of field arrives at the front of the lens. It goes through the lens being partly blocked and distorted by it as it arrives at its virtual back end, the Exit Pupil, we’ll call this blocking/distorting function . Other than in very simple cases, the Exit Pupil does not necessarily coincide with a specific physical element or Principal surface.[iv] It is a convenient mathematical construct which condenses all of the light transforming properties of a lens into a single plane.
The complex light field at the Exit Pupil’s two dimensional plane is then as shown below (not to scale, the product of the two arrays is element-by-element):