In the context of fluorescence microscopy, the probability of measuring a set of number of photons (or digitalization counts proportional to detected light) \mathbf{m} = [m_0, \dots, m_K] for expected values \mathbf{E} = [E_0, \dots, E_K] for a detector with K + 1 pixels is given by P(\mathbf{m} \mid \mathbf{E}) = \prod_i^K \operatorname{Poisson}(E_i) = \prod_i^K \frac{E_i^{m_i} e^{-E_i}}{m_i !}. Since in the context of maximum-likelihood estimation the aim is to locate the
maximum of the
likelihood function without concern for its
absolute value, it is convenient to work with \ln(P): \ln P(\mathbf{m} \mid \mathbf{E}) = \sum_i^K [(m_i \ln E_i - E_i) - \ln(m_i!)]. Moreover, since \ln(m_i!) is a constant, it does not give any additional information regarding the position of the maximum, so consider \alpha(\mathbf{m} \mid \mathbf{E}) = \sum_i^K [m_i \ln E_i - E_i], where \alpha is something that shares the same maximum position as P(\mathbf{m} \mid \mathbf{E}). Now consider that \mathbf{E} comes from a
ground truth \mathbf{x} and a measurement \mathbf{H} which is assumed to be linear. Then \mathbf{E} = \mathbf{H} \mathbf{x}, where a
matrix multiplication is implied. This can also be written in the form E_m = \sum_n^K H_{mn} x_n, where it can be seen how H mixes or blurs the
ground truth. It can also be shown that the derivative of an element of \mathbf{E}, (E_i) with respect to some other element of x_j can be written as {{NumBlk|:| \frac{\partial E_i}{\partial x_j} = H_{ij}. |}} It is easy to see this by writing a matrix \mathbf{H} of, say, 5 × 5 and two arrays \mathbf{E} and \mathbf{x} of 5 elements and check it. This last equation can be interpreted as how much
one element of \mathbf{x}, say element i, influences the
other elements j \neq i (and of course the case i = j is also taken into account). For example, in a typical case an element of the ground truth \mathbf{x} will influence nearby elements in \mathbf{E} but not the very distant ones (a value of 0 is expected on those matrix elements). Now, the key and arbitrary step: \mathbf{x} is not known but may be estimated by \hat{\mathbf{x}}. Let's call \hat{\mathbf{x}}_\text{old} and \hat{\mathbf{x}}_\text{new} the estimated ground truths while using the RL algorithm, where the
hat symbol is used to distinguish ground truth from estimator of the ground truth {{NumBlk|:| \hat{\mathbf{x}}_\text{new} = \hat{\mathbf{x}}_\text{old} + \left.\lambda \frac{\partial\alpha(\mathbf{m} \mid \mathbf{E}(\mathbf{x}))}{\partial\mathbf{x}}\right|_{\hat{\mathbf{x}}_\text{old}}, |}} where \frac{\partial}{\partial\mathbf{x}} stands for a K-dimensional gradient. Performing the
partial derivative of \alpha(\mathbf{m} \mid \mathbf{E}(\mathbf{x})) yields the following expression: \frac{\partial\alpha(\mathbf{m} \mid \mathbf{E}(\mathbf{x}))}{\partial x_j} = \frac{\partial}{\partial x_j} \sum_i^K [m_i \ln E_i - E_i] = \sum_i^K \left[\frac{m_i}{E_i} \frac{\partial}{\partial x_j} E_i - \frac{\partial}{\partial x_j} E_i \right] = \sum_i^K \frac{\partial E_i}{\partial x_j} \left[\frac{m_i}{E_i} - 1 \right]. By substituting (), it follows that \frac{\partial\alpha(\mathbf{m} \mid \mathbf{E}(\mathbf{x}))}{\partial x_j} = \sum_i^K H_{ij} \left[\frac{m_i}{E_i} - 1 \right]. Note that H^T_{ji} = H_{ij} by the definition of a matrix transpose. And hence {{NumBlk|:| \frac{\partial\alpha(\mathbf{m} \mid \mathbf{E}(\mathbf{x}))}{\partial x_j} = \sum_i^K H^T_{ji} \left[\frac{m_i}{E_i} - 1 \right]. |}} Since this equation is true for all j spanning all the elements from 1 to K, these K equations may be compactly rewritten as a single vectorial equation \frac{\partial\alpha(\mathbf{m} \mid \mathbf{E}(\mathbf{x}))}{\partial \mathbf{x}} = \mathbf{H}^T \left[\frac{\mathbf{m}}{\mathbf{E}} - \mathbf{1}\right], where \mathbf{H}^T is a matrix, and \mathbf{m}, \mathbf{E} and \mathbf{1} are vectors. Now, as a seemingly arbitrary but key step, let {{NumBlk|:| \lambda = \frac{\hat{\mathbf{x}}_\text{old}}{\mathbf{H}^T \mathbf{1}}, |}} where \mathbf{1} is a vector of ones of size K (same as \mathbf{m}, \mathbf{E} and \mathbf{x}), and the division is element-wise. By using () and (), () may be rewritten as \hat{\mathbf{x}}_\text{new} = \hat{\mathbf{x}}_\text{old} + \lambda \frac{\partial\alpha(\mathbf{m} \mid \mathbf{E}(x))}{\partial x} = \hat{\mathbf{x}}_\text{old} + \frac{\hat{\mathbf{x}}_\text{old}}{\mathbf{H}^T \mathbf{1}} \mathbf{H}^T \left[\frac{\mathbf{m}}{\mathbf{E}} - \mathbf{1}\right] = \hat{\mathbf{x}}_\text{old} + \frac{\hat{\mathbf{x}}_\text{old}}{\mathbf{H}^T \mathbf{1}} \mathbf{H}^T \frac{\mathbf{m}}{\mathbf{E}} - \hat{\mathbf{x}}_\text{old}, which yields {{NumBlk|:| \hat{\mathbf{x}}_\text{new} = \hat{\mathbf{x}}_\text{old} \mathbf{H}^T \left(\frac{\mathbf{m}}{\mathbf{E}}\right) / \mathbf{H}^T \mathbf{1}, |}} where division refers to element-wise matrix division, and \mathbf{H}^T operates as a matrix, but the division and the product (implicit after \hat{\mathbf{x}}_\text{old}) are element-wise. Also, \mathbf{E} \equiv \mathbf{E}(\hat{\mathbf{x}}_\text{old}) = \mathbf{H} \hat{x}_\text{old} can be calculated because it is assumed that • The initial guess \hat{\mathbf{x}}_0 is known. • The
measurement function \mathbf{H} is known. On the other hand, \mathbf{m} is the experimental data. Therefore, equation () applied successively, provides an algorithm to estimate our ground truth \mathbf{x}_\text{new} by ascending (since it moves in the direction of the gradient of the likelihood) in the likelihood
landscape. It has not been demonstrated in the derivation above that it converges, and no dependence on the initial choice is shown. Note that equation () provides a way of following the direction that increases the likelihood, but the choice of the log-derivative is arbitrary. On the other hand, equation () introduces a way of
weighting the movement from the previous step in the iteration. Note that if this term was not present in (), then the algorithm would output a movement in the estimation even if \mathbf{m} = \mathbf{E}(\hat{\mathbf{x}}_\text{old}). It is worth noting that no prior knowledge on the shape of the ground truth \mathbf{x} is used in this derivation and that the only strategy used here is to maximize the likelihood at all cost, so artifacts on the image can be introduced. ==Software==