The MIMO system can be described by: \mathbf{y} = \mathbf{H} \mathbf{x} + \mathbf{n} , where \mathbf{y} is the received vector, \mathbf{H} is the channel matrix, \mathbf{x} is the transmitted vector, and \mathbf{n} is the noise vector. The goal of MIMO detection is to estimate \mathbf{x} from \mathbf{y} given knowledge of \mathbf{H}.This can be posed as a statistical detection problem, and addressed using a variety of techniques including zero-forcing, successive interference cancellation a.k.a.
V-blast,
maximum likelihood estimation and recently,
neural network MIMO detection. Such techniques commonly assume that the channel matrix \mathbf{H} is known at the receiver. In practice, in communication systems, the transmitter sends a
pilot signal and the receiver learns the state of the channel (i.e., \mathbf{H}) from the received signal Y and the pilot signal X. Recently, there are works on MIMO detection using
deep learning tools which have shown to work better than other methods such as zero-forcing.
Zero forcing The zero forcing (ZF) detector simply solves for the unknown transmitted signals regardless of the noise. The ZF solution takes the form of: \hat{x}_{ZF} = G_{ZF} z, where G_{ZF} is the pseudo-inverse of matrix H and is given by: G_{ZF} = H^{\dagger} = (H^H H)^{-1} H^H. Despite its simplicity, this approach suffers from noise enhancement. After decoupling by Equation, the ZF solution \hat{x}_{ZF} is either quantized and demapped to binary bits or used to compute the LLR. Note that such an approximation introduces negligible error rate degradation and significantly reduces the computation needed. As the ZF detection decouples the multiple correlated streams into independent streams, the extrinsic LLR of the jth bit of the current symbol in the pth stream resembles the soft-output equalization, and is given by: L_{j}^{(p),E} = \frac{1}{\|g_p\|^2} \left( \max_{X^{(p)} \in \chi_{1,j}} \left[ -|\hat{X}^{(p)}_{ZF} - X^{(p)}|^2 \right] - \max_{X^{(p)} \in \chi_{-1,j}} \left[ -|\hat{X}^{(p)}_{ZF} - X^{(p)}|^2 \right] \right), where g_p denotes the pth column vector of matrix G_{ZF}^T, \hat{X}^{(p)}_{ZF} is the pth element of the symbol vector \hat{x}_{ZF}, and \chi_{b,j} indicates the subset of constellation points whose jth bit has value b.
Minimum mean squared error The minimum mean squared error (MMSE) algorithm detects the transmitted signals, \tilde{\mathbf{x}}, through minimizing the mean squared error (MSE), \mathbb{E}\{(\tilde{\mathbf{x}} - \mathbf{x})(\tilde{\mathbf{x}} - \mathbf{x})^H\}. Computation of the MMSE detection is similar to the ZF detection, thus: \tilde{\mathbf{x}} = \mathbf{G}_{\mathrm{MMSE}} \mathbf{z} \quad \text{(1.1)} where \mathbf{G}_{\mathrm{MMSE}} = \mathbf{R}_{\mathbf{xz}} \mathbf{R}_{\mathbf{zz}}^{-1} \quad \text{(1.2)} Note that the cross-correlation matrix \mathbf{R}_{\mathbf{xz}} is computed as: \mathbf{R}_{\mathbf{xz}} = \mathbb{E}\{\mathbf{x} \mathbf{z}^H\} = \mathbf{R}_{\mathbf{xx}} \mathbf{H}^H = \sigma_x^2 \mathbf{H}^H \quad \text{(1.3)} whereas the auto-correlation matrix is given by: \mathbf{R}_{\mathbf{zz}} = \mathbb{E}\{\mathbf{z} \mathbf{z}^H\} = \sigma_x^2 (\mathbf{H} \mathbf{H}^H) + \sigma_v^2 \mathbf{I}_Q \quad \text{(1.4)} where \sigma_x^2 and \sigma_v^2 are the signal energy and the noise variance, respectively. Combining the above three equations, one obtains: \mathbf{G}_\mathrm{MMSE} = \mathbf{H}^H \left( \mathbf{H} \mathbf{H}^H + \frac{\mathbf{I}_Q}{\rho} \right)^{-1} = \left( \mathbf{H}^H \mathbf{H} + \frac{\mathbf{I}_P}{\rho} \right)^{-1} \mathbf{H}^H with the SNR \rho = \sigma_x^2 / \sigma_v^2. The effective SINR of the signal in the pth stream of the MMSE detection output can be formulated as: \alpha^{(p)} = \mathbf{h}_p^H \left( \mathbf{H}_{(-p)} \mathbf{H}_{(-p)}^H + \frac{\mathbf{I}_Q}{\rho} \right)^{-1} \mathbf{h}_p \quad \text{(1.5)} where \mathbf{H}_{(-p)} represents the matrix \mathbf{H} with the pth column removed, and \mathbf{h}_p is the pth column vector of \mathbf{H}. Equation (1.1) is referred to as the biased MMSE detector because the detected signal power is smaller than the transmitted signal power by a factor of \alpha^{(p)} / (\alpha^{(p)} + 1). To avoid this degradation, an unbiased MMSE detector has been proposed: \hat{\mathbf{x}}_{\mathrm{MMSE}} = \mathbf{B} \mathbf{G}_{\mathrm{MMSE}} \mathbf{z} \quad \text{(1.6)} where \mathbf{B} is a diagonal matrix with the pth diagonal element equal to (\alpha^{(p)} + 1)/\alpha^{(p)}. The unbiased MMSE detection solution has better BER performance than the biased MMSE detection solution. Interestingly, this phenomenon implies that minimizing the MSE does not necessarily minimize the BER. The soft-output unbiased MMSE detection is similar to the soft-output ZF detection, thus: L_j^{(p),E} = \alpha^{(p)} \left( \max_{X^{(p)} \in \chi_{1,j}} \left[ -\left| \hat{X}^{(p)}_{\mathrm{MMSE}} - X^{(p)} \right|^2 \right] - \max_{X^{(p)} \in \chi_{-1,j}} \left[ -\left| \hat{X}^{(p)}_{\mathrm{MMSE}} - X^{(p)} \right|^2 \right] \right)
Ordered successive interference cancellation (OSIC, V-BLAST) Both the ZF and the MMSE detectors are linear. There also exist nonlinear methods that solve the MIMO detection problem for spatially multiplexed MIMO systems. Among these nonlinear algorithms, the OSIC is the simplest one. In the ith iteration, the symbol is detected by: \hat{X}^{(p)} = Q\left( \mathbf{g}_{p(i)}^T \mathbf{z}^{(i)} \right) \quad where Q(\cdot) denotes the quantizer and \mathbf{g}_{p(i)} is the column vector of \mathbf{G}^{(i)T}. Then, the interference from \hat{X}^{(p)} is removed: \mathbf{z}^{(i+1)} = \mathbf{z}^{(i)} - \mathbf{h}_p \hat{X}^{(p)} Note that OSIC is known to perform better than linear detectors at high SNR, but it is worse at low SNR. Therefore, adequately switching between linear detection and OSIC can further improve the error rate performance.
Example Assume that the 2×2 channel matrix is: \mathbf{H} = \begin{bmatrix} H(0,0) & H(0,1) \\ H(1,0) & H(1,1) \end{bmatrix} The OSIC scheme checks rows of matrix \mathbf{G}, and if: then the detected signal can be computed by: \tilde{X}^{(1)} = \frac{1}{\det(\mathbf{H})} \begin{bmatrix} - H(1,0) & H(0,0) \end{bmatrix} \begin{bmatrix} Z(0) \\ Z(1) \end{bmatrix}, \quad \hat{X}^{(1)} = Q(\tilde{X}^{(1)}) and \tilde{X}^{(0)} = \frac{1}{|H(0,0)|^2 + |H(1,0)|^2} \begin{bmatrix} H(0,0)^* & H(1,0)^* \end{bmatrix} \left( \begin{bmatrix} Z(0) \\ Z(1) \end{bmatrix} - \hat{X}^{(1)} \begin{bmatrix} H(0,1) \\ H(1,1) \end{bmatrix} \right), \quad \hat{X}^{(0)} = Q(\tilde{X}^{(0)}) Otherwise, if: then: \tilde{X}^{(0)} = \frac{1}{\det(\mathbf{H})} \begin{bmatrix} - H(0,1) & H(1,1) \end{bmatrix} \begin{bmatrix} Z(1) \\ Z(0) \end{bmatrix}, \quad \hat{X}^{(0)} = Q(\tilde{X}^{(0)}) and \tilde{X}^{(1)} = \frac{1}{|H(1,1)|^2 + |H(0,1)|^2} \begin{bmatrix} H(1,1)^* & H(0,1)^* \end{bmatrix} \left( \begin{bmatrix} Z(1) \\ Z(0) \end{bmatrix} - \hat{X}^{(0)} \begin{bmatrix} H(1,0) \\ H(0,0) \end{bmatrix} \right), \quad \hat{X}^{(1)} = Q(\tilde{X}^{(1)})
Maximum likelihood detection The maximum likelihood (ML) detector exhaustively searches all possible transmitted symbol vectors \mathbf{x} and selects the one that minimizes the Euclidean distance: \hat{\mathbf{x}}_{ML} = \arg\min_{\mathbf{x} \in \chi^{N_t}} \|\mathbf{y} - \mathbf{H} \mathbf{x}\|^2 Although ML provides optimal performance, its complexity grows exponentially with the number of transmit antennas and modulation order, making it impractical for large MIMO systems.
Sphere decoder The ML solution to the MIMO detection problem simultaneously determines the P spatially-multiplexed symbols by: \mathbf{x}_{\mathrm{ML}} = \arg\min_{\mathbf{x} \in \chi^P} \left\| \mathbf{z} - \mathbf{H} \mathbf{x} \right\|^2 = \arg\min_{\mathbf{x} \in \chi^P} M(\mathbf{x}) \quad where \chi^P is the P-fold Cartesian product over constellation set \chi, and M(\mathbf{x}) is the metric value of a symbol vector. The ML detector must search all possible combinations of P symbols, thus the complexity grows exponentially with P. In light of this huge complexity, the
sphere decoder (SD) was proposed to reduce the search space in an ML MIMO detector. The SD only searches those constellation points lying within a P-dimensional hypersphere. This is effective only when the radius d is large enough to include the ML solution: M(\mathbf{x}) The
QR decomposition (QRD) is typically applied to convert the exhaustive search into a constrained tree search: \mathbf{x}_{\mathrm{ML}} = \arg\min_{\mathbf{x} \in \chi^P} \left\| \tilde{\mathbf{z}} - \mathbf{R} \mathbf{x} \right\|^2, \quad \text{where} \quad \tilde{\mathbf{z}} = \mathbf{Q}^H \mathbf{z} \quad Let the (i, j)th element in \mathbf{R} be R_{i,j} and the pth element in \tilde{\mathbf{z}} be \tilde{Z}(p). Then the metric M(\mathbf{x}) can be expressed as: M(\mathbf{x}) = \sum_{p=0}^{P-1} T(p) \quad where the partial distance (PD) is defined as: T(p) = \left| \tilde{Z}(p) - \sum_{j=p}^{P-1} R_{p,j} X(j) \right|^2 . The resulting sphere decoding process becomes a P-level tree search. In level L, only child nodes from parent nodes satisfying: \sum_{p=L+1}^{P-1} T(p) are considered. Once the accumulated partial distance: T(L) + \sum_{p=L+1}^{P-1} T(p) exceeds d^2, all nodes in the subtree rooted at that child node are removed from the search space. When nodes at the bottommost layer are visited, the ML solution: \mathbf{x}_{\mathrm{ML}} = \left[ \hat{X}(0), \hat{X}(1), \dots, \hat{X}(P-1) \right] is the path with the minimum metric value. For example, one starts from variable X(P-1) and discards all the nodes where: T(P-1) > d^2 . Then, for surviving X(P-1) nodes, the SD procedure proceeds to examine all the underlying X(P-2) and again discards those partial vectors [X(P-1), X(P-2)] for which: T(P-1) + T(P-2) > d^2 . Since T(p) > 0, the accumulated PDs increase monotonically, and more nodes are pruned at lower layers. With careful design of the radius and search strategy, sphere decoding can approach ML performance with significantly lower average complexity. Different tree search algorithms significantly affect the sphere decoder's efficiency. In algorithm design, tree search strategies are commonly categorized into three major types:
depth-first search, breadth-first search, and
best-first search.
1. Depth-first tree search As its name implies, this algorithm explores the tree by diving down to the bottommost layer first — called the forward step — until a leaf node is reached or the accumulated partial distance (PD) exceeds the radius constraint. When the forward step cannot proceed, a backward step returns the search to the upper layer, and the algorithm continues down another branch. This process repeats until all nodes satisfying the radius constraint are visited. In the natural span scheme, the next node is selected randomly. Its advantage is that it avoids enumerating all possible child nodes, which is a major source of complexity in the closest-point-first scheme.
Radius update In the closest-point-first method, the next node is chosen based on the smallest PD. When this method is combined with depth-first search, the first full symbol vector found is known as the Babai point. The radius constraint d^2 of the sphere decoder can then be updated to the metric value of the Babai point, effectively shrinking the search space. If another full leaf node is later discovered with an even smaller metric, the radius can again be updated, reducing the search space further.
Characteristics Depth-first tree search is favored for its speed. The first valid full solution (the Babai point) can be found by visiting only P nodes. When combined with radius update, the ML solution is often identified quickly. Thus, this approach is especially suitable for hard-output MIMO detectors. However, its drawbacks include variable latency and runtime complexity
. In some cases, especially under low SNR, the algorithm may need to explore many nodes before finding the ML solution, particularly when it is far from the Babai point. To address this, the run-time constraint concept is introduced: a fixed upper limit is imposed on the number of visited nodes. Once the limit is reached, the search terminates early. In summary, the depth-first tree search method is best suited for hard-output MIMO detection in high-SNR environments due to its speed and efficiency, especially when combined with radius update strategies.
2. Breadth-first tree search The breadth-first tree search algorithm features two main properties: (1) multiple nodes are visited simultaneously within a layer, and (2) only forward traversal is allowed (no backward step). As a result, all symbol vectors that satisfy the radius constraint are found concurrently once the search reaches the bottommost layer. Unlike depth-first search, the sphere radius cannot be dynamically updated in breadth-first search. The initial radius is the sole parameter to balance between complexity and performance. If the radius is too small, no valid solution may be found, and the search must be restarted with a larger radius. Conversely, if the radius is too large, the search may visit too many unnecessary nodes and their descendants. A notable issue with this algorithm is the variable number of visited nodes per layer, which poses implementation challenges, especially in hardware design that must accommodate the worst-case scenario.
K-best algorithm A well-known derivative of the breadth-first search is the K-best tree search. Here, K represents the number of nodes retained at each layer for further downward traversal. Therefore, the **search complexity is fixed**, determined by K and the number of tree layers. Several strategies exist to enumerate the best K nodes at a given layer. One common approach: 1. Enumerate the best child node of each surviving parent node. 2. Among these children, determine the overall best node. 3. From the parent whose best child was selected, enumerate its second-best child. 4. Repeat the selection and enumeration process until the top K nodes are determined. This procedure continues layer by layer. A visualization of the K-best tree search is often represented. In a typical K-best Sphere Decoder (SD), the radius is implicitly set to infinity. However, it is possible to combine a fixed radius constraint with the K-best criterion: among nodes with PD below the radius, only K may be selected. If the radius is small, fewer than K nodes might be available in a layer, making K act more like a layer-wise runtime constraint. The choice of K is critical to achieving a good tradeoff between complexity and detection performance. For instance, in a 4×4 MIMO system, the maximum affordable number of visited nodes may be around 100; thus, K \approx 25. However, in real implementations, K is often smaller. For small K, it is possible that the ancestor of the ML solution is pruned, because although the ML path has the smallest total metric, its early PDs may be larger than other nodes at the same layer.
3. Best-first tree search Unlike depth-first and breadth-first tree search algorithms, the best-first tree search does not follow strict layer boundaries. In this approach, candidate nodes are defined as all nodes that can be visited next, regardless of their depth in the tree. At each traversal step, the best candidate node, i.e., the one with the smallest accumulated partial distance (PD), is visited. To manage cross-layer candidates, a node pool is maintained to store all viable candidate nodes and their PDs. This method achieves the lowest average complexity among ML tree searches. While both depth-first and best-first can achieve the ML solution, their behavior differs fundamentally: • Depth-first cannot confirm the ML solution until all valid paths are explored. • Best-first proceeds from low to high PD values and guarantees the ML solution upon reaching the first full-length leaf node, since its metric must be the lowest. However, the best-first tree search has some limitations: 1. Memory usage: A large node pool is required. 2. Enumeration overhead: Dynamic control logic is needed to manage the pool. 3. Soft-output inefficiency
: Few full-length solutions may be found, which is problematic for soft-output MIMO detection. To address these limitations, two variants are introduced:
Modified best-first tree search The modified best-first (MBF) tree search transforms the M-ary search tree into a binary tree using a first-child/next-sibling structure. Instead of pushing all M children of a node into the pool, only: - the best child in the next layer, and - the best unvisited sibling are added when a node is visited. The current node is then removed from the pool. This encoding reduces the branching factor and keeps the node pool more compact, improving search efficiency while preserving forward and horizontal traversal capabilities. This technique is similar to standard binary tree encoding in data structures.
Modified best-first with fast descent The modified best-first with fast descent (MBF-FD) further improves MBF by combining it with depth-first traversal principles. The idea is to descend quickly along the best child path to reach a leaf node, while pushing best sibling nodes encountered along the way into the pool. Once a leaf is found, a new search begins from the next best node in the pool. This method ensures more full-length paths are explored, which is especially beneficial for soft-output MIMO detection requiring multiple high-quality symbol vectors. It retains the efficiency of MBF while expanding search diversity and depth. == Testing ==