3D Gaussian splatting (
3DGS) is a rasterization-based technique used in the field of
real-time radiance fields for representing and rendering photorealistic 3d scenes from a sparse set of 2D images. It enables the creation of high-quality real-time novel-view scenes by combining multiple photos or videos, addressing a significant challenge in the field. The method represents scenes with 3D
Gaussians that retain properties of continuous volumetric radiance fields, integrating sparse points produced during camera calibration. It introduces an
Anisotropic representation using 3D Gaussians to model radiance fields, along with an interleaved optimization and density control of the Gaussians. A fast visibility-aware rendering algorithm supporting anisotropic splatting is also proposed, catering to GPU usage. • Rasterizer: Implementing a tile-based rasterizer for fast sorting and backward pass, enabling efficient blending of Gaussian components. The method uses differentiable 3D Gaussian splatting, which is unstructured and explicit, allowing rapid rendering and projection to 2D splats. The covariance of the Gaussians can be thought of as configurations of an ellipsoid, which can be mathematically decomposed into a scaling matrix and a rotation matrix. The gradients for all parameters are derived explicitly to overcome any overhead due to
autodiff. Each step of rendering is followed by a comparison to the training views available in the dataset. The optimization uses the difference to create a dense set of 3D Gaussians that represent the scene as accurately as possible.
Using An optimized set of 3D Gaussians is saved onto the computer. Like in the training step, a renderer creates a view from these Gaussians. Several sets of Gaussians can be composed together into larger scenes.
Results and evaluation The authors They compared their method against state-of-the-art techniques like Mip-NeRF360, InstantNGP, and Plenoxels. Quantitative evaluation metrics used were PSNR, L-PIPS, and SSIM. Their fully converged model (30,000 iterations) achieves quality on par with or slightly better than Mip-NeRF360, but with significantly reduced training time (35–45 minutes vs. 48 hours) and faster rendering (real-time vs. 10 seconds per frame). At 7,000 iterations (5–10 minutes of training), their method achieves comparable quality to InstantNGP and Plenoxels. For synthetic bounded scenes (Blender dataset), they achieved state-of-the-art results even with random initialization, starting from 100,000 uniformly random Gaussians.
Limitations Some limitations of the method include: • Elongated artifacts or "splotchy" Gaussians in some areas. • Occasional popping artifacts due to large Gaussians created by the optimization, especially in regions with view-dependent appearance. • Higher memory consumption compared to NeRF-based solutions, though still more compact than previous point-based approaches. • May require hyperparameter tuning (e.g., reducing position learning rate) for very large scenes. • Peak GPU memory consumption during training can be high (over 20 GB) in the current unoptimized prototype. The authors note that some of these limitations could potentially be addressed through future improvements like better culling approaches, antialiasing, regularization, and compression techniques. ==3D temporal Gaussian splatting==