Awesome Gaussian Splatting Latest Papers

Awesome Gaussian Splatting Latest Papershttps://yourusername.github.io/Awesome-Gaussian-SplattingDaily updated feed of the latest Gaussian Splatting papers from arXiv.en-usSat, 04 Apr 2026 06:05:59 +0000GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blendinghttps://arxiv.org/abs/2604.02120https://arxiv.org/abs/2604.02120Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. 3D Gaussian Splatting (3DGS) improves on NeRF with explicit scene representation and an optimized pipeline yet still fails to meet practical real-time demands. Existing acceleration works overlook the evolving Tensor Cores of modern GPUs because 3DGS pipeline lacks General Matrix Multiplication (GEMM) operations. This paper proposes GEMM-GS, an ...Thu, 02 Apr 2026 14:56:06 +0000Haomin Li, Bowen Zhu, Fangxin Liu, Zongwu Wang, Xinran LiangHaomin Li, Bowen Zhu, Fangxin Liu, Zongwu Wang, Xinran Liang et al.ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstructionhttps://arxiv.org/abs/2604.02003https://arxiv.org/abs/2604.02003Generating ground-level views and coherent 3D site models from aerial-only imagery is challenging due to extreme viewpoint changes, missing intermediate observations, and large scale variations. Existing methods either refine renderings post-hoc, often producing geometrically inconsistent results, or rely on multi-altitude ground-truth, which is rarely available. Gaussian Splatting and diffusion-based refinements improve fidelity under small variations but fail under wide aerial-to-ground gap...Thu, 02 Apr 2026 13:09:05 +0000DynamicGenerationSirshapan Mitra, Yogesh S. RawatResonance4D: Frequency-Domain Motion Supervision for Preset-Free Physical Parameter Learning in 4D Dynamic Physical Scene Simulationhttps://arxiv.org/abs/2604.01994https://arxiv.org/abs/2604.01994Physics-driven 4D dynamic simulation from static 3D scenes remains constrained by an overlooked contradiction: reliable motion supervision often relies on online video diffusion or optical-flow pipelines whose computational cost exceeds that of the simulator itself. Existing methods further simplify inverse physical modeling by optimizing only partial material parameters, limiting realism in scenes with complex materials and dynamics. We present Resonance4D, a physics-driven 4D dynamic simula...Thu, 02 Apr 2026 13:00:22 +0000CompressionDynamicGenerationPhysicsSegmentationChangshe Zhang, Jie Feng, Siyu Chen, Guanbin Li, Ronghua ShangChangshe Zhang, Jie Feng, Siyu Chen, Guanbin Li, Ronghua Shang et al.GS^2: Graph-based Spatial Distribution Optimization for Compact 3D Gaussian Splattinghttps://arxiv.org/abs/2604.01884https://arxiv.org/abs/2604.018843D Gaussian Splatting (3DGS) has demonstrated breakthrough performance in novel view synthesis and real-time rendering. Nevertheless, its practicality is constrained by the high memory cost due to a huge number of Gaussian points. Many pruning-based 3DGS variants have been proposed for memory saving, but often compromise spatial consistency and may lead to rendering artifacts. To address this issue, we propose graph-based spatial distribution optimization for compact 3D Gaussian Splatting (GS...Thu, 02 Apr 2026 10:41:51 +0000CompressionDynamicRenderingXianben Yang, Tao Wang, Yuxuan Li, Yi Jin, Haibin LingA3R: Agentic Affordance Reasoning via Cross-Dimensional Evidence in 3D Gaussian Sceneshttps://arxiv.org/abs/2604.01882https://arxiv.org/abs/2604.01882Affordance reasoning in 3D Gaussian scenes aims to identify the region that supports the action specified by a given text instruction in complex environments. Existing methods typically cast this problem as one-shot prediction from static scene observations, assuming sufficient evidence is already available for reasoning. However, in complex 3D scenes, many failure cases arise not from weak prediction capacity, but from incomplete task-relevant evidence under fixed observations. To address th...Thu, 02 Apr 2026 10:40:51 +0000SegmentationSparse ViewDi Li, Jie Feng, Guanbin Li, Ronghua Shang, Yuhui ZhengDi Li, Jie Feng, Guanbin Li, Ronghua Shang, Yuhui Zheng et al.FaCT-GS: Fast and Scalable CT Reconstruction with Gaussian Splattinghttps://arxiv.org/abs/2604.01844https://arxiv.org/abs/2604.01844Gaussian Splatting (GS) has emerged as a dominating technique for image rendering and has quickly been adapted for the X-ray Computed Tomography (CT) reconstruction task. However, despite being on par or better than many of its predecessors, the benefits of GS are typically not substantial enough to motivate a transition from well-established reconstruction algorithms. This paper addresses the most significant remaining limitations of the GS-based approach by introducing FaCT-GS, a framework ...Thu, 02 Apr 2026 09:58:00 +0000CompressionMedicalPawel Tomasz Pieta, Rasmus Juul Pedersen, Sina Borgi, Jakob Sauer Jørgensen, Jens Wenzel AndreasenPawel Tomasz Pieta, Rasmus Juul Pedersen, Sina Borgi, Jakob Sauer Jørgensen, Jens Wenzel Andreasen et al.Director: Instance-aware Gaussian Splatting for Dynamic Scene Modeling and Understandinghttps://arxiv.org/abs/2604.01678https://arxiv.org/abs/2604.01678Volumetric video seeks to model dynamic scenes as temporally coherent 4D representations. While recent Gaussian-based approaches achieve impressive rendering fidelity, they primarily emphasize appearance but are largely agnostic to instance-level structure, limiting stable tracking and semantic reasoning in highly dynamic scenarios. In this paper, we present Director, a unified spatio-temporal Gaussian representation that jointly models human performance, high-fidelity rendering, and instance...Thu, 02 Apr 2026 06:29:53 +0000DynamicLanguageMeshSegmentationYuheng Jiang, Yiwen Cai, Zihao Wang, Yize Wu, Sicheng LiYuheng Jiang, Yiwen Cai, Zihao Wang, Yize Wu, Sicheng Li et al.F3DGS: Federated 3D Gaussian Splatting for Decentralized Multi-Agent World Modelinghttps://arxiv.org/abs/2604.01605https://arxiv.org/abs/2604.01605We present F3DGS, a federated 3D Gaussian Splatting framework for decentralized multi-agent 3D reconstruction. Existing 3DGS pipelines assume centralized access to all observations, which limits their applicability in distributed robotic settings where agents operate independently, and centralized data aggregation may be restricted. Directly extending centralized training to multi-agent systems introduces communication overhead and geometric inconsistency. F3DGS first constructs a shared geom...Thu, 02 Apr 2026 04:29:44 +0000Autonomous DrivingRoboticsMorui Zhu, Mohammad Dehghani Tezerjani, Mátyás Szántó, Márton Vaitkus, Song FuMorui Zhu, Mohammad Dehghani Tezerjani, Mátyás Szántó, Márton Vaitkus, Song Fu et al.Satellite-Free Training for Drone-View Geo-Localizationhttps://arxiv.org/abs/2604.01581https://arxiv.org/abs/2604.01581Drone-view geo-localization (DVGL) aims to determine the location of drones in GPS-denied environments by retrieving the corresponding geotagged satellite tile from a reference gallery given UAV observations of a location. In many existing formulations, these observations are represented by a single oblique UAV image. In contrast, our satellite-free setting is designed for multi-view UAV sequences, which are used to construct a geometry-normalized UAV-side location representation before cross...Thu, 02 Apr 2026 03:48:53 +0000CompressionEditingGenerationTao Liu, Yingzhi Zhang, Kan Ren, Xiaoqi ZhaoColorGradedGaussians: Palette-Based Color Grading for 3D Gaussian Splatting via View-Space Sparse Decompositionhttps://arxiv.org/abs/2604.01551https://arxiv.org/abs/2604.01551Professional color editing requires precise control over both color (hue and saturation) and lightness, ideally through separate, independent controls. We present a real-time interactive color editing framework for 3D Gaussian Splatting (3DGS) that enables palette-based recoloring, per-palette tone curves for color-aware lightness adjustment, and accurate pixel-level constraints -- capabilities unavailable in prior palette-based 3DGS methods. Existing approaches decompose colors at the primit...Thu, 02 Apr 2026 02:54:01 +0000EditingCheng-Kang Ted Chao, Yotam GingoldBetter Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatarshttps://arxiv.org/abs/2604.01447https://arxiv.org/abs/2604.01447Recent 3D Gaussian splatting methods built atop SMPL achieve remarkable visual fidelity while continually increasing the complexity of the overall training architecture. We demonstrate that much of this complexity is unnecessary: by replacing SMPL with the Momentum Human Rig (MHR), estimated via SAM-3D-Body, a minimal pipeline with no learned deformations or pose-dependent corrections achieves the highest reported PSNR and competitive or superior LPIPS and SSIM on PeopleSnapshot and ZJU-MoCap...Wed, 01 Apr 2026 22:42:57 +0000AvatarMeshPhysicsDerek AustinLESV: Language Embedded Sparse Voxel Fusion for Open-Vocabulary 3D Scene Understandinghttps://arxiv.org/abs/2604.01388https://arxiv.org/abs/2604.01388Recent advancements in open-vocabulary 3D scene understanding heavily rely on 3D Gaussian Splatting (3DGS) to register vision-language features into 3D space. However, we identify two critical limitations in these approaches: the spatial ambiguity arising from unstructured, overlapping Gaussians which necessitates probabilistic feature registration, and the multi-level semantic ambiguity caused by pooling features over object-level masks, which dilutes fine-grained details. To address these c...Wed, 01 Apr 2026 20:48:06 +0000LanguageSegmentationFusang Wang, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry TsishkouFusang Wang, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou et al.TRACE: High-Fidelity 3D Scene Editing via Tangible Reconstruction and Geometry-Aligned Contextual Video Maskinghttps://arxiv.org/abs/2604.01207https://arxiv.org/abs/2604.01207We present TRACE, a mesh-guided 3DGS editing framework that achieves automated, high-fidelity scene transformation. By anchoring video diffusion with explicit 3D geometry, TRACE uniquely enables fine-grained, part-level manipulatio--such as local pose shifting or component replacemen--while preserving the structural integrity of the central subject, a capability largely absent in existing editing methods. Our approach comprises three key stages: (1) Multi-view 3D-Anchor Synthesis, which lever...Wed, 01 Apr 2026 17:51:00 +0000DynamicEditingGenerationMeshPhysicsRoboticsSparse ViewJiyuan Hu, Zechuan Zhang, Zongxin Yang, Yi YangNeural Harmonic Textures for High-Quality Primitive Based Neural Reconstructionhttps://arxiv.org/abs/2604.01204https://arxiv.org/abs/2604.01204Primitive-based methods such as 3D Gaussian Splatting have recently become the state-of-the-art for novel-view synthesis and related reconstruction tasks. Compared to neural fields, these representations are more flexible, adaptive, and scale better to large scenes. However, the limited expressivity of individual primitives makes modeling high-frequency detail challenging. We introduce Neural Harmonic Textures, a neural representation approach that anchors latent feature vectors on a virtual ...Wed, 01 Apr 2026 17:48:22 +0000RenderingSegmentationJorge Condor, Nicolas Moenne-Loccoz, Merlin Nimier-David, Piotr Didyk, Zan GojcicJorge Condor, Nicolas Moenne-Loccoz, Merlin Nimier-David, Piotr Didyk, Zan Gojcic et al.Diff3R: Feed-forward 3D Gaussian Splatting with Uncertainty-aware Differentiable Optimizationhttps://arxiv.org/abs/2604.01030https://arxiv.org/abs/2604.01030Recent advances in 3D Gaussian Splatting (3DGS) present two main directions: feed-forward models offer fast inference in sparse-view settings, while per-scene optimization yields high-quality renderings but is computationally expensive. To combine the benefits of both, we introduce Diff3R, a novel framework that explicitly bridges feed-forward prediction and test-time optimization. By incorporating a differentiable 3DGS optimization layer directly into the training loop, our network learns to...Wed, 01 Apr 2026 15:40:20 +0000Sparse ViewYueh-Cheng Liu, Jozef Hladký, Matthias Nießner, Angela DaiDLWM: Dual Latent World Models enable Holistic Gaussian-centric Pre-training in Autonomous Drivinghttps://arxiv.org/abs/2604.00969https://arxiv.org/abs/2604.00969Vision-based autonomous driving has gained much attention due to its low costs and excellent performance. Compared with dense BEV (Bird's Eye View) or sparse query models, Gaussian-centric method is a comprehensive yet sparse representation by describing scene with 3D semantic Gaussians. In this paper, we introduce DLWM, a novel paradigm with Dual Latent World Models specifically designed to enable holistic gaussian-centric pre-training in autonomous driving using two stages. In the first sta...Wed, 01 Apr 2026 14:41:21 +0000Autonomous DrivingDynamicRoboticsSegmentationYiyao Zhu, Ying Xue, Haiming Zhang, Guangfeng Jiang, Wending ZhouYiyao Zhu, Ying Xue, Haiming Zhang, Guangfeng Jiang, Wending Zhou et al.Autoregressive Appearance Prediction for 3D Gaussian Avatarshttps://arxiv.org/abs/2604.00928https://arxiv.org/abs/2604.00928A photorealistic and immersive human avatar experience demands capturing fine, person-specific details such as cloth and hair dynamics, subtle facial expressions, and characteristic motion patterns. Achieving this requires large, high-quality datasets, which often introduce ambiguities and spurious correlations when very similar poses correspond to different appearances. Models that fit these details during training can overfit and produce unstable, abrupt appearance changes for novel poses. ...Wed, 01 Apr 2026 14:07:14 +0000AvatarCompressionDynamicPhysicsMichael Steiner, Zhang Chen, Alexander Richard, Vasu Agrawal, Markus SteinbergerMichael Steiner, Zhang Chen, Alexander Richard, Vasu Agrawal, Markus Steinberger et al.Compact Keyframe-Optimized Multi-Agent Gaussian Splatting SLAMhttps://arxiv.org/abs/2604.00804https://arxiv.org/abs/2604.00804Efficient multi-agent 3D mapping is essential for robotic teams operating in unknown environments, but dense representations hinder real-time exchange over constrained communication links. In multi-agent Simultaneous Localization and Mapping (SLAM), systems typically rely on a centralized server to merge and optimize the local maps produced by individual agents. However, sharing these large map representations, particularly those generated by recent methods such as Gaussian Splatting, becomes...Wed, 01 Apr 2026 12:11:20 +0000CompressionGenerationRoboticsSLAMMonica M. Q. Li, Pierre-Yves Lajoie, Jialiang Liu, Giovanni BeltrameDirectFisheye-GS: Enabling Native Fisheye Input in Gaussian Splatting with Cross-View Joint Optimizationhttps://arxiv.org/abs/2604.00648https://arxiv.org/abs/2604.006483D Gaussian Splatting (3DGS) has enabled efficient 3D scene reconstruction from everyday images with real-time, high-fidelity rendering, greatly advancing VR/AR applications. Fisheye cameras, with their wider field of view (FOV), promise high-quality reconstructions from fewer inputs and have recently attracted much attention. However, since 3DGS relies on rasterization, most subsequent works involving fisheye camera inputs first undistort images before training, which introduces two problems...Wed, 01 Apr 2026 09:00:04 +0000Zhengxian Yang, Fei Xie, Xutao Xue, Rui Zhang, Taicheng HuangZhengxian Yang, Fei Xie, Xutao Xue, Rui Zhang, Taicheng Huang et al.TRiGS: Temporal Rigid-Body Motion for Scalable 4D Gaussian Splattinghttps://arxiv.org/abs/2604.00538https://arxiv.org/abs/2604.00538Recent 4D Gaussian Splatting (4DGS) methods achieve impressive dynamic scene reconstruction but often rely on piecewise linear velocity approximations and short temporal windows. This disjointed modeling leads to severe temporal fragmentation, forcing primitives to be repeatedly eliminated and regenerated to track complex nonlinear dynamics. This makeshift approximation eliminates the long-term temporal identity of objects and causes an inevitable proliferation of Gaussians, hindering scalabi...Wed, 01 Apr 2026 06:35:13 +0000DynamicGenerationSuwoong Yeom, Joonsik Nam, Seunggyu Choi, Lucas Yunkyu Lee, Sangmin KimSuwoong Yeom, Joonsik Nam, Seunggyu Choi, Lucas Yunkyu Lee, Sangmin Kim et al.RT-GS: Gaussian Splatting with Reflection and Transmittance Primitiveshttps://arxiv.org/abs/2604.00509https://arxiv.org/abs/2604.00509Gaussian Splatting is a powerful tool for reconstructing diffuse scenes, but it struggles to simultaneously model specular reflections and the appearance of objects behind semi-transparent surfaces. These specular reflections and transmittance are essential for realistic novel view synthesis, and existing methods do not properly incorporate the underlying physical processes to simulate them. To address this issue, we propose RT-GS, a unified framework that integrates a microfacet material mod...Wed, 01 Apr 2026 05:50:03 +0000PhysicsRenderingKunnong Zeng, Chensheng Peng, Yichen Xie, Masayoshi Tomizuka, Cem YukselARGS: Auto-Regressive Gaussian Splatting via Parallel Progressive Next-Scale Predictionhttps://arxiv.org/abs/2604.00494https://arxiv.org/abs/2604.00494Auto-regressive frameworks for next-scale prediction of 2D images have demonstrated strong potential for producing diverse and sophisticated content by progressively refining a coarse input. However, extending this paradigm to 3D object generation remains largely unexplored. In this paper, we introduce auto-regressive Gaussian splatting (ARGS), a framework for making next-scale predictions in parallel for generation according to levels of detail. We propose a Gaussian simplification strategy ...Wed, 01 Apr 2026 05:21:59 +0000GenerationQuanyuan Ruan, Kewei Shi, Jiabao Lei, Xifeng Gao, Xiaoguang HanGRVS: a Generalizable and Recurrent Approach to Monocular Dynamic View Synthesishttps://arxiv.org/abs/2603.29734https://arxiv.org/abs/2603.29734Synthesizing novel views from monocular videos of dynamic scenes remains a challenging problem. Scene-specific methods that optimize 4D representations with explicit motion priors often break down in highly dynamic regions where multi-view information is hard to exploit. Diffusion-based approaches that integrate camera control into large pre-trained models can produce visually plausible videos but frequently suffer from geometric inconsistencies across both static and dynamic areas. Both fami...Tue, 31 Mar 2026 13:35:14 +0000DynamicGenerationRenderingThomas Tanay, Mohammed Brahimi, Michal Nazarczuk, Qingwen Zhang, Sibi Catley-ChandarThomas Tanay, Mohammed Brahimi, Michal Nazarczuk, Qingwen Zhang, Sibi Catley-Chandar et al.AA-Splat: Anti-Aliased Feed-forward Gaussian Splattinghttps://arxiv.org/abs/2603.29394https://arxiv.org/abs/2603.29394Feed-forward 3D Gaussian Splatting (FF-3DGS) emerges as a fast and robust solution for sparse-view 3D reconstruction and novel view synthesis (NVS). However, existing FF-3DGS methods are built on incorrect screen-space dilation filters, causing severe rendering artifacts when rendering at out-of-distribution sampling rates. We firstly propose an FF-3DGS model, called AA-Splat, to enable robust anti-aliased rendering at any resolution. AA-Splat utilizes an opacity-balanced band-limiting (OBBL)...Tue, 31 Mar 2026 07:59:51 +0000GenerationRenderingSparse ViewTaewoo Suh, Sungpyo Kim, Jongmin Park, Munchurl KimMotionScale: Reconstructing Appearance, Geometry, and Motion of Dynamic Scenes with Scalable 4D Gaussian Splattinghttps://arxiv.org/abs/2603.29296https://arxiv.org/abs/2603.29296Realistic reconstruction of dynamic 4D scenes from monocular videos is essential for understanding the physical world. Despite recent progress in neural rendering, existing methods often struggle to recover accurate 3D geometry and temporally consistent motion in complex environments. To address these challenges, we propose MotionScale, a 4D Gaussian Splatting framework that scales efficiently to large scenes and extended sequences while maintaining high-fidelity structural and motion coheren...Tue, 31 Mar 2026 06:03:59 +0000DynamicPhysicsHaoran Zhou, Gim Hee LeeLightHarmony3D: Harmonizing Illumination and Shadows for Object Insertion in 3D Gaussian Splattinghttps://arxiv.org/abs/2603.29209https://arxiv.org/abs/2603.292093D Gaussian Splatting (3DGS) enables high-fidelity reconstruction of scene geometry and appearance. Building on this capability, inserting external mesh objects into reconstructed 3DGS scenes enables interactive editing and content augmentation for immersive applications such as AR/VR, virtual staging, and digital content creation. However, achieving physically consistent lighting and shadows for mesh insertion remains challenging, as it requires accurate scene illumination estimation and mul...Tue, 31 Mar 2026 03:26:30 +0000EditingGenerationMeshPhysicsTianyu Huang, Zhenyang Ren, Zhenchen Wan, Jiyang Zheng, Wenjie WangTianyu Huang, Zhenyang Ren, Zhenchen Wan, Jiyang Zheng, Wenjie Wang et al.Efficient Camera Pose Augmentation for View Generalization in Robotic Policy Learninghttps://arxiv.org/abs/2603.29192https://arxiv.org/abs/2603.29192Prevailing 2D-centric visuomotor policies exhibit a pronounced deficiency in novel view generalization, as their reliance on static observations hinders consistent action mapping across unseen views. In response, we introduce GenSplat, a feed-forward 3D Gaussian Splatting framework that facilitates view-generalized policy learning through novel view rendering. GenSplat employs a permutation-equivariant architecture to reconstruct high-fidelity 3D scenes from sparse, uncalibrated inputs in a s...Tue, 31 Mar 2026 02:56:10 +0000RenderingRoboticsSen Wang, Huaiyi Dong, Jingyi Tian, Jiayi Li, Zhuo YangSen Wang, Huaiyi Dong, Jingyi Tian, Jiayi Li, Zhuo Yang et al.Hierarchical Visual Relocalization with Nearest View Synthesis from Feature Gaussian Splattinghttps://arxiv.org/abs/2603.29185https://arxiv.org/abs/2603.29185Visual relocalization is a fundamental task in the field of 3D computer vision, estimating a camera's pose when it revisits a previously known scene. While point-based hierarchical relocalization methods have shown strong scalability and efficiency, they are often limited by sparse image observations and weak feature matching. In this work, we propose SplatHLoc, a novel hierarchical visual relocalization framework that uses Feature Gaussian Splatting as the scene representation. To address th...Tue, 31 Mar 2026 02:51:14 +0000RenderingHuaqi Tao, Bingxi Liu, Guangcheng Chen, Fulin Tang, Li HeHuaqi Tao, Bingxi Liu, Guangcheng Chen, Fulin Tang, Li He et al.UltraG-Ray: Physics-Based Gaussian Ray Casting for Novel Ultrasound View Synthesishttps://arxiv.org/abs/2603.29022https://arxiv.org/abs/2603.29022Novel view synthesis (NVS) in ultrasound has gained attention as a technique for generating anatomically plausible views beyond the acquired frames, offering new capabilities for training clinicians or data augmentation. However, current methods struggle with complex tissue and view-dependent acoustic effects. Physics-based NVS aims to address these limitations by including the ultrasound image formation process into the simulation. Recent approaches combine a learnable implicit scene represe...Mon, 30 Mar 2026 21:30:05 +0000GenerationMedicalPhysicsRenderingFelix Duelmer, Jakob Klaushofer, Magdalena Wysocki, Nassir Navab, Mohammad Farid AzampourGleanmer: A 6 mW SoC for Real-Time 3D Gaussian Occupancy Mappinghttps://arxiv.org/abs/2603.29005https://arxiv.org/abs/2603.29005High-fidelity 3D occupancy mapping is essential for many edge-based applications (such as AR/VR and autonomous navigation) but is limited by power constraints. We present Gleanmer, a system on chip (SoC) with an accelerator for GMMap, a 3D occupancy map using Gaussians. Through algorithm-hardware co-optimizations for direct computation and efficient reuse of these compact Gaussians, Gleanmer reduces construction and query energy by up to 63% and 81%, respectively. Approximate computation on G...Mon, 30 Mar 2026 21:08:52 +0000CompressionRoboticsZih-Sing Fu, Peter Zhi Xuan Li, Sertac Karaman, Vivienne SzeLG-HCC: Local Geometry-Aware Hierarchical Context Compression for 3D Gaussian Splattinghttps://arxiv.org/abs/2603.28431https://arxiv.org/abs/2603.28431Although 3D Gaussian Splatting (3DGS) enables high-fidelity real-time rendering, its prohibitive storage overhead severely hinders practical deployment. Recent anchor-based 3DGS compression schemes reduce gaussian redundancy through some advanced context models. However, they overlook explicit geometric dependencies, leading to structural degradation and suboptimal ratedistortion performance. In this paper, we propose a Local Geometry-aware Hierarchical Context Compression framework for 3DGS(...Mon, 30 Mar 2026 13:39:35 +0000CompressionRenderingXuan Deng, Xiandong Meng, Hengyu Man, Qiang Zhu, Tiange ZhangXuan Deng, Xiandong Meng, Hengyu Man, Qiang Zhu, Tiange Zhang et al.ObjectMorpher: 3D-Aware Image Editing via Deformable 3DGS Modelshttps://arxiv.org/abs/2603.28152https://arxiv.org/abs/2603.28152Achieving precise, object-level control in image editing remains challenging: 2D methods lack 3D awareness and often yield ambiguous or implausible results, while existing 3D-aware approaches rely on heavy optimization or incomplete monocular reconstructions. We present ObjectMorpher, a unified, interactive framework that converts ambiguous 2D edits into geometry-grounded operations. ObjectMorpher lifts target instances with an image-to-3D generator into editable 3D Gaussian Splatting (3DGS),...Mon, 30 Mar 2026 08:15:29 +0000DynamicEditingGenerationPhysicsRoboticsYuhuan Xie, Aoxuan Pan, Yi-Hua Huang, Chirui Chang, Peng DaiYuhuan Xie, Aoxuan Pan, Yi-Hua Huang, Chirui Chang, Peng Dai et al.SVGS: Single-View to 3D Object Editing via Gaussian Splattinghttps://arxiv.org/abs/2603.28126https://arxiv.org/abs/2603.28126Text-driven 3D scene editing has attracted considerable interest due to its convenience and user-friendliness. However, methods that rely on implicit 3D representations, such as Neural Radiance Fields (NeRF), while effective in rendering complex scenes, are hindered by slow processing speeds and limited control over specific regions of the scene. Moreover, existing approaches, including Instruct-NeRF2NeRF and GaussianEditor, which utilize multi-view editing strategies, frequently produce inco...Mon, 30 Mar 2026 07:45:03 +0000EditingGenerationPengcheng Xue, Yan Tian, Qiutao Song, Ziyi Wang, Linyang HePengcheng Xue, Yan Tian, Qiutao Song, Ziyi Wang, Linyang He et al.\textit{4DSurf}: High-Fidelity Dynamic Scene Surface Reconstructionhttps://arxiv.org/abs/2603.28064https://arxiv.org/abs/2603.28064This paper addresses the problem of dynamic scene surface reconstruction using Gaussian Splatting (GS), aiming to recover temporally consistent geometry. While existing GS-based dynamic surface reconstruction methods can yield superior reconstruction, they are typically limited to either a single object or objects with only small deformations, struggling to maintain temporally consistent surface reconstruction of large deformations over time. We propose ``\textit{4DSurf}'', a novel and unifie...Mon, 30 Mar 2026 06:09:37 +0000AvatarDynamicMeshPhysicsSegmentationSparse ViewRenjie Wu, Hongdong Li, Jose M. Alvarez, Miaomiao LiuPhysically Inspired Gaussian Splatting for HDR Novel View Synthesishttps://arxiv.org/abs/2603.28020https://arxiv.org/abs/2603.28020High dynamic range novel view synthesis (HDR-NVS) reconstructs scenes with dynamic details by fusing multi-exposure low dynamic range (LDR) views, yet it struggles to capture ambient illumination-dependent appearance. Implicitly supervising HDR content by constraining tone-mapped results fails in correcting abnormal HDR values, and results in limited gradients for Gaussians in under/over-exposed regions. To this end, we introduce PhysHDR-GS, a physically inspired HDR-NVS framework that models...Mon, 30 Mar 2026 04:27:39 +0000DynamicPhysicsRenderingHuimin Zeng, Yue Bai, Hailing Wang, Yun FuDipGuava: Disentangling Personalized Gaussian Features for 3D Head Avatars from Monocular Videohttps://arxiv.org/abs/2603.28003https://arxiv.org/abs/2603.28003While recent 3D head avatar creation methods attempt to animate facial dynamics, they often fail to capture personalized details, limiting realism and expressiveness. To fill this gap, we present DipGuava (Disentangled and Personalized Gaussian UV Avatar), a novel 3D Gaussian head avatar creation method that successfully generates avatars with personalized attributes from monocular video. DipGuava is the first method to explicitly disentangle facial appearance into two complementary component...Mon, 30 Mar 2026 03:50:23 +0000AvatarDynamicGenerationPhysicsSegmentationJeonghaeng Lee, Seok Keun Choi, Zhixuan Li, Weisi Lin, Sanghoon LeeGS3LAM: Gaussian Semantic Splatting SLAMhttps://arxiv.org/abs/2603.27781https://arxiv.org/abs/2603.27781Recently, the multi-modal fusion of RGB, depth, and semantics has shown great potential in dense Simultaneous Localization and Mapping (SLAM). However, a prerequisite for generating consistent semantic maps is the availability of dense, efficient, and scalable scene representations. Existing semantic SLAM systems based on explicit representations are often limited by resolution and an inability to predict unknown areas. Conversely, implicit representations typically rely on time-consuming ray...Sun, 29 Mar 2026 17:25:18 +0000GenerationRenderingSLAMSegmentationLinfei Li, Lin Zhang, Zhong Wang, Ying ShenSGS-Intrinsic: Semantic-Invariant Gaussian Splatting for Sparse-View Indoor Inverse Renderinghttps://arxiv.org/abs/2603.27516https://arxiv.org/abs/2603.27516We present SGS-Intrinsic, an indoor inverse rendering framework that works well for sparse-view images. Unlike existing 3D Gaussian Splatting (3DGS) based methods that focus on object-centric reconstruction and fail to work under sparse view settings, our method allows to achieve high-quality geometry reconstruction and accurate disentanglement of material and illumination. The core idea is to construct a dense and geometry-consistent Gaussian semantic field guided by semantic and geometric p...Sun, 29 Mar 2026 04:45:31 +0000SegmentationSparse ViewJiahao Niu, Rongjia Zheng, Wenju Xu, Wei-Shi Zheng, Qing ZhangFrom None to All: Self-Supervised 3D Reconstruction via Novel View Synthesishttps://arxiv.org/abs/2603.27455https://arxiv.org/abs/2603.27455In this paper, we introduce NAS3R, a self-supervised feed-forward framework that jointly learns explicit 3D geometry and camera parameters with no ground-truth annotations and no pretrained priors. During training, NAS3R reconstructs 3D Gaussians from uncalibrated and unposed context views and renders target views using its self-predicted camera parameters, enabling self-supervised training from 2D photometric supervision. To ensure stable convergence, NAS3R integrates reconstruction and came...Sun, 29 Mar 2026 00:28:38 +0000RenderingRanran Huang, Weixun Luo, Ye Mao, Krystian MikolajczykNimbusGS: Unified 3D Scene Reconstruction under Hybrid Weatherhttps://arxiv.org/abs/2603.27228https://arxiv.org/abs/2603.27228We present NimbusGS, a unified framework for reconstructing high-quality 3D scenes from degraded multi-view inputs captured under diverse and mixed adverse weather conditions. Unlike existing methods that target specific weather types, NimbusGS addresses the broader challenge of generalization by modeling the dual nature of weather: a continuous, view-consistent medium that attenuates light, and dynamic, view-dependent particles that cause scattering and occlusion. To capture this structure, ...Sat, 28 Mar 2026 10:46:29 +0000DynamicPhysicsYanying Li, Jinyang Li, Shengfeng He, Yangyang Xu, Junyu DongYanying Li, Jinyang Li, Shengfeng He, Yangyang Xu, Junyu Dong et al.DiffSoup: Direct Differentiable Rasterization of Triangle Soup for Extreme Radiance Field Simplificationhttps://arxiv.org/abs/2603.27151https://arxiv.org/abs/2603.27151Radiance field reconstruction aims to recover high-quality 3D representations from multi-view RGB images. Recent advances, such as 3D Gaussian splatting, enable real-time rendering with high visual fidelity on sufficiently powerful graphics hardware. However, efficient online transmission and rendering across diverse platforms requires drastic model simplification, reducing the number of primitives by several orders of magnitude. We introduce DiffSoup, a radiance field representation that emp...Sat, 28 Mar 2026 06:00:57 +0000RenderingKenji Tojo, Bernd Bickel, Nobuyuki UmetaniDetailed Geometry and Appearance from Opportunistic Motionhttps://arxiv.org/abs/2603.26665https://arxiv.org/abs/2603.26665Reconstructing 3D geometry and appearance from a sparse set of fixed cameras is a foundational task with broad applications, yet it remains fundamentally constrained by the limited viewpoints. We show that this bound can be broken by exploiting opportunistic object motion: as a person manipulates an object~(e.g., moving a chair or lifting a mug), the static cameras effectively ``orbit'' the object in its local coordinate frame, providing additional virtual viewpoints. Harnessing this object m...Fri, 27 Mar 2026 17:59:16 +0000DynamicRoboticsSparse ViewRyosuke Hirai, Kohei Yamashita, Antoine Guédon, Ryo Kawahara, Vincent LepetitRyosuke Hirai, Kohei Yamashita, Antoine Guédon, Ryo Kawahara, Vincent Lepetit et al.GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generationhttps://arxiv.org/abs/2603.26661https://arxiv.org/abs/2603.26661Most recent advances in 3D generative modeling rely on diffusion or flow-matching formulations. We instead explore a fully autoregressive alternative and introduce GaussianGPT, a transformer-based model that directly generates 3D Gaussians via next-token prediction, thus facilitating full 3D scene generation. We first compress Gaussian primitives into a discrete latent grid using a sparse 3D convolutional autoencoder with vector quantization. The resulting tokens are serialized and modeled us...Fri, 27 Mar 2026 17:58:05 +0000CompressionGenerationNicolas von Lützow, Barbara Rössle, Katharina Schmid, Matthias NießnerDrive-Through 3D Vehicle Exterior Reconstruction via Dynamic-Scene SfM and Distortion-Aware Gaussian Splattinghttps://arxiv.org/abs/2603.26638https://arxiv.org/abs/2603.26638High-fidelity 3D reconstruction of vehicle exteriors improves buyer confidence in online automotive marketplaces, but generating these models in cluttered dealership drive-throughs presents severe technical challenges. Unlike static-scene photogrammetry, this setting features a dynamic vehicle moving against heavily cluttered, static backgrounds. This problem is further compounded by wide-angle lens distortion, specular automotive paint, and non-rigid wheel rotations that violate classical ep...Fri, 27 Mar 2026 17:42:42 +0000DynamicGenerationSegmentationNitin Kulkarni, Akhil Devarashetti, Charlie Cluss, Livio Forte, Philip SchneiderNitin Kulkarni, Akhil Devarashetti, Charlie Cluss, Livio Forte, Philip Schneider et al.Scene Grounding In the Wildhttps://arxiv.org/abs/2603.26584https://arxiv.org/abs/2603.26584Reconstructing accurate 3D models of large-scale real-world scenes from unstructured, in-the-wild imagery remains a core challenge in computer vision, especially when the input views have little or no overlap. In such cases, existing reconstruction pipelines often produce multiple disconnected partial reconstructions or erroneously merge non-overlapping regions into overlapping geometry. In this work, we propose a framework that grounds each partial reconstruction to a complete reference mode...Fri, 27 Mar 2026 16:41:20 +0000SegmentationTamir Cohen, Leo Segre, Shay Shomer-Chai, Shai Avidan, Hadar Averbuch-ElorGLINT: Modeling Scene-Scale Transparency via Gaussian Radiance Transporthttps://arxiv.org/abs/2603.26181https://arxiv.org/abs/2603.26181While 3D Gaussian splatting has emerged as a powerful paradigm, it fundamentally fails to model transparency such as glass panels. The core challenge lies in decoupling the intertwined radiance contributions from transparent interfaces and the transmitted geometry observed through the glass. We present GLINT, a framework that models scene-scale transparency through explicit decomposed Gaussian representation. GLINT reconstructs the primary interface and models reflected and transmitted radian...Fri, 27 Mar 2026 08:52:12 +0000RenderingYoungju Na, Jaeseong Yun, Soohyun Ryu, Hyunsu Kim, Sung-Eui YoonYoungju Na, Jaeseong Yun, Soohyun Ryu, Hyunsu Kim, Sung-Eui Yoon et al.R-PGA: Robust Physical Adversarial Camouflage Generation via Relightable 3D Gaussian Splattinghttps://arxiv.org/abs/2603.26067https://arxiv.org/abs/2603.26067Physical adversarial camouflage poses a severe security threat to autonomous driving systems by mapping adversarial textures onto 3D objects. Nevertheless, current methods remain brittle in complex dynamic scenarios, failing to generalize across diverse geometric (e.g., viewing configurations) and radiometric (e.g., dynamic illumination, atmospheric scattering) variations. We attribute this deficiency to two fundamental limitations in simulation and optimization. First, the reliance on coarse...Fri, 27 Mar 2026 04:38:04 +0000Autonomous DrivingDynamicGenerationPhysicsTianrui Lou, Siyuan Liang, Jiawei Liang, Yuze Gao, Xiaochun CaoLess Gaussians, Texture More: 4K Feed-Forward Textured Splattinghttps://arxiv.org/abs/2603.25745https://arxiv.org/abs/2603.25745Existing feed-forward 3D Gaussian Splatting methods predict pixel-aligned primitives, leading to a quadratic growth in primitive count as resolution increases. This fundamentally limits their scalability, making high-resolution synthesis such as 4K intractable. We introduce LGTM (Less Gaussians, Texture More), a feed-forward framework that overcomes this resolution scaling barrier. By predicting compact Gaussian primitives coupled with per-primitive textures, LGTM decouples geometric complexi...Thu, 26 Mar 2026 17:59:59 +0000CompressionRenderingYixing Lao, Xuyang Bai, Xiaoyang Wu, Nuoyuan Yan, Zixin LuoYixing Lao, Xuyang Bai, Xiaoyang Wu, Nuoyuan Yan, Zixin Luo et al.arg-VU: Affordance Reasoning with Physics-Aware 3D Geometry for Visual Understanding in Robotic Surgeryhttps://arxiv.org/abs/2603.26814https://arxiv.org/abs/2603.26814Affordance reasoning provides a principled link between perception and action, yet remains underexplored in surgical robotics, where tissues are highly deformable, compliant, and dynamically coupled with tool motion. We present arg-VU, a physics-aware affordance reasoning framework that integrates temporally consistent geometry tracking with constraint-induced mechanical modeling for surgical visual understanding. Surgical scenes are reconstructed using 3D Gaussian Splatting (3DGS) and conver...Thu, 26 Mar 2026 17:14:28 +0000DynamicMedicalPhysicsRoboticsNan Xiao, Yunxin Fan, Farong Wang, Fei LiuUnblur-SLAM: Dense Neural SLAM for Blurry Inputshttps://arxiv.org/abs/2603.26810https://arxiv.org/abs/2603.26810We propose Unblur-SLAM, a novel RGB SLAM pipeline for sharp 3D reconstruction from blurred image inputs. In contrast to previous work, our approach is able to handle different types of blur and demonstrates state-of-the-art performance in the presence of both motion blur and defocus blur. Moreover, we adjust the computation effort with the amount of blur in the input image. As a first stage, our method uses a feed-forward image deblurring model for which we propose a suitable training scheme ...Thu, 26 Mar 2026 14:29:47 +0000DynamicPhysicsSLAMQi Zhang, Denis Rozumny, Francesco Girlanda, Sezer Karaoglu, Marc PollefeysQi Zhang, Denis Rozumny, Francesco Girlanda, Sezer Karaoglu, Marc Pollefeys et al.