3D Building Model Segmentation using GNN and ViT

Rashidan H., Musliman I. A., Abdulrahman A., Büyüksalih G.

GeoAdvances 2025 – 10th International Conference on GeoInformation Advances, Marrakush, Fas, 29 - 30 Mayıs 2025, cilt.17, ss.279-283, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 17
Doi Numarası: 10.5194/isprs-archives-xlviii-4-w17-2025-279-2026
Basıldığı Şehir: Marrakush
Basıldığı Ülke: Fas
Sayfa Sayıları: ss.279-283
İstanbul Üniversitesi Adresli: Evet

Özet

Reliable semantics in 3D building models support practical urban tasks such as planning, asset inventory, and maintenance. This paper presents an approach that pairs graph-based geometry (GNN) with image-based appearance (ViT) to improve component segmentation. A Graph Neural Network (GNN) is first applied to the building mesh to capture structural cues and produce initial labels. Multi-view 2D projections (orthographic and perspective) are then rendered and processed with a Vision Transformer (ViT) to recover visual patterns related to windows, doors, roofs, and walls. The two streams are reconciled through a simple consensus fusion that projects ViT predictions back onto the 3D geometry and refines the labels. In experiments, the proposed pipeline improves accuracy and classwise consistency over a GNN baseline, with clearer gains on small or visually ambiguous elements.