Microscopic Tissue Boundary Identification in Laparoscopic Videos Using Transformer-Based Segmentation
https://doi.org/10.5281/zenodo.18801897
Keywords:
Laparoscopic Video Segmentation; Transformer-Based Segmentation; Tissue Boundary Detection; Hybrid CNN–Transformer; Real-Time Surgical Assistance; Medical Image AnalysisAbstract
Accurate soft tissue boundary identification in laparoscopic videos is essential for surgical precision and intraoperative decision support. However, challenges such as smoke, specular reflections, rapid camera motion, occlusions, and low-contrast structures make reliable segmentation difficult. Traditional convolutional neural network (CNN)-based methods often struggle to capture long-range contextual dependencies, resulting in coarse and inconsistent boundary predictions.
This research proposes an efficient hybrid Transformer-guided encoder–decoder architecture for real-time microscopic tissue boundary segmentation in laparoscopic videos. The model integrates convolutional layers for local feature extraction with Transformer-based self-attention mechanisms to capture global contextual information. A multi-scale decoder with skip connections reconstructs high-resolution segmentation maps, while an auxiliary boundary refinement head enhances contour accuracy. The system is trained using a combination of Dice and cross-entropy loss and evaluated using Dice, IoU, Hausdorff Distance, and latency metrics.
The proposed approach aims to achieve improved boundary accuracy over existing CNN and Transformer baselines while maintaining near real-time performance (≤100 ms per frame) on a single GPU. This work contributes toward intelligent, reliable, and clinically deployable surgical assistance systems.
