FPM-R2Net:跨模态表征与配准网络融合光声和手术显微镜的术中成像方法

FPM-R2Net: Fused Photoacoustic and operating Microscopic imaging with cross-modality Representation and Registration Network

Yuxuan Liu, Jiasheng Zhou,Yating Luo, et al

Medical Image Analysis, 2025, 105:103698.

 

Abstract

Robot-assisted microsurgery is a promising technique for a number of clinical specialties including neurosurgery. One of the prerequisites of such procedures is accurate vision guidance, delineating not only the exposed surface details but also embedded microvasculature. Conventional microscopic cameras used for vascular imaging are susceptible to specular reflections and changes in ambient light with low tissue resolution and contrast. Photoacoustic microscopy (PAM) is emerging as a promising tool and increasingly used for vascular imaging due to its high image resolution and tissue contrast. This paper presents a fused microscopic imaging scheme that integrates standard surgical microscopy with PAM for improved intraoperative visualization and guidance. We propose the FPM-RNet to Fuse Photoacoustic and surgical Microscopic imaging via cross-modality Representation and Registration Network. A MOdality Representation Network (MORNet) is used to extract unified feature representation across white-light and PAM modalities, and a Hierarchical Iterative Registration Network (HIRNet) is used to establish the correspondence between the two modalities in a coarse-to-fine manner based on multi-resolution feature maps. A synthetic dataset with ground truth correspondence and an in vivo dataset of mouse brain vasculature are used to evaluate our proposed network. Extensive validation on the two datasets has shown significant improvements compared to the current state-of-the-art methods assessed with intersection over union and Dice scores (10.3% and 6.6% on the synthetic dataset and 15.9% and 11.8% on the in vivo dataset, respectively).

 

Introduction and Methods

To achieve this goal, there exist several inherent challenges in establishing the correspondence and fusion of images between the information-rich PAM modality and the information-scarce white-light microscopic modality. The main challenge lies in the significant modality shifts between PAM and RGB images.1 The intensity of RGB images is determined by the absorption and reflection of natural white light, which is stored in red, green, and blue channels. In contrast, the intensity of PAM signals corresponds to the amplitude of formed sound waves, which is stored in a single channel. The modality shift necessitates the extraction of a unified feature representation for both PAM and RGB images which is invariant to modality appearance. Meanwhile, there is also an inequality in the information pertaining to the images between the two modalities. The RGB images exhibit low-contrast superficial features and are susceptible to surrounding illuminations, while the PAM images are high resolution, high contrast and are capable of capturing signals of vessels at greater depths. The distinct characteristics of the two imaging modalities lead to PAM images containing more detailed structural information compared to RGB images. Specifically, for microvasculature imaging, RGB images are limited to capturing the primary vessels on the surface, while the intricate details of microvasculature can only be seen by PAM images. In addition, the availability of existing images paired with PAM and RGB modalities is far limited due to the difficulty in collecting ground truth correspondence for in vivo scenarios. This limitation not only compromises the accuracy of evaluating proposed algorithms but also hinders the generalization ability of different methods.
To address the aforementioned challenges, we propose to Fuse Photoacoustic and surgical Microscopic imaging via cross-modality Representation and Registration Network, namely FPM-RNet. The proposed network consists of two subnetworks: a MOdality Representation Network (MORNet) and a Hierarchical Iterative Registration Network (HIRNet). The MORNet is proposed to extract the unified feature representation across different imaging modalities, and the HIRNet is designed to perform the cross-modality registration in a coarse-to-fine manner. With such a network, the proposed method aims to establish accurate correspondence between intraoperative white-light microscopic images and preoperative PAM images through self-supervised learning. By leveraging the distinct characteristics of the two modalities, we propose a new pipeline to generate synthetic image pairs with ground truth correspondence to improve the performance of our proposed method. More importantly, the generated synthetic dataset can provide a comprehensive quantitative evaluation based on known ground truth correspondence. To demonstrate the clinical potential of the method, we also present an in vivo dataset consisting of preoperative PAM images and intraoperative white-light microscopic images of mouse brain vasculature.

 

Key Results and Conclusions

This paper proposes a fused microscopic imaging scheme to improve the imaging quality of intraoperative white-light microscopic vision via our proposed FPM-RNet to establish the correspondence of images between white-light microscopic and PAM modalities. The proposed FPM-RNet contains two subnetworks to extract the unified feature representation of cross-modality microvasculature images and perform registration in an iterative coarse-to-fine manner based on hierarchical multi-resolution feature maps. A generation pipeline of synthetic image pairs is proposed for a detailed evaluation of the proposed method based on the ground truth correspondence, and an in vivo dataset of mouse brain vasculature is presented to demonstrate the potential application of clinical use. To fully evaluate the performance of our method, we conduct detailed experiments on both synthetic and in vivo datasets, and our method achieves the best performance compared with existing methods with an improvement of 10.3% and 6.6% on the synthetic dataset, and 15.9% and 11.8% on the in vivo scenarios for IOU and Dice, respectively. Both quantitative and qualitative results demonstrate the superior performance of our method. Further improvement of the method can be achieved via knowledge distillation to make the network with fewer model parameters and shorten the inference time to achieve real-time image processing.

 

 

Fig. 1:(a) Illustration of the cross-modality representation and registration task. We propose a framework to establish the correspondence between white-light microscopic and PAM modalities, and fuse the standard operating microscopic images with preoperative PAM for improved intraoperative visualization and guidance. (b) Comparison between images captured by white-light microscopic vision and PAM system. The RGB images exhibit low contrast and low resolution and are easily affected by surrounding illuminations, while the PAM images have high contrast, superior resolution, and greater depth penetration.

 

 

Fig. 2:Overview of the proposed FPM-RNet. The proposed method takes the paired PAM and RGB images as input and predicts the correspondence which is utilized to obtain the final fused image as output. The proposed method contains two subnetworks, i.e., MORNet: Modality Representation Network and HIRNet: Hierarchical Iterative Registration Network. The MORNet takes the input images and extracts the modality maps which contain the unified representation of vessels and remove background noise. The HIRNet estimates the correspondence based on modality maps in a coarse-to-fine manner. The lower part visualizes the detailed network architecture of the MORNet and HIRNet.

 

 

Fig. 3: Illustration of the information flow for the proposed hierarchical iterative registration network which estimates the correspondence in a coarse-to-fine manner. The input PAM and RGB modality maps are down-sampled by different scale factors and fed into the backbone network iteratively. The output correspondence of each down-sampling level is resized to the original shape and added together to form the final transformation field. The fused image is obtained based on the final transformation field. The black arrows indicate the direction of information flow and the networks in brown color share the same weights.

 

 

Fig. 4:The illustration of the proposed datasets. (a) The data generation pipeline of the synthetic dataset D-I. Based on the given PAM image and modality map, the degradation mask and transformation field are randomly generated to imitate the modality shift between PAM modality and RGB modality. The synthetic RGB image is reconstructed based on the degraded PAM modality map and a pre-trained image generation network. (b) The data collection pipeline of the in vivo dataset D-II. The PAM system is described in the left part. The 532 nm pulsed laser is used to excite the PA signals of mouse brain vessels. BS: beam splitter, PD: photodetector, ND: neutral density filter, L: lens, DL: dichroic lens, FC: fiber coupler, SMF: single-mode fiber, PC: personal computer, DAQ: data acquisition card, AMP: preamplifier.

 

 

Fig. 5: Selected examples of our proposed synthetic dataset. The first and the last columns refer to the PAM images and reconstructed RGB images. The green color represents the PAM modality in the second column and the red color represents the RGB modality in the fourth column. The third column refers to the ground truth correspondence of two modality maps.

 

 

Fig. 6: Illustration of the defined evaluation protocols in this paper. The overall performance is calculated based on the ground truth warped PAM image shown in the middle part. The visible metric is calculated based on the visible area that can be seen both from the RGB modality and PAM modality in red color. The invisible metric is calculated based on the area that cannot be seen from the RGB modality but can be seen from the PAM modality in green color.

 

 

Fig. 7:Visualization results of different methods on synthetic dataset D-I. The green color represents the PAM modality and the red color represents the RGB modality.

 

 

Fig. 8:Statistical analysis of quantitative results of our proposed method and comparison methods. (a) Results on synthetic dataset D-I. (b) Results on in vivo dataset D-II. P-values are calculated based on the comparison with our results. The symbol **** represents -value  0.0001.

 

 

Fig. 9:Visualization results on in vivo dataset D-II. (a) Comparison results with different methods. (b) Diverse results estimated by our proposed method. The green color represents the PAM modality and the red color represents the RGB modality.

 

 

Fig. 10:Results showing the estimated transformation fields on in vivo samples of D-II. The green color in the third column represents the transformed PAM images overlayed on the RGB images.

 

 

Fig. 11:Visualization results estimated by different modality maps. The modality maps  refer to the intermediate down-sampling maps of our proposed HIRNet. The green color represents the PAM modality and the red color represents the RGB modality. The zoomed images of the gray shaded area are visualized in the upper left part.

 

 

Fig. 12: Comparison results of different  on synthetic dataset D-I for (a) visible metrics and (b) invisible metrics.

 

 

Fig. 13: Visualization results by selecting different . The green color represents the PAM modality and the red color represents the RGB modality. The zoomed images of the gray shaded area are visualized in the upper left part.

 

 

https://www.sciencedirect.com/science/article/pii/S1361841525002452

Copyright © 2025上海交通大学医疗机器人研究院 版权所有 沪交ICP备20190057   流量统计