Detecting topographic changes in an urban environment and keeping city-level point clouds up-to-date are important tasks for urban planning and monitoring. In practice, remote sensing data are often available only in different modalities for two epochs. Change detection between airborne laser scanning data and photogrammetric data is challenging due to the multi-modality of the input data and dense matching errors. This paper proposes a method to detect building changes between multimodal acquisitions. The multimodal inputs are converted and fed into a light-weighted pseudo-Siamese convolutional neural network (PSI-CNN) for change detection. Different network configurations and fusion strategies are compared. Our experiments on a large urban data set demonstrate the effectiveness of the proposed method. Our change map achieves a recall rate of 86.17%, a precision rate of 68.16%, and an F1-score of 76.13%. The comparison between Siamese architecture and feed-forward architecture brings many interesting findings and suggestions to the design of networks for multimodal data processing.