Computer-aided clinical decision support tools for radiology often suffer from poor generalizability in multi-centric frameworks due to data heterogeneity. In particular, magnetic resonance images depend on a large number of acquisition protocol parameters as well as hardware and software characteristics that might differ between or even within institutions. In this work, we use a supervised image-to-image harmonization framework based on a conditional generative adversarial network to reduce inter-site differences in T1-weighted images using different dementia protocols. We investigate the use of different hybrid losses including standard voxel-wise distances and a more recent perceptual similarity metric, and how they relate to image similarity metrics and volumetric consistency in brain segmentation. In a test cohort of 30 multiprotocol patients affected by dementia, we show that despite improvements in terms of image similarity, the synthetic images generated do not necessarily result in reduced inter-site volumetric differences, therefore highlighting the mismatch between harmonization performance and the impact on the robustness of post-processing applications. Hence, our results suggest that traditional image similarity metrics such as PSNR or SSIMmay poorly reflect the performance of different harmonization techniques in terms of improving cross-domain consistency.