Streamflow is often the only variable used to evaluate hydrological models. In a previous international comparison study, eight research groups followed an identical protocol to calibrate 12 hydrological models using observed streamflow of catchments within the Meuse basin. In the current study, we quantify the differences in five states and fluxes of these 12 process-based models with similar streamflow performance, in a systematic and comprehensive way. Next, we assess model behavior plausibility by ranking the models for a set of criteria using streamflow and remote-sensing data of evaporation, snow cover, soil moisture and total storage anomalies. We found substantial dissimilarities between models for annual interception and seasonal evaporation rates, the annual number of days with water stored as snow, the mean annual maximum snow storage and the size of the root-zone storage capacity. These differences in internal process representation imply that these models cannot all simultaneously be close to reality. Modeled annual evaporation rates are consistent with Global Land Evaporation Amsterdam Model (GLEAM) estimates. However, there is a large uncertainty in modeled and remote-sensing annual interception. Substantial differences are also found between Moderate Resolution Imaging Spectroradiometer (MODIS) and modeled number of days with snow storage. Models with relatively small root-zone storage capacities and without root water uptake reduction under dry conditions tend to have an empty root-zone storage for several days each summer, while this is not suggested by remote-sensing data of evaporation, soil moisture and vegetation indices. On the other hand, models with relatively large root-zone storage capacities tend to overestimate very dry total storage anomalies of the Gravity Recovery and Climate Experiment (GRACE). None of the models is systematically consistent with the information available from all different (remote-sensing) data sources. Yet we did not reject models given the uncertainties in these data sources and their changing relevance for the system under investigation.