A multi-model inter-comparison study was conducted to evaluate the performance of ten potato crop models to accurately predict potato yield in response to elevated CO2 (Ce) when calibrated with ambient CO2 data (Ca). Experimental data from seven open-top chambers (OTC) and free-air−CO2-enrichment (FACE) facilities across continental Europe were used. Model ensemble percent errors averaged over all datasets for simulated yields were 26.5 % for Ca and 27.2 % Ce data. Metrics such as Wilmott’s index of agreement (IA) and root mean square relative error (RMSRE) ranged broadly among individual models and locations, such that four of the ten models outperformed the median or mean of the ensemble for about half of the Ce datasets. These top performing models were representative of three different model structural groups, including radiation use efficiency, transpiration efficiency, or leaf-level based approaches. Relative response to an increase in CO2 was more accurately modeled than absolute yield responses when averaged across all locations, and within 3.3 kg ppm−1 (or 5%) of observed values. Specific targets in the model structure needed for improvement were not identified due to large and inconsistent variation in the accuracy of yield predictions across locations. However, models with the lowest calibration errors tended to be top performers for Ce predictions as well. Such results suggest calibration is at least as important as model structure. Where possible, modelers using potato models to estimate Ce responses should use Ce calibration data to improve confidence in such predictions.