Long term Global Horizontal Irradiance (GHI) data sets are essential to assess the local solar resource and estimate the potential power production of photovoltaic systems. Statistical models are found to be very effective in estimating the GHI. In this study we examine to what extent the performance of such models is affected by the distance, direction and temporal difference between the training and testing period. To quantify these factors three machine learning models are considered: Random Forest, Extreme Gradient Boosting, and Artificial Neural Network. These models estimate the GHI at 15 weather stations in the Netherlands by considering 11 meteorological variables. The paper demonstrates that GHI estimation is more accurate when the model is trained on a station that is located closer to the target station, where an increased error of 3% and 7% is found up to a distance of respectively 40 and 120 km. In addition, in the case study it is found that the accuracy of GHI estimation improves when the test station is located in a northeast, east, southeast or south direction from the training station. This partly correlates with the prevailing wind direction. Finally, the testing period selected is found to significantly affect the obtained model performance, whereas the influence of the training period is found to be minimal.