By generating high quality data without the big time investment and economic cost of real experiments, dynamic greenhouse climate and crop simulation models can support decisions on greenhouse climate control, crop management and greenhouse design. The reliability of simulation-based decisions depends on both the prediction accuracy and interpretability of simulation models. The prediction accuracy of these simulation models can be increased by: 1) improving mechanisms in process-based models; 2) calibrating process-based model parameters; 3) deriving black-box relationships from data. Considering the descending interpretability from (1) to (3), this study presents a knowledge-based data-driven modelling approach where firstly a process-based model is selected and modified based on domain knowledge, then data-driven improvement is applied including two steps: parameter value estimation by particle filter (PF) and further black-box improvement by deep neural networks (DNN). The approach was tested with an example of greenhouse climate-tomato production system modelling. Modules from GreenLight (Katzin et al., 2020) and TOMSIM (Heuvelink, 1995, Heuvelink, 1996) were selected, modified and integrated into a process-based greenhouse climate-tomato model. Validation showed that PF-calibration of five greenhouse parameters decreased the seasonal relative root mean squared error (RRMSE) of indoor air vapor pressure predictions from 40.7% of that before PF-calibration to 16.4%, while it did not decrease the RRMSE of indoor air temperature predictions. Combining the PF-calibrated model with a DNN trained on a season of data decreased the RRMSE of indoor air temperature from 15.0% without DNN to 6.7%, and decreased the RRMSE of indoor air vapor pressure to 12.6%. The knowledge-based data-driven greenhouse climate-tomato model had a relative error of 0.9% for seasonal total fresh yield, and an RRMSE of 6.6% for the cumulative yield throughout the season. If process-based model parameters were not calibrated before combining the model with DNNs, the required amount and diversity of DNN training data increased because more information needed to be learnt from data by the DNNs. Without PF-calibration, combining a DNN trained on 50 days of data with the process-based model resulted in RRMSEs of 44.8% and 31.8% for indoor air temperature and vapor pressure prediction, respectively; with PF-calibration, the RRMSEs were decreased to 13.1% and 17.9%. The proposed three-step knowledge-based data-driven approach can not only improve the model prediction accuracy, but can also help to track and interpret the improvements.