Large MS/MS dataset build from data that was obtained from GNPS (accessed on 2020-05-11): https://gnps-external.ucsd.edu/gnpslibrary/ALL_GNPS.json
The data was cleaned and pre-processed using notebooks provided here: https://github.com/iomega/spec2vec_gnps_data_analysis/tree/master/notebooks
● 112,956 positive ionmode spectra
● metadata was cleaned and corrected using matchms (https://github.com/matchms/matchms) and lookup routines using PubChem
● 92,954 of the spectra have Smiles and InchiKey (13717 unique InchiKey in first 14 characters)
Date made available | 10 Aug 2020 |
---|