Overview: To address the lack of peak annotation in NEIMS compared to CFM-EI, we developed a python script to automate the peak annotation process of EI-MS spectra. This program (called PeakAnnotator) can annotate peaks from predicted EI-MS spectra (generated by EI-MS beta or EI-MS gamma) as well as peaks from experimentally acquired EI-MS spectra. Given the low resolution of most EI-MS spectra (~1 Da resolution) it is easy to imagine that many possible molecular formulae can generate the same mass or m/z value.

To reduce the redundancy and to eliminate mislabeling of peaks, the PeakAnnotator script takes into consideration a number of basic properties of formula generation, given a molecule’s mass (in this case, all the predicted m/z values in a spectrum) and elemental composition (i.e. the formula of the parent molecule). The program combinatorially generates all possible subformulae for each of the observed (or predicted) m/z values, which are further constrained to a smaller number of possibilities using rules such as the nitrogen rule, the senior rule, and the degree of unsaturation.

We also combine a knowledgebase of frequently known EI-MS peak patterns for particular functional groups and structures to further refine the formula generation process. PeakAnnotator also removes every peak from each cluster of peaks within a given EI-MS spectrum where the peak’s relative intensity is less than 1 percent of the maximum cluster peak. This is done to get rid of unnecessary annotations or peaks arising from 13C isotope peaks or spectrometer noise. PeakAnnotator also include rules for handling and annotating other high abundance isotopic elements such as Cl, Br, I, etc..

To further improve the formula generation process, PeakAnnotator combines the subformulae and substructures generated from CFM-EI’s fragmentation module to once again narrow down the potential formula possibilities. Finally, if there are any remaining peaks for which an appropriate peak annotation cannot by suggested by PeakAnnotator, the algorithm will automatically add or remove hydrogens (up to four) to nearby confirmed subformulae to annotate the remaining peaks.