Triacylglycerols (TGs) are the most abundant lipids in the human body and the primary source of energy storage. Altered TGs are implicated in metabolic syndrome and detailed structural information may provide novel insights into diseases pathology. TGs are comprised of three fatty acyls with various lengths and double bond composition, complicating structural annotation. DIA-based lipidomics enables a continuous and unbiased acquisition of all TGs, creating the potential for more comprehensive TG analysis. However, DIA generates multiplexed tandem mass spectra (MS2) that pose challenges to identifying TGs. Here, we present DIATAGeR, an R package aimed to automate and improve TG identification to the molecular species level in DIA-based lipidomics.
DIATAGeR requires two data inputs: a feature list and MS2 spectra. Feature list (txt format) can be an MS-DIAL alignment file or generic format . MS2 spectra can be in txt, msp and mgf format (exported from MSDIAL). In addition, DIATAGeR can accept data exported from MZMine: feature list (needs to be formatted to match the generic format) and MS2 spectra in mgf format.
Example of Feature lists and MS2 spectra are available in [MetaboLights] or [Google Drive]
DIATAGeR includes 6 functions: measureFileImport, SpectraImport,
RemoveIsotopes, LipidIdentifier, ScoreLipids and FDRCutoff. For further
details, please use
help('name of the function', package = "DIATAGeR")
In the example below, Feature_List_Liver_MSDIAL.txt and Spectra_txt_Liver_MSDIAL folder with MS2 spectra in txt format can be downloaded from [MetaboLights] or [Google Drive]
measureFileImport reads in the feature list (MSDIAL Alignment file or Generic Format).
# Set the processing parameters
DIADataObj = DIA_Pos # the object stores feature list
fileORfoldername #path to the folder that stores centroid MS2 spectra
ion.mode <- c("Pos","Neg")
results.file.type = c("MSDIAL", "Generic")
spectra.file.type =c("txt","msp","mgf","mgf_mzmine")
rttol #retention time tolerance in seconds. Default to 2
# Example
DIA_Pos <- SpectraImport(DIADataObj = DIA_Pos,
fileORfoldername = "Spectra_txt_Liver_MSDIAL/Centroid",
ion.mode = "Pos",
spectra.file.type ="txt",
results.file.type = "MSDIAL",
rttol = 10)Any features with missing spectra are stored in a folder named “Troubleshooting”.
# Set the processing parameters
DIADataObj = DIA_Pos # the object stores feature list
ppmtol #ppm tolerance
low.intensity.cutoff # Any peaks with intensity below this value will be removed. Default 100.
density.cutoff = c(TRUE,FALSE) #If TRUE, a density-based cutoff will be applied.
density.cutoff.value # Default 1. Change with caution.
#Example
DIA_Pos <- RemoveIsotopes(DIADataObj = DIA_Pos,
ppmtol = 15,
low.intensity.cutoff=100,
density.cutoff=T,
density.cutoff.value=1)# Set the processing parameters
DIADataObj = DIA_Pos
lipid = "TAG" #DIATAGeR is for triacylglycerols only.
ion.mode = c("Pos","Neg")
format = c("MSDIAL","Generic")
ppmtol #ppm tolerance. Default to 15
rttol #retention time tolerance in seconds. Default to 5
intensity.window # Defaults to 0 and infinity.
version # To specify the version of generated result file.
ms1.precursors #Number of precursors found in MS1 to identify compounds. Default to 1
ms2.precursors #Number of precursors found in MS2 to identify compounds.
which.frag = c("any","firsttwo") #specifying whether any fragment should be used or only the first two fragments for identification. Default 'any'.
max.tails # list of desired number of carbons and the maximum number of double
# bonds for each fatty acyls.
# Default = "8.2, 9.0, 10.2, 11.0, 12.3, 13.1, 14.3, 15.3, 16.5, 17.3,
# 18.5, 19.5, 20.6, 21.5, 22.6, 23.0, 24.4, 25.0, 26.0"
exact.tails # library can be customizable for specific tails. Default NULL.
#Eg: exact_tails = c("18.1","18.2) creates library of TGs made up by 18:1 and 18:2 fatty acids
print.MS2spectra = c(TRUE,FALSE) # If TRUE, prints MS/MS spectra and mirrored
# reference peaks. Default FALSE.
write.annotations = c(TRUE, FALSE) # If TRUE, the function automatically writes
# RDS file of identified lipids. Default TRUE.
#Example:
LipidIdentifier(DIADataObj = DIA_Pos,
lipid = "TAG",
ion.mode = "Pos",
format = "MSDIAL",
ppmtol=15, rttol=5,
intensity.window = c(0,Inf),
version = "September2024",
ms1.precursors= 1, ms2.precursors = 1,
max.tails = "8.2, 9.0, 10.2, 11.0, 12.3, 13.1, 14.3, 15.3, 16.5, 17.3, 18.5, 19.5, 20.6, 21.5, 22.6, 23.0, 24.4, 25.0, 26.0",
exact.tails = NULL,
which.frag = "any",
print.MS2spectra = F,
write.annotations = T)A folder named according to ion.mode (Pos/Neg) will be created in the working directory and contain the result file (Eg: Identified_TAG_Pos_September2024_1). This file will contain TGs identified from target database.
A separate TG identification is performed with the decoy database, then the targeted and decoy annotations are merged and scored using the machine learning algorithm.
# Set the processing parameters
DIADataObj = DIA_Pos #The object stores alignment file and spectra
format # Same as LipidIdentifier
ion.mode # Same as LipidIdentifier
lipid # Same as LipidIdentifier
version # Same as LipidIdentifier
ppmtol # Same as LipidIdentifier
rttol # Same as LipidIdentifier
ms1.precursor # Same as LipidIdentifier
ms2.precursors # Same as LipidIdentifier
max.tails # Same as LipidIdentifier
exact.tails # Same as LipidIdentifier
spectra.file #File path for retrieving spectra from all samples.
spectra.file.type = c("txt","msp","mgf","mgf_mzmine")
#Example:
ResultsTG<- ScoreLipids(DIADataObj = DIA_Pos,
format = "MSDIAL",
lipid = "TAG",
ion.mode = "Pos",
version = "September2024",
ppmtol= 15,
rttol = 5,
ms1.precursors =1, ms2.precursors =1,
spectra.file.type ="txt",
max.tails = "8.2, 9.0, 10.2, 11.0, 12.3, 13.1, 14.3, 15.3, 16.5, 17.3, 18.5, 19.5, 20.6, 21.5, 22.6, 23.0, 24.4, 25.0, 26.0",
exact.tails=NULL,
spectra.file = "Spectra_txt_Liver_MSDIAL/Centroid")
saveRDS(ResultsTG, file = "ResultsTG.RDS")The average number of species, molecular species and AUC values after multiple iterations are reported. The function outputs the file containing the highest number of molecular species. If a list of standard TGs is provided, the number of standards found and FDR required to find all standards are reported. Standards_File_Format_example.csv file can be found in [MetaboLights] or [Google Drive]
# Set the processing parameters
identified.lipids # List of combined TG annotations
standard.file # list of standard TGs (Optional)
iteration # #Default 10
cutoff # Empirical FDR. Default 0.1
rttol #rt tolerance between the rt in standard file and in the result. Default 10
#Example:
ResultsTG_FDR <- FDRCutoff(identified.lipids = ResultsTG,
standard.file = "Standards_File_Format_example.csv",
iteration=10,
cutoff = 0.1,
rttol=10)
saveRDS(ResultsTG_FDR, file = "ResultsTG_FDR.RDS")