Week 9: Application to Real Data¶
Topics¶
This week’s assignments will guide you through the following topics:
Computing workflow of CMS
Issues related to real data
Reading¶
Please read the following:
Read about the CMS event data model (EDM): https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookCMSSWFramework
Learn about the CMS software stack (CMSSW): https://cms-sw.github.io/index.html
Browse an example pull request to CMSSW: https://github.com/cms-sw/cmssw/pull/30072
Skim some papers on the framework and multithreading: https://indico.cern.ch/event/408139/contributions/979800/attachments/815724/1117731/FrameworkPaper.pdf, [24]
Tasks¶
Propose a project for Quarter 2. Possible extensions include:
Explore other model architectures for the same H(bb) identification task, like transformers, tensor networks, equivariant neural networks.
Expand the model to do multiclass classification (classifying all flavors of QCD quarks, gluons, H(bb) present in the dataset).
Try an unsupervised learning approach like (variational) autoencoders for anomaly detection.
Explore model compression, knowledge distillation, quantization or other techniques to reduce the model size or complexity; Can it be made more efficient?
Perform a regression task, like correcting the energy or mass of the particle jet based on generator-level “truth” information.
Apply concepts you learned to a new dataset like the TrackML Particle Tracking Challenge: https://www.kaggle.com/c/trackml-particle-identification/overview.
Apply your algorithm to real data.
Apply concepts you learned to a slightly different jet tagging problem for VBS production of 2H(bb) 2H(WW), which may be used in a real CMS analysis!
Explainable AI for GNNs using layerwise relevance propagation (LRP) or GNNExplainer [25].
Weekly Questions¶
Answer the following questions
What are some differences between the CMS event data model and other processing frameworks