2025 Computer Science PhD Qualifier Exam: Data/ML/AI

Committee

Chang-Tien Lu
Dawei Zhou
Liqing Zhang
Pinar Yanardag Delul
Xuan Wang (Chair)

Registered Students

Abhilash Neog
Jiazhen Hu
Nabayan Chaudhury
Sindhuja Madabushi
Mohannad Al Ameedi
Nie Jianan
Manar Aljohani
Jason Banuelos
Yutong Cheng

Instructions

At the beginning of the examination period, all students will receive a document that contains questions. The detailed instructions will be emailed to the registered students.
By the end of the examination period, each student must submit a written solution and a recorded presentation to address those questions. The detailed instructions will be emailed to the registered students.
Each submission will be graded by at least two faculty members. A combined grade will then be assigned for each student based on all faculty input by the area committee, on a scale of Pass/Fail, as is called for by GPC policies.

Early Withdrawal Policy

A student registered for the PhD qualifier exam may withdraw at any point of time before the early withdrawal deadline, which is 2/1/2025. After this date, withdrawal is prohibited. Students with questions about this policy should contact the exam chair directly.

Academic Integrity

Discussions among students of the papers identified for the exam are reasonable up until the date the exam is released publicly. Once the exam questions are released, we expect all such discussions will cease as students are required to conduct their own work entirely to answer the qualifier questions. This examination is conducted under the University’s Graduate Honor System. Students are encouraged to draw from other papers than those listed in the exam to the extent that this strengthens their arguments. However, the answers submitted must represent the sole and complete work of the student submitting the answers. Material substantially derived from other works, whether published in print or found on the web, must be explicitly and fully cited. Note that your grade will be more strongly influenced by arguments you make rather than arguments you quote or cite.

Exam Schedule

1/1/2025: Release of reading lists
1/6/2025 - 1/13/2025: Students register for the qualifier exam
1/14/2025: Qualifier waiver decisions
1/20/2025: Release of exam questions
2/1/2025: Last date to withdraw from the qualifier exam
3/1/2025: Students submit the written solutions and oral recordings
3/20/2025: Qualifier result decisions

Reading Lists

The reading lists below cover various topics in the area of data and information. You may choose any one of these lists for your exam. You are expected to significantly expand on your selected list while preparing your written solution. You are also welcome to create your own reading list on a topic not listed here relevant to data and information, but that reading list must be approved by your research advisor by 2/1/2025.

The reading list and qualifying exam topic are not intended to necessarily be your dissertation topic. But you are welcome to make the two overlap if desired. Instead, you will be expected to reason about, write about, conduct a literature search on, and present this topic to demonstrate your ability to conduct doctorate research.

List 1: Data Mining and Information Retrieval

MentorGNN: Deriving Curriculum for Pre-Training GNNs, Dawei Zhou, Lecheng Zheng, Dongqi Fu, Jiawei Han, and Jingrui He. CIKM, 2022.
A Data-Driven Graph Generative Model for Temporal Interaction Networks, Dawei Zhou, Lecheng Zheng, Jiawei Han, Jingrui He. KDD, 2020.
Beta embeddings for multi-hop logical reasoning in knowledge graphs, Hongyu Ren, and Jure Leskovec. NeurIPS, 2020.
Local motif clustering on time-evolving graphs, Dongqi Fu, Dawei Zhou, and Jingrui He. KDD, 2020.
Domain Adaptive Multi-Modality Neural Attention Network for Financial Forecasting, Dawei Zhou, Lecheng Zheng, Jianbo Li, Yada Zhu, Jingrui He. WWW, 2020.
Adversarial attacks on neural networks for graph data, Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. KDD, 2018.

List 2: Natural Language Processing

Fine-tuned Language Models are Zero-Shot Learners, Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. ICLR, 2022.
Lifelong Event Detection with Knowledge Transfer, Pengfei Yu, Heng Ji, and Prem Natarajan. EMNLP, 2021.
Diffusion-LM Improves Controllable Text Generation, Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto. NeurIPS, 2022.
MERLOT: Multimodal Neural Script Knowledge Models, Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, and Yejin Choi. NeurIPS, 2021.
TaPas: Weakly Supervised Table Parsing via Pre-training, Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Eisenschlos. ACL, 2020.
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework, Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. ICML, 2022.

List 3: Reinforcement Learning

Brunke, Lukas, Melissa Greeff, Adam W. Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P. Schoellig. “Safe learning in robotics: From learning-based control to safe reinforcement learning.” Annual Review of Control, Robotics, and Autonomous Systems 5 (2022): 411-444.
Chen, Lili, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. “Decision transformer: Reinforcement learning via sequence modeling.” Advances in neural information processing systems 34 (2021): 15084-15097.
Fawzi, Alhussein, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov et al. “Discovering faster matrix multiplication algorithms with reinforcement learning.” Nature 610, no. 7930 (2022): 47-53.
Vinyals, Oriol, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi et al. “Grandmaster level in StarCraft II using multi-agent reinforcement learning.” Nature 575, no. 7782 (2019): 350-354.
Mai, Vincent, Kaustubh Mani, and Liam Paull. “Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation.” In International Conference on Learning Representations. 2021.
Zhang, Ruohan, Faraz Torabi, Lin Guan, Dana H. Ballard, and Peter Stone. “Leveraging human guidance for deep reinforcement learning tasks.” arXiv preprint arXiv:1909.09906 (2019).

List 4: Machine Learning and Security

He, Xinlei, Savvas Zannettou, Yun Shen, and Yang Zhang. “You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content.” In 2024 IEEE Symposium on Security and Privacy (SP), pp. 61-61. IEEE Computer Society, 2023.
Zou, Andy, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. “Universal and transferable adversarial attacks on aligned language models.” arXiv preprint arXiv:2307.15043 (2023).
Gehman, Samuel, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. “RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models.” In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3356-3369. 2020.
Qi, Xiangyu, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Henderson. “Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!.” arXiv preprint arXiv:2310.03693 (2023).
Wei, Jerry, Da Huang, Yifeng Lu, Denny Zhou, and Quoc V. Le. “Simple synthetic data reduces sycophancy in large language models.” arXiv preprint arXiv:2308.03958 (2023).
Lahnala, Allison, Charles Welch, Béla Neuendorf, and Lucie Flek. “Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy.” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4926-4938. 2022.
Carlini, Nicholas, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh et al. “Are aligned neural networks adversarially aligned?.” arXiv preprint arXiv:2306.15447 (2023).

List 5: Machine Learning and Software

ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations, Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Michael O’Boyle, Hugh Leather ICML, 2021.
Scalable Deep Learning via I/O Analysis and Optimization, Sarunya Pumma, Min Si, Wu-chun Feng, Pavan Balaji TOPC, 2019.
Iterative Machine Learning (IterML) for Effective Parameter Pruning and Tuning in Accelerators, Xuewen Cui, Wu-chun Feng Computing Frontiers, 2019.
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks, Albert Njoroge Kahira et al. HPDC, 2021.
Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning, Truong Thao Nguyen et al. IPDPS, 2022.

List 6: Online Learning

Seldin, Y., Bartlett, P. L., Crammer, K., and Abbasi-Yadkori, Y. Prediction with limited advice and multiarmed bandits with paid observations, In International Conference on Machine Learning, 2014.
Altschuler, J. M. and Talwar, K. Online learning over a finite action set with limited switching, Proceedings of the 31st Conference On Learning Theory, PMLR 75:1569-1573, 2018.
Arora, R., Marinov, T. V., and Mohri, M. Bandits with feedback graphs and switching costs, Advances in Neural Information Processing Systems, 32, 2019.
Shi, M., Lin, X., and Jiao, L. Power-of-2-arms for bandit learning with switching costs, In Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pp. 131–140, 2022.
Duo Cheng, Xingyu Zhou, and Bo Ji, Understanding the Role of Feedback in Online Learning with Switching Costs, Proceedings of ICML 2023, Honolulu, HI, July 2023.

List 7: Spatiotemporal Data Mining

Hamdi, A., Shaban, K., Erradi, A. et al. Spatiotemporal data mining: a survey on challenges and open problems, Artif Intell Rev 55, 1441–1488 (2022).
Wang, Senzhang, Jiannong Cao, and S. Yu Philip. Deep learning for spatio-temporal data mining: A survey, IEEE transactions on knowledge and data engineering 34, no. 8 (2020): 3681-3700.
Liang Zhao, Jiangzhuo Chen, Feng Chen, Fang Jin, Wei Wang, Chang-Tien Lu, and Naren Ramakrishnan. Online flu epidemiological deep modeling on disease contact network, GeoInformatica, Vol. 24, pp. 443–475, 2020.
Qianyue Hao, Lin Chen, Fengli Xu, and Yong Li. Understanding the urban pandemic spreading of covid-19 with real world mobility data, In Proceedings of the 26th ACMSIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3485–3492, 2020.
Bai, Lei, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. Adaptive graph convolutional recurrent network for traffic forecasting, Advances in neural information processing systems 33 (2020): 17804-17815.
Li, Kunchang, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, and Yu Qiao. Uniformer: Unified transformer for efficient spatiotemporal representation learning, arXiv preprint arXiv:2201.04676 (2022).

Grading Scale

The exam will ultimately be graded on a scale of Pass/Fail by GPC policies.

Use of Generative AI Tools

The use of any generative AI tools (e.g., LLMs) during the written exam is strictly prohibited for any purpose, including but not limited to writing, editing, idea generation, grammar correction, or polishing answers. Violations will result in a direct Fail grade for the qualifier exam. All submissions will be monitored, and students may be asked to explain their work if AI use is suspected. Your work must be entirely your own. No exceptions.