2025 Computer Science PhD Qualifier Exam: Data/ML/AI
Committee
- Chang-Tien Lu
- Dawei Zhou
- Liqing Zhang
- Pinar Yanardag Delul
- Xuan Wang (Chair)
Registered Students
TBD
Tentative Instructions
- At the beginning of the examination period, all students will receive a document that contains questions.
- By the end of the examination period, each student must turn in a written solution to one of those questions. The solutions will be no longer than 8 pages (excluding references) at 11 point font or larger using a format to be announced.
- Written solutions should take the form of a scientific paper. It should include at least the following:
- a motivation section making clear the context of the problem/situation;
- a clear statement of the problem in terms of concepts and terminology in the information/data area, that addresses the situation/context;
- a review of related literature, drawn partially from multiple relevant works in the reading list, but must include additional references found by the student during a thorough literature search;
- a description of approaches to solve the problem; and
- an evaluation plan for how such approaches would be validated.
- Students will then provide an oral presentation detailing their solution. They must be completed within 15 minutes, of which 12 minutes are for presentation and 3 minutes for answering questions posed by faculty examiners.
- Each solution will be graded by at least 2 faculty members. A combined grade will then be assigned for each student based on all faculty input by the area committee, on a scale of Pass/Fail, as is called for by GPC policies.
Early Withdrawal Policy
A student registered for the PhD qualifier exam may withdraw at any point of time before the early withdrawal deadline, which is 2/1/2025. After this date, withdrawal is prohibited. Students with questions about this policy should contact the exam chair directly.
Academic Integrity
Discussions among students of the papers identified for the exam are reasonable up until the date the exam is released publicly. Once the exam questions are released, we expect all such discussions will cease as students are required to conduct their own work entirely to answer the qualifier questions. This examination is conducted under the University’s Graduate Honor System. Students are encouraged to draw from other papers than those listed in the exam to the extent that this strengthens their arguments. However, the answers submitted must represent the sole and complete work of the student submitting the answers. Material substantially derived from other works, whether published in print or found on the web, must be explicitly and fully cited. Note that your grade will be more strongly influenced by arguments you make rather than arguments you quote or cite.
Tentative Exam Schedule
- 1/1/2025: Release of reading lists
- 1/6/2025 - 1/13/2025: Students register the qualifier exam
- 1/20/2025: Qualifier waiver decisions
- 1/20/2025: Release of exam questions
- 2/1/2025: Last date to withdraw from the qualifier exam
- Beginning of Feb: Oral exams
- 3/1/2025: Students submit the written exam solutions
- 3/20/2025: Qualifier result decisions
Tentative Reading Lists
The reading lists below cover various topics in the area of data and information. You may choose any one of these lists for your exam. You are expected to significantly expand on your selected list while preparing your written solution. You are also welcome to create your own reading list on a topic not listed here relevant to data and information, but that reading list must be approved by your research advisor by 2/1/2025.
The reading list and qualifying exam topic are not intended to necessarily be your dissertation topic. But you are welcome to make the two overlap if desired. Instead, you will be expected to reason about, write about, conduct a literature search on, and present this topic to demonstrate your ability to conduct doctorate research.
List 1: Data Mining and Information Retrieval
- MentorGNN: Deriving Curriculum for Pre-Training GNNs, Dawei Zhou, Lecheng Zheng, Dongqi Fu, Jiawei Han, and Jingrui He. CIKM, 2022.
- A Data-Driven Graph Generative Model for Temporal Interaction Networks, Dawei Zhou, Lecheng Zheng, Jiawei Han, Jingrui He. KDD, 2020.
- Beta embeddings for multi-hop logical reasoning in knowledge graphs, Hongyu Ren, and Jure Leskovec. NeurIPS, 2020.
- Local motif clustering on time-evolving graphs, Dongqi Fu, Dawei Zhou, and Jingrui He. KDD, 2020.
- Domain Adaptive Multi-Modality Neural Attention Network for Financial Forecasting, Dawei Zhou, Lecheng Zheng, Jianbo Li, Yada Zhu, Jingrui He. WWW, 2020.
- Adversarial attacks on neural networks for graph data, Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. KDD, 2018.
List 2: Natural Language Processing
- Fine-tuned Language Models are Zero-Shot Learners, Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. ICLR, 2022.
- Lifelong Event Detection with Knowledge Transfer, Pengfei Yu, Heng Ji, and Prem Natarajan. EMNLP, 2021.
- Diffusion-LM Improves Controllable Text Generation, Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto. NeurIPS, 2022.
- MERLOT: Multimodal Neural Script Knowledge Models, Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, and Yejin Choi. NeurIPS, 2021.
- TaPas: Weakly Supervised Table Parsing via Pre-training, Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Eisenschlos. ACL, 2020.
- OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework, Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. ICML, 2022.
List 3: Reinforcement Learning
- Brunke, Lukas, Melissa Greeff, Adam W. Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P. Schoellig. “Safe learning in robotics: From learning-based control to safe reinforcement learning.” Annual Review of Control, Robotics, and Autonomous Systems 5 (2022): 411-444.
- Chen, Lili, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. “Decision transformer: Reinforcement learning via sequence modeling.” Advances in neural information processing systems 34 (2021): 15084-15097.
- Fawzi, Alhussein, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov et al. “Discovering faster matrix multiplication algorithms with reinforcement learning.” Nature 610, no. 7930 (2022): 47-53.
- Vinyals, Oriol, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi et al. “Grandmaster level in StarCraft II using multi-agent reinforcement learning.” Nature 575, no. 7782 (2019): 350-354.
- Mai, Vincent, Kaustubh Mani, and Liam Paull. “Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation.” In International Conference on Learning Representations. 2021.
- Zhang, Ruohan, Faraz Torabi, Lin Guan, Dana H. Ballard, and Peter Stone. “Leveraging human guidance for deep reinforcement learning tasks.” arXiv preprint arXiv:1909.09906 (2019).
List 4: Machine Learning and Security
- He, Xinlei, Savvas Zannettou, Yun Shen, and Yang Zhang. “You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content.” In 2024 IEEE Symposium on Security and Privacy (SP), pp. 61-61. IEEE Computer Society, 2023.
- Zou, Andy, Zifan Wang, J. Zico Kolter, and Matt Fredrikson. “Universal and transferable adversarial attacks on aligned language models.” arXiv preprint arXiv:2307.15043 (2023).
- Gehman, Samuel, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. “RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models.” In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3356-3369. 2020.
- Qi, Xiangyu, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, and Peter Henderson. “Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!.” arXiv preprint arXiv:2310.03693 (2023).
- Wei, Jerry, Da Huang, Yifeng Lu, Denny Zhou, and Quoc V. Le. “Simple synthetic data reduces sycophancy in large language models.” arXiv preprint arXiv:2308.03958 (2023).
- Lahnala, Allison, Charles Welch, Béla Neuendorf, and Lucie Flek. “Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy.” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4926-4938. 2022.
- Carlini, Nicholas, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh et al. “Are aligned neural networks adversarially aligned?.” arXiv preprint arXiv:2306.15447 (2023).
List 5: Machine Learning and Software
- ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations, Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Michael O’Boyle, Hugh Leather ICML, 2021.
- Scalable Deep Learning via I/O Analysis and Optimization, Sarunya Pumma, Min Si, Wu-chun Feng, Pavan Balaji TOPC, 2019.
- Iterative Machine Learning (IterML) for Effective Parameter Pruning and Tuning in Accelerators, Xuewen Cui, Wu-chun Feng Computing Frontiers, 2019.
- An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks, Albert Njoroge Kahira et al. HPDC, 2021.
- Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning, Truong Thao Nguyen et al. IPDPS, 2022.
List 6: Online Learning
- Seldin, Y., Bartlett, P. L., Crammer, K., and Abbasi-Yadkori, Y. Prediction with limited advice and multiarmed bandits with paid observations, In International Conference on Machine Learning, 2014.
- Altschuler, J. M. and Talwar, K. Online learning over a finite action set with limited switching, Proceedings of the 31st Conference On Learning Theory, PMLR 75:1569-1573, 2018.
- Arora, R., Marinov, T. V., and Mohri, M. Bandits with feedback graphs and switching costs, Advances in Neural Information Processing Systems, 32, 2019.
- Shi, M., Lin, X., and Jiao, L. Power-of-2-arms for bandit learning with switching costs, In Proceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, pp. 131–140, 2022.
- Duo Cheng, Xingyu Zhou, and Bo Ji, Understanding the Role of Feedback in Online Learning with Switching Costs, Proceedings of ICML 2023, Honolulu, HI, July 2023.
List 7: Spatiotemporal Data Mining
- Hamdi, A., Shaban, K., Erradi, A. et al. Spatiotemporal data mining: a survey on challenges and open problems, Artif Intell Rev 55, 1441–1488 (2022).
- Wang, Senzhang, Jiannong Cao, and S. Yu Philip. Deep learning for spatio-temporal data mining: A survey, IEEE transactions on knowledge and data engineering 34, no. 8 (2020): 3681-3700.
- Liang Zhao, Jiangzhuo Chen, Feng Chen, Fang Jin, Wei Wang, Chang-Tien Lu, and Naren Ramakrishnan. Online flu epidemiological deep modeling on disease contact network, GeoInformatica, Vol. 24, pp. 443–475, 2020.
- Qianyue Hao, Lin Chen, Fengli Xu, and Yong Li. Understanding the urban pandemic spreading of covid-19 with real world mobility data, In Proceedings of the 26th ACMSIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3485–3492, 2020.
- Bai, Lei, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. Adaptive graph convolutional recurrent network for traffic forecasting, Advances in neural information processing systems 33 (2020): 17804-17815.
- Li, Kunchang, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, and Yu Qiao. Uniformer: Unified transformer for efficient spatiotemporal representation learning, arXiv preprint arXiv:2201.04676 (2022).
Grading Scale
The exam will ultimately be graded on a scale of Pass/Fail by GPC policies.
Use of Generative AI Tools
The use of any generative AI tools (e.g., LLMs) during the written exam is strictly prohibited for any purpose, including but not limited to writing, editing, idea generation, grammar correction, or polishing answers. Violations will result in a direct Fail grade for the qualifier exam. All submissions will be monitored, and students may be asked to explain their work if AI use is suspected. Your work must be entirely your own. No exceptions.