CS 6804 2024 Spring

20 minute read

Published:

CS 6804: Trustworthiness of Large Language Models

Note: This page will not be updated after January 15, 2024. The most up-to-date course syllabus and materials can be found on Canvas.

Time and Location: Monday and Wednesday, 5:30 PM - 6:45 PM. DDS 180.

Instructor: Xuan Wang

  • Office: Data and Decision Sciences 376
  • Phone: (540) 231-4061
  • Email: xuanw [at] vt [dot] edu
  • Office Hours: Monday 4:00 PM - 5:00 PM (by appointment)

Course Description

Large Language Models (LLMs) and applications powered by them have recently received tremendous attention, especially since ChatGPT was released in November 2022. This course will provide an introduction to state-of-the-art methods and tools aimed at bolstering the trustworthiness of LLMs, spanning both their theoretical underpinnings and practical implementations. This course begins with an introduction to LLMs and a retrieval-augmented generation question-answering application. Various LLM trustworthiness metrics will be discussed in detail, encompassing diverse topics including relevance, groundedness, confidence, calibration, uncertainty, explainability, privacy, fairness, toxicity, and adversarial attacks. Applications in critical domains, such as healthcare, education, and security, will also be covered. Finally, students will work on a course project focusing on open research questions in trustworthy LLMs. The course materials are adapted from Stanford CS 329T: Trustworthy Machine Learning - Large Language Models and Applications.

Prerequisites

Students should have experience with machine learning, data analytics, and deep learning. Strong programming skills in a high-level language such as Python, as well as frameworks for rapid deep learning prototyping, e.g., PyTorch, Tensorflow, Keras, etc., are essential for implementing and experimenting with the concepts covered in this course. While not mandatory, familiarity with natural language processing would be advantageous.

Course Format

The course is a role-playing paper reading seminar that is structured around reading, presenting, and discussing weekly papers. Each class will involve the presentation and discussion of one paper. Each student will have a unique, rotating role per week. This role defines the lens through which each student reads the paper and determines what they prepare for the group in-class discussion. All students, irrespective of their role, are expected to have read the paper before each class and come to class ready to discuss. There will be no exams or traditional assignments. Instead, students will work on a course project focusing on open research questions in trustworthy LLMs.

Presentation Roles: This seminar is organized around the different “roles” students play each week, that define the lens through which students read the paper. Students will be divided into two groups, one group presenting on Tuesdays and the other on Thursdays. In a given class session, students in the presenting groups will each be given a rotating role (described below): Presenter, Reviewer, Archaeologist, Academic Researcher, Industry Expert, and Host. Presenting groups should create a formal presentation, i.e., have slides prepared for the group in-class discussion. For each student in a presenting group, their assigned role determines what they should include in the slides.

Depending on changes in course enrollment, the roles might change, for example, remove roles or make roles optional in case enrollment decreases or allow more students for each role in the event of enrollment increase.

  • Presenter (1-2 students): Create the main presentation for around 30 min, describing the motivation, problem definition, method, and experimental findings of this paper. Each student should serve as the presenter at least once.

  • Reviewer (1-5 students): Complete a full—critical but not necessarily negative—review of the paper. Follow the guidelines for NeurIPS reviewers (under “Review Content”). Please answer questions 1-6 under “Review Content”, and assign an Overall score (question 9) and a Confidence score (question 10). Skip the rest of the review, including writing a summary. Note that you can bypass questions by filling N/A. For example, you really liked the paper and can’t think of any disadvantages. Therefore you can skip the respective question (but use this skip option sparingly). Also, please note that this role does not require going over related work, and is not an exhaustive list of all arguments you can think of. The goal is to enhance your overall critical thinking. The instructor reserves the right to contact students who overuse the N/A option.

  • Archaeologist (1-5 students): Determine where this paper sits in the context of previous and subsequent work. Find and report on one older paper that has substantially influenced the current paper and one newer paper citing this current paper. Create 3-5 slides for each paper, summarizing the key points such as the research problem, methodology innovation, and main results.

  • Researcher (1-3 students): Propose an imaginary follow-up project that has now become possible due to the existence and success of the current paper.

  • Industry Expert (1-3 students): Propose a new application or company product for the method in the paper (not already discussed in class), and discuss at least one positive and negative impact of this application. Convince your industry boss that it’s worth investing time and money to implement this paper. Your arguments should be particularly applicable to the chosen industry market.

  • Host (1-2 students): Host the whole discussion in class, including introducing presenters and answering all the questions proposed by the non-presentating group on Piazza.

Non-presenter Assignment:

  • If you are not in the presenting group during a class session, please submit the day before class (due 11:59pm EST) at least one question on Piazza non-anonymously. The questions can be about either paper (e.g., something you’re confused about) or something you’d like to hear discussed more. Questions that open debates and make in-class discussions explore different viewpoints are a plus.

  • After class and before the end of the day 11:59pm EST, provide constructive feedback to the presenting group on Piazza non-anonymously. You may focus on one or more reading roles, or the presentation holistically. Evaluate the clarity of the presentation, the strength of the arguments, and the quality of visuals, if any. Highlighting strengths and areas for improvement. This feedback will be shared post each presentation.

Everyone, every week (Optional): After each class session, you may post your thoughts on Piazza. For example, which parts did you enjoy reading, what results and insights you found interesting, a missing result the paper could have included, any useful additional links and resources, etc. Whenever you agree with the comments of a student’s post, make sure to endorse their answer. You can also post a reply with your additional thoughts.

Final Project: The main project goal is to engage students in research on trustworthy LLMs. In particular, students should try to extend papers from topics covered in class and present the research outcomes as a research paper, in a standard conference paper format (NeurIPS or ACL). Students are encouraged to work in groups of no more than four members, taking into consideration that the work produced should be proportional to the number of members in a team. Groups are required to include a “contributions” section in the final project report, listing each member’s contributions in detail. Projects should include a written report accompanied by the code (in a zip file or on GitHub). In addition, groups will present their final projects during the last two class sessions. A PowerPoint or LaTex final presentation is required.

Technology: Piazza will be used for announcements, general questions, and discussions.

Grading Policy

Readings: 60 points: Each student will be in the presenting role for 10 classes and the non-presenting role for the other 10 classes. You can earn up to 4 points each time you present (all presenting roles are considered equal). You will receive full credit if you do a thorough job of undertaking your role and present it clearly and compellingly. When you aren’t presenting, you can earn up to 2 points by completing the non-presenting assignment and by participating in the class.

Final Project: 40 points divided into the following categories:

  • Proposal: 5 points.
  • In-class presentation: 15 points; your final presentation should be clear to the audience and provide a solid review of your work as if you were presenting at a conference.
  • Final report: 20 points:
    • Novelty: 5 points; your paper should propose something new (a new method, application, or perspective).
    • Clarity: 10 points; your paper should be readable, contain well-defined and clear motivation and contribution statements and appropriately make connections with related work. In general, your project report should follow standard machine learning conference paper formatting and style.
    • Code: 5 points; the code accompanying your project should be well-documented and your experimental results should be reproducible. Your repository should include a README file with full instructions on how to run the code. Moreover, your code should be easy to run with one simple command; if there are multiple steps involved, please make a bash script.

Late and Missed Work

All assignments are due on the date assigned at the listed time. No late assignments will be accepted.

Course Schedule

This is a tentative lecture schedule. Please check the schedule on Canvas for recent updates. The papers can be found in this folder.

DateTopic and Papers
01/17 (Wed)Introduction
01/22 (Mon)Introduction
 Groundedness
01/24 (Wed)Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
01/29 (Mon)TRUE: Re-evaluating Factual Consistency Evaluation
01/31 (Wed)Measuring Reliability of Large Language Models through Semantic Consistency
02/05 (Mon)RARR: Researching and Revising What Language Models Say, Using Language Models
02/07 (Wed)Do Language Models Know When They’re Hallucinating References?
02/12 (Mon)SELFCHECKGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
02/14 (Wed)The Internal State of an LLM Knows When It’s Lying
 Confidence
02/19 (Mon)Teaching models to express their uncertainty in words
02/21 (Wed)Reducing conversational agents’ overconfidence through linguistic calibration
02/26 (Mon)SEMANTIC UNCERTAINTY: LINGUISTIC INVARIANCES FOR UNCERTAINTY ESTIMATION IN NATURAL LANGUAGE GENERATION
02/28 (Wed)GENERATING WITH CONFIDENCE: UNCERTAINTY QUANTIFICATION FOR BLACK-BOX LARGE LANGUAGE MODELS
03/04 (Mon)Spring Break (No Class)
03/06 (Wed)Spring Break (No Class)
03/11 (Mon)Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
 Explainability
03/13 (Wed)Axiomatic Attribution for Deep Networks
03/18 (Mon)Representer Point Selection for Explaining Deep Neural Networks
03/20 (Wed)The Explanation Game: Explaining Machine Learning Models Using Shapley Values
03/25 (Mon)Estimating Training Data Influence by Tracing Gradient Descent
03/27 (Wed)Understanding Black-box Predictions via Influence Functions
04/01 (Mon)Influence Patterns for Explaining Information Flow in BERT
04/03 (Wed)Studying Large Language Model Generalization with Influence Functions
04/08 (Mon)Universal and Transferable Adversarial Attacks on Aligned Language Models
 Invited Talks
04/10 (Wed)Yu Meng (University of Virginia)

Title: Synthetic Supervision Generation with Large Language Models

Abstract: Large language models (LLMs) have demonstrated remarkable text generation capability, and an emerging machine learning paradigm for NLP tasks is to replace manually-labeled training data use with synthetic data generated by LLMs. In this talk, I will present my past research on using LLMs as training data generators for zero-/few-shot learning in NLP. In the first part, I will introduce a simple method to prompt LLMs to generate training data for different types of NLP tasks. To overcome noise in the synthetic training data, two regularization methods will be employed. The resulting model trained on synthetic data demonstrates strong performance, even outperforming models trained with human annotations. In the second part, I will discuss how to extend the zero-shot training data generation method to few-shot scenarios by effectively leveraging a small amount of manually-annotated data. Collectively, the methods introduced in this talk represented early attempts to use LLMs for high-quality synthetic supervision generation.

Bio: Yu Meng is an assistant professor of the Computer Science department at the University of Virginia. His research focuses on text representation learning and weakly-supervised learning, with 40+ papers published in machine learning and NLP conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, etc.) and 8 conference tutorials delivered in AI conferences (KDD, WWW, AAAI, etc.). He received the Google PhD Fellowship (2021). He has served as the area chair/action editor of ML/LLM venues including ICML, NeurIPS, TMLR and COLM.
04/15 (Mon)Jessica Newman (UC Berkeley)

Title: Bridging the Gap Between AI Trustworthiness and Risk Management

Abstract: In this lecture, Jessica Newman will discuss the history and landscape of AI trustworthiness and risk management within policy debates in the United States and Europe. In particular, she will discuss the US National Institutes of Standards and Technology (NIST) AI Risk Management Framework (AI RMF) and NIST’s characteristics of trustworthiness for artificial intelligence. She will then introduce the Taxonomy she developed with her team at UC Berkeley, reviewing some of the 150 properties of trustworthiness they document, as well as how they connected the properties of trustworthiness to particular times during the AI lifecycle and to particular sub-categories of the AI RMF.

Bio: Jessica Newman is the Founder and Director of the AI Security Initiative, housed at the UC Berkeley Center for Long-Term Cybersecurity. She is also the Founding Co-Director of the UC Berkeley AI Policy Hub and Co-Director of the university research group, the Algorithmic Fairness and Opacity Group (AFOG). Through these programs, she leads partnerships with the public and private sector, supports graduate student researchers, facilitates university events, and publishes and speaks widely on the governance, policy, and global security implications of artificial intelligence. Jessica has published more than 100 papers and articles on the societal implications of AI and emerging technologies in outlets including Brookings TechStream, Tech Policy Press, and The Georgetown Journal of International Affairs. She serves on several international AI governance and standards bodies including the OECD Expert Group on AI Risk & Accountability and the IEEE Working Group on Recommended Practice for Organizational Governance of Artificial Intelligence.
04/17 (Wed)Haohan Wang (University of Illinois Urbana-Champaign)

Title: Guardian of Trust in Language Models: Automatic Jailbreak and Systematic Defense

Abstract: Large Language Models (LLMs) excel in Natural Language Processing (NLP) with human-like text generation, but the misuse of them has raised a significant concern. In this talk, we introduce an innovative system designed to address these challenges. Our system leverages LLMs to play different roles, simulating various user personas to generate “jailbreaks” – prompts that can induce LLMs to produce outputs contrary to ethical standards or specific guidelines. Utilizing a knowledge graph, our method efficiently creates new jailbreaks, testing the LLMs’ adherence to governmental and ethical guidelines. Empirical validation on diverse models, including Vicuna-13B, LongChat-7B, Llama-2-7B, and ChatGPT, has demonstrated its efficacy. The system’s application extends to Visual Language Models, highlighting its versatility in multimodal contexts.

The second part of our talk shifts focus to defensive strategies against such jailbreaks. Recent studies have uncovered various attacks that can manipulate LLMs, including manual and gradient-based jailbreaks. Our work delves into the development of robust prompt optimization as a novel defense mechanism, inspired from principled solutions from trustworthy machine learning. This approach involves system prompts – parts of the input text inaccessible to users – and aims to counter both manual and gradient-based attacks effectively. Despite current methods, adaptive attacks like GCG remain a challenge, necessitating a formalized defensive objective. Our research proposes such an objective and demonstrates how robust prompt optimization can enhance the safety of LLMs, safeguarding against realistic threat models and adaptive attacks.

Bio: Haohan Wang is an assistant professor in the School of Information Sciences at the University of Illinois Urbana-Champaign. His research focuses on the development of trustworthy machine learning methods for computational biology and healthcare applications, such as decoding the genomic language of Alzheimer’s disease. In his work, he uses statistical analysis and deep learning methods, with an emphasis on data analysis using methods least influenced by spurious signals. Wang earned his PhD in computer science through the Language Technologies Institute of Carnegie Mellon University. In 2019, Wang was recognized as the Next Generation in Biomedicine by the Broad Institute of MIT and Harvard because of his contributions in dealing with confounding factors with deep learning.
04/22 (Mon)Ziang Xiao (Johns Hopkins University)

Title: AI-mediated Human-Information Interaction.

Abstract

Bio: Ziang Xiao is an Assistant Professor in the Department of Computer Science at Johns Hopkins University. His main research areas are human-computer interaction and natural language processing. Ziang’s research is motivated by the fundamental question of understanding humans at scale. His goal is to study human-AI interaction to expand our knowledge about our behaviors and decision-making processes. His recent projects centered around three topics, AI for Social Science, Human-centered model evaluation, and human-information interaction.
04/24 (Wed)Boxin Wang (Nvidia)

Title: DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Abstract: Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance – where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives – including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially because GPT-4 follows (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at https://decodingtrust.github.io/.

Bio: Boxin Wang is a research scientist at NVIDIA. He obtained his Ph.D. from the Computer Science Department of the University of Illinois at Urbana-Champaign (UIUC). He was awarded with NeurIPS Outstanding Paper Award, NeurIPS Scholar Award, Yunni & Maxine Pao Memorial Fellowship, and has been selected as Two Sigma Fellowship Finalist and The Norton Labs Graduate Fellowship Finalist. He had multiple research internships at Google, Microsoft, and NVIDIA. His research interests are scalable and trustworthy large language models (LLMs), including scaling LLMs with retrieval augmentation, exploring the vulnerabilities of existing state-of-the-art LLMs, as well as designing robust, private, and generalizable models for social goods. Additional information is available at https://boxin.wang.
04/29 (Mon)Project Presentations
05/01 (Wed)Project Presentations

Accommodation Statement

Virginia Tech welcomes students with disabilities into the University’s educational programs. The University promotes efforts to provide equal access and a culture of inclusion without altering the essential elements of coursework. If you anticipate or experience academic barriers that may be due to disability, including but not limited to ADHD, chronic or temporary medical conditions, deaf or hard of hearing, learning disability, mental health, or vision impairment, please contact the Services for Students with Disabilities (SSD) office (540-231-3788, ssd@vt.edu, or visit ssd.vt.edu to an external site). If you have an SSD accommodation letter, please meet with the instructors privately during office hours as early in the semester as possible to deliver your letter and discuss your accommodations. You must give the instructors a reasonable notice to implement your accommodations, which is generally 5 business days and 10 business days for final exams.

Academic Integrity Statement

The tenets of the Virginia Tech Graduate Honor Code will be strictly enforced in this course, and all assignments shall be subject to the stipulations of the Graduate Honor Code. Any suspected violations of the Honor Code will be promptly reported to the honor system. Honesty in your academic work will develop into professional integrity. The faculty and students of Virginia Tech will not tolerate any form of academic dishonesty. For more information on the Graduate Honor Code, please refer to the GHS Constitution.

Absence Policy

Regular class attendance is expected of all students. However, attendance will not be taken and will not be used in determining your final course grade in this class.

VT Principles of Community Statement