BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20251120T205059EST-9347RcMAuE@132.216.98.100 DTSTAMP:20251121T015059Z DESCRIPTION:Title: What Can Statistics Offer to Language Models: Watermarki ng and Evaluation\n\nAbstract: `Large language models (LLMs) have transfor med how we generate and process information\, yet two foundational challen ges remain: ensuring the authenticity of their outputs and accurately eval uating their true capabilities. In this talk\, I argue that both challenge s are\, at their core\, statistical problems\, and that statistical thinki ng can play an important role in advancing reliable and principled researc h on large language models. I will present two lines of work that approach these problems from a statistical perspective.\n\nThe first part introduc es a statistical framework for language watermarks\, which embed impercept ible signals into model-generated text for provenance verification. By for mulating watermark detection as a hypothesis testing problem\, this framew ork identifies pivotal statistics\, provides rigorous Type I error control \, and derives optimal detection rules that are both theoretically grounde d and computationally efficient. It clarifies the theoretical limits of ex isting methods\, such as the Gumbel-max and inverse-transform watermarks\, and guides the design of more robust and powerful detectors. The second p art focuses on language model evaluation\, where I study how to quantify t he unseen knowledge that models possess but may not reveal through limited queries. To that end\, I introduce a statistical pipeline\, based on the smoothed Good–Turing estimator\, to estimate the total amount of a model’s knowledge beyond what is observed in benchmark datasets. The findings rev eal that even advanced LLMs often articulate only a fraction of their inte rnal knowledge\, suggesting a new perspective on evaluation and model comp etence. Together\, these projects represent an ongoing effort to develop s tatistical foundations for trustworthy and reliable language models\, with applications ranging from watermark detection to model evaluation.\n\n🔗 Z oom: https://mcgill.zoom.us/j/85469273736\n Meeting ID: 854 6927 3736\n DTSTART:20251124T163000Z DTEND:20251124T173000Z LOCATION:Room 1104\, Burnside Hall\, CA\, QC\, Montreal\, H3A 0B9\, 805 rue Sherbrooke Ouest SUMMARY:Xiang Li (University of Pennsylvania) URL:/mathstat/channels/event/xiang-li-university-penns ylvania-369123 END:VEVENT END:VCALENDAR