I’m a scientist and startup founder who works on artificial intelligence, thinking about its foundations and applications. I’ve co-founded startups at the frontier of AI.

My co-authored research papers have received awards and challenged critical paradigms in AI.

My research field is in decision making under uncertainty, reinforcement learning, preference models, social choice, conversational recommenders and learning theory.

Learn more

Selected Publications

Non-delusional Q-learning and value-iteration

Tyler Lu, Dale Schuurmans, Craig Boutilier

Advances in Neural Information Processing Systems, 2018

Best Paper Award

Download PDF

We identify a fundamental source of error in Q-learning and other forms of dynamic programming with function approximation. Delusional bias arises when the approximation architecture limits the class of expressible greedy policies. Since standard Q-updates make globally uncoordinated action choices with respect to the expressible policy class, inconsistent or even conflicting Q-value estimates can result, leading to pathological behaviour such as over/under-estimation, instability and even divergence. To solve this problem, we introduce a new notion of policy consistency and define a local backup process that ensures global consistency through the use of information sets---sets that record constraints on policies consistent with backed-up Q-values. We prove that both the model-based and model-free algorithms using this backup remove delusional bias, yielding the first known algorithms that guarantee optimal results under general conditions. These algorithms furthermore only require polynomially many information sets (from a potentially exponential support). Finally, we suggest other practical heuristics for value-iteration and Q-learning that attempt to reduce delusional bias.

Optimal Social Choice Functions: A Utilitarian View

Craig Boutilier, Ioannis Caragiannis, Simi Haber, Tyler Lu, Ariel Procaccia, Or Sheffet

Artificial Intelligence, Volume 227, pp. 190–213 (2015)

Prominent Paper Award 2022

Download PDF

We adopt a utilitarian perspective on social choice, assuming that agents have (possibly latent) utility functions over some space of alternatives. For many reasons one might consider mechanisms, or social choice functions, that only have access to the ordinal rankings of alternatives by the individual agents rather than their utility functions. In this context, one possible objective for a social choice function is the maximization of (expected) social welfare relative to the information contained in these rankings. We study such optimal social choice functions under three different models, and underscore the important role played by scoring functions. In our worst-case model, no assumptions are made about the underlying distribution and we analyze the worst-case distortion—or degree to which the selected alternative does not maximize social welfare—of optimal (randomized) social choice functions. In our average-case model, we derive optimal functions under neutral (or impartial culture) probabilistic models. Finally, a very general learning-theoretic model allows for the computation of optimal social choice functions (i.e., ones that maximize expected social welfare) under arbitrary, sampleable distributions. In the latter case, we provide both algorithms and sample complexity results for the class of scoring functions, and further validate the approach empirically.
This paper has made outstanding contributions to computational social choice by introducing a fundamentally novel approach to voting, now referred to as implicit utilitarian voting. It also initiated the prolific line of research on distortion, leading to survey and tutorial coverage of the topic as well as inclusion in graduate course material. Among its many contributions, the paper includes breakthrough results such as polynomial-time computability of the optimal randomized voting rule and a near-tight analysis of its distortion. It also provides an average-case analysis and learning-theoretic results, thus opening doors to a multi-faceted treatment of the subject.

Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning

Shai Ben-David, Tyler Lu, David Pal

Conference on Learning Theory, 2008

Download PDF

We study the potential benefits to classification prediction that arise from having access to unlabeled samples. We compare learning in the semi-supervised model to the standard, supervised PAC (distribution free) model, considering both the realizable and the unrealizable (agnostic) settings.
Roughly speaking, our conclusion is that access to unlabeled samples cannot provide sample size guarantees that are better than those obtainable without access to unlabeled data, unless one postulates very strong assumptions about the distribution of the labels.
In particular, we prove that for basic hypothesis classes over the real line, if the distribution of unlabeled data is ‘smooth’, knowledge of that distribution cannot improve the labeled sample complexity by more than a constant factor (e.g., 2). We conjecture that a similar phenomena holds for any hypothesis class and any unlabeled data distribution. We also discuss the utility of semi-supervised learning under the common cluster assumption concerning the distribution of labels, and show that even in the most accommodating cases, where data is generated by two uni-modal label-homogeneous distributions, common SSL paradigms may be mis- leading and may result in poor prediction performance.

On the Value of Using Group Discounts under Price Competition

Reshef Meir, Tyler Lu, Moshe Tennenholtz, Craig Boutilier

Artificial Intelligence, Volume 216, pp. 168–178 (2014)

AAAI Honorable Mention Award 2013

Download PDF

The increasing use of group discounts has provided opportunities for buying groups with diverse preferences to coordinate their behavior in order to exploit the best offers from multiple vendors. We analyze this problem from the viewpoint of the vendors, asking under what conditions a vendor should adopt a volume-based price schedule rather than posting a fixed price, either as a monopolist or when competing with other vendors. When vendors have uncertainty about buyers’ valuations specified by a known distribution, we show that a vendor is always better off posting a fixed price, provided that buyers’ types are i.i.d. and that other vendors also use fixed prices. We also show that these assumptions cannot be relaxed: if buyers are not i.i.d., or other vendors post discount schedules, then posting a schedule may yield higher profit for the vendor. We provide similar results under a distribution-free uncertainty model, where vendors minimize their maximum regret over all type realizations.

Selected Publications

Non-delusional Q-learning and value-iteration

Optimal Social Choice Functions: A Utilitarian View

Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning

On the Value of Using Group Discounts under Price Competition

Email

LinkedIn

X/Twitter