Seguir
Owain Evans
Owain Evans
Associate, CHAI, UC Berkeley
Dirección de correo verificada de philosophy.ox.ac.uk - Página principal
Título
Citado por
Citado por
Año
The malicious use of artificial intelligence: Forecasting, prevention, and mitigation
M Brundage, S Avin, J Clark, H Toner, P Eckersley, B Garfinkel, A Dafoe, ...
arXiv preprint arXiv:1802.07228, 2018
1262*2018
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
arXiv preprint arXiv:2206.04615, 2022
1196*2022
When will AI exceed human performance? Evidence from AI experts
K Grace, J Salvatier, A Dafoe, B Zhang, O Evans
Journal of Artificial Intelligence Research 62, 729-754, 2018
1141*2018
Truthfulqa: Measuring how models mimic human falsehoods
S Lin, J Hilton, O Evans
arXiv preprint arXiv:2109.07958, 2021
11152021
Trial without error: Towards safe reinforcement learning via human intervention
W Saunders, G Sastry, A Stuhlmueller, O Evans
arXiv preprint arXiv:1707.05173, 2017
3162017
Help or hinder: Bayesian models of social goal inference
T Ullman, C Baker, O Macindoe, O Evans, N Goodman, J Tenenbaum
Advances in neural information processing systems 22, 2009
2282009
Teaching models to express their uncertainty in words
S Lin, J Hilton, O Evans
arXiv preprint arXiv:2205.14334, 2022
2112022
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
L Berglund, M Tong, M Kaufmann, M Balesni, AC Stickland, T Korbak, ...
arXiv preprint arXiv:2309.12288, 2023
167*2023
The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv. org, vol. cs
M Brundage, S Avin, J Clark, H Toner, P Eckersley, B Garfinkel, A Dafoe, ...
AI, 2018
163*2018
Learning the Preferences of Ignorant, Inconsistent Agents
O Evans, A Stuhlmüller, ND Goodman
Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-2016), 2016
1432016
Truthful AI: Developing and governing AI that does not lie
O Evans, O Cotton-Barratt, L Finnveden, A Bales, A Balwit, P Wills, ...
arXiv preprint arXiv:2110.06674, 2021
1052021
Agent-Agnostic Human-in-the-Loop Reinforcement Learning
D Abel, J Salvatier, A Stuhlmüller, O Evans
arXiv:1701.0407, 2017
832017
AI progress measurement
P Eckersley, Y Nasser, Y Bayle, O Evans, G Gebhart, D Schwenk
Electronic Frontier Foundation, 2017
51*2017
Constructing and adjusting estimates for household transmission of SARS-CoV-2 from prior studies, widespread-testing and contact-tracing data
M Curmei, A Ilyas, O Evans, J Steinhardt
International Journal of Epidemiology 50 (5), 1444-1457, 2021
41*2021
Active Reinforcement Learning: Observing Rewards at a Cost
D Krueger, J Leike, O Evans, J Salvatier
NIPS 2016 Workshop, 2016
37*2016
Learning the Preferences of Bounded Agents
O Evans, A Stuhlmüller, ND Goodman
Advances in Neural Information Processing Systems (Bounded Optimality Workshop), 2015
372015
Taken out of context: On measuring situational awareness in LLMs
L Berglund, AC Stickland, M Balesni, M Kaufmann, M Tong, T Korbak, ...
arXiv preprint arXiv:2309.00667, 2023
35*2023
How to catch an ai liar: Lie detection in black-box llms by asking unrelated questions
L Pacchiardi, AJ Chan, S Mindermann, I Moscovitz, AY Pan, Y Gal, ...
arXiv preprint arXiv:2309.15840, 2023
312023
Modeling Agents with Probabilistic Programs
O Evans, A Stuhlmüller, J Salvatier, D Filan
agentmodels.org, 2017
28*2017
Forecasting future world events with neural networks
A Zou, T Xiao, R Jia, J Kwon, M Mazeika, R Li, D Song, J Steinhardt, ...
Advances in Neural Information Processing Systems 35, 27293-27305, 2022
222022
El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.
Artículos 1–20