Keeping AI Beneficial and Safe for Humanity

--

Fireside chat with Rosie Campbell, The Center for Human-Compatible AI

Just as a plane flying a single degree off course can end up hundreds of miles from its target and potentially crash, without dedicated effort and oversight, Artificial Intelligence (AI) could take us somewhere we’d prefer to avoid.

As we develop increasingly capable AI systems that become highly competent and self-sustaining, how can humans ensure that these AI systems remain beneficial and safe for humanity?

To discuss the significant impact of developments in AI on the future of humanity, we invited Rosie Campbell, Assistant Director of the Center for Human-Compatible AI (CHAI) at UC Berkeley to our AI & Emerging Tech Meetup to share an insider perspective on how CHAI is approaching this problem using machine learning, decision theory, and human-robot interaction.

1.Let’s start with how you got into this field and what brought you to the Bay Area.

Before moving to Berkeley, I worked as a Research Engineer at the BBC’s Research and Development department in the UK. Not many people realize the BBC has an R&D department but it’s actually responsible for a lot of the advances in broadcast technology over the last century. As a research engineer my role was to build prototypes and test ideas for applying novel technology to broadcasting and the media.

My most recent project involved exploring how AI might help automate the coverage of live events. Alongside this, I was studying for a doctorate in Visual Computing and Machine Learning as well. As I learnt more about AI I started to have some concerns about the ethics and safety of such potentially powerful systems. I’ve always been interested in the effects of technology on society and founded a Futurist community group to explore these issues. My undergraduate degree was Physics and Philosophy and my Masters was Computer Science so AI safety and ethics seemed to be the intersection of all my interests.

I eventually decided I’d like to work more directly on AI safety. I had a coaching session with 80,000 Hours, and they suggested a role at Center for Human Compatible AI (CHAI) at UC Berkeley. I was pretty unsure at first because it was on the other side of the world, and was more of an operations/managerial role than a technical one. However, I was excited about their mission so decided to apply. After coming over for an interview I fell in love with Berkeley, and so decided to leave my PhD and BBC job and join CHAI. After waiting 7 months for my visa I finally came over in January this year!

Trending AI Articles:

1. Building a Django POST face-detection API using OpenCV and Haar Cascades

2. Understanding and building Generative Adversarial Networks(GANs)

3. Learning from mistakes with Hindsight Experience Replay

4. Predicting buying behavior using Machine Learning

2. Tell us more about CHAI and your role there.

CHAI is a multi-institution research team based at UC Berkeley with a number of branches at other US schools. It was founded by Prof. Stuart Russell, who has been a pioneer in AI for decades - his book Artificial Intelligence: A Modern Approach is the standard textbook for many AI courses. The Executive Director is Dr. Mark Nitzberg, who is also the Head of Strategic Outreach at the Berkeley AI Research Lab, and has built numerous companies in the field of AI and Computer Vision.

CHAI’s focus is technical AI safety, and most of our researchers are from Computer Science and Engineering, but our work has a lot of overlap with social science as well. We are a mix of faculty, postdocs, graduate students, and a few members of staff.

I’m the Assistant Director, which in startup terms is something akin to a COO. This means that I handle most things aside from actually doing the research. This includes helping define our long-term and short-term strategy, fundraising, recruitment, event organizing, general operations, project/program management, outreach, comms, and more.

3. CHAI’s mission is to ‘reorient the field of AI towards beneficial systems’. What does this mean?

One analogy that Stuart Russell uses that I find helpful is bridges. When we ask a civil engineer to build a bridge, we don’t have to specify ‘make sure it’s safe’ or ‘make sure it doesn’t fall down’. These concepts are built-in when we talk about bridges. Similarly, CHAI would like to get the field of AI to the point where if we ask a software engineer to build an AI system, we don’t have to specify things like value alignment, ethics, and human-compatibility — they should be built right into the definition of AI. If AI is not beneficial to humans, it’s not actually achieving its purpose. Yet we currently have no guarantees that the systems that are in development at the moment are going to be beneficial, and some good reason to believe they won’t be by default — just as a bridge built without the right engineering expertise likely wouldn’t be safe.

4. Some people are quite skeptical about superintelligence and the need for AI safety research. What would you say about this?

Firstly, I think ‘superintelligence’ is not an ideal term here because it’s usually used to mean ‘smarter than human’ AI. While this would have safety issues,

“I’m not sure we need to have ‘smarter than human’ AI for a system to be dangerous. Any system that is sufficiently competent could be dangerous, even if it doesn’t resemble something that we would recognise as human-like. So even if you believe human-like AI is many years away, you might still think there is a reason to be concerned with AI safety.”

Having said that, there are a lot of reasonable objections to working on AI safety.

  1. Andrew Ng said it’s like “worrying about overpopulation on mars”, i.e., it’s way too early for AI safety to be a concern. While I understand the sentiment, I imagine the people in the industrial revolution might have said the same thing if you warned them about climate change, yet we would probably be in a much better place if people had started working on solutions to climate change much earlier. (In fact, scientists were already warning about the CO2-induced greenhouse effect in the late 19th century.) There’s also a lot of uncertainty about exactly when we will achieve powerful AI, so assuming that it’s definitely far off could be risky.
  2. Some people ask: How can we do useful AI safety work when we don’t know if AGI will look anything like current systems? An analogy that is often used is: would people have been able to make meaningful safety suggestions to the future of transport when the main mode was horses and carts, not cars? Some arguments made recently by DeepMind researchers suggest that while they wouldn’t have got everything right, it seems plausible that they might have been able to invent certain features like safety belts, pedestrian-free roads, an agreement about which side of the road to drive on, and some sort of turn-taking signal system at busy intersections.
  3. Some people argue that focusing on AI safety distracts from current problems in AI like algorithmic bias or fairness and accountability. Again, I definitely understand this sentiment but I see these issues as part of the same spectrum of ‘human-compatible, beneficial AI’ that does what we actually want it to do. I’m optimistic that progress in one area may provide insights into the other. So I don’t see this as an ‘either/or’ and think it makes sense for people to work on different areas of the problem.
  4. I think a lot of AI researchers in particular are worried about ‘over-hyping’ the power of AI, which leads to people being disillusioned at the progress, which leads to funding drying up, which leads to another ‘AI winter’ like we’ve seen in previous decades. People claiming that safety is important might make it sound like AI is much more advanced than it is. To me, the solution is not to stop talking about safety but to just ensure that we also accurately convey the full picture, including the current state of AI capabilities. This can be tricky because the media likes to sensationalize AI.

In general, I think there’s actually less of a disagreement than is often portrayed. To paraphrase Scott Alexander:

“The “skeptic” position seems to be that, although we should probably get some people working on the problem, we shouldn’t panic. The “believers”, meanwhile, insist that although we shouldn’t panic, we should probably get some people working on the problem.”

5. What is the main problem that CHAI is trying to solve?

The main one, which people may have heard of, is the concept of AI alignment. This is the idea that AI will by default do what we say, not what we mean. We know from stories about genies, wishes, and King Midas that this is not always the same thing!

If we ask a human to try to cure cancer, we can be pretty certain they have enough context and background assumptions that they won’t do it by eliminating all humans or by inducing tumors in all humans so that more medical trials can be run in parallel. An AI system by default won’t have the same context and assumptions, will take our instructions literally, and could take an approach that technically achieves the goal but in a way that we very much don’t endorse.

This can be seen as a problem of ‘misspecified objective functions’; it’s hard to reliably specify what we actually want. We can think of less extreme examples as well; such as a cleaning robot that throws out your favorite jewelry just because it happened to fall on the floor.

DeepMind researcher Vika Krakovna has compiled a list of examples of AI behaving in unexpected ways due to misspecified objective functions. It’s quite scary how even in simple scenarios, these systems exploit loopholes or come up with creative solutions to maximize their reward that we would never have even thought of. A couple of highlights include:

  • An artificial-life simulated evolution system with the goal of creating very fast movement: rather than evolving legs and running, or wheels, it evolved to be very very tall and then fell over.
  • A circuit designed by a genetic algorithm that included seemingly pointless disconnected wires, but which actually turned out to be exploiting quirks of the hardware and the temperature of the environment to work.

This is one of the reasons I’m worried about powerful AI - even if we think we’ve anticipated all the ways it could go wrong, chances are we probably haven’t!

“It’s important to note that these systems do not need to be malicious or in any way anthropomorphic to be dangerous, they simply need to be competent at doing what we tell them. It’s just that what we tell them is often not what we really want.”

Although this makes up the majority of our research, there are also other problems we consider important for AI safety. One example is coordination: What happens when you have multiple agents and humans in an environment together? How can we incentivize them to cooperate and coordinate? We have some work on ‘open source game theory’ that attempts to make progress on this.

6. Can you give us a real world example of the ‘misspecified objective function’ problem that affect us today?

A very topical one is the current backlash against social media. Social media companies developed systems that were designed to optimize for something like ‘engagement’, which it measured with proxy metrics like ‘clicks’ or ‘time spent on site’ etc.

This system apparently discovered that more outrageous content generated more engagement, so people were fed more and more extreme content. This had the effect of polarizing their views, making them more easily outraged, which in turn made them more likely to click on more content.

Not only was the content optimized in a way we probably don’t endorse, but the system actually learnt how to manipulate us to make us more likely to click and therefore make it easier for it to achieve its own objective function. I doubt that most people would approve of a robot manipulating humans to make its own task easier! But this is what happens when the systems ruthlessly maximize for a reward function that it too simplistic and doesn’t reflect our actual values and preferences.

It’s examples like this that to me help illustrate how near-term concerns around AI ethics, fairness, transparency, etc are actually on the same spectrum as longer term concerns about aligned AI, and why I believe that making progress on one end may help the other.

7. How does CHAI plan to solve the problem of AI (mis) alignment?

CHAI has the advantage of being one of the few academic institutions focused on this problem. This means we are in a unique position to leverage the academic community and process. We plan to do this in three main ways:

  1. Producing and disseminating high quality technical research that makes progress on the problem. I’ve discussed this further below in section 9.
  2. Building an academic community and pipeline of future researchers, by holding events, running courses, and hosting interns and visiting researchers.
  3. Contributing thought-leadership to the public conversation on these issues, and helping to inform policy and strategy decisions. We do this by giving talks, writing articles and advising influential people and organizations.

8. Can you give a concrete example of how CHAI has made progress on beneficial AI?

Probably the best known research finding from CHAI is ‘Cooperative Inverse Reinforcement Learning’ (CIRL). This gave us a number of useful insights, one of which is how we might achieve “corrigibility” in an AI system, i.e., the ability to correct it or turn it off if it starts doing something we don’t like. This was explored in the the paper ‘The Off-Switch Game’.

This isn’t trivial — you might think you can just pull the plug on any agent that starts behaving in an undesirable way. However, if the agent is competent, it will have anticipated this and potentially taken steps to disable its off-switch or otherwise prevent you from turning it off. Why might it do this? If it has been programmed to receive maximum reward by completing a certain task (even a simple on like ‘fetch the coffee’), it knows that it won’t be able to complete the task if it gets turned off. So ‘survival’ is an instrumental goal for agents that want to achieve some other final goal.

In CIRL, both the human and agent maximize a shared reward function, but only the human knows the actual reward function. The agent has a probability distribution over the possible reward functions, but it has some uncertainty. Because it is maximizing the shared reward function between it and the human, it can gain information about the true reward by observing the human. Therefore, if it sees the human is about to turn it off, it will view this as useful information about the true reward function and allow itself to be turned off because it must be doing something wrong, which is not in its own best interests.

9. For a long time, tech has been the realm of “tech bros”. How does the technical part of AI intersect with non-technical and what contributions can non-technical folks make to this field?

Perhaps the most obvious one is Philosophy. In order to ensure an AI system is aligned with and beneficial to humans, we need to understand how to deal with the fact that humans often have incompatible or competing values and preferences, the role of altruism, the consequences of weakness of will, various bugs in simplistic utilitarian approaches, etc. Thankfully, these are questions Philosophers have been studying for millenia! So we find it very valuable to engage with moral philosophers and applied ethicists.

“An AI system cannot invert actual human behavior to identify preferences unless it has some sort of model of how preferences produce behavior in humans. We therefore expect to gain many useful insights from studying fields like cognitive science and psychology which engage with this question.”

Another question we have is how we can design the correct incentives for AI systems, and also how different AI systems will interact with each other as well as humans. These are questions that have been studied with respect to humans in fields like economics and law, via topics like game theory, decision theory, and mechanism design.

If we can anticipate how AI will be developed and when, we have a much better chance of designing safety systems that work. It’s likely that studying evolution and neuroscience to understand how human intelligence evolved could help us with this kind of forecasting.

Finally, we suspect that even if we solve the technical problems of AI safety, there will still be the social challenges of how to get our recommendations adopted. For this, fields like politics may come in very handy.

10. What do you consider your biggest achievements so far?

I think in general it’s that we’ve shown through our initial research that there are ways to make progress on the problem of AI safety and alignment. CHAI’s creation was a bit of an experiment - when it was founded in 2016 it wasn’t necessarily clear that this would be possible, for some of the reasons we’ve already discussed. And although our work is very preliminary, we are now more confident that this is a promising research approach and we’re excited about encouraging others to take an interest in the field.

There are two main hurdles:

  1. Media sensationalism. The media often portrays concerns about AI as worrying about Skynet or the Terminator or some other malevolent system. This is a strawman - we are not worried about conscious AI turning evil or wanting revenge on humans or anything!
  2. Disagreements/controversy in the AI community. Many influential people don’t think safety is worth worrying about. This can make it difficult to get the problem taken seriously. However, we’re very keen to engage with these arguments and even have some skeptics as faculty affiliates. We are committed to scientific and technical rigor and updating our thinking based on evidence.

11.What’s next for CHAI and how can folks support your mission?

Mostly more of the same, continuing to produce research, continuing to refine our strategy, and trying to grow our team and the field at a sustainable pace. We are going to be putting some more effort into comms in the near future so keep an eye out for that. Our next big thing will be our Annual Workshop in May, which is a gathering of technical researchers and social scientists across academia, industry and nonprofits who are interested in the problem of AI alignment. We are also in talks with a couple of other universities who are interested in setting up CHAI branches.

If people would like to support CHAI’s mission, there are a number of things you can do. If you’re talented in research or ML/AI, you could apply for one of our open positions. In particular, we are taking applications for our 2019 summer internship program until December 21. This is a great choice for people who want to test the waters or who don’t come from a directly relevant background, but still have strong technical skills.

Alternatively, you can donate to CHAI here if you’d like to support us financially.

12. How can people learn more about AI safety and CHAI?

If you’re new to the concept of AI safety and alignment, first watch Stuart’s TED talk. Then, I highly recommend Max Tegmark’s book ‘Life 3.0’ and Nick Bostrom’s ‘Superintelligence’. I’d suggested Life 3.0 first as it gives a nice overview of the issues, whereas Superintelligence is more of a deep dive.

One of our students, Rohin Shah, has started a newsletter on AI alignment which is a great way to keep up with the field.

If you’re interested in CHAI in particular, you can go to our website humancompatible.ai and follow us on Twitter @CHAI_Berkeley @RosieCampbell

Mia Dand is the CEO of Lighthouse3.com, a Strategic Research & Advisory firm based in the San Francisco Bay Area. Mia is an experienced marketing leader who helps F5000 companies innovate at scale with AI and other emerging technologies. For more insightful discussions and latest AI ethics research you can follow her on Twitter @MiaD

--

--

Responsible AI Leader, Founder - Women in AI Ethics™ and 100 Brilliant Women in AI Ethics™ list #tech #diversity #ethics #literacy