[해외 DS] AI가 어떻게 작동하는지 모르는데 신뢰할 수 있을까?

[해외DS]는 해외 유수의 데이터 사이언스 전문지들에서 전하는 업계 전문가들의 의견을 담았습니다. 저희 데이터 사이언스 경영 연구소 (GIAI R&D Korea)에서 영어 원문 공개 조건으로 콘텐츠 제휴가 진행 중입니다.

인간이 무엇인가를 신뢰하기 위해선 예측 가능성과 윤리적 통념이 필요하다. 인공지능이 이 두 가지 요소를 만족하기 위해선 아직 풀어야 할 과제가 많이 남아있다.

“이랬다저랬다” 하는 모델, 믿기 어려워

같은 회사에서 만든 모델도 일관적인 성능을 보이지 못하는 경우가 있다. OpenAI의 사례만 보더라도 GPT-3와 GPT-4를 비교할 필요 없이 최신 GPT-4와 초기 버전만 놓고 보더라도 수학 문제 정답률이 낮아졌다. 업계 전문가들은 대규모언어모델을 인간 피드백형 강화학습 방식으로 만들었기 때문에 미세 조정 과정에서 예상치 못한 영역에서 성능이 좋아지거나 나빠질 수 있다고 설명한다. 아직 이에 대한 확실한 해결책은 없다. 단순히 데이터나 파라미터 개수를 늘린다고 해결되지 않는 문제이며 학습 알고리즘이 바뀌지 않는 한 생성형 AI의 고질병으로 남을 가능성이 높다.

일관된 판단 기준이 없는 문제는 실수 비용이 큰 의료, 금융, 군대, 자율주행 같은 분야에서 더 치명적이다. 설명 가능한 AI(XAI: eXplainable AI)에 대한 수요가 높지만, 지금처럼 복잡한 모델 구조를 유지하면서 해석력을 높이기는 쉽지 않다. 해석력을 높이기 위해 파라미터 개수를 줄이면 정확성을 일정 부분 포기해야 하는 문제도 발생한다. 중요한 결정에 사용될 정도로 길들지 않은 것에 비해 AI는 이미 우리의 삶에 깊숙이 침투했기 때문에 산업별로 적절한 규제가 수립돼야 한다. AI를 신뢰하기엔 이른 이유다.

타인의 인식도 고려한 의사 결정 과정 필요해

신뢰는 예측 가능성뿐만 아니라 규범이나 윤리적 동기에도 의존한다. 인간의 가치관은 일반적인 경험의 영향을 받으며, 도덕적 추론은 윤리적 기준과 타인의 의식에 의해 역동적으로 형성된다. 인간과 달리 AI는 다른 사람에게 어떻게 인식되는지 또는 윤리적 규범을 준수하는지에 따라 행동을 조정하지 않는다. AI의 ‘의식’은 학습 데이터에 의해 정적으로 설정되며 미묘한 사회적 상호 작용에 영향을 받지 않기 때문이다. 연구자들이 AI에 윤리를 포함 하기 위해 애쓰지만, 이는 어려운 일이다. 대부분의 인간 운전자는 어린이를 부딪치지 않기 위해 노력하겠지만 자율주행차의 인공지능은 운전자를 보호해야 할 윤리도 있으므로 우선순위를 정하는 것이 어렵다.

미국 국방부에선 AI 시스템의 추천 결과를 행하는 주체를 사람으로 제한시키거나 AI 시스템이 주체적으로 결정을 하더라도 사람이 관리 감독하도록 권장하고 있다. 하지만 관계자들은 미국 국방부의 지침이 장기적인 관점에서 효력이 사라질 것이라고 점쳤다. 기업과 정부가 인공지능 시스템을 도입할수록 여러 시스템이 중첩된 의사 결정 구조는 불가피하다. 그러면 사람이 개입할 여지도 그만큼 줄어들기 때문에 효과적인 대책이 아니라는 설명이다.

현재와 미래를 아우르는 현명한 질문이 신뢰로 향하는 지름길

인간이 인공지능을 신뢰할 수 있는 날이 도래할까? 이 질문에 대한 정확한 답변을 내리기 어렵지만 현재의 수준에서 고민해선 안 된다는 점은 분명하다. 위에서 언급한 것처럼 알고리즘 발전 방향과 적용 범위의 확대 그리고 또 다른 특이점이 올 가능성 등을 종합하여 질문에 접근해야 한다. 즉 질문의 맥락이 동적이라는 사실부터 제대로 인지해야 올바른 해답을 찾을 수 있게 된다.

자동차나 사람이나 어린이를 발견한 시점부터 부딪히는 시점까지 몇 초의 시간밖에 없을 것이다. 사람이 개입하기엔 너무 짧은 시간이고 인공지능의 선택은 불안정하다. 현재의 자율주행차는 해당 차량에 대한 제어만 가능하므로 안타까운 사고를 피하기 어렵다. 하지만 자율주행차 간의 제어 혹은 어린이 주변 사물 간의 상호 제어가 가능해지면 운전자와 어린이의 생존율을 조금이라도 더 높일 수 있다. 물론 긍정적인 미래 상황뿐만 아니라 해킹에 취약한 자율주행 시스템에 대해서도 고민해야 할 테지만 질문의 시제가 현재에 머무른다면 신뢰의 장벽이 더 높게 쌓이는 현재를 살아갈 수밖에 없다.

How Can We Trust AI If We Don’t Know How It Works

Trust is built on social norms and basic predictability. AI is typically not designed with either

There are alien minds among us. Not the little green men of science fiction, but the alien minds that power the facial recognition in your smartphone, determine your creditworthiness and write poetry and computer code. These alien minds are artificial intelligence systems, the ghost in the machine that you encounter daily.

But AI systems have a significant limitation: Many of their inner workings are impenetrable, making them fundamentally unexplainable and unpredictable. Furthermore, constructing AI systems that behave in ways that people expect is a significant challenge.

If you fundamentally don’t understand something as unpredictable as AI, how can you trust it?

WHY AI IS UNPREDICTABLE
Trust is grounded in predictability. It depends on your ability to anticipate the behavior of others. If you trust someone and they don’t do what you expect, then your perception of their trustworthiness diminishes.

Many AI systems are built on deep learning neural networks, which in some ways emulate the human brain. These networks contain interconnected “neurons” with variables or “parameters” that affect the strength of connections between the neurons. As a naïve network is presented with training data, it “learns” how to classify the data by adjusting these parameters. In this way, the AI system learns to classify data it hasn’t seen before. It doesn’t memorize what each data point is, but instead predicts what a data point might be.

Many of the most powerful AI systems contain trillions of parameters. Because of this, the reasons AI systems make the decisions that they do are often opaque. This is the AI explainability problem – the impenetrable black box of AI decision-making.

Consider a variation of the “Trolley Problem.” Imagine that you are a passenger in a self-driving vehicle, controlled by an AI. A small child runs into the road, and the AI must now decide: run over the child or swerve and crash, potentially injuring its passengers. This choice would be difficult for a human to make, but a human has the benefit of being able to explain their decision. Their rationalization – shaped by ethical norms, the perceptions of others and expected behavior – supports trust.

In contrast, an AI can’t rationalize its decision-making. You can’t look under the hood of the self-driving vehicle at its trillions of parameters to explain why it made the decision that it did. AI fails the predictive requirement for trust.

EXPECTATIONS
Trust relies not only on predictability, but also on normative or ethical motivations. You typically expect people to act not only as you assume they will, but also as they should. Human values are influenced by common experience, and moral reasoning is a dynamic process, shaped by ethical standards and others’ perceptions.

Unlike humans, AI doesn’t adjust its behavior based on how it is perceived by others or by adhering to ethical norms. AI’s internal representation of the world is largely static, set by its training data. Its decision-making process is grounded in an unchanging model of the world, unfazed by the dynamic, nuanced social interactions constantly influencing human behavior. Researchers are working on programming AI to include ethics, but that’s proving challenging.

The self-driving car scenario illustrates this issue. How can you ensure that the car’s AI makes decisions that align with human expectations? For example, the car could decide that hitting the child is the optimal course of action, something most human drivers would instinctively avoid. This issue is the AI alignment problem, and it’s another source of uncertainty that erects barriers to trust.

CRITICAL SYSTEMS AND TRUSTING AI
One way to reduce uncertainty and boost trust is to ensure people are in on the decisions AI systems make. This is the approach taken by the U.S. Department of Defense, which requires that for all AI decision-making, a human must be either in the loop or on the loop. In the loop means the AI system makes a recommendation but a human is required to initiate an action. On the loop means that while an AI system can initiate an action on its own, a human monitor can interrupt or alter it.

While keeping humans involved is a great first step, I am not convinced that this will be sustainable long term. As companies and governments continue to adopt AI, the future will likely include nested AI systems, where rapid decision-making limits the opportunities for people to intervene. It is important to resolve the explainability and alignment issues before the critical point is reached where human intervention becomes impossible. At that point, there will be no option other than to trust AI.

Avoiding that threshold is especially important because AI is increasingly being integrated into critical systems, which include things such as electric grids, the internet and military systems. In critical systems, trust is paramount, and undesirable behavior could have deadly consequences. As AI integration becomes more complex, it becomes even more important to resolve issues that limit trustworthiness.

CAN PEOPLE EVER TRUST AI?
AI is alien – an intelligent system into which people have little insight. Humans are largely predictable to other humans because we share the same human experience, but this doesn’t extend to artificial intelligence, even though humans created it.

If trustworthiness has inherently predictable and normative elements, AI fundamentally lacks the qualities that would make it worthy of trust. More research in this area will hopefully shed light on this issue, ensuring that AI systems of the future are worthy of our trust.

이시호 선임연구원

[email protected] 세상은 다면적입니다. 내공이 쌓인다는 것은 다면성을 두루 볼 수 있다는 뜻이라고 생각하고, 하루하루 내공을 쌓고 있습니다. 쌓아놓은 내공을 여러분과 공유하겠습니다.