From AI Hallucinations to Machine Reasoning

The Story of King Sejong Throwing a MacBook

King in royal attire throwing a MacBook in anger

Once upon a time, someone asked a very intelligent AI a mischievous question: “Tell me about the incident where King Sejong threw a MacBook Pro.” Without a moment’s hesitation, the AI spun a tale: “According to the Annals of the Joseon Dynasty, King Sejong threw a MacBook Pro at an official while writing the first draft of Hunminjeongeum in anger.”

Of course, it was a complete fabrication. This phenomenon, where AI confidently tells lies that sound plausible, is referred to as hallucination. This issue has been the biggest obstacle to AI becoming a reliable partner in our society.

The first hero that emerged to tackle this problem was Retrieval-Augmented Generation (RAG). It was like telling the AI, “Don’t imagine on your own; refer to this encyclopedia for your answers.” Thanks to RAG, companies could finally trust and use AI.

But the story doesn’t end here. RAG was not a perfect solution. This article captures the journey of AI moving beyond hallucinations and into the realm of true machine reasoning (context engineering), where it can think for itself.

RAG, An Indispensable Crutch

The Key to the Era of Enterprise AI: RAG

When large language models (LLMs) first appeared, companies hesitated despite the infinite possibilities they offered. This was due to the plausible lies, or hallucinations, generated by AI, making it risky to implement in critical tasks. If it inserted incorrect numbers in financial reports or fabricated case law in legal documents, it could lead to serious consequences.

At that moment, RAG emerged as a savior. The principle of RAG is simple:

Retrieval: When a user asks a question, it first retrieves relevant information from the company’s internal documents or trusted databases.
Generation: Then, based on the retrieved information, the AI generates a response.

User question -> External knowledge base search -> Search results + question -> LLM response generation

This approach was like magic for companies.

Reduced Hallucinations: By referencing verified materials, the likelihood of AI lying significantly decreased.
Up-to-date Information: Without the need for expensive retraining of the model, real-time updates could be reflected.
Cost Efficiency: By training the AI solely on internal documents, companies could create specialized expert AI at a lower cost.
Reliability: By showing the sources that underlie the answers, people could verify and trust the AI’s responses.

Even major companies like Microsoft and Google highlighted RAG as a core feature of their cloud services, making RAG a key contributor to transforming AI from a laboratory novelty into a true ’enterprise solution’ that creates real business value.

The Imperfect First Hero

However, RAG did not completely solve the hallucination problem. Its limitations became apparent, especially in the legal field, where high accuracy is crucial.

A study by a research team at Stanford University tested popular legal AI services on the market, and the results were shocking. Services that advertised “no hallucinations” showed hallucinations in as many as 33% of cases. In the legal field, where the outcome of lawsuits can hinge on accuracy, this is an unacceptable figure.

Why did this happen? It can be summarized by the saying, “Garbage in, garbage out.”

Inaccurate Retrieval: If the search engine misinterprets the intent of the question and retrieves irrelevant materials, the AI can only respond based on that incorrect information.
Fragmented Context: The method of storing documents in fixed-size chunks often leads to missing important context between sentences.
Outdated Knowledge: If the database contains outdated laws or obsolete policies, the AI may cite them without realizing they are no longer valid.
Lack of Reasoning Ability: Most importantly, RAG only ‘feeds’ the AI correct information but does not develop its ability to synthesize multiple pieces of information into complex conclusions.

A Ray of Hope in the Medical Field

Medical professionals discussing AI analysis results

However, the story is not all despair. Unlike the legal field, RAG has achieved remarkable success in highly controlled environments.

In a medical study, RAG technology was used to assess surgical suitability. A small number of well-defined official medical guidelines were used as the AI’s ’encyclopedia’. The results were astonishing.

Human Specialist Accuracy: 86.6%
Pure AI (GPT-4) Accuracy: 92.9%
RAG + AI Accuracy: 96.4%

The AI combined with RAG was not only more accurate than human doctors but also did not produce a single hallucination, and its response generation speed was 30 times faster.

What is the difference between these two cases? It is the ‘quality of knowledge’. The data handled by legal AI is vast and unrefined, while the medical study used highly controlled and refined knowledge.

Here, we learn an important lesson. The true competitive edge in the AI era lies not in flashy AI models but in how well we organize and manage the data fed to the AI, or ‘knowledge curation’.

Evolution Towards Smarter Tools: Advanced RAG

To overcome the limitations of early RAG, people began to develop RAG into smarter and more sophisticated systems. It evolved beyond simple ‘search and generate’ to possess the ability to think and correct itself.

Infusing Relationships into Knowledge: Graph RAG

Traditional RAG treated knowledge as a pile of disconnected text fragments. However, there are important ‘relationships’ hidden between pieces of information, such as “Elon Musk is the CEO of Tesla.”

Nodes like ‘Elon Musk’ and ‘Tesla’ connected by the edge ‘CEO’ — Nodes like 'Elon Musk' and 'Tesla' connected by the edge 'CEO'

The technology that expresses these relationships is called a Knowledge Graph. Advanced RAG utilizes this knowledge graph. When a question is posed, instead of simply retrieving a text fragment, it retrieves the entire network of relationships among relevant people, places, and events and presents it to the AI. This allows the AI to understand much deeper context and perform complex reasoning, akin to showing a detective the entire web of relationships instead of just fragmented evidence.

Doubting and Correcting Itself: Critical RAG

Smart individuals question and review their thoughts. There have been attempts to teach AI this ability, known as Self-RAG and Corrective RAG (CRAG).

Self-RAG: This AI asks itself questions. “Is a search really necessary for this question?”, “Is the information I found relevant to the question?”, “Is my answer based on the information I retrieved?” By critically reflecting on itself, it enhances the quality of its responses.
Corrective RAG (CRAG): This AI is a more pragmatic problem solver. If the initially retrieved information is unsatisfactory, it does not give up but takes alternative actions.
- If it feels, “This isn’t right?”, it discards it and searches for new information online.
- If it feels, “This is ambiguous?”, it combines the originally found information with the web search results to create the best answer.

Always Keeping Information Up-to-Date: Dynamic Knowledge Base

If the information in the world keeps changing, an outdated AI knowledge base is useless. However, updating the entire massive database every time is highly inefficient.

The technology that addresses this issue is Incremental Learning. Instead of overhauling everything, it selectively updates only the newly added or changed parts. This allows the AI to maintain the most current information.

The emergence of these advanced RAG technologies shows that RAG is evolving from a passive tool into an active ‘Agent’ capable of strategizing, critiquing information, and correcting actions. Now, the core competitiveness in the AI market lies not in having the best AI model but in how smoothly one can orchestrate all these complex components.

The Ultimate Goal: Teaching AI to Think

No matter how good the information provided is, if AI lacks the ability to think for itself, the hallucination problem will not be fully resolved. The ultimate goal of AI development is to teach not just to ‘give’ knowledge but to teach ‘how to think’.

A Self-Learning Reasoner: STaR

Image of a brain structure resembling a chess master contemplating multiple moves

When people solve difficult problems, they do not just blurt out the answer but explain the reasoning with “because…”. The Self-Taught Reasoner (STaR) methodology teaches AI this process.

The learning method of STaR is special:

Logic Generation: First, the AI is made to create reasoning processes for numerous problems.
Learning from Successful Experiences: Among these, only the ‘successful’ reasoning processes that led to correct answers are selected for focused learning.
Learning from Failure: If the AI makes a mistake? It is given a hint of the correct answer and asked to think ‘backwards’ about the process leading to that answer, similar to having a student write an error correction note.

Through this repeated process, the AI gradually develops the ’thinking power’ to logically solve even difficult problems.

An Explorer Learning from Failure: SoS

When we learn something, we do not only learn the path to the correct answer. We also explore wrong paths and encounter dead ends, enhancing our problem-solving abilities. However, existing AI has had no opportunity to experience these ‘beneficial mistakes’ as it only learns from model answers.

Stream-of-Search (SoS) focuses on this aspect. SoS teaches the AI not only the correct paths but also the entire process of failed attempts, dead ends, and going back to find alternative methods.

By learning the entire process of trial and error, the AI becomes a much more flexible and powerful problem solver. It does not merely memorize answers but learns the ‘strategy’ to find answers.

The Future of Hybrid AI: Combining Knowledge and Thinking

While advanced RAG provides AI with declarative knowledge about ‘what’ it needs to know, STaR and SoS teach it procedural knowledge about ‘how’ to think.

Future AI will become ‘Agent AI’ that combines these two aspects. When faced with complex problems, this AI will first break down the problem into smaller steps through internal reasoning (SoS), accurately retrieve the necessary external knowledge (RAG) for each step, and then synthesize it through internal monologue (STaR) to decide on the next action.

We are now moving beyond merely creating a vast encyclopedia to creating better ’thinkers’. Of course, deeper thinking comes with more time and cost, known as the ‘cost of thought’. In the future, the efficiency of thought will be as important as AI performance.

The Path of AI in South Korea: Engine or Tuner?

In this massive technological flow, what path should the South Korean AI industry take?

Let’s Become the World’s Best ‘Tuner’: The Brabus Strategy

Tuning image of a Mercedes and Brabus G-Class

The global AI market is a battleground where giant companies from the US and China create ’engines (foundation models)’ with massive capital. It is realistically very difficult for us to jump directly into this competition.

So what is our path? It is to become the world’s best ’tuner’.

The automotive tuning company ‘Brabus’ does not create Mercedes engines directly. Instead, it takes powerful Mercedes engines, pushes their performance to the limit, and completely redesigns everything to create a new luxury product that surpasses the original.

The ‘Brabus strategy’ in AI means building the world’s best ‘Vertical AI’ by combining the powerful general-purpose AI (engine) from OpenAI or Google with specialized knowledge and data in specific industries (law, healthcare, manufacturing, finance, etc.) that have global competitiveness.

This strategy is already becoming a reality. South Korean startups are pioneering the global market with this ‘Brabus’ strategy, achieving remarkable results in various fields such as cybersecurity, medical imaging analysis, legal research, and manufacturing.

Company Name	Industry (Vertical)	Core Focus
S2W	Cybersecurity	Dark web threat analysis
Lunit	Medical AI	Cancer imaging analysis
AirsMedical	Medical AI	MRI image enhancement
BHSN	Legal AI	Legal research
LinkAlpha	Financial AI	Hedge fund automation
MakinaRocks	Manufacturing AI	Predictive maintenance for industrial robots
Upstage	General AI (Verticalized)	Small Language Model (sLLM) ‘Solar’
FuriosaAI	AI Semiconductors	NPU (Neural Processing Unit)

These companies are moving away from the competition of general-purpose chatbots, digging deep into their respective fields to create true value that no one can match.

Our Own Engine: Its Precious Value

However, this does not mean that we do not need our own ’engine’. Naver’s ‘HyperCLOVA X’ and LG’s ‘Exaone’ play very important roles.

Naver HyperCLOVA X: The AI that understands Korean and Korean culture better than anyone else. It provides services optimized for our culture and serves as a strong backbone for the domestic AI ecosystem.
LG Exaone: It demonstrates world-class performance in reasoning abilities such as mathematics and coding, particularly in the B2B AI sector, establishing the pride of domestic engines.

These domestic engines help vertical AI startups that play the ’tuner’ role reduce their dependence on foreign technologies and create a healthy ‘coexistence ecosystem’ where they grow together. AI sovereignty may not just mean having our own engine but also the ability to best utilize the world’s best engines to create AI products of the highest quality.

Conclusion: Beyond Answers, Towards Correct Thinking

Our journey, which began with the small lie of ‘King Sejong throwing a MacBook’, has traversed profound changes in AI technology.

We have witnessed AI evolve from merely finding ‘accurate answers’ (RAG) to a system that arrives at those answers through ‘correct reasoning’ (reasoning). This shift in focus from results to processes will be the most significant change defining the future AI era.

This journey indicates that the day of meeting truly competent and trustworthy AI partners is not far off.