1.1. Assessment and continuous improvement of AI models and collaborative processes

No matter how powerful a platform is, it’s difficult to fully unleash AI’s potential if the “development environment” developers face daily isn’t organically integrated with AI. This is because, regardless of how advanced an intelligent platform becomes, its full potential cannot be realized unless developers can easily utilize its functions and collaborate seamlessly with AI within their daily work environment.

If we want to overcome the limitations of “vibe coding” and meet the requirements of complex software development, the importance of AI-based workflows becomes paramount. This signifies that AI must move beyond piecemeal code generation or specific functional assistance to become the core of an ‘innovative development environment’ that encompasses the entire Software Development Lifecycle (SDLC).

These workflows are essential for improving software quality and integrating AI into various development activities that occur outside the Integrated Development Environment (IDE). Kakao’s efforts to enhance the development environment through AI integration have already been thoroughly explored in this part.

In this chapter, we will discuss how Kakao evaluates AI’s impact across the SDLC stages within its development environments.

Assessment and continuous improvement of AI models and collaborative processes

The collaborative process between AI models and human-AI interaction is like a living organism. Once created, it doesn’t remain in optimal condition indefinitely; it must be constantly evaluated and improved in line with changes in the surrounding environment and new requirements.

Just like the long-standing adage in business administration, “What cannot be measured cannot be managed, and what cannot be managed cannot be improved,” Kakao continuously measures and evaluates the performance of AI models and the efficiency of collaborative processes, relentlessly pursuing a cycle of improvement based on these assessments.

AI model performance: how to evaluate and improve it?

Diversification of quantitative evaluation indicators

When evaluating the performance of AI models, we don’t solely rely on a single metric like accuracy. Depending on the type of model and the nature of the applied task, various quantitative indicators—such as accuracy, recall, F1 score, response latency, and throughput—are set and measured periodically.

For example, in the case of code generation AI, the viability of the generated code, whether it contains bugs, and compliance with coding conventions are used as additional evaluation indicators. For LLM-based text summarization models, automatic evaluation metrics like ROUGE and BLEU are referenced.

Recognizing the importance of qualitative assessment

The actual usefulness and user experience of AI models, which are difficult to grasp with quantitative indicators alone, are complemented by qualitative evaluation. Through satisfaction surveys, in-depth interviews, and A/B tests with actual users, we carefully assess how natural the results generated by the AI model are, how well they match the user’s intention, and how effectively they assist actual work.

Notably, an in-house group of domain experts periodically reviews the quality of AI results and provides feedback.

Continuous optimization of prompt engineering

The performance of generative AI models, such as LLMs, is highly dependent on the quality of prompts entered by users. Therefore, Kakao studies optimal prompt patterns for various business scenarios and continuously improves prompts based on feedback obtained during actual use.

Systematic management of model retraining and fine-tuning cycles

As data distribution changes over time or new types of requirements emerge, the performance of AI models can degrade. To prevent this, Kakao monitors model performance in real-time and manages a systematic cycle of retraining or fine-tuning the model with new data as soon as performance drops below a certain level or when significant changes are detected.

AI collaboration processes: how to diagnose and evolve them?

Just as important as the performance of the AI model itself is how AI and humans work together—that is, the efficiency of the collaboration process. Kakao evaluates and improves AI collaboration processes in the following ways:

User experience (UX)-centered tool evaluation

We periodically conduct usability tests and surveys on AI-based collaboration tools actually used by developers or business users (e.g., AI coding assistants, data analysis platforms, AI agent services) to evaluate their intuitiveness, convenience, and actual work contribution, deriving improvements.

Identifying and improving workflow bottlenecks

We identify delays or inefficiencies that occur in AI agent workflows or human-AI interactions, and improve them through process redesign or expansion of automation.

Diversification of feedback collection channels

In addition to formal evaluation processes, we provide various feedback collection channels so that users can easily suggest difficulties or improvement ideas during AI collaboration at any time (e.g., anonymous bulletin boards, regular meetings).