What You Need Is Not an AI Agent, But an AI-Friendly Workflow

The concept of AI agents is gaining popularity, and some people see them as a silver bullet for solving problems with AI. The hype around AI agents suggests that if you have one, many challenges can be easily tackled. However, there are also those who argue that AI agents are overhyped and lack real, viable applications.

A well-known example often cited is from Andrew Ng, who used multi-agent translation to improve quality. In this scenario, three agents are used: a direct translation agent, a review agent, and a paraphrasing agent, which can significantly boost translation quality. However, it's not necessarily about using three agents to improve translation quality. In fact, I've previously proposed a method that uses prompts to guide large language models (LLMs) through direct translation, reflection, and paraphrasing steps, resulting in high-quality output.

The key point here is that leveraging LLMs to solve problems often comes down to a method called "Chain of Thought" (CoT). CoT helps enhance the quality of outputs by breaking down complex reasoning into steps. The translation quality improves not because agents are involved but because of the CoT process. Whether you use separate agents for each step or organize them into a single process, the outcome is not fundamentally different.

For most AI applications, the core principle is similar: if you want to use AI effectively, it's not about having an agent; it's about designing an AI-friendly workflow.

How to Design an AI-Friendly Workflow

So, how do you design a workflow that works well with AI? I believe several key factors are worth considering:

1. Avoid Limiting AI Solutions to Human Methods

Sometimes, we anthropomorphize AI too much, applying human problem-solving approaches to AI systems. This can be effective at times, but it's often not the optimal way to leverage AI. For example, professional translators can achieve high-quality translations in one go without explicitly following steps like direct translation, reflection, and paraphrasing. Initially, many AI translation prompts followed this approach—just translating directly, which often produced awkward results. By realizing the value of breaking down the task using a CoT approach, we can design workflows better suited to AI.

I’ve also seen AI agent projects trying to mimic human software development roles: project managers, product managers, architects, developers, and testers. This approach, attempting to mirror human job divisions, is overly anthropomorphic and doesn’t necessarily fit well with how AI can work most effectively. Hence, such ideas usually remain within academic papers without practical impact. On the other hand, tools like GitHub Copilot, which assist in code generation, align well with how AI can improve software workflows and significantly boost productivity.

2. Don't Rely Solely on AI for Decisions—Use AI to Assist

One of last year’s trending projects was AutoGPT, which allows you to enter a task and then GPT-4 breaks it down, creates a plan, invokes external tools (like Google Search), and even executes code to complete the task. This was one of the pioneering AI agent projects. However, the hype around it has faded because current AI is not yet capable of making reliable decisions for open-ended tasks. This led to large token expenditures with minimal practical outcomes.

Nowadays, the mainstream approach to AI is using it as a "copilot"—it assists humans rather than making all decisions. Alternatively, you can design workflows where AI handles specific, well-defined tasks without having to make complex decisions. Take, for instance, a workflow for handling negative customer reviews:

The system scrapes review data.
AI analyzes the sentiment of each review and flags negative ones.
AI drafts responses for these negative reviews (which can be reviewed by humans).

This workflow is ideal for AI because it relies on AI for simpler tasks—sentiment analysis and response generation—without requiring complex, overarching decisions. As a result, it improves efficiency while maintaining reliability.

3. Combine AI Models and Tools Across Domains for Better Solutions

The recent surge of interest in AI is largely driven by the emergence of LLMs. These models are powerful and versatile, capable of basic reasoning, and easy to use—whether via chatbot interfaces or API calls. This accessibility has allowed non-specialists, like myself, to utilize AI without deep expertise in AI engineering. Previously, AI had a much higher barrier to entry, requiring data preprocessing, training, and parameter tuning.

However, this has led to a tendency to over-rely on LLMs and neglect the potential of combining them with other domain-specific AI tools. By integrating various AI models and tools into a well-designed workflow, you can create more efficient solutions.

4. Return to the Root Problem—AI Is Just a Tool

The mistakes mentioned above stem from an overemphasis on trendy concepts while neglecting the core problem at hand. AI is a tool, not an end in itself. Elon Musk's "first principles" thinking emphasizes breaking problems down to their most fundamental elements, analyzing them, and finding optimal solutions. Applying first principles involves three steps:

Clearly define the core problem you want to solve.
Break down the problem into its essential components.
Create a solution from the ground up.

This same thinking is useful for using AI effectively—designing a workflow that allows AI to do what it does best in service of your real goals.

Two Examples of AI-Optimized Workflows

1. Converting PDFs to Markdown

If you've ever translated a PDF, you know that preparing content for translation—structuring it into Markdown format—makes a huge difference in the quality of the output. However, extracting content from a PDF is challenging since it’s primarily designed for printing rather than structured data. Handling figures and tables further complicates the process.

I recently saw a project called PDFGPT, which approaches this problem in an elegant way. It uses GPT-4 and PyMuPDF as part of a workflow:

A PDF library (PyMuPDF) detects images, figures, and tables, extracting them as separate images.
Each page is converted into an image, with red boxes marking the figures and tables.
GPT-4, with its visual capabilities, interprets these marked images and generates corresponding Markdown content.

Using only an LLM would struggle with such a task due to context length limits and difficulties embedding visual content. By combining PyMuPDF with a suitable workflow, this project effectively converts PDFs to Markdown with great results.

2. Translating Comics

Comics are another interesting challenge—translating speech bubbles into another language requires extracting the text, translating it, and placing it back into the original layout. The key challenges are:

Speech bubbles are irregular and sometimes overlap, making text extraction difficult.
A straightforward translation of extracted text may not be coherent without understanding the comic's context.
After translation, the original text must be removed and replaced in the correct positions.

A project called comic-translate handles this problem quite well by designing a tailored workflow:

A specialized model detects and locates the speech bubbles.
OCR extracts the text from these bubbles.
Another model removes the original text from the bubbles.
GPT-4 translates the text based on visual content.
The translated text is re-inserted into the original bubbles.

If you overlook translation quality for a moment, this can be a fully automated workflow—highly efficient and low-cost, with GPT-4's API being the most expensive part at about $0.02 per page. Adding human oversight to refine translations further boosts the quality without sacrificing efficiency.

Conclusion

To make the most of AI, the key is to design workflows that fit AI's strengths. Whether it involves an AI agent, an LLM, or another model is secondary. The true focus should always be on solving the core problem effectively, using AI as a tool rather than an end in itself.