الهندسة في أنثروبيك: دليل عملي لبناء وكلاء نماذج لغوية كبيرة فعالة

شاركت أنثروبيك مؤخرًا رؤيتها حول بناء وكلاء نماذج لغوية كبيرة (LLM) فعالة، مستندةً إلى تجربتها في العمل مع العديد من الفرق عبر مختلف الصناعات. الخلاصة الرئيسية هي أن البساطة وقابلية التركيب تفوقان الأطر المعقدة عندما يتعلق الأمر بتنفيذ الوكلاء بنجاح. يحلل هذا المنشور مقاربة أنثروبيك للأنظمة الوكيلة، ويقدم نصائح عملية للمطورين الذين يتطلعون إلى الاستفادة من قوة نماذج LLM.

تعريف الوكلاء: سير العمل مقابل الوكلاء المستقلين

تميز أنثروبيك بين نوعين من الأنظمة الوكيلة: سير العمل والوكلاء. فهم هذا التمييز أمر بالغ الأهمية لاختيار النهج الصحيح لتطبيقك.

سير العمل: هي أنظمة يتم فيها تنسيق نماذج LLM والأدوات من خلال مسارات التعليمات البرمجية المحددة مسبقًا. توفر القدرة على التنبؤ والاتساق للمهام المحددة جيدًا.
الوكلاء: هي أنظمة تقوم فيها نماذج LLM بتوجيه عملياتها واستخدام أدواتها بشكل ديناميكي، مع الحفاظ على التحكم في كيفية إنجازها للمهام. توفر المرونة واتخاذ القرارات القائمة على النموذج على نطاق واسع.

متى تستخدم (ولا تستخدم) الوكلاء

يؤكد المنشور على أهمية البدء بأبسط حل ممكن. تقدم الأنظمة الوكيلة تعقيدًا وغالبًا ما تتبادل زمن الوصول والتكلفة لتحسين أداء المهام.

ضع في اعتبارك ما إذا كان التعقيد الإضافي ضروريًا حقًا. قد يكون تحسين استدعاءات نماذج LLM الفردية مع الاسترجاع والأمثلة الموجودة في السياق كافيًا للعديد من التطبيقات.
تعتبر سير العمل مناسبة للمهام المحددة جيدًا والتي تتطلب القدرة على التنبؤ.
الوكلاء هم الخيار الأفضل عندما تكون المرونة واتخاذ القرارات القائمة على النموذج مطلوبة.

الأطر: استخدمها بحذر

في حين أن الأطر مثل LangGraph وإطار Amazon Bedrock AI Agent و Rivet و Vellum يمكن أن تبسط التنفيذ الأولي، إلا أنها يمكن أن تدخل أيضًا تجريدًا وتعقيدًا غير ضروريين.

تبسط الأطر مهامًا مثل استدعاء نماذج LLM وتحديد الأدوات وتسلسل المكالمات.
يمكنهم إخفاء المطالبات والاستجابات الأساسية، مما يجعل تصحيح الأخطاء أكثر صعوبة.
ابدأ بواجهات برمجة تطبيقات LLM مباشرة لفهم التعليمات البرمجية الأساسية.
الافتراضات غير الصحيحة حول الأعمال الداخلية للإطار هي مصدر شائع للخطأ.

لبنات البناء وسير العمل والوكلاء: نهج تدريجي

تحدد أنثروبيك نهجًا تدريجيًا لبناء أنظمة وكيلة، بدءًا من لبنة البناء الأساسية وزيادة التعقيد تدريجيًا.

نماذج LLM المعززة: لبنة البناء الأساسية هي نموذج LLM محسّن بإضافات مثل الاسترجاع والأدوات والذاكرة. ركز على تصميم هذه الإمكانات خصيصًا لحالة الاستخدام الخاصة بك وتوفير واجهة موثقة جيدًا.
تسلسل المطالبات: قسّم المهمة إلى سلسلة من الخطوات، حيث تعالج كل مكالمة LLM ناتج المكالمة السابقة. استخدم عمليات التحقق البرمجية (“البوابات”) للتأكد من أن العملية تسير على الطريق الصحيح. مثالية للمهام التي يمكن تقسيمها بسهولة إلى مهام فرعية ثابتة.
التوجيه: صنّف الإدخال ووجهه إلى مهمة متابعة متخصصة. مفيد للمهام المعقدة ذات الفئات المتميزة التي يتم التعامل معها بشكل منفصل.
التوازي: قم بتشغيل نماذج LLM في وقت واحد على مهمة وتجميع مخرجاتها برمجيًا. يتجلى هذا في شكلين: التقسيم (تقسيم المهمة إلى مهام فرعية مستقلة) والتصويت (تشغيل نفس المهمة عدة مرات للحصول على مخرجات متنوعة).
المنسق-العاملون: يقوم نموذج LLM مركزي بتقسيم المهام ديناميكيًا وتفويضها إلى نماذج LLM للعاملين وتجميع نتائجها. مناسب تمامًا للمهام المعقدة التي لم يتم تحديد المهام الفرعية فيها مسبقًا.
المقيّم-المحسّن: يقوم أحد نماذج LLM بإنشاء استجابة بينما يقدم نموذج آخر تقييمًا وملاحظات في حلقة. فعال عندما توجد معايير تقييم واضحة ويوفر التحسين التكراري قيمة قابلة للقياس.
الوكلاء المستقلون: يخطط الوكلاء ويعملون بشكل مستقل، ويمكن أن يعودوا إلى الإنسان للحصول على مزيد من المعلومات أو الحكم. من الأهمية بمكان الحصول على “حقيقة واقعة” من البيئة في كل خطوة وتضمين شروط الإيقاف. مناسب للمشاكل المفتوحة التي لا يمكن التنبؤ بالخطوات المطلوبة.

تطبيقات عملية

يسلط المنشور الضوء على القيمة العملية لوكلاء الذكاء الاصطناعي في دعم العملاء والبرمجة، مع التأكيد على أن الوكلاء هم الأكثر قيمة للمهام التي تتطلب المحادثة والعمل، ولديها معايير نجاح واضحة، وتمكن حلقات التغذية الراجعة، وتدمج الرقابة البشرية الهادفة.

دعم العملاء: يجمع بين واجهات الدردشة مع تكامل الأدوات للوصول إلى بيانات العملاء وسجل الطلبات ومقالات قاعدة المعرفة، مما يتيح إجراءات برمجية مثل إصدار المبالغ المستردة.
وكلاء الترميز: الاستفادة من الاختبار الآلي للتحقق، مما يتيح التحسين التكراري للحلول بناءً على نتائج الاختبار. تظل المراجعة البشرية ضرورية للمواءمة الأوسع للنظام.

هندسة المطالبات لأدواتك

تم التأكيد على أهمية الأدوات والوثائق المحددة جيدًا. يعد الاستثمار في واجهة جيدة بين الوكيل والكمبيوتر (ACI) أمرًا بالغ الأهمية مثل الاستثمار في واجهات بين الإنسان والحاسوب (HCI).

امنح النموذج عددًا كافيًا من الرموز المميزة “للتفكير” قبل أن يكتب نفسه في الزاوية.
احتفظ بالتنسيق قريبًا مما رآه النموذج يحدث بشكل طبيعي في النص.
تجنب “النفقات العامة” للتنسيق مثل الاحتفاظ بعدد دقيق لصفوف التعليمات البرمجية أو إلغاء تسلسل السلاسل.
اختبر كيف يستخدم النموذج الأدوات ويكررها.
غيّر الحجج لمنع الأخطاء (Poka-yoke).

خاتمة

رسالة أنثروبيك الأساسية واضحة: إعطاء الأولوية للبساطة والشفافية وواجهة مصممة جيدًا بين الوكيل والكمبيوتر. ابدأ بمطالبات بسيطة، وقم بتحسينها، وأضف أنظمة وكيلة متعددة الخطوات فقط عندما تفشل الحلول الأبسط. باتباع هذه المبادئ، يمكنك بناء وكلاء أقوياء وموثوقين وموثوق بهم.

المصدر: Anthropic

In a recent blog post, Anthropic shared their insights on building effective Large Language Model (LLM) agents, drawing from their experience working with numerous teams across various industries. The key takeaway is that simplicity and composability trump complex frameworks when it comes to successful agent implementation. This post breaks down Anthropic’s approach to agentic systems, offering practical advice for developers looking to leverage the power of LLMs.

Defining Agents: Workflows vs. Autonomous Agents

Anthropic distinguishes between two types of agentic systems: workflows and agents. Understanding this distinction is crucial for choosing the right approach for your application.

Workflows: These are systems where LLMs and tools are orchestrated through predefined code paths. They offer predictability and consistency for well-defined tasks.
Agents: These are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. They provide flexibility and model-driven decision-making at scale.

When to Use (and Not Use) Agents

The post emphasizes the importance of starting with the simplest solution possible. Agentic systems introduce complexity and often trade latency and cost for improved task performance.

Consider whether the added complexity is truly necessary. Optimizing single LLM calls with retrieval and in-context examples might be sufficient for many applications.
Workflows are suitable for well-defined tasks requiring predictability.
Agents are the better option when flexibility and model-driven decision-making are needed.

Frameworks: Use with Caution

While frameworks like LangGraph, Amazon Bedrock’s AI Agent framework, Rivet, and Vellum can simplify initial implementation, they can also introduce unnecessary abstraction and complexity.

Frameworks simplify tasks like calling LLMs, defining tools, and chaining calls.
They can obscure the underlying prompts and responses, making debugging more difficult.
Start with LLM APIs directly to understand the underlying code.
Incorrect assumptions about the framework’s inner workings are a common source of error.

Building Blocks, Workflows, and Agents: A Progressive Approach

Anthropic outlines a progressive approach to building agentic systems, starting with the foundational building block and gradually increasing complexity.

Augmented LLM: The basic building block is an LLM enhanced with augmentations such as retrieval, tools, and memory. Focus on tailoring these capabilities to your use case and providing a well-documented interface.
Prompt Chaining: Decompose a task into a sequence of steps, where each LLM call processes the output of the previous one. Use programmatic checks (“gates”) to ensure the process stays on track. Ideal for tasks easily decomposed into fixed subtasks.
Routing: Classify an input and direct it to a specialized followup task. Useful for complex tasks with distinct categories handled separately.
Parallelization: Run LLMs simultaneously on a task and aggregate their outputs programmatically. This manifests in two variations: sectioning (breaking a task into independent subtasks) and voting (running the same task multiple times for diverse outputs).
Orchestrator-Workers: A central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results. Well-suited for complex tasks where subtasks aren’t predefined.
Evaluator-Optimizer: One LLM generates a response while another provides evaluation and feedback in a loop. Effective when clear evaluation criteria exist and iterative refinement provides measurable value.
Autonomous Agents: Agents plan and operate independently, potentially returning to the human for further information or judgement. Crucial to gain “ground truth” from the environment at each step and include stopping conditions. Suitable for open-ended problems where the required steps are unpredictable.

Practical Applications

The post highlights the practical value of AI agents in customer support and coding, emphasizing that agents are most valuable for tasks that require conversation and action, have clear success criteria, enable feedback loops, and integrate meaningful human oversight.

Customer Support: Combines chatbot interfaces with tool integration to access customer data, order history, and knowledge base articles, enabling programmatic actions like issuing refunds.
Coding Agents: Leverage automated testing for verification, enabling iterative solution refinement based on test results. Human review remains crucial for broader system alignment.

Prompt Engineering Your Tools

The importance of well-defined tools and documentation is emphasized. Investing in a good agent-computer interface (ACI) is as crucial as investing in human-computer interfaces (HCI).

Give the model enough tokens to “think” before writing itself into a corner.
Keep the format close to what the model has seen naturally occurring in text.
Avoid formatting “overhead” such as keeping accurate line counts or string escaping.
Test how the model uses the tools and iterate.
Change arguments to prevent mistakes (Poka-yoke).

Conclusion

Anthropic’s core message is clear: prioritize simplicity, transparency, and a well-crafted agent-computer interface. Start with simple prompts, optimize them, and add multi-step agentic systems only when simpler solutions fall short. By following these principles, you can build powerful, reliable, and trusted agents.

Source: Anthropic

القائمة