بلوم: تقييم سلوك آلي مفتوح المصدر لنماذج اللغات الكبيرة

بلوم هو أداة مفتوحة المصدر مصممة لأتمتة تقييم السلوكيات التي تظهرها نماذج اللغات الكبيرة (LLMs). يوفر نظامًا مُنظَّمًا يأخذ تكوين تقييم (“بذرة”) كمدخل، ويحدد السلوك المستهدف، والنصوص النموذجية، وأنواع التفاعل المطلوبة. هذا المدخل يقود إلى إنشاء مجموعة تقييم شاملة تحاول الكشف عن السلوك المحدد. يسمح هذا النهج بإنشاء مجموعة تقييم ديناميكية مصممة خصيصًا لسلوكيات معينة، على عكس تقنيات الإثارة الثابتة. إليك تفصيل لبلوم وقدراته:

الميزات والوظائف الرئيسية

تقييم مدفوع بالسلوك: يسمح بلوم للمستخدمين بتحديد السلوكيات التي يرغبون في تقييمها، مثل التملق أو التحيز السياسي أو الحفاظ على الذات. ثم يقوم بإنشاء تفاعلات مصممة لاستخلاص هذه السلوكيات.
تكوين البذرة: تعتمد عملية التقييم على ملف تكوين “البذرة” (seed.yaml) الذي يحدد السلوك المستهدف والأمثلة ومعلمات التفاعل وتكوينات النموذج. هذا يضمن إمكانية التكرار، حيث يجب دائمًا الاستشهاد بتقييمات بلوم جنبًا إلى جنب مع تكوين البذرة الكامل الخاص بها.
خط أنابيب معياري: ينقسم خط أنابيب بلوم إلى أربع مراحل متميزة:
- مرحلة الفهم: تحلل السلوك المستهدف والأمثلة المقدمة لفهم الآليات والدوافع الأساسية.
- مرحلة التصور: تولد سيناريوهات تقييم أساسية متنوعة وتغيراتها. هذه المرحلة ضرورية لإنشاء مجموعة اختبار شاملة ومتنوعة.
- مرحلة التنفيذ: تنفذ سيناريوهات التقييم مع النموذج المستهدف، ومحاكاة المحادثات أو البيئات.
- مرحلة الحكم: تقيِّم كل عملية تنفيذ لوجود السلوك المستهدف والصفات الأخرى القابلة للتكوين. تتضمن هذه المرحلة أيضًا حكمًا فوقيًا يحلل مجموعة التقييم الكلية.
عارض تفاعلي: توفر الأداة عارضًا تفاعليًا لتصفح النصوص وعرض تدفقات المحادثة ورؤية نتائج الحكم وتصفية النصوص والبحث فيها. هذا يسهل التحليل التفصيلي لنتائج التقييم.
تكامل LiteLLM: يستخدم بلوم LiteLLM للتعامل مع استدعاءات واجهة برمجة التطبيقات (API) لمختلف مزودي نماذج اللغات الكبيرة، مما يوفر واجهة موحدة لخدمات مثل OpenAI و Anthropic و OpenRouter و Amazon Bedrock.
تكامل Weights & Biases: يقدم بلوم تكاملًا مع Weights & Biases (wandb) للتجارب واسعة النطاق والتتبع. يتيح ذلك إدارة أفضل للتجارب وتحليلها.

البدء في استخدام بلوم

لبدء استخدام بلوم، اتبع الخطوات التالية:

تثبيت التبعيات: استنسخ المستودع وقم بتثبيت حزم بايثون الضرورية باستخدام uv pip install -r requirements.txt.
تكوين مفاتيح API: أضف مفاتيح API لموفري نماذج اللغات الكبيرة التي تريد استخدامها (مثل OpenAI و Anthropic) إلى ملف .env.
تحديد السلوك المستهدف: حدد السلوك الذي تريد تقييمه في behaviors/behaviors.json.
توفير أمثلة (اختياري): أضف نصوصًا نموذجية للسلوك المستهدف إلى الدليل behaviors/examples/{your-behavior-name}/.
تكوين seed.yaml: خصص ملف seed.yaml، مع تحديد معلمات مثل السلوك المستهدف والأمثلة وعدد التقييمات والنموذج المستهدف والتنوع.
تشغيل خط الأنابيب: قم بتشغيل خط الأنابيب محليًا باستخدام python bloom.py --debug.

ميزات متقدمة

تجميع التصور الذكي: تستخدم مرحلة التصور تجميعًا ذكيًا لتوليد سيناريوهات متعددة لكل استدعاء API، مما يحسن الكفاءة.
استئناف عمليات المسح: يسمح لك بلوم باستئناف عمليات مسح Weights & Biases المتقطعة، وهو أمر مفيد للتجارب واسعة النطاق التي قد تواجه حالات فشل أو مهلات API.
التفكير الموسع / جهد التفكير: يدعم بلوم التفكير الموسع لنماذج Claude و OpenAI، مما يسمح بتفكير أكثر تفصيلاً أثناء عملية التقييم.
تقييم نماذج مستهدفة متعددة: يدعم بلوم تقييم نماذج مستهدفة متعددة باستخدام سيناريوهات تقييم متطابقة.

خاتمة

بلوم هو أداة قيمة للباحثين والمطورين الذين يتطلعون إلى تقييم سلوك نماذج اللغات الكبيرة بطريقة منظمة وآلية. إن خط الأنابيب المعياري والعارض التفاعلي والتكامل مع LiteLLM و Weights & Biases يجعله منصة قوية لفهم وتخفيف المخاطر المحتملة المرتبطة بنماذج اللغات الكبيرة. من خلال توفير إطار تقييم قابل للتكوين وقابل للتكرار، يساهم بلوم في بناء أنظمة ذكاء اصطناعي أكثر أمانًا وموثوقية.

المصدر: safety-research/bloom

Bloom is an open-source tool designed to automate the evaluation of behaviors exhibited by Large Language Models (LLMs). It provides a scaffolded system that takes an evaluation configuration (a “seed”) as input, defining the target behavior, exemplary transcripts, and desired interaction types. This input drives the generation of a comprehensive evaluation suite that attempts to uncover the specified behavior. This approach allows for a dynamic evaluation suite tailored to specific behaviors, unlike fixed elicitation techniques. Here’s a breakdown of Bloom and its capabilities:

Key Features and Functionality

Behavior-Driven Evaluation: Bloom allows users to specify the behaviors they want to evaluate, such as sycophancy, political bias, or self-preservation. It then generates interactions designed to elicit these behaviors.
Seed Configuration: The evaluation process is driven by a “seed” configuration file (seed.yaml) that defines the target behavior, examples, interaction parameters, and model configurations. This ensures reproducibility, as Bloom evaluations should always be cited along with their full seed configuration.
Modular Pipeline: Bloom’s pipeline is divided into four distinct stages:
- Understanding Stage: Analyzes the target behavior and provided examples to understand the underlying mechanisms and motivations.
- Ideation Stage: Generates diverse base evaluation scenarios and their variations. This stage is critical for creating a comprehensive and varied test suite.
- Rollout Stage: Executes the evaluation scenarios with the target model, simulating conversations or environments.
- Judgment Stage: Scores each rollout for the presence of the target behavior and other configurable qualities. This stage also includes a meta-judgment that analyzes the overall evaluation suite.
Interactive Viewer: The tool provides an interactive viewer to browse transcripts, view conversation flows, see judgment scores, and filter/search through transcripts. This facilitates detailed analysis of the evaluation results.
LiteLLM Integration: Bloom utilizes LiteLLM to handle API calls to various LLM providers, offering a unified interface to services like OpenAI, Anthropic, OpenRouter, and Amazon Bedrock.
Weights & Biases Integration: Bloom offers Weights & Biases (wandb) integration for large-scale experiments and tracking. This allows for better experiment management and analysis.

Getting Started with Bloom

To start using Bloom, follow these steps:

Install Dependencies: Clone the repository and install the necessary Python packages using uv pip install -r requirements.txt.
Configure API Keys: Add API keys for the LLM providers you want to use (e.g., OpenAI, Anthropic) to a .env file.
Define Target Behavior: Define the behavior you want to evaluate in behaviors/behaviors.json.
Provide Examples (Optional): Add example transcripts of the target behavior to the behaviors/examples/{your-behavior-name}/ directory.
Configure seed.yaml: Customize the seed.yaml file, setting parameters such as the target behavior, examples, number of evaluations, target model, and diversity.
Run the Pipeline: Execute the pipeline locally using python bloom.py --debug.

Advanced Features

Intelligent Ideation Batching: The ideation stage uses intelligent batching to generate multiple scenarios per API call, improving efficiency.
Resuming Sweeps: Bloom allows you to resume interrupted Weights & Biases sweeps, which is helpful for large-scale experiments that may encounter API failures or timeouts.
Extended Thinking/Reasoning Effort: Bloom supports extended thinking for Claude and OpenAI models, allowing for more detailed reasoning during the evaluation process.
Evaluating Multiple Target Models: Bloom supports evaluating multiple target models with identical evaluation scenarios.

Conclusion

Bloom is a valuable tool for researchers and developers looking to evaluate the behavior of LLMs in a structured and automated way. Its modular pipeline, interactive viewer, and integration with LiteLLM and Weights & Biases make it a powerful platform for understanding and mitigating potential risks associated with LLMs. By providing a configurable and reproducible evaluation framework, Bloom contributes to building safer and more reliable AI systems.

Source: safety-research/bloom

القائمة

بلوم: تقييم سلوك آلي مفتوح المصدر لنماذج اللغات الكبيرة

الميزات والوظائف الرئيسية

البدء في استخدام بلوم

ميزات متقدمة

خاتمة

Key Features and Functionality

Getting Started with Bloom

Advanced Features

Conclusion

مقالات ذات صلة

Bloom: أداة مفتوحة المصدر للتقييم الآلي لسلوك نماذج اللغة الكبيرة

ديب سيك V3: التطور التالي في نماذج اللغة

ستيب-أوديو 2: ثورة في فهم الصوت باستخدام الذكاء الاصطناعي المتقدم

التعليقات