SensorLM: سد الفجوة بين بيانات الاستشعار القابلة للارتداء وفهم اللغة الطبيعية

تعمل الأجهزة القابلة للارتداء على إنشاء كميات هائلة من البيانات، ولكن فهم السياق وراء الأرقام الأولية يمثل تحديًا. تقدم Google Research SensorLM، وهي عائلة من نماذج أساس اللغة الاستشعارية المدربة على 60 مليون ساعة غير مسبوقة من بيانات الاستشعار القابلة للارتداء من أكثر من 103000 فرد. تهدف SensorLM إلى سد الفجوة بين بيانات المستشعر الأولية واللغة التي يمكن للبشر فهمها، مما يفتح الأبواب أمام رؤى صحية وشخصية. تلخص هذه المدونة الجوانب الرئيسية لأبحاث SensorLM، بما في ذلك منهجية التدريب والقدرات والتطبيقات المحتملة.

التحدي: فهم بيانات الاستشعار

تقوم الأجهزة القابلة للارتداء مثل الساعات الذكية وأجهزة تتبع اللياقة البدنية بجمع البيانات باستمرار حول معدل ضربات القلب والخطوات وأنماط النوم ومستويات النشاط. تحمل هذه البيانات إمكانات هائلة للصحة والعافية الشخصية. ومع ذلك، غالبًا ما تفتقر البيانات الأولية إلى السياق. معرفة أن معدل ضربات قلبك هو 150 نبضة في الدقيقة أقل فائدة دون معرفة ما إذا كان ذلك بسبب “ركض حاد صعودًا” أو “حدث تحدث أمام الجمهور مرهق”. يعد تدوين البيانات يدويًا بشكل كافٍ لتدريب النماذج على فهم هذا السياق مكلفًا للغاية. تعالج SensorLM هذا التحدي من خلال تعلم الروابط المعقدة بين إشارات المستشعر واللغة البشرية مباشرةً من مجموعة بيانات ضخمة.

SensorLM: تعلم “التحدث” ببيانات الاستشعار

SensorLM هي عائلة من نماذج أساس اللغة الاستشعارية المدربة مسبقًا على ما يقرب من 60 مليون ساعة من بيانات الاستشعار متعددة الوسائط من أكثر من 103000 فرد. يسمح هذا لـ SensorLM بما يلي:

تفسير وإنشاء أوصاف دقيقة وقابلة للقراءة من قبل الإنسان من بيانات قابلة للارتداء عالية الأبعاد.
ترجمة بيانات الاستشعار المعقدة إلى أوصاف لغة طبيعية ذات مغزى عبر الأبعاد الإحصائية والهيكلية والدلالية.
تحديد أحدث التقنيات في فهم بيانات الاستشعار.

تدريب SensorLM: خط أنابيب هرمي جديد

للتغلب على عنق الزجاجة في التعليقات التوضيحية، طور الباحثون خط أنابيب هرميًا جديدًا يقوم تلقائيًا بإنشاء تسميات نصية وصفية. تشمل الجوانب الرئيسية ما يلي:

إنشاء مجموعة البيانات: استخدام بيانات تم إلغاء تحديدها من 103643 شخصًا في 127 دولة وافقوا على استخدامها للبحث. تم جمع البيانات من أجهزة Fitbit أو Pixel Watch خلال شهري مارس ومايو 2024. تتكون مجموعة البيانات من ما يقرب من 2.5 مليون يوم عمل شخص.
إنشاء تسميات توضيحية تلقائيًا: يقوم خط الأنابيب الهرمي بحساب الإحصائيات وتحديد الاتجاهات ووصف الأحداث مباشرةً من بيانات المستشعر. يؤدي هذا إلى أتمتة إنشاء التسميات التوضيحية، مما يجعل مجموعة بيانات ذات حجم غير مسبوق ممكنة.

بنية SensorLM: الجمع بين التعلم التبايني والتوليدي

توحد بنية SensorLM استراتيجيات التدريب المسبق متعددة الوسائط البارزة:

التعلم التبايني: يتعلم النموذج مطابقة أجزاء بيانات المستشعر مع الأوصاف النصية المقابلة، مما يمكنه من التمييز بين الأنشطة والحالات المختلفة.
- مثال: التمييز بين “السباحة الخفيفة” و “تمرين القوة”.
التدريب المسبق التوليدي: يتعلم النموذج إنشاء تسميات نصية مباشرة من بيانات المستشعر، مما يسمح له بإنتاج أوصاف غنية واعية بالسياق.

القدرات الرئيسية وسلوكيات التحجيم

تُظهر SensorLM العديد من القدرات الرئيسية، بما في ذلك:

فهم المستشعر بدون رصاصة: تصنيف النشاط بدقة من 20 نشاطًا دون أي ضبط دقيق.
التعلم ببضعة طلقات: التعلم بسرعة من عدد محدود من الأمثلة.
محاذاة واسترجاع المستشعر والنص: تمكين الفهم متعدد الوسائط بين بيانات المستشعر والأوصاف اللغوية. الاستعلام عن الأوصاف باستخدام مدخلات المستشعر، أو العثور على أنماط مستشعر محددة باستخدام اللغة الطبيعية.
إنشاء تسميات توضيحية للمستشعر: إنشاء تسميات توضيحية متماسكة وصحيحة واقعيًا مباشرة من بيانات المستشعر، متجاوزة نماذج اللغة العامة.
سلوك التحجيم: يتحسن الأداء باستمرار مع المزيد من البيانات وأحجام النماذج الأكبر وزيادة الحوسبة.

خاتمة

يمثل SensorLM خطوة مهمة نحو إطلاق العنان لفهم بيانات المستشعر القابلة للارتداء من خلال اللغة الطبيعية. من خلال سد الفجوة بين البيانات الأولية واللغة التي يمكن فهمها من قبل الإنسان، تمهد SensorLM الطريق لرؤى صحية شخصية، ومدربين صحيين رقميين، وأدوات مراقبة سريرية، وتطبيقات للعافية الشخصية. تتضمن الخطط المستقبلية توسيع نطاق بيانات التدريب المسبق إلى مجالات جديدة مثل صحة التمثيل الغذائي وتحليل النوم.

المصدر: Google Research

Wearable devices are generating massive amounts of data, but understanding the context behind the raw numbers has been a challenge. Google Research introduces SensorLM, a family of sensor-language foundation models trained on an unprecedented 60 million hours of wearable sensor data from over 103,000 individuals. SensorLM aims to bridge the gap between raw sensor data and human-understandable language, opening doors to personalized health and wellness insights. This blog post summarizes the key aspects of the SensorLM research, including its training methodology, capabilities, and potential applications.

The Challenge: Making Sense of Sensor Data

Wearable devices like smartwatches and fitness trackers continuously collect data on our heart rate, steps, sleep patterns, and activity levels. This data holds immense potential for personalized health and wellness. However, the raw data often lacks context. Knowing that your heart rate is 150 bpm is less useful without knowing if it’s from “a brisk uphill run” or “a stressful public speaking event.” Manually annotating enough data to train models to understand this context is prohibitively expensive. SensorLM addresses this challenge by learning the intricate connections between sensor signals and human language directly from a massive dataset.

SensorLM: Learning to “Speak” Sensor Data

SensorLM is a family of sensor-language foundation models pre-trained on nearly 60 million hours of multimodal sensor data from over 103,000 individuals. This allows SensorLM to:

Interpret and generate nuanced, human-readable descriptions from high-dimensional wearable data.
Translate complex sensor data into meaningful natural language descriptions across statistical, structural, and semantic dimensions.
Set a new state-of-the-art in sensor data understanding.

Training SensorLM: A Novel Hierarchical Pipeline

To overcome the annotation bottleneck, the researchers developed a novel hierarchical pipeline that automatically generates descriptive text captions. Key aspects include:

Dataset Creation: Using de-identified data from 103,643 people across 127 countries who consented to its use for research. Data was collected from Fitbit or Pixel Watch devices during March and May 2024. The dataset consists of nearly 2.5 million person-days of data.
Automatic Caption Generation: The hierarchical pipeline calculates statistics, identifies trends, and describes events directly from the sensor data. This automates the creation of captions, making a dataset of unprecedented size possible.

SensorLM Architecture: Combining Contrastive and Generative Learning

The SensorLM architecture unifies prominent multimodal pre-training strategies:

Contrastive Learning: The model learns to match sensor data segments with corresponding text descriptions, enabling it to discriminate between different activities and states.
- Example: distinguishing a “light swim” from a “strength workout.”
Generative Pre-training: The model learns to generate text captions directly from sensor data, allowing it to produce rich, context-aware descriptions.

Key Capabilities and Scaling Behaviors

SensorLM demonstrates several key capabilities, including:

Zero-Shot Sensor Understanding: Accurately classifying activity from 20 activities without any fine-tuning.
Few-Shot Learning: Quickly learning from a limited number of examples.
Sensor-Text Alignment and Retrieval: Enabling cross-modal understanding between sensor data and language descriptions. Querying descriptions using sensor input, or finding specific sensor patterns using natural language.
Sensor Caption Generation: Generating coherent and factually correct captions directly from sensor data, surpassing generic language models.
Scaling Behavior: Performance consistently improves with more data, larger model sizes, and increased computation.

Conclusion

SensorLM represents a significant step towards unlocking the understanding of wearable sensor data through natural language. By bridging the gap between raw data and human-understandable language, SensorLM paves the way for personalized health insights, digital health coaches, clinical monitoring tools, and personal wellness applications. Future plans include scaling pre-training data into new domains such as metabolic health and sleep analysis.

Source: Google Research

القائمة