In a previous post on Meshtastic data processing we explored how individual data packets are received from an MQTT broker and the initial handling of those messages. This article builds on that foundation, focusing specifically on data transformation—a critical step that ensures received data can be correctly interpreted and used within an application. To review the previous post, simply review:
As part of Lemuridae Labs' processing pipeline, several steps occur between receiving raw IoT messages and making them actionable. This post will break down those steps, highlighting challenges and key considerations in normalizing, decrypting, and decoding Meshtastic data for further processing
The goal of this article is to provide insight into:
By understanding these transformation processes, developers working with MQTT-based IoT networks, and Meshtastic in particular, can design more robust and scalable data handling mechanisms.
Data Transformation Overview
In data integration it is normal to receive data in one or more formats that do not directly align to the internal system’s requirements. This may be due to differences in field encoding, gaps in data that must be filled, or simply data structures that are optimized for transport and not processing.
Although data transformation is seen as a mechanical process, issues often arise when there are assumptions made in data fields, structures, units of measure, and ultimately, meaning. Data relationships often impact data streams, and understanding the “what” of data is critical.
When getting repeated records for the same base, are they considered updates or replacements? When an update isn’t heard for some time, does it imply a passive delete? Does receiving a particular attribute on a record imply another, or are there rules in how the data is structured beyond simple structural data formats?
Although it is generally straight forward to transform a data record into a new structure or format, as records are more complex, the specific data being produced from the transformation process must track the actual intent of the data, both from the sender and receiver. Before decoding the first byte of data, this understanding is critical to ensure the effort will be successful.
In IoT data streams often there is a diversity of vendors, manufacturers, and hardware devices. Even within the same vendor different devices often behave differently, in minor ways but enough to impact the integrity of the transformation process.
Receiving Data from Meshtastic
Receiving the MQTT data from Meshtastic is relatively straight forward, as with any other MQTT-based integration. A client connects and authenticates to the MQTT service, either on a standard or encrypted port, and subscribes to one or more topics.
For our process, we connect to the public Meshtastic MQTT service and subscribe to a range of topics, so as to receive messages from different areas and groups around the world.
The connection and topic subscription ensures that all published messages will be received with information about the published topic.
The general mechanics of MQTT are beyond the scope of this article, but additional information can be found at: https://mqtt.org/
With the subscription process, the Lemuridae Labs application will receive an encoded message, either text or binary, and will then need to transform the message accordingly.
The message decoding is addressed farther below in this article.
Note that although Meshtastic allows a user to create different channels to talk and share information on, these are all shared across the same basic topics in the MQTT broker. The topics are structured as a basic group scope, such as msh/US/MD for the state of Maryland in the US, and sub-topic path values identify the sending node and other characteristics.
Data Decryption
When messages are published to Meshtastic channels, they may be encrypted. Data being published will have a specific channel identifier, and the channel may be plain-text, encrypted in a common shared key, or encrypted via a private channel key. This may be summarized below:
Meshtastic uses an AES256 encryption algorithm running in CTR mode. This is a secure and efficient process to send and share information, and is well supported on devices and clients.
From a process perspective, when Lemuridae Labs receives a message it checks if the contents are plain text or encrypted, and if encrypted will attempt to decrypt with the common shared key. The results of the decryption process are checked to see if the decryption resulted in valid data. If the results are not a valid data structure, the message was transmitted via a private channel key and the message is discarded.
In the event that the message is either plain text or successfully decrypted via the common shared key, the message is prepared for the binary data decoding process.
Meshtastic Binary Decoding
With the Meshtastic data being published to MQTT servers, the data is (by default) encoded in a binary format called Protobuf. The Protobuf standard was created by Google to have a more optimized exchange of information between servers, and allows a data exchange to be built from a schema. The protobuf schema is precise, defining permitted data types and structures, allowing for data to be structurally validated at receive and decode time.
A source of information for protobuf can be found at:
For Meshtastic, the protobuf schema files are available as part of the open source effort, and can be found at:
https://github.com/meshtastic/protobufs
Although the specific protobuf files won’t be detailed on this post, a key understanding is that all data exchanged between devices is also routed via MQTT if configured, and thus the protobuf schema files act as a comprehensive interface standard for all Meshtastic messages.
Field Calculations and Conversions
Reviewing the Meshtastic protobuf files is worthwhile, and the comments and notes are informative. This helps to understand the encoding of data being received, and helps to ensure the transformation process is properly configured. For example, looking at a latitude protobuf field:
/*
* The new preferred location encoding, multiply by 1e-7 to get degrees
* in floating point
*/
optional sfixed32 latitude_i = 1;
With this we can see that the latitude_i field is optional, and has the sfixed32 data type. Going to the protobuf reference, this data type is defined as:
i32 := sfixed32 | fixed32 | float;
encoded as 4-byte little-endian;
memcpy of the equivalent C types (u?int32_t, float)
Building on the comment above the data field definition, it is understood that the latitude is a 4 byte little endian floating point value, and to transform the field into the original latitude, the value will be multiplied by 1e-7.
Other fields have a simpler encoding process, while others are enumerations with a list of possible values.
Application Normalized Output
Data normalization is a standard process when integrating data between applications, whether receiving data from one source or many. In this case, the Meshtastic data structures, although optimized for the “over the air” data exchanges, are not optimal for the subsequent processing and data storage within the Mesh Stats application.
For this last step, the received and decoded data is converted into internal data structures focused on the action or event received. Information such as position locations, node information, chat messages, and a range of other messages is converted into internal data structures that aid the ongoing processing steps.
To be clear, it is possible to have an application solely operate using an external system’s structure and data formats. However, often this is limiting as anything from database identifiers and timestamps to converted data fields require some level of structural change. Rather than basing a system on an external’s data interface, it is generally easiest and more efficient to determine the requirements for the own application’s processing and storage.
Summary
In this article we have decomposed the transformation block into several detailed steps, and walked through the process of consuming data from the Meshtastic MQTT broker and transforming it into usable data within our system. The next article will discuss the filtering performed on inbound messages, and later articles will discuss data enrichment and ultimately the processing performed. Finally, we will discuss the various visualizations and integration of this data into other AI-augmented processes.
We are always interested in feedback and questions, and are happy to provide additional details and insights in future articles as we continue to decompose additional Meshtastic data processing and analysis activities.