Design principles for unified edge device architecture

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

The act of designing, deploying, and maintaining an IoT solution involves the close examination of many criteria ranging from the business needs of the organization to the realities of supply chain influences over the lifecycle of components involved. Some factors to consider are obvious such as dashboards and telemetry data while others are more esoteric and not always top of mind when beginning the design of a solution. In this series of articles, we hope to cover a breadth of considerations as it pertains to the device(s) needed to provide data or affect changes on premises. Whereas the Well Architected Framework for IoT pays attention to the holistic solution ranging from device concerns to data ingestion to the cloud as well as cloud side architecture, these complimentary documents focus more directly on the device and its role in the solution. For this article's purposes, we refer to the term Unified Edge Device Architecture, to represent a methodology of designing, deploying and maintaining IoT devices as part of an IoT solution.

This series of articles will be structured as follows:

Design Principles (this document). This article's intent is to formulate a set of pillars key to a cohesive IoT Unified Edge Device Architecture. The intended audience for this initial article is both device makers as well as enterprises utilizing IoT solutions. The rest of this series of articles will best serve device makers by providing deeper context and implementation detail.
Device Management. This article will focus on the techniques and technologies to consider when managing the set of software running on the device. This includes firmware, operating systems, runtimes, applications, and container-based workloads.
Connectivity Management at Scale. This article will discuss connectivity topics from devices retrieving data from sensors on premises and connecting to the cloud.
Data Management at Scale. This article addresses the needs from a device perspective for managing data. This will range from AIL/ML models to data aggregation and encryption, and transfer protocols and methods.
Security Management at Scale. This article will cover security topics ranging from a hardware root of trust to provide a device identity through ongoing security monitoring.

Design Principles

Design Criteria

Designing an IoT Device to be effective in production and at scale requires a careful examination of many different facets ranging from supply chain to power consumption to communication standards as well as lifecycle. This paper is not intended to be a comprehensive list of concerns that if addressed will yield the perfect device for a project. Instead, this paper attempts to introduce some typical concerns device makers face and a methodology for making your own assessment as that what constitutes an appropriate design for your use case.

When formulating a set of criteria that needs to be satisfied for your device design, it is instructive to consider the personas of those who will be impacted by the device. Once you identify those personas, visit each of the design pillars and ask questions from the perspective of those personas. Each will surface topics for which you need to provide a reasonable resolution within your design specification.

One way of identifying these personas is to consider the lifecycle of an IoT device. We view this as a cycle consisting of 5 stages.

Plan. Group devices and control access according to your organization’s needs.
Provision. Securely authenticate devices, on-board for management and provision for service.
Configure. Provide updates, configuration, and applications to assign the purpose of each device.
Monitor. Monitor device inventory, health and security while providing proactive remediation of issues.
Retire. Replace or decommission devices after failure, upgrade cycle or service lifetime.

Much of the service life of a device will be in a virtuous cycle of configure/monitor within the overall lifecycle of an IoT device. This represents the ongoing steady state of the operation of the device (providing telemetry, receiving and applying updates, etc.).

As you can imagine, multiple personas start to become apparent when considering these various phases. For example, during the Plan phase, supply chain realities need to be considered relative to the hardware BOM (bill of materials), deployments need to be planned as it relates to how devices are to be grouped, amongst others. During the Provision phase, special consideration needs to be placed on establishing a device identity that can be trusted and how to associate that with the IoT cloud solution.

Operations. How will the device be installed and provisioned? What is required from the device to enable those goals? This can be as pedantic as requiring communication to the device to as pragmatic as understanding what is required from a security perspective to ensure that these controls are limited to those tasked with the job of operating the devices as scale (both on-site with the device as well as remotely over a fleet of the devices).
Workload developer. What agents will need to run on the device to satisfy its function? What does that imply from a hardware BOM (bill of materials) perspective? How will the application be deployed and serviced over time? For example, will this require the Azure IoT Edge runtime to run and manage a container-based workload?
Security expert. Every facet of this design can potentially introduce security concerns to address. Will the device have a hardware root of trust? What certificates are necessary to ensure a secure connection with the cloud? How will those certificates be renewed over time? How can those certificates be revoked? Is there a need for secure storage on the device? Which secure protocols best match the deployment environment coupled with the workload transmission rates required of this device?

These are just 3 examples of personas that you may wish to consider when formulating the criteria for your design. It would be beneficial to read the remainder of this paper with your set of personas in mind.

Device Management

Device management is a crucial process for maintaining devices working properly in a network, which includes device provisioning and authentication, operating system updates, device configuration, application updates, the ability to monitor and perform diagnostics on the IoT environment.

Device Management usually has many aspects, but it can be categorized into some main “areas”:

Device & Platform Management Itself. Maintain the device platform itself. This could include functionality or security updates. Keeping the OS up to date and secure is critical for IoT devices. Weaknesses and vulnerabilities in deployed devices will be exploited by nefarious identities.
Application management (or Process management). This is to ensure that the correct applications and versions are running on the system and provide a method to deliver new or updated applications to a fleet of devices. These are normally running on top of the device platform layer.
Provisioning and authentication. This is to identify the device and create/sustain a relationship with the IoT solution. This includes the initial state and re-provisioning as appropriate if the device needs to be re-deployed elsewhere in the field.
Configuration and control. This is intentionally broad and is defined from the scope of the required function of the device. This could include updates to the OS settings, application settings, security settings, etc.
Monitoring and diagnostics. This area focuses on the health of the device both from a foundational perspective (does the device provide a heartbeat, are all relevant processes/services running, are peripherals or sensors still viable and providing data, etc.) and a functional perspective (ongoing telemetry gathered by the application).

These main “areas,” separate the app deployment from the infrastructure management. It allows continuous app deployment and management to be on a different release cadence from the platform management cadence. This is sometimes referred to as the data plane and control plane.

It can also be seen in certain Industry sectors where Operational Technology (OT) is separated from Information/Infrastructure Technology (IT).

Platforms for IoT

One of the key components of an IoT system is the device, and with them comes a weighty decision when choosing the operating system or run-time that we will use to run the application and that is going to be capable of integrating with a management tool. Microsoft offers different OSes across the Azure IoT portfolio from Windows for IoT to Azure RTOS to Azure Sphere. At this point, you may be wondering which operating system best suits your needs. To facilitate the decision-making process,

What level of reliability and long-term support is needed? Each device will be used for a specific purpose, whether in products such as ATMs, vending machines, autonomous cars, or as part of complex manufacturing processes, so each product will have a certain lifecycle. It is important to know the support that the hardware and the operating system will have in the future. If it is a specialized system, it is also convenient to know what type of security certifications the operating system provides and how easy it is to integrate with user interfaces.
What are your performance requirements? The device from which you run the operating system must have enough memory space allocated to the directory and enough space to host the applications that will perform specific tasks. Some of the tasks require a higher processing capacity than others, others may require lower latency time and higher energy consumption.
Will this OS bring security to the device? As the number of IoT devices increases, networks become more vulnerable. A compromised device could result in stolen information, system hacking and malicious actions by the perpetrators. There are several layers of security that can help contain attacks, among them is having an operating system robust in security features, such as multi-layer defenses, renewable security systems, data encryption capacity, authentication based on certificates, among others.
Does this OS offer scalability? For some solutions it is easy to provide the estimated time when their use will be most in demand, so the use of an operating system that is designed to cover future demands is recommended.
What applications platforms/libraries, such as container environments or middleware, are viable on the O/S or runtime and are they well suited to the needs of the application to be written for the IoT device?

Device Management providers

Once the platform (hardware + OS) on which the IoT solution will run has been determined, we can choose a platform to manage the solution that is compatible with it and fits our requirements. Assess what aspects of the DM provider are important to your organization, including hosting and pricing. Consider compatibility with your devices, the ability to deploy devices at scale, authentication mechanisms the provider supports, integration with the cloud, integration with data analysis and visualization services, data transfer protocols supported, physical location of the devices, frequency, and type of updates.

The tools that exist for managing devices are varied and will depend on the requirements of each IoT solution and the technical capabilities of the devices.

IoT Hub is a PaaS (platform as a service) that allows organizations to on-board and manage devices that run on a broad range of operating systems, including Windows IoT Enterprise, Linux, Sphere OS, Azure RTOS, Raspbian, among others. Some features include provisioning devices and connecting to the Azure Cloud, enable over-the-air updates, group, and nest devices, monitor health and security of the IoT network. Learn more about IoT Hub at IoT Concepts and IoT Hub | Microsoft Docs.

When the intention is to provision large numbers of devices just-in-time, without human intervention, automatically load-balancing devices across multiple hubs and regions, reprovision devices or massively rolling keys, you might want to use DPS (Device Provisioning Service), a service that helps IoT Hub provision devices in a scalable and secure way.

Connectivity Management at Scale

Two broad categories of connectivity will be considered: short range and wide area. Short range is often a local RF (Radio Frequency) source and examples include Bluetooth, Bluetooth Low Energy (BLE), 802.15.4 based protocols, proprietary sub-GHz, and LoRa, among others. There are techniques that increase the base RF range by creating networks, typically either a simple repeater scheme or a mesh network. Examples of mesh network-based systems include Zigbee, BLE Mesh, and LoRaWAN. Range can be extended at the cost of network latency. It takes time for the packets to propagate between repeaters regardless of protocol or topology.

Managing connectivity is much easier if the radios themselves are a viable choice for the task at hand. In radios, there are two general principles that are good to know. First, the faster the data rate the lower the radio sensitivity. Therefore, faster is not always better as it will have a shorter range if all other factors are equal. The second is that the lower the frequency the better the RF signal will propagate and the better it will be at going through objects. There are many other factors that influence RF choices including interference, low power considerations, and compliance issues just to name a few.

Despite repeaters or a mesh extending the basic range these networks are not global in the way that telecommunication networks or the internet is. How each network type manages connectivity will be different. Indeed, there are many different management schemes, often dependent on what protocol is being used. In the simplest case a single protocol is used for all devices on the network. More often an installation will want to use multiple protocols as real-world installs are usually a mix of battery powered devices, which are optimized to spend as much time as possible in a low power sleep mode, and mains powered devices. To deal with the complexity introduced by various protocols with different control mechanisms, Connectivity Management Platforms arose. Connectivity management platforms can be defined as a platform used by a service provider to provision devices or users on a wireless network and manage them. Several companies currently offer connectivity management platforms. Connectivity management at scale blurs into device management at scale. In a perfect world you would have a single pane of glass that shows both as devices and their connectivity are intertwined and having one without the other is not useful.

Some of the tasks that should be considered are provisioning devices onto the network, managing network connectivity issues (e.g., data consumption or cellular subscriptions), how diverse types of networks will be handled (e.g., separate tools for separate networks or try and get everything all under one tool), and geographic scope (e.g., local networks or larger regions or even worldwide). Note that as the exposure of a device increases the security needs will increase also. Networks that are local and not internet connected directly do not have the attack surface of internet-facing devices.

The above considerations are fairly high level but provide a good starting point for factors to consider. To dig deeper into the specifics of the exact problem being solved will begin to dictate design choices. In many cases contradictory design goals will force compromises between range, performance, power needs, and latency.

Data Management at Scale

Some would say that data management is the primary job of an IoT Device. IoT solutions use ingested data to direct all its reporting, alerting, and to drive the insights needed to trigger actions. Data is at the core of every IoT solution. From the device perspective, there are multiple facets to consider for its participation in the data estate. In many cases, the device in question will be gathering the data, from directly attached sensors or as the first IP (Internet Protocol) capable device receiving telemetry from non-IP capable sensors such as Bluetooth or serial.

The IoT device may need to act on the data as it receives it, it may also need to be able to analyze data gathered while offline. It is helpful to think of the needs of the data in 3 modes:

Hot path. Loosely speaking, this refers to data where the use case demands being able to recognize immediate action to be taken based on data in real time. This could be an extreme temperature, vibration reading, etc. The anomaly detection algorithm could be as simple as testing discrete data values against an acceptable range, it could be more complex and require richer analysis of the data on the device itself.
Cold path. This typically refers to analyzing greater amounts of data (weeks, months, years) as a batch to determine trends over time. This larger payload of data is typically handled on servers or cloud, for reporting or even used to train or retrain machine learning models.
Warm path. The warm path is typically described as handling shorter durations of data than cold-path where small analytics and batch processing are performed on the data. This is more likely to be device than cold path, but either/both could still be server or cloud hosted depending on use case.

The different paths described above may be satisfied by different techniques and technologies and must be weighed against the capabilities of the device in question (RAM, processing power, etc.).

However, merely acting upon data is only part of the considerations necessary when planning data management on a device. Equal consideration must be paid to the data's security, both in transit and at rest, privacy controls and storage. Data encryption becomes a vital topic. You should design with the assumption that data in transit can be intercepted and thus protocol selection needs to be intentional and well thought out. If data is to be stored locally (warm path or even cold path), the design needs to consider what techniques are required to satisfy your security needs. If the device itself can be stolen, for example, you would not want sensitive data stored as plaintext on some removable storage. This could mean you may want to design for encrypting data before persisting it or even leveraging secure storage options. There are many tradeoffs to be considered in your design. While encryption does not have to impose a huge compute load, it does not have a zero cost either. What design tradeoffs make sense to satisfy your use cases? Learn more about Azure Confidential Computing by visiting Confidential Computing on Azure.

The design of your IoT device needs to consider how to transfer data both “to” the device (from sensors, etc.) as well as “from” the device (to the cloud). What requirements are necessary to satisfy your use case? Azure IoT Hub only accepts secure methods of data transfer from devices which then implies compliance of one of those protocols on the device side. From a pattern perspective, your device may then well be performing as a translation gateway. More discussion can be found on translation patterns can be found here: Gateways for downstream devices - Azure IoT Edge. Note, IoT Edge is not required to fulfill the pattern per se but does provide a great framework for doing so.

Security Management at Scale

IoT Security breaches can have significant impact on organizations revenue, legal and cost impact among other factors. Devices can be rendered useless, attackers can take control of the devices and use them for malicious purposes, data could be polluted, confidential data could be stolen. Security of Edge Devices at scale, data protection, connectivity, communication, and lifecycle management of related security attributes when edge devices are out in the wild is always challenging and customers top priority.

Microsoft builds security for the enterprise and technology you have and is not just securing Microsoft technology. This is possible as we are constantly investing into our incredible network of solution integration and MDR/MSSP partners, working closely with external organizations like NIST (National Institute of Standards and Technology), CIS (Center for Internet Security), The Open Group, CERTS, ISACs (Information Sharing and Analysis Centers), Law Enforcement agencies (for botnet takedowns), and others to bring in the best-in-class security solutions that are secure from silicon up. Before diving into the solutions aspects of security management at scale, we recommend adopting a Zero Trust Framework mindset from the outset and gathering key requirements for your product or devices.

Luckily, there are plenty of resources available to device manufacturers and solution integrators to help define requirements for a safe and secure IoT solution. A great starting point, Essential Properties of Secure Connected Devices, can help identify foundational security requirements for your IoT device. Additionally, several best practices both from a product engineering perspective as well as a deployment and configuration perspective can be found in Windows 10 Secured-core PCs | Microsoft Learn.

Because security is a broad topic, it is helpful to consider multiple perspectives that can help inform your approach – such as whether your target use case is a new (greenfield) or existing (brownfield) deployment. If you are working with an existing deployment, what special considerations might you need to think about given the constraints introduced by those legacy systems? If they are already connected, how are they currently managed and how might that need to evolve or be adapted to address your new solution. Given the connectivity requirements of the IoT device (such as BLE, Zigbee/Matter, Wi-Fi, Ethernet, Cellular) - pay attention to practices for securing those exchanges. If your device may store data locally, such as PII (personally identifiable information) data, encryption of that data at rest becomes an important design criterion. If that data is transmitted off the IoT device, you would again want to ensure that it is encrypted to prevent plain text interception of sensitive information. A highly secure device begins with the “hardware root of trust.” Without a foundation upon which to build, any stack is subject to attack. What, if any, is a viable approach for your hardware root of trust for your device design (TPM: Trusted Platform Module, other)?

All of these are important considerations to manage, including how to control the security configuration state (firmware updates, policy updates, etc.) at scale. For your design, would this roll under a natural function of your Device Management design assets or do you need to consider approaches specifically designed around security? As with all the other topic areas raised in this article, costs must be considered. What tradeoffs are reasonable to make to achieve the business objective of the IoT device?

Conclusion

The act of designing a holistic IoT solution involves many topic areas – as considered in the Well Architected Framework for IoT (WAF for IoT). This article has focused on topics more germane to the IoT devices that comprise the solution. Once a set of criteria has been generated given all the considerations introduced in each section, device manufacturers can then use standard DevOps processes and Azure device SDKs to create a compelling IoT device that will integrate easily into an IoT solution. IoT solution integrators can use these devices with confidence knowing that their IoT device partners have designed their products with this holistic vision driving their design criteria.

This article acts as the introduction to a series of articles that takes a more thorough look at each section mentioned above. The intent is to highlight a methodology of considering multiple aspects relevant to IoT device design and not specifically as a complete checklist. It is our belief that an ecosystem of partners that share an understanding of how to design and integrate IoT devices into an IoT solution will accelerate adoption of both the devices and the solutions that use them.