By Utpal Mangla (IBM), Luca Marchi (IBM), Shikhar Kwatra (AWS)
Smart Edge devices across the world are continuously interacting and exchanging information through machine-to-machine communication protocols. Multitude of similar sensors and endpoints are generating tons of data that needs to be analyzed in real-time used to train deep complex learning models. From Autonomous Vehicles running Computer vision algorithms to smart devices like Alexa running Natural language processing models, the era of deep learning has widely extended to plethora of applications.
Using Deep learning, one can enable such edge devices to aggregate and interpret unstructured pieces of information (audio, video, textual) and take ameliorative actions, although it comes at the expense of amplified power and performance measurements. Since Deep learning resources require huge computation time and resources to process streams of information generated by such edge devices and sensors, such information can scale very quickly. Hence, bandwidth and network requirements need to be tackled tactfully to ensure smooth scalability of such deep learning models.
If the Machine and deep learning models are not deployed directly at the edge devices, these smart devices will have to offload the intelligence to the cloud. In the absence of a good network connection and bandwidth, it can be a quite a costly affair.
However, since these Edge or IoT devices are supposed to run on low power, deep learning models need to be tweaked and packaged efficiently in order to smoothly run on such devices. Furthermore, many existing deep learning models and complex ML use cases leverage third party libraries, which are difficult to port to these low power devices.
The global Edge Computing in Manufacturing market size is projected to reach USD 12460 Million by 2028, from USD 1531.4 Million in 2021, at a CAGR of 34.5% during 2022-2028. 
Simultaneously, the global edge artificial intelligence chips market size was valued at USD 1.8 billion in 2019 and is expected to grow at a compound annual growth rate (CAGR) of 21.3% from 2020 to 2027. 
Recent statistical results and experiments have shown that offloading complex models to the edge devices has proven to effectively reduce the power consumption. However, pushing such deep learning models to the edge didn’t essentially meet the latency requirements. Hence, offloading to edge may still be use case dependent and may-not work for all the tasks involved at this point.
We have to now build deep learning model compilers capable of optimizing these models into executable codes with versioning on the target platform. Furthermore, it needs to be compatible or go in-line with existing Deep learning frameworks inclusive of but not limited to MXNet, Caffe, Tensorflow, PyTorch etc. Lightweight and low power messaging and communication protocol will also enable swarms of such devices to continuously collaborate with each other and solve multiple tasks via parallelization of tasks at an accelerated pace.
For instance, TensorFlow lite can be used for on-device inferencing. First, we pick an existing versioned model or develop a new model. Then, the TensorFlow model is compressed into a compressed flat buffer in the conversion stage. The compressed tflite file is loaded into an edge device during the deployment stage. In such manner, various audio classification, object detection and gesture recognition models have been successfully deployed on the edge devices.
Hence, if we migrate the deep learning models with specific libraries and frameworks (like tensorflow lite or ARM compute library for embedded systems) can be optimized to run on IoT devices. In order to scale it further with efficiency, first, we need to directly compile and optimize deep-learning models to executable codes on the target device and we need an extremely light-weight OS to enable multitasking and efficient communications between different tasks.