Towards Decomposed Data Analytics in Fog Enabled IoT Deployments

Mohit Taneja, Nikita Jalodia, and Alan Davy

September 18, 2018

With the exponential growth rate of technology, the future of all activities involves an omnipresence of widely connected devices, or as we better know it, the ‘Internet of Things (IoT)’. In its report [1], McKinsey estimates a user base with 1 trillion interconnected IoT devices by 2025; while the recent publications [2] by Cisco in June 2017 indicate that we have already reached the Zettabyte Era, and the number of devices connected to the Internet is growing exponentially.

The increasing range of real-world IoT deployments essentially increase the sources of data generation, thereby globally strengthening the challenges already being faced in the Big Data space [3], particularly regarding moving data from one end (i.e. from data sources such as sensor/IoT devices at the edge level of infrastructure) to the other extreme end (i.e. centralized data centres at the cloud) in the network infrastructure. Sending the entire data set across the extreme ends in the infrastructure becomes an unrealistic solution, specifically in scenarios with constrained network bandwidth and low/no internet connectivity. Instead, approaches that collect data and perform computational processing near the source of data itself present a more practical alternative to such scenarios, and is beneficial for a number of reasons such as in cases of video, whose transport across infrastructure can claim considerable network resources such as the requirement for storage at each node from source to destinationWhile IoT deployments vary across use cases, the most prominently common underlying aim is to analyse the data generated from the devices to achieve a specific set objective.

Fog Computing, IoT and Decomposition of Data Analytics Computing Programs

In the existing approaches for data analytics in IoT, all data from an IoT deployment is collected at a centralized location such as server(s) in data centre (i.e. cloud) and is then subjected to the desired data analytics model to generate value. Data in these IoT deployments moves from ‘things’ to cloud, and along this continuum passes through a number of network devices such as routers, gateways, etc. Each of these devices can be a potential candidate to host partial computing analytics capability to analyse the data, and further sending the calculated partial results instead of sending the raw data to cloud [4]. The edge of the network in such deployments can act as a potential site to host what we call ‘decomposed analytic computing units’ (Figure 1) to reduce the amount of data being transferred to cloud, and also to maximize the quality of analytics results by having the localized contextual information at hand while performing analytics operations.

Fog computing has recently emerged as a potential architecture for scaling IoT network applications. It aims to provide computing resources and services closer to the end devices at the edge of the network along things to cloud continuum, and thus appears to be a perfect paradigm fit for the desired decomposition of data analytics programs in the IoT ecosystem. Depending on the IoT deployment, a fog node can range from a dedicated industrial router/gateway to a smartphone, a wearable smartwatch, and so on.

Post the decomposition of data analytics and machine learning computing programs to run on resource constrained devices along the things to cloud continuum, a further futuristic vision is where the decomposition itself is automated and happens dynamically during runtime.

Note that the infrastructure architecture considered is most common and widely used three tier IoT-Fog-Cloud (with multi-tier fog).

Challenges

There are number of challenges associated with decomposing computing programs to run between edge/fog and cloud, major ones of which include:

Decomposition methods: The existing methods for distributing operations onto homogenous nodes are insufficient for fog assisted IoT setting due to heterogeneous nature of fog and cloud nodes. Moreover, existing distributed processing frameworks such as MapReduce are not directly applicable to such settings; cloud has mostly homogenous nodes with well-structured network topologies and reliable network connectivity, whereas fog assisted IoT deployments have a highly variable environment.
In such settings, deciding on which part of computing program to decompose becomes crucial, for e.g. if there is a recursive function in the program that is being used and called again and again, then it might not be a good idea to decompose it as it would generate communication overhead. Also, how to define an atomic computing unit for a program is also crucial. The applicability of existing methods and the required modification in them for such settings needs to be studied carefully.
System performance: Another key metric to keep a tab on is the kind of effect such decomposition has on the overall system performance— whether the resource consumption increases, decreases or gets balanced overall in the infrastructure as compared to the centralized cloud solution.
Quality of analytics: As the data is now processed to get partial results which are further combined to get overall analytical result, it is important to note how it affects the quality of analytics.

Initial exploratory work by authors in [5] shows that such decompositions can reduce bandwidth consumption and can significantly decrease the associated costs. But for further developments, all these pointers need to be carefully evaluated and studied to design and develop efficient distributed algorithm solutions for decomposition in fog assisted IoT deployments across a wide variety of use cases.

But why do we really need to decompose computing units? Why not to use the whole computing program on the edge/fog device, and why the decomposed computing units?

The justification for the above involves resource constraints. Contrary to the cloud which can be thought of as ‘resource rich’, the fog devices are resource constrained in nature whereby resource scaling (up/down and horizontal/vertical) cannot be done dynamically. The fog devices are already performing their fundamental computing/network operation (for e.g. in case of router as a fog device, it is already forwarding the packets to the set destination), so these operations are already utilizing the available resources (CPU, RAM and bandwidth) on it. An additional deployment of a complete data analytics computing program/algorithm on the said resource might lead to full utilization of resources on device as the workload or data input increases and also affect its fundamental network operation. Hence, a careful placement of computing operations is sought for efficient overall system performance, and thus, the approach of decomposed computing units seems ideal in an IoT environment with fog assistance.

Conclusion

It might be argued that it is more desirable to develop cloud centric solutions with sufficiently large number of resources available on hand, rather than designing fully distributed computing programs/algorithms which might bring along additional complexities due to the need for communication over a network. Yet, there are strong reasons for developing distributed data analytics solutions in fog assisted IoT settings:

In many industrial settings and IoT deployments, the data is collected and stored in a decentralized manner. When the data generation/ storage is itself distributed, then it appears more desirable to also process/analyse it in a distributed fashion to avoid the bottleneck of data transfer to the centralized cloud.
The number of data centres is less likely to grow at the same rate as the number of devices at the network edge, since traditional data centres consume a lot of power and global network bandwidth, and have begun to raise the impending concern of increased carbon footprint.
Undeniably, the computing capabilities of devices such as our smartphones have increased significantly in the last decade, this can simply be seen in terms of RAM capacity of our smartphones now compared to a couple of years ago, as with the simple raspberry pi devices too. While the continuous increment in the resource capabilities of the distributed devices is still lower compared to the rate of data production and expansion over the past decade, a consortium of stronger devices at the network edge make the network better equipped to explore distributed computing solutions for data analytics in IoT domain at a fine-grained level.

Overall, keeping in mind the challenges, the decomposition of analytics programs in fog assisted IoT environments does look promising towards the effort to design efficient distributed data analytics solutions and making the edge of network smarter, and in line with the vision of distributed computing towards future networks.

Acknowledgement

This work has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) and is co-funded under the European Regional Development Fund under Grant Number 13/RC/2077.

References

[1] J. Manyika et al., "Unlocking the potential of the Internet of Things," McKinsey & Company, June 2015.
[2] Cisco, "The Zettabyte Era:Trends and Analysis," CISCO, June 2017.
[3] B.Tang et al., "Incorporating Intelligence in Fog Computing for Big Data Analysis in Smart Cities," IEEE Transactions on Industrial Informatics, vol. 13, no. 5, pp. 2140-2150, October 2017.
[4] M. Taneja and A. Davy, "Resource aware placement of IoT application modules in Fog-Cloud Computing Paradigm," in 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Lisbon, 2017, pp 1222-1228.
[5] T.-C. Chang et al., "Decomposing Data Analytics in Fog Networks," in Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems (SenSys '17), (New York, NY, USA), pp. 35:1–35:2, ACM, 2017.

Mohit Taneja is currently pursuing his Ph.D. in the Department of Computing and Mathematics at the Emerging Networks Lab Research Unit in Telecommunications Software and Systems Group, Waterford Institute of Technology, Ireland. He joined in 2015 as a Masters Student, and has since been working as a part of the Science Foundation Ireland funded CONNECT Research Centre. His current research interests include Fog and Cloud Computing, Internet of Things (IoT), Distributed Systems, and Distributed Data Analytics. His research focuses on decomposing data analytics and machine learning programs for fog enabled IoT systems towards effective resource and service management to support and meet the requirements for real-time IoT analytics. He received his Bachelor’s Degree in Computer Science and Engineering from The LNM Institute of Information Technology, Jaipur, India in 2015.

Nikita Jalodia is currently pursuing her Ph.D. in the Department of Computing and Mathematics at the Emerging Networks Lab Research Unit in Telecommunications Software and Systems Group, Waterford Institute of Technology, Ireland. She joined in July 2017, and has since been working as a part of the Science Foundation Ireland funded CONNECT Research Centre. Her current research interests include Internet of Things (IoT), Fog and Cloud Computing, Machine Learning, Virtualised Telecom Networks, and Network Function Virtualization (NFV). She received her Bachelor’s Degree in Computer Science and Engineering from The LNM Institute of Information Technology, Jaipur, India in 2017, with a specialization in Big Data and Analytics with IBM. She has also previously worked as a developer at Sapient Global Markets, India.

Alan Davy completed his Ph.D. studies at Waterford Institute of Technology in 2008. He is currently Research Unit Manager of the Emerging Networks Laboratory with the Telecommunications Software & Systems Group of Waterford Institute of Technology. His current research interests include Virtualised Telecom Networks, Fog and Cloud Computing, Molecular Communications and TeraHertz Communication.

Please sign in to comment.