Cloud computing has significantly changed the IT landscape. Today it is possible for small companies or even single individuals to access virtually unlimited resources in large data centres for running computationally demanding tasks. This has triggered the rise of “big data” applications, which operate on large amounts of data. These include traditional batch-oriented applications, such as data mining, data indexing, log collection and analysis, and scientific applications, as well as real-time stream processing, web search and advertising.
To support big data applications, parallel processing systems, such as MapReduce, adopt a partition/aggregate model: a large input data set is distributed over many servers, and each server processes a share of the data. Locally generated intermediate results must then be aggregated to obtain the final result. An open challenge of the partition/aggregate model is that it results in high contention for network resources in data centres when a large amount of data traffic is exchanged between servers. Facebook reports that, for 26% of processing tasks, network transfers are responsible for more than 50% of the execution time. This is consistent with other studies, showing that the network is often the bottleneck in big data applications.
Improving the performance of such network-bound applications in data centres has attracted much interest from the research community. Various solutions focus on reducing bandwidth usage or increasing network bandwidth, both of which come with their disadvantages. In contrast, we argue that the problem can be solved more effectively by providing data centre tenants with efficient, easy and safe control of network operations. Instead of over-provisioning, we focus on optimising network traffic by exploiting application specific knowledge. We term this approach “network-as-a-service” (NaaS) because it allows tenants to customise the service that they receive from the network.
The NaaS model has the potential to revolutionise current cloud computing offerings by increasing the performance of tenants’ applications -through efficient in-network processing- while reducing development complexity. It aims to combine distributed computation and network communication in a single, coherent abstraction, providing a significant step towards the vision of “the data centre is the computer”.
Partners: Citrix Systems, NetApp, Netronome Systems, Xilinx Research Labs