847-505-9933 | +91 20 66446300 info@datametica.com

Flume

Because of its cost effective way of storing data and analyzing it, Hadoop has become the backbone of every Enterprise Big Data System. But to store data, we must first bring the data from the source to Hadoop. The framework that helps solve this issue is Apache Flume.

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

The use of Apache Flume is not restricted to only log data aggregation. Since data sources are customizable, Flume can also be used to transport massive quantities of event data. This includes, but is not limited to, network traffic data, social-media-generated data, and email messages.

The following diagram depicts the different components of Flume and what role each of these components performs.

image

Because of this very simple but powerful architecture, Flume is capable of moving anything that can be read as a byte array from any source system to any sink system. If we can open an input stream to the data generated by the source, then that source can become a flume source. If we can open an output stream to this system, then the system can become a flume sink. Flume provides many source and sink implementations, however, users can also easily implement their own custom sources or sinks if it is more convenient.

Flume provides an ability to create multi hop flows. By this we mean there are multiple flume agents forming a chain together to transfer data from the Source System to the Destination System. This ability allows us to create complex flows where data from multiple Flume Agents is consolidated by intermediate hop of Flume Agents and then stored into Destination System.

image

*Note that in multi hop flow, the intermediate Source and Sink must use Avro/Thrift

 

Abhijeet Shingate (Big Data Architect)

 

http://flume.apache.org/FlumeUserGuide.html

Leave a Comment

POST COMMENT Back to Top
*
Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.