“Data is oil” analogy is not valid anymore. Since the dot-com boom in 2001, the internet has led to an influx of data generation in the form of tweets, likes, comments, and IoT sensors.
Unlike oil, which is a finite commodity, water is infinite. Hence, the ‘Data is Water’ analogy is a much better fit for understanding data engineering processes.
Let’s dive into the Data Engineering Sea to understand the process from data generation to its final destination i.e. use in homes, factories, and for irrigation processes.
- Generation : The source of water is rain, rivers, or underground reservoirs, similar to how users generate social media posts, clickstreams, and orders.
- Storage: This incoming water is stored in a large reservoir in enterprises at every step of the process via Ingestion pipelines. In Data Engineering, Storage is referred to as Database/ Data Warehouse/ Data Lake (We will talk about these jargons in detail in later blogs).
- Ingestion: Engineers (or Plumbers, as we like to call ourselves) create pipelines to pull the water from different sources into storage systems.
- Transformation: Water stored in the storage reservoirs is not fit for the final destination until it’s cleaned, filtered, chlorinated, and waste is removed. This process does all that. Tools commonly used include Spark, SQL
- Serving: Cleaned water then proudly moves towards the end goals via different pipes. This step is often called the Load step (as in ETL), which loads the cleaned data to its destinations.
In conclusion, the world of Data Engineering can be as vast and deep as the ocean, but hopefully, our water analogy has made it a bit more navigable. Just as water undergoes various processes before it reaches our taps, data too must be generated, ingested, stored, transformed, and finally served for various uses. As we continue to explore this Data Engineering Sea, we’ll encounter more terms and concepts like Data Swamps, Data Lakehouse, Data Streams, and more. But remember, no matter how complex it may seem, it all flows back to the basic processes we’ve discussed today. So, let’s dive in together and make a splash in this exciting field of Data Engineering!
References
“Fundamentals of Data Engineering” Book by Joe Reis and Matt Housley