AI Creator's Path News: Revolutionizing data integration! Build a metadata-driven ETL framework with Azure ADF to automate and streamline data processing. #AzureADF #DataIntegration #ETL
Video explanation
Will data organization become dramatically easier? The latest data utilization techniques using the magical blueprint "metadata"
Hello! I'm John, an AI technology commentary writer.
Where is data stored in your company? Excel files, in-house databases, cloud services like Salesforce... Data is scattered all over the place, and many people are worried that "I want to analyze it all at once, but the preparation is too much work!"
In the past, experts had to manually create a "data path" each time new data was connected, which was time-consuming and costly, and made it very difficult to respond to changes.
But what if there was a dream system that, once created, could automatically organize any data simply by changing a few simple instructions?
In fact, in the world of AI and cloud computing, smart ways of using data are attracting attention."Metadata Driven"This time, I will explain this new approach in a way that anyone can understand, using the analogy of cooking!
What is "metadata-driven"? A new way of using data
First, the basis of data utilization is a process called "ETL"."Extract"Then,"Transform"Then,"Load"In other words, it refers to the process of sorting raw data stored in various locations, preparing it so that it can be easily analyzed, and transporting it to a specific location.
In the conventional way, this ETL process was custom-made for each type of data. This is like preparing a dedicated kitchen and cooking utensils from scratch every time just for a specific ingredient (data). Every time a new ingredient is added, you have to build a new kitchen...it's a lot of work just to think about it.
That's where the"Metadata Driven"This is the approach.
In a word, "metadata""Data about data"That is"Instructions" and "Blueprints"It contains all the information about what data to send (where it comes from), how to process it (how), and where to send it (where to).
In this approach, we first create a single, flexible and high-performance "giant central kitchen." Then, when we want to handle new data (ingredients), instead of recreating the kitchen,Just rewrite the "recipe (metadata)"This way, you can keep trying new dishes!
In the case study that this article is based on, the role of this "central kitchen" is played by a cloud service called "Azure Data Factory (ADF)" provided by Microsoft.
The key to the system is the "blueprint" and "specialized workers"
So how exactly does this "metadata-driven" system work? Let's take a look at the big picture using the cooking analogy from earlier.
This mechanism consists of three main elements:
- Metadata (Recipes): It is a recipe that contains all the instructions such as which ingredients (data), which warehouse (database) to bring from, how to cook (transform), and on which plate (storage destination) to serve. This recipe is stored in a database (recipe book) called Azure SQL Database.
- Parent Pipeline (Head Chef): The head chef is the general director who reads the recipe and directs the entire process. The head chef does not cook himself. His job is to look at the recipe and give instructions to each specialized chef, such as "Let's ask Mr. A to do this task and Mr. B to do that task."
- Child Pipeline (Professional Chef): These are the "specialist chefs" who receive instructions from the head chef and perform the actual work. Each has their own specialty, such as a "vegetable cutting expert," a "meat grilling expert," and a "plating expert." For example, you could imagine a specialist chef who "transfers data from one database to another" and a specialist chef who "reads data from a file."
The great thing about this system is that when we need new data integration,Just update the "recipe (metadata)"The head chef and the specialist chefs can just follow the new recipes as they are given to them, so there is no need to remodel the kitchen. This is the secret to achieving flexible and efficient data integration.
Other important roles that support the system
There are several other important mechanisms in place to ensure the smooth running of this "central kitchen."
Security (treasury)
Sensitive information such as database passwords cannot be written directly into recipes. Therefore, we use a digital safe with a lock called "Azure Key Vault." The recipe only contains the location of the "safe key," and the actual password is safely stored inside the safe, so security is perfect.
Flexible execution (adjusted business hours)
This kitchen can be operated at different times.
- "Scheduled execution" to run at a fixed time every day
- "On-demand execution" - run it when you need it with the push of a button
- "Event Execution" automatically starts cooking when new ingredients arrive
This way you can be flexible and adapt to your business needs.
Monitoring and logging (quality control)
All cooking processes are recorded, and if a problem occurs (for example, if the cooking fails), an alert is sent immediately. This allows the system to operate stably and maintain high quality.
Easy operation screen (order system)
Furthermore, a system is provided that allows even non-experts to add or change "recipes (metadata)" from a simple web screen, allowing on-site staff to quickly add data integrations themselves, greatly improving overall efficiency.
Author's thoughts
After reading the original article, the thing I felt most was that "designing at the beginning makes the future easier." The metadata-driven system requires you to design the entire central kitchen first, so it may be a bit of a hassle. However, once you have a solid foundation, it will be surprisingly easy and quick to respond when the amount of data increases or the requirements change later.
In the future, data will continue to increase and become more diverse. In that environment, I feel that the idea of using a flexible and highly scalable "system" like this, rather than dealing with each piece manually, will definitely become mainstream in the world of data utilization.
This article is based on the following original articles and is summarized from the author's perspective:
Designing a metadata-driven ETL framework with Azure ADF: An
architectural perspective
