The road to becoming an AI creator | Article introduction "There's a shortage of data engineers"? Accelerate AI development with Databricks Lakeflow Designer! Revolutionize data utilization with no code. #Lakeflow #NoCodeETL #DataEngineering
Video explanation
Will it dramatically change how AI uses data? What is Databricks Lakeflow Designer? A complete guide for beginners
Hello everyone! I'm John, a familiar face in AI technology commentary. Have you heard the phrase "data is the new oil" a lot recently? Indeed, data is an essential fuel for the growth of modern business. However, just as crude oil cannot be used as is, data must be "refined" or it will go to waste. This "refining" process,Data engineering (a series of tasks to collect, process, and make data usable) However, in reality, this is a major obstacle for many companies aiming to utilize AI.
"Isn't it difficult unless you're a specialized engineer?" "We don't have enough manpower..." I'm sure there are many people who have such concerns. But what if, as if by magic, AI could help you with this data preparation work, even if you don't have any programming knowledge? Today, we're going to introduce a new secret weapon announced by Databricks that may make such a dream come true: "Databricks Lakeflow DesignerWe will thoroughly explain "AI" in an easy-to-understand manner even for those who are new to AI!
What is Databricks Lakeflow Designer? ~A new way to use data~
First of all, what exactly is this "Databricks Lakeflow Designer"?A no-code data preparation tool" It was developed by Databricks, a world leader in data and AI. Sounds difficult? No, don't worry!
To put it in perspective, Lakeflow Designer is like a super-competent data organization assistant. All you have to do is give it some rough instructions, like "I want this kind of data...", and it will collect the data and polish it until it shines. This data preparation process is called in technical terms:ETL (Ethel) It is called.
- ExtractFirst, we collect information (data) that is scattered around, such as in-house databases, cloud storage, websites, etc. It's like sourcing ingredients from a field or a market.
- TransformNext, we organize the collected information so that it is easy to use, remove unnecessary information, and give it a shape. This is the most time-consuming part, and it is the process of "refining" the data. It's similar to washing food, peeling it, and cutting it into bite-sized pieces.
- LoadFinally, the neatly organized information is stored in a designated location (such as a data lakehouse or data warehouse) so that it can be used immediately by analysis tools and AI models. It is like arranging cooked food neatly on a plate.
The great thing about Lakeflow Designer is that it lets you view the ETL process flow, i.e.Data pipeline (a summary of the data path and processing steps) ,No-code (a method of developing systems and applications without writing programming code) It's amazing that you can create complex data pipelines just by dragging and dropping components on the screen, or in some cases by instructing the AI in natural language how you want to process the data.
Why do we need Lakeflow Designer now? - Data engineering challenges in the age of AI -
So why are tools like this attracting attention now? It's because of the "Data Engineering BottlenecksThis is deeply related to the major issue of "
AI, especially the recently popular generative AI, requires large amounts of high-quality data to learn intelligently and make accurate decisions. However, the data engineering work required to prepare this high-quality data is highly specialized and takes a lot of time and effort. Data engineers are in high demand at companies. When their workload becomes overloaded, they are unable to prepare data even if they want to proceed with new AI projects, and as a result, the speed of AI development slows down...this is a "bottleneck." In fact, in one survey, many people said that "the inability of data engineers to keep up with the work is an obstacle to advancing AI projects."
Of course, there have always been low-code and no-code data processing tools that non-experts can use, but according to Bilal Aslam, senior director of product management at Databricks, these tools often don’t provide the services that companies need.Governance (managing and controlling data quality and security) ,Scalability (ability to handle increasing amounts of data and processing) In other words, even if it was easy to create, there were many cases where doubts were raised as to whether it was really reliable and could be used to process important company data.
Lakeflow Designer was created to solve this dilemma. It aims to enable data analysts and business users who have never had specialized programming skills to build data pipelines that can be used safely and reliably in production environments. This will reduce the burden on data engineers and accelerate the use of AI across the enterprise.
What's great about Lakeflow Designer? Unique features
Lakeflow Designer has several compelling features that set it apart from previous tools:
- Intuitive and easy-to-understand no-code operation: Even those without programming experience can design data pipelines visually by dragging and dropping components on the screen. It's a truly "visually understandable" interface.
- Equipped with a smart AI assistant: Amazingly, generative AI can help with pipeline creation. Users can simply tell AI in natural language (the language we normally speak) "I want this kind of data" or "I want to process it like this," and AI can interpret that and suggest a pipeline design. This is revolutionary!
- Enterprise-grade reliability and scalability: It may look simple, but at its heartApache Spark: A powerful open source engine for fast processing of large amounts of data This powerful technology is used. Furthermore, the security and quality of the data isUnity Catalog (Unity Catalog: a unified data governance solution provided by Databricks) This allows you to build reliable and scalable data pipelines that can withstand the demands of your business, not just a toy tool.
- Make team collaboration easier: It is designed to make it easy for data analysts and data engineers to work together. Engineers can review and modify pipelines created by analysts, and conversely, analysts can reuse parts created by engineers.
- Transparency and control over the development processThe pipeline you created isGit (a distributed version control system for recording and tracking change history of program source code, etc.) It can be linked withDevOps (the practice and culture of closely working development and operations teams to rapidly and continuously deliver business value) It also supports the flow of the above. The history of who changed what and when (lineage), access control, audit trails, etc. are firmly managed, so you can use it with peace of mind.
How does Lakeflow Designer work? A look behind the scenes
You may be wondering, "How can AI help you with no coding?" Here, we will explain the technical aspects of Lakeflow Designer as simply as possible.
First, as mentioned above, the user operates through a graphical and intuitive interface. Here, the user selects the data source and specifies the type of processing they want to perform. At this point, the AI assistant understands the user's intent, suggests appropriate processing components, and assists with the configuration.
One of the major features of Lakeflow Designer is "Declarative Pipelines" This is an approach where you declare (define) what kind of data you ultimately want (What) rather than giving detailed instructions on how to process (How), and the system (Lakeflow) will think of the optimal way to achieve that. If you were to declare that you wanted delicious curry rice, an excellent AI chef would take care of everything from selecting the ingredients to the cooking procedure and heat setting. This declarative pipeline technology has also been donated by Databricks to the Apache Spark open source project, and it is expected to become widespread as an industry standard.
And the one that actually processes the data based on this "declaration" is theApache SparkSpark excels at distributed processing of huge amounts of data and can perform complex calculations at high speeds. In other words, behind the easy-to-use interface of Lakeflow Designer, this powerful engine is working at full speed to efficiently process large amounts of data.
In addition, governance aspects such as data quality, security, and access management are supported byUnity CatalogThis allows for centralized management of who can access what data and how it is used, making data usage safe and well-controlled across the entire company. It's reassuring, just like a strict librarian at a library carefully managing the lending and management of books.
Lakeflow Designer is actually part of a larger product called "Lakeflow", which also includes other modules such as:
- Lakeflow Connect: A function to easily import data from various data sources. A wide range of no-code data connectors are also available.
- Lakeflow Declarative Pipelines: Declarative pipeline creation and management capabilities integrated with Lakeflow Designer.
- Lakeflow Jobs: A function for scheduling and monitoring the data pipeline you have created.
These modules work together to enable smooth, end-to-end data collection, from importing data to converting and processing it, and preparing it for use.
Who makes it? What kind of company is Databricks?
Let's take a moment to talk about Databricks, the company that developed this groundbreaking tool. Databricks was founded by the original creators of Apache Spark and is one of the leading companies in the data and AI fields.
They advocate "Lakehouse Platform (a new data management architecture that combines the flexibility and large capacity of a data lake with the structure and reliability of a data warehouse)" has been adopted by many companies around the world and has revolutionized the way data is used and AI is developed. Databricks' mission is to simplify the complex world of data and enable everyone to extract value from data (democratize data). Lakeflow Designer is a product that truly embodies that mission.
The company's talented team, including Bilal Aslam, Senior Director of Product Management, is leading the development of these innovative products. Databricks is also highly regarded in terms of corporate reliability and technical capabilities.
What can you do with Lakeflow Designer? - Specific use cases and future possibilities -
So, what exactly can you do with Lakeflow Designer? And how will it change the way we use data and develop AI?
A revolution for data analysts and business users
The biggest benefit is that data analysts can now do the data preparation work themselves, which previously required data engineers to do.
- The data needed for analysis is now readily available,Improved speed of decision makingTo do.
- Because there is no waiting for data engineers to work,Significantly shortened lead times for analysis projectsWill be
- By touching the data yourself, you will gain a deeper understanding of the data.Better analytics and insightsmay result.
Specific use cases
Experts believe Lakeflow Designer is particularly effective in less complex but important use cases, such as:
- Profit margin tracking by region and product: Organize sales data and visualize profitability in near real time.
- Compliance Reporting: Automatically collect and process data required for reporting to regulatory authorities.
- Aggregation of Key Performance Indicators (KPIs): Aggregate data from various systems and calculate KPIs for dashboards.
- Data Retention Monitoring and ArchivingAutomatically identify and archive outdated data.
- Data preparation for customer segmentation: Grouping customer data according to specific conditions for marketing purposes (cohort analysis).
Of course, we also support custom development, so we hope to be able to meet even more complex needs in the future.
The "Unsung Heroes" of AI Development
When developing an AI model, especially a machine learning model,Feature engineering (extracting and processing characteristic information from the original data to make it easier for the AI model to learn)"The data preparation process is extremely important. Lakeflow Designer streamlines the creation of pipelines for feature engineering, helping AI developers focus on building models themselves. This will accelerate the AI development cycle and enable AI to be introduced into business more quickly.
A look into the future: Towards the "Canva of ETL"
Michael Ni, principal analyst at consulting firm Constellation Research, called Lakeflow Designer "aETL's Canva (a popular graphic design tool that allows anyone to easily create professional-looking designs)" This suggests that Lakeflow Designer has the potential to make data pipeline development, which previously required specialized knowledge, accessible and intuitive to anyone, just like Canva has done in the design world. The democratization of data engineering will continue to advance.
What makes it different from other tools? Comparison with competitors
There are many different tools in the world of data processing. What are the features of Lakeflow Designer compared to them? In particular, it is often compared to "Openflow" provided by Snowflake, another major data cloud company.
Differences in philosophy with Snowflake Openflow
According to analysts, Databricks' Lakeflow and Snowflake's Openflow have different approaches and philosophies towards data engineering.
- Databricks Lakeflow (including Designer): Data engineering functions are integrated into an open orchestration platform centered around Apache Spark.Flexibility and opennessIt can be said that this is a philosophy that emphasizes the importance of "integrating existing technologies such as Delta Live Tables, Databricks Workflows, and the technology of Arcion (which was acquired in 2023) into Lakeflow Connect. It is also characterized by its high level of functional maturity, as existing technologies such as Delta Live Tables, Databricks Workflows, and the technology of Arcion (which was acquired in XNUMX) have been evolved and integrated.
- Snowflake Openflow: Provides declarative workflow control that leverages deep native capabilities of the Snowflake platform.Integrated and simpleIt is considered a relatively new offering.
Michael Ni describes the difference as "one side prefers flexibility and openness, the other side prefers integration and simplicity."
Lakeflow Designer's unique strengths
Lakeflow Designer's strengths include:
- Databricks’ powerful ecosystem (Spark, Delta Lake, Unity Catalog, etc.)Close collaboration.
- The AI assistant doesn't just generate code.Understanding the context of the dataIt helps with pipeline building.
- We provide both a no-code environment for data analysts and a pro-code development environment for data engineers (described below).Facilitate collaboration among users of different skill levelsDesigned to.
Things you should know before installing: Precautions and considerations
Lakeflow Designer seems like a dream tool, but there are a few things to keep in mind when considering its use.
- Not a panaceaAs Matt Aslett, software research director at ISG, points out, highly complex data integration, conversion processing, special system integration, etc. will still require the advanced expertise of data engineers. Lakeflow Designer can be seen as a tool that makes collaboration between analysts and engineers smoother.
- Understanding the data itself is essential: No-code tools lower the technical hurdle, but it's still important to have a fundamental understanding of what data you're working with, what it means to your business, and what outcomes you want to achieve.
- Collaboration within the organizationWhen introducing a new tool, it is important to create rules within the organization, such as who will use it, how, and what the scope of responsibility will be. Lakeflow Designer is designed to promote collaboration, but it is more effective if there is organizational effort to utilize it.
- Awareness that it is a preview versionAt the time of writing, Lakeflow Designer is available as a preview version. This means that improvements and feature additions are still being made toward the official release. If you are considering full-scale company-wide deployment, we recommend that you thoroughly check the functionality, stability, support system, etc. of the official release version.
What do the experts think? ~Analysts' opinions~
What do industry experts think about this new tool? Here are some opinions.
Principal Analyst, Constellation ResearchMichael Nihas given Lakeflow Designer a very positive review.
"Lakeflow Designer addresses a critical data management problem: data engineering bottlenecks are killing AI momentum. … Lakeflow Designer opens the door wide by putting the power of no-code tools in the hands of analysts, while keeping it enterprise-safe."
He also praised its innovation and robustness, saying, "It's the Canva of ETL. It enables instant, visual, AI-assisted development of data pipelines. And under the hood, it runs machine-scale Spark SQL secured by Unity Catalog."
Meanwhile, ISG's Director of Software Research,Matt Aslettalso offers a more realistic perspective.
While the new tools are expected to reduce the burden on data engineering teams, the report noted that "for use cases with more complex integration or transformation requirements that require additional expertise, data analysts will still likely work collaboratively with data engineering teams."
Aslett also commented on Lakeflow's maturity, explaining that it is built on a foundation of existing technologies, saying, "The Connect feature was acquired along with Arcion in 2023. The Declarative Pipelines feature is an evolution of DLT (Delta Live Tables), and Jobs is an evolution of Databricks Workflows."
Taking all these opinions into consideration, it seems that while Lakeflow Designer has the potential to dramatically change the way data engineering is done, it is not an all-purpose tool and collaboration with existing experts remains important.
Latest Updates and Future Roadmap
Databricks Lakeflow Designer is an exciting new technology that was just announced at the recent Data + AI Summit.
- Lakeflow Designer is now available in PreviewCurrently, users can try out Lakeflow Designer as a preview, and it will be refined based on your feedback.
- Lakeflow as a whole moves to General Availability (GA)Lakeflow's core modules, including Lakeflow Designer, Lakeflow Connect, Lakeflow Declarative Pipelines, and Lakeflow Jobs, will be generally available in the coming days.
- ProCode IDE for data engineers also announced: Along with Lakeflow Designer, Databricks also announced a new integrated development environment (IDE) for data engineers, which enables experienced engineers to efficiently develop, debug, and manage more complex pipelines. Michael Ni analyzed the simultaneous launch of this no-code tool (Lakeflow Designer) and pro-code tool (new IDE), saying, "It's a strategic move to target both ends of pipeline maturity: low-code for moving quickly and a full IDE for extending and maintaining the pipeline."
These moves show Databricks’ strong commitment to meeting all data engineering needs.
Summary: The key to unlocking the future of data utilization
Now that we’ve taken a closer look at Databricks Lakeflow Designer, this new tool significantly lowers the barrier to entry for data preparation, a task traditionally reserved for experts, and puts data in the hands of more people, including data analysts and business users.
Eliminate data engineering bottlenecks, accelerate AI development, and foster a data-driven culture across the enterprise.Lakeflow Designer is a tool with great potential. It may be the key to promoting the democratization of data utilization and taking business to a new stage in the AI era.
Of course, this is a technology that has just been announced, so we need to keep an eye on how it evolves in the future. However, the concept and Databricks' technical capabilities seem to hold great promise. Be sure to keep an eye on it to see what kind of revolution this "Canva for ETL" will bring to the world of data!
よくある質問 (FAQ)
- Q1: What is ETL? It seems complicated with all the technical terms...
- A1: ETL is an abbreviation for "Extract, Transform, Load," and is a series of steps to collect data, clean it up so that it can be used, and store it. In terms of cooking, it's like taking ingredients from the field (extract), washing and cutting them (transformation), and serving them on a plate (storage). Lakeflow Designer helps you do this without programming.
- Q2: Does "no code" really mean you don't have to write programs?
- A2: Yes, you don't need to write a program for basic operations. You can create a data processing flow (pipeline) just by dragging and dropping parts on the screen and issuing simple instructions. It's like building something with Lego blocks.
- Q3: Who is Lakeflow Designer for? Do I have to be a data scientist to use it?
- A3: No, Lakeflow Designer is targeted at data analysts and business users who have had difficulty with programming in the past. Of course, data scientists and data engineers can also use it to rapidly prototype and streamline some of their work.
- Q4: With Lakeflow Designer, do we no longer need data engineers?
- A4: Not necessarily. Lakeflow Designer can automate and simplify many routine data preparation tasks, but highly complex data processing, system-wide design, advanced tuning, etc. still require the expertise of data engineers. Rather, data engineers can focus on more strategic tasks and collaborate more smoothly with analysts.
- Q5: What benefits does this bring to AI development?
- A5: To make AI models smarter, a large amount of "clean and easy-to-use data" is required. Lakeflow Designer allows you to quickly and efficiently prepare data for AI. As a result, you can speed up the AI model development cycle and deploy AI that creates business value more quickly.
Related links collection
- Databricks Official Blog: Lakeflow Designer Announcement
- Databricks Lakeflow product page
- InfoWorld article: Databricks targets AI bottlenecks with Lakeflow Designer
- Databricks Press Release: Announcing Lakeflow Designer
This article is intended to provide information about Databricks Lakeflow Designer and does not recommend the use of any specific product. If you are considering introducing it, please conduct sufficient research and consideration at your own risk.