The secret to making AI smarter! How to use "various types of data"?
Hello and welcome to the world of AI and data! I'm John.
Recently, the word "AI" has been appearing a lot in the news and on the Internet. Some people may think it sounds difficult, but it's actually a very interesting technology that will make our lives more convenient.
But did you know that in order for AI to work smartly, data is actually extremely important? Today, I'll explain in an easy-to-understand way even for beginners what kind of data AI eats, how it gets smarter, and the latest ideas for preparing that data so that AI can use it easily!
Are there different types of data? Data that AI is good at and data that it is not good at
First of all, even though we use the word "data" in general, there are actually many different types of data.
- Organized Data (Structured Data)Data that has fixed rows and columns, like Excel, and where you can see at a glance what information is in what place. For example, sales data for a store or a customer list.
- Mixed data (unstructured data)Data that does not have a fixed form, such as email text, social media posts, audio from a meeting, or photos and videos taken on a trip. In fact, most of the data in the world is this "mixed data."
Recent AI has become good at finding important information from this "mixed data." In particular, it has become "a technology that combines and understands multiple types of data, including text, images, and voice."Multimodal AI" is attracting attention. For example, AI that can understand product photos and their descriptions together and suggest recommended products!
A major crisis in the age of AI? Data is fragmented and difficult to use!
But there's one problem... Within companies and organizations, important data is stored in various places (sometimes called "data silos"), and much of this data is "mixed data" that is difficult for AI to use directly.
Data analysis experts work hard every day, like a treasure hunter, to find treasures (valuable information) that AI can use from this scattered data, and organize it into a form that AI can easily consume. This is actually a very time-consuming and laborious task.
A common mistake is thinking that preparing to use data is a one-time thing. For example, if you write all the product types into the program at the beginning, you have to rewrite it every time a new product appears, which is very troublesome. The ideal system would be more flexible, with AI automatically recognizing from the product description that "This is a new type of product!"
"Data Cooking" to Make AI Truly Smart
So how can AI use various data effectively? The important thing is,A flexible data flow (data pipeline) that is resistant to change"The goal is to create something that is
For example, let's consider the case of analyzing inquiry emails from customers.
- First, the AI roughly classifies the content of the email. For example, "This is a question about the product," or "This may be a complaint." At this stage, we use an AI that requires relatively light processing (this is calledNatural Language Processing: NLPIt is a technology that allows AI to understand the language we normally use.
- If you want to analyze in more detail, you can use a more powerful AI (such as ChatGPT).Large-scale language model: LLM) to determine the customer's feelings when they wrote the email (whether they were angry, happy, etc.).
- And to use the analysis results in actual business, a special database (Vector Database) may be required.
As you can see, the type of AI you use and the amount of computing power you need will vary depending on the type of data and the stage of analysis. The key to using AI is to skillfully combine these.
A sneak peek at the list of things you need for different AI jobs!
Depending on the type of work that AI does, the required "data storage location (storage)," "computing power (computing)," and "data path (network)" will change. This will be a bit technical, but let's look at some typical examples to see what the differences are.
Types of AI jobs | Where to put the data | Data Path | Computational Power | How to cope when work increases |
---|---|---|---|---|
Understand and classify words in real time (e.g. spam filters) | A special place to store the meaning of words, a place where they are easily accessible | Speed is important (low latency) | A brain that is good at AI processing (GPU) and a brain that can temporarily store a lot of information (CPU) | Increase the number of processes so that many can be processed at the same time. The more words there are, the more memory space there is. |
Analyzing large amounts of text data (Example: Analysis of customer feedback) | A place to store text, a special place to store the meaning of words, a place to organize related information | A wide road that can send a lot of data at once | Powerful brains to train the AI (GPUs and TPUs), and brains to prepare the data (CPUs). | The more data there is, the more storage space is required, and the more complex the AI becomes, the higher the calculation costs become. |
Analyze videos and images (Example: Automatic detection of defective products) | A place to store many large videos and images, and to temporarily store frequently used data for quick access | An extremely wide road, a system that allows smooth transmission and reception of video | Super-powerful brains (GPUs) for training AI, dedicated brains (GPUs) for making AI decisions | Video data can quickly fill up storage space. Processing it all at once can help balance computing power. |
Predict the future and detect unusual behavior (Examples: stock price prediction, unauthorized access detection) | Organize data by time, and efficiently manage old and new data | Predictable and stable roads, processed in batches at regular intervals | Mainly normal brain (CPU), the longer the prediction period, the more memory space is required | Processing by period is efficient; the longer the forecast period, the more calculations required. |
*This is just a typical example, and in reality many more factors are involved.
Looking at it this way, you can see that different types of AI require completely different preparations, right? That's why it's important to carefully consider the "tools" that are appropriate for each type of AI.
You can start using it today! 4 latest techniques for using AI data
If you're thinking, "Hmm, that seems like a lot of work..." don't worry! Lately, there have been many convenient methods that can help you with this data preparation.
- Let your AI assistant help you!: AI assistants are now available that act like a good secretary, telling you the structure of your data and automatically creating programs (such as SQL) for data analysis. Using these makes the initial data research and preparation much easier.
- Clean and process your data in one place!:It's a pain to go back and forth between different tools. It's recommended that you clean your data and convert it into an AI-friendly form (called feature engineering) in a central location (core data platform) where the data is stored as much as possible. This will reduce the hassle of moving data and improve efficiency.
- Automate the tedious preparations!:Let’s create a data preparation procedure once and run it repeatedly automatically, so we don’t have to do the same work every time and can focus on more important things like improving our AI models.
- Get more power only when you need it! Use serverless!:The "serverless" system allows you to use a lot of computing power only when you have a lot of data, and then restore it when you're done... This is a dream come true. This allows you to keep costs down while processing powerfully only when you need it. Another great thing about it is that you don't have to manage the servers yourself.
These techniques can actually be used for both "tidy data" and "mixed data." Recent data platforms are smart enough to handle "mixed data" such as images, audio, and text as if it were tidy data.
The future of AI data utilization will be even more amazing!
In the future, AI data utilization will become smarter and easier.
Our aim isAllowing different types of data to be handled in one central locationThis will eliminate the need to move data around and allow you to use AI more efficiently. For example, you can easily combine and analyze customer inquiry emails (mixed data) with the customer's past purchase history (organized data).
Furthermore, AI itself will evolve further and be able to quickly help with tasks such as summarizing and classifying data, transcribing audio, and other tasks during data analysis. For example, Google Cloud's BigQuery service has a function where, just by chanting the magic spell (command) "AI.GENERATE_TABLE()", AI can extract information from various types of data (multimodal data) and convert it into tabular data that is easy to analyze!
AI and the "mixed data" and "various types of data" that it consumes will greatly change the world of data analysis in the future.An "architecture" that allows tools and methods to be flexibly changed according to the situationAs AI becomes more deeply involved in our work and lives, this flexibility will become even more important.
A word from John
When you hear the word AI, it might sound like a "magic box." But behind the scenes, there's a lot of data and hard work to help AI understand it. I hope this article has given you a little insight into it. The more you learn about the world of AI and data, the deeper and more fascinating it becomes!
This article is based on the following original articles and is summarized from the author's perspective:
Building an analytics architecture for unstructured data and
Multimodal AI