LTIMindtree Logo
logo_lnt_group_company
  • What we do
  • CAPABILITIES
    iRun
    • Application Management Services  
    • Cognitive Infrastructure Services
    • Cybersecurity
    iTransform
    • AI-led Engineering
    • Data and Analytics
    • Enterprise Applications
    • Interactive
    • Industry.NXT
    Business AI
    • BlueVerse
    PROPRIETARY OFFERINGS
    • GCC-as-a-Service
    • Unitrax
    • Voicing AI
  • Industries we serve
  • INDUSTRIES
    • Banking
    • Capital Markets
    • Communications, Media and Entertainment
    • Energy & Utilities
    • Healthcare
    • Hi-tech
    • Insurance
    • Life Sciences
    • Manufacturing
    • Retail and CPG
    • Travel, Transport and Hospitality
  • About us
  • ABOUT US
    • Company
    • Investors
    • Brand
    • Newsroom
    • Partners
    • Insights
    • Environment, Sustainability and Governance
    • Diversity, Equity and Inclusion
  • Careers
logo_lnt_group_company
Contact
  • What we do
    CAPABILITIES
    iRun
    • Application Management Services  
    • Cognitive Infrastructure Services
    • Cybersecurity
    iTransform
    • AI-led Engineering
    • Data and Analytics
    • Enterprise Applications
    • Interactive
    • Industry.NXT
    Business AI
    • BlueVerse
    PROPRIETARY OFFERINGS
    • GCC-as-a-Service
    • Unitrax
    • Voicing AI
  • Industries we serve
    INDUSTRIES
    • Banking
    • Capital Markets
    • Communications, Media and Entertainment
    • Energy & Utilities
    • Healthcare
    • Hi-tech
    • Insurance
    • Life Sciences
    • Manufacturing
    • Retail and CPG
    • Travel, Transport and Hospitality
  • About us
    ABOUT US
    • Company
    • Investors
    • Brand
    • Newsroom
    • Partners
    • Insights
    • Environment, Sustainability and Governance
    • Diversity, Equity and Inclusion
  • Careers
Contact
  1. LTIMindtree is now LTM | It’s time to Outcreate
  2. Insights
  3. Blogs

Data Intelligence Platform-The New Databricks Avatar is Set to Revolutionize Data Platforms

Jun 14, 2024

Saikat Dutta
Saikat Dutta
Senior Specialist Data Engineer, Data & Analytics Practice, LTM

When OpenAI launched ChatGPT, a sudden buzz was created around Generative AI and how AI will disrupt the way we interact with technology. This is the most significant disruption since the advent of the internet or cloud computing.

Data platforms are no exception. Soon after the Generative AI wave, many companies started to invest in AI-enabled code assistants, conversational bots for data discovery and integrating AI into pipeline design and development processes. However, they were still limited by many challenges, including skill gaps, data quality issues, governance, and understanding data semantics.

Let's explore how Databricks is becoming a Data Intelligence Platform

On November 15, 2023, Databricks proposed the direction in which data platforms should move to solve the challenges. They coined the future data platforms as Data Intelligence Platforms. This was quite an enhancement from their revolutionary Lakehouse architecture, which was proposed in 2018.

Fig.1 Data Intelligence Engine,Image Source: https://cms.databricks.com/sites/default/files/inline-images/blog-marketecture-1.png

The core ideas behind the Data Intelligence Platform, as shared by Databricks CEO Ali Ghodsi, are as follows:

  1. Natural language access to data
  2. Automatic reading of the semantic catalog and data discovery
  3. Automated management
  4. Enhanced security
  5. Support for enhanced AI workloads

Now, let's explore how all of these components work together under the hood.

1 Architecture

The Databricks Data Intelligent Platform is centered around the core Data Lake, where raw data is stored in an open format. The Delta Lake, built on top of it, provides the data lake with atomicity, consistency, isolation, and durability (ACID) properties. The Unity Catalog offers unified governance capabilities and automatic loading abilities.

Additionally, the Data Intelligence Platform uses the Data Intelligence Engine (DatabricksIQ) to optimize every aspect of this platform. This engine optimizes storage within the Data Lake and generates and reads metadata in the Unity catalog.

Figure 2: Data Intelligence Platform Architecture, Image source: Databricks (https://cms.databricks.com/sites/default/files/inline-images/blog-marketecture-1.png)

All the knowledge about the data and metadata is then utilized to drive intelligence in optimizing computation, ensuring data quality, generating text to structured query language (SQL), and code and training, deploying, and fine-tuning large language models (LLMs) and AI apps.

Let's look at some capabilities that empower the Databricks Data Intelligence Platform.

1.1 AI documentation using Unity Catalog

Databricks has introduced AI-generated documentation in its Unity Catalog, which will simplify the organization's documentation, data discovery, and metadata management.

Let's face it: who loves documentation? No one, right? With the new AI documentation, the Unity Catalog can generate plain English documentation for all the tables and columns.

They have also kept the humans in the loop. Users can review, edit, or accept auto-generated metadata. This ensures that correct descriptions are aligned with the specific use case and domain knowledge.

1.2 Semantic search

Semantic search enables users to search across the data landscape and provides the most appropriate data relevant to our search. Semantic search is empowered by all the English descriptions auto-generated by AI in the catalog.

Information discovery has always been a challenge in big organizations. A data engineer spends plenty of time explaining the meaning of specific data or simply finding the table that answers their questions. Now, users can search the data themselves.

1.3 Databricks Assistant

Databricks Assistant is a context-aware AI assistant capable of automatically generating SQL queries or Python codes. It can also explain existing code, format it, and address issues. Further, Databricks Assistant leverages Unity Catalog metadata to understand the data in tables and columns. It even understands the descriptions of popular data assets and provides personalized responses.

Databricks Assistant can generate charts from a previously defined Lakeview dataset. Users can determine what they need to learn from the chart, and the assistant will generate it. The Assistant can also be used to edit the charts.

N.B. This is still in preview, and users should always review visualizations generated by the Assistant to verify correctness.

1.4 Auto optimization

DatabricksIQ is Databricks's new Data Intelligence Engine, which is deeply integrated into all its products. It empowers the auto-optimization of different services within the Databricks Data Intelligence Platform. For example, it can automatically index columns and provide partitions for the data. This improves the Lakehouse's performance, resulting in lower total cost of ownership (TCO) and better performance.

1.5 Serverless compute Databricks, in its Data Engineering in the Age of AI conference, showed how serverless computing can run jobs from within workflows. Once a serverless computer is selected, a user can run the job as quickly or cheaply as possible.

Databricks will then handle all the administration and scaling for the serverless computer, optimizing it through its Data Intelligence Platform.

N.B. Please note that this has only been announced and is just starting to be previewed.

1.6 Run generative AI functions

Generative AI functions can be executed from within the Databricks platform. For example, the ai_generate_text method can be called from within a Databricks notebook using SQL. This function can call any ready-built LLM (like openAI) to generate auto descriptions, etc.

Conclusion and way forwards

Data Intelligence Platform in Databricks is an ongoing effort, and they have shared the roadmap for building Databricks as a DI platform. Many of the above services are still in preview, and new capabilities are added daily.

The deep knowledge of the data and metadata in the Unity Catalog is the main differentiator for Databricks's becoming the primary Data Intelligence Platform. This deep context allows Databricks to improve the queries and code according to the custom use case and specific business data.

Databricks integrates DatabricksIQ with Mosaic AI to enable businesses to create custom AI applications specific to their data. They are building to support end-to-end RAG (Retrieval Augmented Generation) systems, training custom models or pretraining existing models on the business and domain-specific data of the customers, serverless abstractions, and end-to-end MLOps.

With such a detailed roadmap, the Databricks Data Intelligence Platform certainly looks to be one of the forerunners in democratizing AI and data access.

References:

  1. Data Intelligence Platforms by Michael Armbrust, Adam Conway, Ali Ghodsi, Naveen Rao, Arsalan Tavakoli-Shiraji, Patrick Wendell, Reynold Xin and Matei Zaharia November 15, 2023, in Platform Blog: https://www.databricks.com/blog/what-is-a-data-intelligence-platform
  1. Data Engineering in the Age of AI: https://www.databricks.com/resources/demos/videos/data-engineering/databricks-data-intelligence-platform
  2. DatabricksIQ, April 19, 2024: https://docs.databricks.com/en/databricksiq/index.html

More Articles For You

It’s time to Outcreate

Outcreate Your Business

  • Industries
  • iRun
  • iTransform
  • Business AI

Outcreate with LTM

  • Brand
  • Company
  • Careers
  • Locations

Outcreate Together

  • Investors
  • Newsroom
  • Partners
LTIMindtree Logo

It’s time to Outcreate

  • Industries
  • iRun
  • iTransform
  • Business AI
  • Brand
  • Company
  • Careers
  • Locations
  • Investors
  • Newsroom
  • Partners
LTIMindtree Logo
Accessibility Modern Slavery Statement Privacy Statement Responsible Disclosure

Stay connected for latest updates on LTIMindtree