Azure Databricks is an analytics platform powered by Apache Spark. Spark is a unified analytics engine capable of working with virtually every major database, data caching service, and data warehouse provider. In addition to it working with most providers, companies use Spark because it uses in-memory computing among other optimizations to offer very fast analytics. Azure Databricks enables companies to integrate their data analytics solutions into their existing Azure infrastructure. In this lab, you'll load data into Azure Data Lake Store Gen2 and use Databricks to interact with that data through a Databricks workspace and cluster that you'll configure.
Learning Objectives
Upon completion of this lab you will be able to:
- Load data into Azure Data Lake Storage Gen2
- Create and manage a Databricks workspace
- Create and manage a Databricks cluster
- Mount data into a Databricks workspace from Azure Data Lake Store
- Interact with data using Databricks
Intended Audience
This lab is intended for:
- Azure administrators
- Cloud engineers and solutions architects
- Data engineers
- Anyone with a need to visualize and analyze data in Azure
Prerequisites
You should be familiar with:
- Basic familiarity with the Azure Portal is helpful, but not required
- The videos on using Azure Databricks to interact with ADLS data are helpful
Updates
March 1st, 2024 - Migrated to Azure Data Lake Storage Gen2
December 5th, 2023 - Updated screenshots and instructions to reflect the latest UI
March 7th, 2023 - Updated screenshots and instructions to reflect the latest UI
May 09, 2022 - Updated screenshots and instructions for clarity
Nov 3rd, 2021 - Updated instruction to resolve the login issue with Azure Databricks
October 23rd, 2021 - Provide a workaround for an Azure Active Directory issue that initially prevents logging in to Databricks
September 7th, 2021 - Updated instructions and images to reflect the latest portal experience
June 15th, 2021 - Updated the instruction to reflect the latest portal experience
June 22nd, 2020 - Clarified the format of the Azure Data Lake Storage URL and included a screenshot to avoid confusion
Environment before
Environment after
Matt has worked for multiple Fortune 500 companies as a DevOps Engineer and Solutions Architect. He is an AWS Certified DevOps Engineer - Professional, and an AWS Certified Solution Architect - Associate. He enjoys reading and learning new technologies.