Engineering and Technology
Learn about the big data ecosystem and the power of Apache Spark for data wrangling and transformation.
This course on Udacity, Spark and Data Lakes, provides a comprehensive understanding of the big data ecosystem, data lakes, and the Spark framework. It covers topics such as the purpose and evolution of data lakes, a comparison between Spark and Hadoop, and the features of lakehouse architecture. Additionally, the course delves into the essentials of Spark, including data wrangling with functional programming, processing data with Spark DataFrames and Spark SQL, and working with common formats like CSV and JSON. Furthermore, it explores the usage of Spark and data lakes in the AWS Cloud, utilizing distributed data storage with Amazon S3 and configuring AWS Glue for running Spark Jobs. The course also covers ingesting and organizing data in lakehouse architecture on AWS, using Spark and AWS Glue for ELT processes, creating a Glue Data Catalog and Tables, and leveraging AWS Athena for ad-hoc queries. Finally, the course concludes with a hands-on project where learners act as data engineers for the STEDI team, building a data lakehouse solution for sensor data that involves building an ELT pipeline, processing data with Spark and AWS Glue, and loading the analytics tables back into the lakehouse architecture.
by Udacity
Learn about the big data ecosystem and the power of Apache Spark for data wrangling and transformati...
by Udacity
Learn to design data models, build data warehouses and data lakes, automate data pipelines, and mana...
by Udacity
Master job-ready Azure skills like designing data models and utilizing other in-demand components of...
by Udacity
Master how to work with big data and build machine learning models at scale using Spark!
by Udacity
Learn how to plan, design and implement enterprise data infrastructure solutions and create the blue...
by Udacity
In this course, we introduce the characteristics of medical data and associated data mining challeng...
by Udacity
Learn how to stream data to unlock key insights in real-time.