Building the query pipeline for meaningful data

“Hadoop is a critical foundational skill.”

Hadoop experts who know how to retrieve meaningful information from the deepening wells of Big Data will be able to sift through a growing list of job opportunities at organizations of all kinds.

“Hadoop is a critical foundational skill,” says Marilson Campos, principal data architect at Zoosk, a San Francisco-based online data company in more than 80 countries. Marilson teaches the popular course Hadoop: Distributed Processing of Big Data that begins Saturday, Oct. 6.

“I teach because it’s really relevant moving forward,” he says.

What used to be the realm of the back-end developer is now moving into the mainstream. Organizations generate a tremendous amount of data these days and need a technology that helps them uncover important insights from that data.

“It’s not uncommon for CEOs to have access to Hadoop SQL tools so they can clearly see simple things about their data and make better business decisions,”Marilson says. “It’s across the organization. The tool most being used is SQL-based.”

The Apache Hadoop ecosystem of open-source software utilities includes libraries, a distributed file system, a platform management system, and a programming model for large-scale data processing.

In the overview course students practice with everything from a Hadoop Distributed File System (HDFS), MapReduce framework, and HBase to SparkSQL to Spark libraries, Hive queries, and Zookeeper.

Students work through problems together on their laptops in class. While it’s important for Hadoop students to have some basic knowledge in reading code, only a small percentage of Marilson’s students are developers. Some understanding of database, SQL, parallel or distributed computing is recommended.

“At the end of the class, students will be able to use Hadoop to solve real problems,” Marilson says. “They’ll be load a data set, create a table and query that table.

Students may want to take the Cloudera Certified Professional (CCP) data engineer exam after completion of the course.

Top 5 things students should know about Hadoop?

  • How Hadoop works
  • How to optimize applications using the platform.
  • How to leverage SQL skills to develop Hadoop applications using Hive.
  • How to understand the basics of low level code such as Java and Python.
  • How to run programs in Hadoop to learn the inner workings of the platform.

 

Learn more about our Database and Data Analytics courses and certificate program.

INSTRUCTOR BIO: MARILSON CAMPOS, B.A., principal data architect at Zoosk, an online dating site, has been designing complex big data systems for more than a decade. He specializes in developing and implementing architectures to massively build machine learning models. Previously he held leadership positions at a variety
of startups including Rocket Fuel, Inc., BuzzLogic, Whojini, and GlobalEnglish. Coauthor of Leading and Managing in Silicon Valley, he has a bachelor’s degree in Computer Science from Universidado do Estado do Rio de Janeiro, has studied at UC Berkeley and UC Santa Cruz and holds numerous technical certificates from Cloudera, MIT, Stanford, and Oracle.

Leave a Reply