Big Data and Spark Revolution

Description:

The continued rise in the volume and diversity of available data presents both opportunities and challenges for businesses to source information that is relevant, targeted and timely.

Big Data uniquely enables us to see, understand and be aware of the competitive landscape in new and increasingly detailed ways. Leveraging this data can enable organizations to target markets, engage with new prospects, compete more effectively, and close sales.

One of the most prominent framework for processing big data is Apache Hadoop, which is developed for distributed processing of large data sets across clusters of computers. The framework is written in Java, but a language binding exists for most of the commonly used languages.

This seminar will provide you with introduction to Big Data Technologies and help you identify the benefits of Big Data for your business.

In addition, we’ll introduce Apache Spark and related projects. Apache Spark solves the problem of speed and versatility by offering an “open source data analytics cluster computing framework” and it offers a framework that supports different types of data analysis within the same technology stack: fast interactive queries, streaming analysis, graph analysis and machine learning.

The primary audience for this course are Architects, Developers, CTOs, Engineering Managers, etc. No prior Hadoop experience is required.

  • Knowledge and experience with RDBMS and Information systems

Demystifying Big Data 

  • History of Database Systems
  • Exploration of Data
  • The CAP Theorem
  • Replication, Clustering and Sharding
  • Cloud Computing

Hadoop

  • Introduction to Hadoop
  • Hadoop Origin
  • Hadoop Distributed File System (HDFS)
  • Distributed Parallel Processing (MapReduce)
  • YARN (Yet Another Resource Negotiator)
  • Hadoop Eco-System
  • SQOOP
  • Flume
  • ZooKeeper
  • Oozie
  • Other Projects
  • Kafka
  • Parquet
  • Avro

NoSQL

  • Key-Value Store
  • Hbase
  • Cassandra
  • DynamoDB
  • Document Store
  • MongoDB
  • CouchBase
  • Graph Databases
  • Neo4J

In-Memory Technologies

  • Redis
  • VoltDB
  • Apache Spark
  • SAP HANA
  • Oracle In-Memory Database

Search, Indexing and Log Analysis

  • ElasticSearch
  • Splunk

YesSQL!

  • Hive
  • Impala
  • Drill
  • Phoenix
  • Couchbase N1QL
  • Spark SQL

Oracle Lines Up

  • Big Data Appliance
  • Big Data SQL
  • JSON API
  • Oracle sharding
  • Spatial and Graph

Spark Introduction

  • Big Data concepts
  • Eco-system Overview: Sqoop, Impala, Hive, Flume
  • Spark Basics
  • Spark Data Model and Operations: RDDs, Actions, Transformations
  • Spark Execution Models Overview: YARN, Spark Standalone
  • DataFrames, Spark SQL, MLlib
  • Spark Streaming
דוד יהלום מרצהדוד הינו CTO בנאיה טכנולוגיות ומרצה בכיר בנאיה אקדמי. דוד הינו מומחה בתחום מסדי נתונים ומומחה Big Data
  • על פי דרישה מועד פתיחה
  • 9:00-16:30daysימים ושעות
  • 16academic hours שעות אקדמיות
  • מתקדםcourse levelרמת הקורס
  • עברית/Englishlanguageשפת הדרכה
  • לבדיקת התאמה לקורס
  • [current_url]

    השאירו פרטים ונחזור אליכם בהקדם!