Big Data (Hadoop)

About Big Data (Hadoop)

Big Data refers to Information resources whose characteristics in terms of volume, velocity and variety require the use of particular technologies and analytical methods to generate value and which generally exceed the capabilities of a single machine and require parallel processing..

Hadoop is an open source, java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Its distributed file system enables concurrent processing and fault tolerance. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.

Big Data is treated like an asset, which can be valuable, whereas Hadoop is treated like a program to bring out the value from the asset, which is the main difference between Big Data and Hadoop. Big Data is unsorted and raw, whereas Hadoop is designed to manage and handle complicated & sophisticated Big data.

Who Uses

Zillow, Redfin and Trulia are companies using Hadoop and big data to democratize data for real estate consumers through customer analysis. Roughly 350+ big companies use Hadoop in their stack, below are the listing:

  • Netflix
  • Uber
  • Twitter
  • Spotify
  • Shopify
  • Airbnb

SCOPE

Big Data is a fast growing field with exciting opportunities for professionals in all industries and across the globe. With the demand for skilled big data professionals continuing to rise, now is a great time to enter the job market. Big Data is influencing the IT industry like few technologies or trends have done so before, it can help companies improve their decision-making and compete on another level.

Eligibility

While there are no strict requirements for learning Big Data Hadoop, basic knowledge in the following areas will make it easier to grasp the course:

  • Computer programming skills
  • SQL knowledge
  • Linux

NOTE: People who have a bachelor’s or Master’s degree in science, mathematics, engineering, finance, economics or statistics can grasp Big data Hadoop with ease.

Quick Enquiry Form



Experienced Faculty

Certification

Placement Assistance

JOB OPPORTUNITIES

Few popular Big Data job titles are listed below:

  • Hadoop / Big Data developer
  • Hadoop Administrator
  • Data Engineer.
  • Big Data Analyst
  • Machine learning engineer
  • Software development engineer
  • Big data engineer
  • Big data Consultant

Course Syllabus

  • Introduction to Big Data
  • Characteristics
  • Why, How and What s of Big data
  • Existing OLTP, ETL,DWH,OLAP
  • Introduction to Hadoop Ecosystem
  • Architecture-HDFS
  • Sharding , Distributed and Replication factor (SDR)
  • Daemons
  • Map reduce (MRV1) and Yarn
  • Hadoop v1 and v2
  • Hadoop Data federation
  • Prerequisite for Installation
  • Single node , Pseudo distributed and Multinode cluster
  • Virtual machine using Linux ubuntu/CentOS
  • Installation of hadoop in cloud (Azure/AWS)
  • Installation of Java ,ssh,eclipse
  • Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons
  • High Availability (Active and Standby)
  • Automatic and manual failover
  • Hadoop Fs shell commands
  • Writing Data to HDFS
  • Reading Data from DFS
  • Rack awareness policy and Replica placement Strategy
  • Failure Handling
  • Namenode
  • Datanode
  • Block-Safe mode
  • Rebalancing and load optimization
  • Trouble shooting and error rectification
  • Hadoop fs shell commands-Unix and Java-Basics
  • Assessment 1
  • Introduction to Mapreduce
  • Architecture of Map reduce
  • Execution Map reduce in YARN
  • App Master ,Resource Manager and Node manager
  • Input format , Input split and Key Value Pairs
  • class and methods of Mapreduce paradigm
  • Mapper
  • Reducer
  • Partitioner
  • Custom and Default partition
  • Shuffle and Sort
  • Combiner-Scheduler
  • App Master /manager
  • Container-Node manager
  • Map reduce Hands on
  • word count program/ log analytics
  • Hadoop streaming in R/Python
  • Data processing Transformations
  • Map only jobs and uber jobs
  • Inverted index and searches
  • MR Programs 2
  • Structured and Unstructured Data handling
  • optimizing using Combiner
  • Partitioner
  • Single and multiple column
  • Inverted Index
  • XML -semi structure
  • Map side joins
  • Reduce side join
  • Introduction to Hive Data warehouse
  • Installation hive and metastore database
  • Configure metastore to mysql
  • Hive QL Commands
  • Manipulation and anlytical function in hive
  • Managed table and external tables
  • Partitioning and Bucketing
  • Complex data types and Unstructured data
  • Advance HQL commands
  • UDF and UDAF
  • Integration with Hbase
  • SerDe / Regular Expression
  • Introduction to PIG
  • Installation-Bags and collections
  • Commands and Scripts
  • Pig UDF
  • JSON to AVRO file conversion
  • Parquet compressed file to uncompressed
  • AVRO schema and data file
  • ORC file
  • Assessment 2

  • Introduction to NOSQL
  • ACID /CAP/BASE
  • Key value pair
  • Map reduce
  • Column family
  • HbaseDocumennt
  • MongoDB
  • Graph DB
  • Neo4j
  • Introduction to HBASE and installation
  • The HBase Data Model
  • The HBase Shell
  • HBase Architecture
  • Schema Design
  • The HBase API
  • HBase Configuration and Tuning
  • Ingest data from RDB
  • Introduction to Sqoop and installation
  • Import and export data from and to RDB
  • Bulk loading , Incremental load , Split by , Conditional query
  • Sqoop validation and jobs
  • Ingest streaming data
  • Flume Architecture
  • Agent ,Source,sink channel
  • Ingest log file
  • Collecting data from twitter for Sentimental analysis
  • Assessment 3
  • Integrate With ETL
  • Talend Big data edition – Components of big data
  • Big data Analytics
  • Dimensional modelling
  • Data Visualization
  • Tableau – Hive and spark sql connectors
  • Spark core and Components
  • Spark Shell
  • Create RDD from HDFS /Local
  • Creating new RDD-Transformations on RDD
  • Lineage Graph – DAG
  • Actions on RDD
  • RDD Concepts on Persist and Cache-Lazy evaluation of RDD
  • Hands on and core concepts of map() transformation
  • Hands on and core concepts of filter() transformation
  • Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation Hands on and core concepts of reduce() action
  • Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action
  • Basics of Accumulator
  • Hands on and core concepts of collect() action
  • Hands on and core concepts of take() action
  • Apache Spark Execution Model
  • How Spark execute program
  • Concepts of RDD partitioning
  • RDD data shuffling and performance issue
  • Data frames and dataset
  • Spark SQL
  • Pyspark
  • Spark jobs
  • Build scala program using SBT /Maven
  • Spark submit and spark Application
  • KAFKA-Publisher /Subscriber
  • Consumer and producer
  • HUE
  • Monitoring and scheduling
  • Zeppelin
  • OOZIE-Workflow and Co-ordinator
  • Distribution Installation on cloud or Sandbox
  • Cloudera -cloudera manager
  • Horton works -ambari server
  • MapR – MCS
  • Introduction to Data science-Machine learning-Statistical Analysis-Sentiment Analysis
  • Use Multinode cluster setup-High Availability-Hadoop data federation-Commissioning and-decommissioning-Automatic and manual failover-Zookeeper failover controller
  • Use cases, Case studies and Proof of Concept-Working on different Distributions
  • CCA Spark and Hadoop Developer Exam (CCA175)
  • CCP Data Engineer (DE575)
  • HDPCD CERTIFICATION
  • HDP CERTIFIED APACHE SPARK DEVELOPER