• Call us: +91 9501707741
  • tutorialzdiary@gmail.com

HADOOP

HADOOP Tutorials

Chapter 1 : Introduction
Chapter 2 : What is Hadoop ?
Chapter 3 : HDFS
Chapter 4: HDFS Security
Chapter 5 : Sqoop
Chapter 6 : Apache Pig

HADOOP Hands on

Hadoop 1 : Ingesting Data into Hadoop Through Sqoop
Hadoop 2 : Basics of Apache Hive
Hadoop 2.1 : Apache Hive Tables
Hadoop 3: Basics of Apache Pig
Hadoop 4 : Basics of Apache HBase

HADOOP Interview Questions and Answers

Part 1 : Big Data Basics
Part 2 : MapReduce
Part 3 : Mapreduce II
Part 4 : Hive I
Part 5 : Hive II
Part 6 : Hbase I

HADOOP Training

BigData and Hadoop Training Course

Part 5 : Hive II

Q. If you run select * query in Hive, why it’s not run MapReduce?
It’s an optimization technique. hive.fetch.task.conversion property can (FETCH task) minimize the latency of MapReduce
overhead. When queried SELECT, FILTER, LIMIT queries, this property skip MapReduce and using FETCH task. As a result, Hive can execute query without run MapReduce task

By default, it’s valued “minimal”. Which optimize: SELECT STAR, FILTER on partition columns, LIMIT queries only, where
as another value is “more” which optimize : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns).

Q.If you run hive as a server, what are the available mechanism for connecting it from the application?
There are following ways by which you can connect with the Hive Server:
1. Thrift Client: Using thrift you can call hive commands from a various programming languages e.g. C++, Java, PHP, Python and Ruby.
2. JDBC Driver : It supports the Type 4 (pure Java) JDBC Driver
3. ODBC Driver: It supports ODBC protocol.

Q.What is serde ? Why you use it? What are different format of Serde ?
The SerDe interface allows you to instruct Hive as to how a record should be processed. A SerDe is a combination of a Serializer and a Deserializer (hence, Ser-De). The Deserializer interface takes a string or binary representation of a record, and translates it into a Java object that Hive can manipulate. The Serializer, however, will take a Java object that Hive has been working with, and turn it into something that Hive can write to HDFS or another supported system. Commonly, Deserializers are used at query time to execute SELECT statements, and Serializers are used when writing data, such as through an INSERT-SELECT statement.

Q. How to start Hive Thrift server ?
To Start Thrift Server use below command :
$hive -service hiveserver

Q. What types of Input Format supported in Hive?
File Format
A file format is the way in which information is stored or encoded in a computer file. In Hive it refers to how records are stored inside the file.
TEXTFILE
TEXTFILE format is a famous input/output format used in Hadoop. In Hive if we define a table as TEXTFILE it can load data of form CSV (Comma Separated Values), delimited by Tabs, Spaces and JSON data.
SEQUENCEFILE
Sequence files are flat files consisting of binary key-value pairs. When Hive converts queries to MapReduce jobs, it decides on the appropriate key-value pairs to be used for a given record. Sequence files are in a binary format which are able to split and the main use of these files is to club two or more smaller files and make them as a one sequence file.
RCFILE
RCFILE stands of Record Columnar File which is another type of binary file format which offers high compression rate on the top of the rows.
RCFILE is used when we want to perform operations on multiple rows at a time.
RCFILEs are flat files consisting of binary key/value pairs, which shares many similarities with SEQUENCEFILE. RCFILE stores columns of a table in the form of record in a columnar manner.
ORCFILE
ORC stands for Optimized Row Columnar which means it can store data in an optimized way than the other file formats. ORC reduces the size of the original data up to 75%. As a result of the speed of data processing also increases.

Q. What is Hcatalog ?
It assists integration with other tools and supplies to read and write interfaces for Pig, Hive, and Map/Reduce.
It provides shared schema and data types for Hadoop tools.You do not have to explicitly type the data structures in each program.
It exposes the information as Rest Interface for external data access.
It also integrates with Sqoop, which is a tool designed to transfer data back and forth between Hadoop and relational databases such as SQL Server and Oracle
It provide APIs and webservice wrapper for accessing metadata in hive metastore.
HCatalog also exposes a REST interface so that you can create custom tools and applications to interact with Hadoop data structures.

image_pdfimage_print

Leave a reply

Your email address will not be published. Required fields are marked *


Newsletter
Training Enquiry