BigData – Basics of Apache Hive

BigData – Basics of Apache Hive

Apache Hive is one of the components of Hadoop Ecosystem. Hive is Datawarehouse provides the SQL-like interface used to Query Big Data .

Features of Hive :

  • Support various data formats.
  • Can access data stored on HDFS or Hbase.
  • Can do the Ad-hoc query to analyze , summarize data.
  • Hive is scalable and fault tolerance.
  • Convert HiveQL into MapReduce Jobs.
  • Easy to learn if you are familiar with SQL language.
  • Not designed for the real-time query.

So Lets Begin :

Step 1. To start Hive on Hadoop cluster. Go to Terminal. Open > Type : Hive > Press Enter.

Hive Shell

Step 2. Create Database :
A database can be created using command “CREATE DATABASE <name>;”
create a database “retail_db” otherwise, tables will be created in default DB.

CREATE DATABASE retail_db;

Table can be created as follows :-

Table Structure in Hive

Table Structure in Hive

What is required : You have to provide table name : “customer”

column names with suitable data type : cust_id (column name)  int (Data Type )

The format of your data i.e it is comma separated then ‘,’ or tab separated ‘\t’. Many formats available.

Step 3. Prepare Data : Open text file or any Editor/notepad.

Raw_data

File name : employ_data.txt

 

Step 4. You can load your data into Hive tables in two ways :

(i) From your Local :

Save the file as employ_data.txt.

Command : load data local inpath ‘/home/employ_data.txt’ into table customer;

Info: ‘/home/employ_data.txt’ = Path name where you have saved your file. So hive will pick up from the local system and load it to HDFS .

customer = table name in which you want to load your data.

(ii) From Hadoop FileSystem i.e HDFS :

How to upload your file from local system to HDFS ? Easy – Go to command line :

upload data file to Hadoop

upload data file to Hadoop

Populating Hive table with Data.

Loading Data into Hive Tables

Loading Data into Hive Tables

Step 5. View Data : To View data required simple SQL statement using “SELECT * FROM <table name>;”

Data in Hive

Data in Hive

Other Things you can do in Hive :

(i)There are many built-in functions like MIN,MAX,AVG,SUM etc .

Max() function example

Max() function example.

Output of Max()

Output of Max()

 

(ii) You can check  how many DB and tables are there as well can select to drop them.

View DBs

View DBs

To drop use command “DROP DATABASE <database name>;”

Dropping DB in Hive

Dropping DB in Hive

(iii) Use Shell commands in Hive. Just put “!” before command.

pwd : To view the current Directory .

clear : To clear your command line.

Shell Command

example

shell command

shell command

 

Happy Learning ! .

 

No Comments

Post a Reply

Inquire Now
close slider