Apache Hive is one of the components of Hadoop Ecosystem. Hive is Datawarehouse provides the SQL-like interface used to Query Big Data .
Features of Hive :
- Support various data formats.
- Can access data stored on HDFS or Hbase.
- Can do the Ad-hoc query to analyze , summarize data.
- Hive is scalable and fault tolerance.
- Convert HiveQL into MapReduce Jobs.
- Easy to learn if you are familiar with SQL language.
- Not designed for the real-time query.
So Lets Begin :
Step 1. To start Hive on Hadoop cluster. Go to Terminal. Open > Type : Hive > Press Enter.
Step 2. Create Database :
A database can be created using command “CREATE DATABASE <name>;”
create a database “retail_db” otherwise, tables will be created in default DB.
CREATE DATABASE retail_db;
Table can be created as follows :-
What is required : You have to provide table name : “customer”
column names with suitable data type : cust_id (column name) int (Data Type )
The format of your data i.e it is comma separated then ‘,’ or tab separated ‘\t’. Many formats available.
Step 3. Prepare Data : Open text file or any Editor/notepad.
Step 4. You can load your data into Hive tables in two ways :
(i) From your Local :
Save the file as employ_data.txt.
Command : load data local inpath ‘/home/employ_data.txt’ into table customer;
Info: ‘/home/employ_data.txt’ = Path name where you have saved your file. So hive will pick up from the local system and load it to HDFS .
customer = table name in which you want to load your data.
(ii) From Hadoop FileSystem i.e HDFS :
How to upload your file from local system to HDFS ? Easy – Go to command line :
Populating Hive table with Data.
Step 5. View Data : To View data required simple SQL statement using “SELECT * FROM <table name>;”
Other Things you can do in Hive :
(i)There are many built-in functions like MIN,MAX,AVG,SUM etc .
(ii) You can check how many DB and tables are there as well can select to drop them.
To drop use command “DROP DATABASE <database name>;”
(iii) Use Shell commands in Hive. Just put “!” before command.
pwd : To view the current Directory .
clear : To clear your command line.
Happy Learning ! .