BigData Interview Questions- Hive 1

BigData Interview Questions- Hive 1

Question: What is Hive Metastore ?
Answer: Hive Meta store is a database that stores metadata of your hive tables like table name,column name,data types,table location,number of buckets in the table etc.

Question: What are different modes in which Hive can run?
Answer: Below different modes Hive can run.
(i)Local mode
(ii)Distributed mode
(iii)PseudoDistributed mode

Question: Is it possible to create multiple table in hive for same data?
Answer: In hive tables are created on the top of data , data is stored in hdfs. Therefore one can have multiple schema for one data file,table info will be saved in hive metastore.

Question: How to skip Header rows from table in hive ?
Answer: While creating schema specify the Keyword “TBLPROPERTIES
(“skip.header.line.count”=”3”);

Question: Is multi line comment supported in Hive Script?
Answer: No, multi line comment is not supported in Hive Script.

Question: How to implement UDF in Hive ?
Answer:
1. Extend org.apache.hadoop.hive.ql.exec.UDF
2. Implement at least one evaluate() method.
3. Add JAR /path/to/hive_example.jar;
4. CREATE Temporary function Uppercase AS ‘com.hadoopUdf.hive.Uppercase’;

Question: What are skewed tables in Hive?
Answer: A skewed table is a special type of table where the values that appear very often (heavy skew) are split out into separate files and rest of the values go to some other
file.

Question: Difference between order by, sort by , Distributed by and Cluster by in hive?
Answer: ORDER BY x: guarantees global ordering, but does this by pushing all data through just one reducer. This is basically unacceptable for large datasets. You end up one sorted file as output.
SORT BY x: orders data at each of N reducers, but each reducer can receive overlapping ranges of data. You end up with N or more sorted files with overlapping ranges.
DISTRIBUTE BY x: ensures each of N reducers gets non-overlapping ranges of x, but doesn’t sort the output of each reducer. You end up with N or unsorted files with non-overlapping ranges.
CLUSTER BY x: ensures each of N reducers gets non-overlapping ranges, then sorts by those ranges at the reducers. This gives you global ordering, and is the same as doing (DISTRIBUTE BY x and SORT BY x). You end up with N or more sorted files with non-overlapping ranges.

Question: How can the columns of a table in hive be written to a file?
Answer: By using awk command in shell, the output from HiveQL (Describe) can be written to a file.
hive -S -e “describe table_name;” | awk -F” ” ’{print 1}’ > ~/output.

Question: Differentiate between describe and describe extended ?
Answer: Describe database- This query displays the name of the database, the root location on the file system and comments if any.
Describe extended database- Gives the details of the database or schema in a detailed manner.

Question: What are the different Complex Data Types available in Hive?
Answer:
arrays: ARRAY<data_type>
maps: MAP<primitive_type, data_type>
structs: STRUCT<col_name : data_type [COMMENT col_comment], …>
union: UNIONTYPE<data_type, data_type, …>

No Comments

Post a Reply

Inquire Now
close slider