hadoop - Determining Bucketing Configuration on Hive Table -


i curious if provide little more clarification on how configure bucketing property on hive table. see helps joins , believe read put on column use join. wrong. curious how determine number of buckets choose.

if give brief explanation , documentation on how determine of these things great.

thanks in advance assistance.

craig

the following few suggestions considered while designing buckets.

  1. buckets created on critical columns , single column or set of columns, implies these columns primary columns various join conditions , concept of bucketing hash these set of columns , store in such way accessible hdfs faster.thus retrieving speed fast.its advised not use join columns critical , think improve performance.
  2. the number of buckets in exponents of 2. number of buckets determine number of reducers run , determines final number files in data stored. number of buckets has designed keeping in mind size of data handling , there keeping in mind of avoiding large number of small files in hdfs , few number of big files , improving hive query retrieving speed , optimizations.

Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -