hadoop - Determining Bucketing Configuration on Hive Table -
i curious if provide little more clarification on how configure bucketing property on hive table. see helps joins , believe read put on column use join. wrong. curious how determine number of buckets choose.
if give brief explanation , documentation on how determine of these things great.
thanks in advance assistance.
craig
the following few suggestions considered while designing buckets.
- buckets created on critical columns , single column or set of columns, implies these columns primary columns various join conditions , concept of bucketing hash these set of columns , store in such way accessible hdfs faster.thus retrieving speed fast.its advised not use join columns critical , think improve performance.
- the number of buckets in exponents of 2. number of buckets determine number of reducers run , determines final number files in data stored. number of buckets has designed keeping in mind size of data handling , there keeping in mind of avoiding large number of small files in hdfs , few number of big files , improving hive query retrieving speed , optimizations.
Comments
Post a Comment