hadoop - Determining Bucketing Configuration on Hive Table -

- June 15, 2011

i curious if provide little more clarification on how configure bucketing property on hive table. see helps joins , believe read put on column use join. wrong. curious how determine number of buckets choose.

if give brief explanation , documentation on how determine of these things great.

thanks in advance assistance.

craig

the following few suggestions considered while designing buckets.

buckets created on critical columns , single column or set of columns, implies these columns primary columns various join conditions , concept of bucketing hash these set of columns , store in such way accessible hdfs faster.thus retrieving speed fast.its advised not use join columns critical , think improve performance.
the number of buckets in exponents of 2. number of buckets determine number of reducers run , determines final number files in data stored. number of buckets has designed keeping in mind size of data handling , there keeping in mind of avoiding large number of small files in hdfs , few number of big files , improving hive query retrieving speed , optimizations.

Search This Blog

Post

hadoop - Determining Bucketing Configuration on Hive Table -

Comments

Post a Comment

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

Maven Javadoc 'Cannot find default setter' and fails -

javascript - SAPUI5 Filling SmartTable with OData from XMII -