Each run of the same Hadoop SequenceFile creation routine creates a file with different crc. Is it ok? -

- June 15, 2015

i have simple code creates hadoop's sequence file. each code ran leaves in working dir 2 files:

   mysequencefile.txt    .mysequencefile.txt.crc

after each run sizes of both files remain same. crc file contents become different!

is bug or expected behaviour?

this confusing, expected behaviour.
according sequencefile standart, each sequencefile has sync-block, length 16 bytes. sync-block repeats after each record in block-compressed sequencefiles, , after records or 1 long record in uncompressed or record-compressed sequencefiles.
thing is, sync-block sort of random value. written in header, how reader recognizes it. stays same within 1 sequencefile, can (and is) different 1 sequencefile another.
files logically same, binary different. crc binary shecksum, different between 2 files too.
haven`t found ways manually set sync-block. if gets way, please write here.

Search This Blog

Post

Each run of the same Hadoop SequenceFile creation routine creates a file with different crc. Is it ok? -

Comments

Post a Comment

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

Maven Javadoc 'Cannot find default setter' and fails -

javascript - SAPUI5 Filling SmartTable with OData from XMII -