Each run of the same Hadoop SequenceFile creation routine creates a file with different crc. Is it ok? -


i have simple code creates hadoop's sequence file. each code ran leaves in working dir 2 files:

   mysequencefile.txt    .mysequencefile.txt.crc 

after each run sizes of both files remain same. crc file contents become different!

is bug or expected behaviour?

this confusing, expected behaviour.
according sequencefile standart, each sequencefile has sync-block, length 16 bytes. sync-block repeats after each record in block-compressed sequencefiles, , after records or 1 long record in uncompressed or record-compressed sequencefiles.
thing is, sync-block sort of random value. written in header, how reader recognizes it. stays same within 1 sequencefile, can (and is) different 1 sequencefile another.
files logically same, binary different. crc binary shecksum, different between 2 files too.
haven`t found ways manually set sync-block. if gets way, please write here.


Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

Maven Javadoc 'Cannot find default setter' and fails -

lua - nginx string.match non posix -