list - C# Poor dictionary performance when adding elements -

i have big chunck of data containing ~1.5 million entries. each entry instance of class this:

public class element {     public guid id { get; set; }     public string name { get; set; }     public property p... p1... p2...  }

i have list of guids (~4 millions) need names based on list of instances of element class.

i'm storing element objects in dictionary, takes ~90 seconds populate data. there way improve performance when adding items dictionary? data doesn't have duplicates know dictionary checks duplicates when adding new item.

the structure doesn't need dictionary if there's better one. tried put element objects in list performed lot better (~9sec) when adding. when need item guid takes more 10min find 4million elements. tried using list.find() , manually iterating through list.

also, if instead of using system.guid convert them string , store string representation on data structures whole both operations of populating dictionary , filling names on other list takes 10s, application consumes 1.2gb of ram, instead of 600mb when store them system.guid.

any ideas on how perform better?

your problem perhaps connected "sequential" guid, like:

c482fbe1-9f16-4ae9-a05c-383478ec9d11 c482fbe1-9f16-4ae9-a05c-383478ec9d12 c482fbe1-9f16-4ae9-a05c-383478ec9d13 c482fbe1-9f16-4ae9-a05c-383478ec9d14 c482fbe1-9f16-4ae9-a05c-383478ec9d15

the dictionary<,> has problem those, because have same gethashcode(), has tricks change search time o(1) o(n)... can solve using custom equality comparer calculates hash in different way, like:

public class reverseguidequalitycomparer : iequalitycomparer<guid> {     public static readonly reverseguidequalitycomparer default = new reverseguidequalitycomparer();      #region iequalitycomparer<guid> members      public bool equals(guid x, guid y)     {         return x.equals(y);     }      public int gethashcode(guid obj)     {         var bytes = obj.tobytearray();          uint hash1 = (uint)bytes[0] | ((uint)bytes[1] << 8) | ((uint)bytes[2] << 16) | ((uint)bytes[3] << 24);         uint hash2 = (uint)bytes[4] | ((uint)bytes[5] << 8) | ((uint)bytes[6] << 16) | ((uint)bytes[7] << 24);         uint hash3 = (uint)bytes[8] | ((uint)bytes[9] << 8) | ((uint)bytes[10] << 16) | ((uint)bytes[11] << 24);         uint hash4 = (uint)bytes[12] | ((uint)bytes[13] << 8) | ((uint)bytes[14] << 16) | ((uint)bytes[15] << 24);          int hash = 37;          unchecked         {             hash = hash * 23 + (int)hash1;             hash = hash * 23 + (int)hash2;             hash = hash * 23 + (int)hash3;             hash = hash * 23 + (int)hash4;         }          return hash;     }      #endregion }

then declare dictionary this:

var dict = new dictionary<guid, element>(reverseguidequalitycomparer.default);

a little test see difference:

private static void increment(byte[] x) {     (int = x.length - 1; >= 0; i--)     {         if (x[i] != 0xff)         {             x[i]++;             return;         }          x[i] = 0;     } }

and

// can try timing program default gethashcode: //var dict = new dictionary<guid, object>(); var dict = new dictionary<guid, object>(reverseguidequalitycomparer.default); var hs1 = new hashset<int>(); var hs2 = new hashset<int>();  {     var guid = guid.newguid();      stopwatch sw = stopwatch.startnew();      (int = 0; < 1500000; i++)     {         hs1.add(reverseguidequalitycomparer.default.gethashcode(guid));         hs2.add(guid.gethashcode());         dict.add(guid, new object());         var bytes = guid.tobytearray();         increment(bytes);         guid = new guid(bytes);     }      sw.stop();      console.writeline("milliseconds: {0}", sw.elapsedmilliseconds); }  console.writeline("reverseguidequalitycomparer distinct hashes: {0}", hs1.count); console.writeline("guid.gethashcode() distinct hashes: {0}", hs2.count);

with sequential guid difference in number of distinct hash codes staggering:

reverseguidequalitycomparer distinct hashes: 1500000 guid.gethashcode() distinct hashes: 256

now... if don't want use tobytearray() (because allocates useless memory), there solution uses reflection , expression trees... should work correctly mono, because mono "aligned" implementation of guid 1 of microsoft in 2004, ancient time :-)

public class reverseguidequalitycomparer : iequalitycomparer<guid> {     public static readonly reverseguidequalitycomparer default = new reverseguidequalitycomparer();      public static readonly func<guid, int> gethashcodefunc;      static reverseguidequalitycomparer()     {         var par = expression.parameter(typeof(guid));         var hash = expression.variable(typeof(int));          var const23 = expression.constant(23);          var const8 = expression.constant(8);         var const16 = expression.constant(16);         var const24 = expression.constant(24);          var b = expression.convert(expression.convert(expression.field(par, "_b"), typeof(ushort)), typeof(uint));         var c = expression.convert(expression.convert(expression.field(par, "_c"), typeof(ushort)), typeof(uint));         var d = expression.convert(expression.field(par, "_d"), typeof(uint));         var e = expression.convert(expression.field(par, "_e"), typeof(uint));         var f = expression.convert(expression.field(par, "_f"), typeof(uint));         var g = expression.convert(expression.field(par, "_g"), typeof(uint));         var h = expression.convert(expression.field(par, "_h"), typeof(uint));         var = expression.convert(expression.field(par, "_i"), typeof(uint));         var j = expression.convert(expression.field(par, "_j"), typeof(uint));         var k = expression.convert(expression.field(par, "_k"), typeof(uint));          var sc = expression.leftshift(c, const16);         var se = expression.leftshift(e, const8);         var sf = expression.leftshift(f, const16);         var sg = expression.leftshift(g, const24);         var si = expression.leftshift(i, const8);         var sj = expression.leftshift(j, const16);         var sk = expression.leftshift(k, const24);          var body = expression.block(new[]         {             hash         },         new expression[]         {             expression.assign(hash, expression.constant(37)),             expression.multiplyassign(hash, const23),             expression.addassign(hash, expression.field(par, "_a")),             expression.multiplyassign(hash, const23),             expression.addassign(hash, expression.convert(expression.or(b, sc), typeof(int))),             expression.multiplyassign(hash, const23),             expression.addassign(hash, expression.convert(expression.or(d, expression.or(se, expression.or(sf, sg))), typeof(int))),             expression.multiplyassign(hash, const23),             expression.addassign(hash, expression.convert(expression.or(h, expression.or(si, expression.or(sj, sk))), typeof(int))),             hash         });          gethashcodefunc = expression.lambda<func<guid, int>>(body, par).compile();     }      #region iequalitycomparer<guid> members      public bool equals(guid x, guid y)     {         return x.equals(y);     }      public int gethashcode(guid obj)     {         return gethashcodefunc(obj);     }      #endregion      // comparison purpose, not used     public int gethashcodesimple(guid obj)     {         var bytes = obj.tobytearray();          unchecked         {             int hash = 37;              hash = hash * 23 + (int)((uint)bytes[0] | ((uint)bytes[1] << 8) | ((uint)bytes[2] << 16) | ((uint)bytes[3] << 24));             hash = hash * 23 + (int)((uint)bytes[4] | ((uint)bytes[5] << 8) | ((uint)bytes[6] << 16) | ((uint)bytes[7] << 24));             hash = hash * 23 + (int)((uint)bytes[8] | ((uint)bytes[9] << 8) | ((uint)bytes[10] << 16) | ((uint)bytes[11] << 24));             hash = hash * 23 + (int)((uint)bytes[12] | ((uint)bytes[13] << 8) | ((uint)bytes[14] << 16) | ((uint)bytes[15] << 24));              return hash;         }     } }

other solution, based on "undocumented working" programming (tested on .net , mono):

public class reverseguidequalitycomparer : iequalitycomparer<guid> {     public static readonly reverseguidequalitycomparer default = new reverseguidequalitycomparer();      #region iequalitycomparer<guid> members      public bool equals(guid x, guid y)     {         return x.equals(y);     }      public int gethashcode(guid obj)     {         guidtoint32 gtoi = new guidtoint32 { guid = obj };          unchecked         {             int hash = 37;              hash = hash * 23 + gtoi.int32a;             hash = hash * 23 + gtoi.int32b;             hash = hash * 23 + gtoi.int32c;             hash = hash * 23 + gtoi.int32d;              return hash;         }     }      #endregion      [structlayout(layoutkind.explicit)]     private struct guidtoint32     {         [fieldoffset(0)]         public guid guid;          [fieldoffset(0)]         public int int32a;         [fieldoffset(4)]         public int int32b;         [fieldoffset(8)]         public int int32c;         [fieldoffset(12)]         public int int32d;     } }

it uses structlayout "trick" superimpose guid bunch of int, write guid , read int.

why guid.gethashcode() has problems sequential ids?

very easy explain: reference source, gethashcode() is:

return _a ^ (((int)_b << 16) | (int)(ushort)_c) ^ (((int)_f << 24) | _k);

so _d, _e, _g, _h, _i, _j bytes aren't part of hash code. when incremented guid first incremented in _k field (256 values), on overflow in _j field (256 * 256 values, 65536 values), on _i field (16777216 values). not hashing _h, _i, _j fields hash of sequential guid show 256 different values non-huge range of guid (or @ maximum 512 different values, if _f field incremented once, if start guid similar 12345678-1234-1234-1234-aaffffffff00, aa (that "our" _f) incremented ab after 256 increments of guid)

Search This Blog

Post

list - C# Poor dictionary performance when adding elements -

Comments

Post a Comment

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

Maven Javadoc 'Cannot find default setter' and fails -