Many downloads like Delphi Xe7 may also include a crack, serial number, unlock code or keygen (key generator). If this is the case then it is usually made available in the full download archive itself.
I'm trying to load two lists, each of 500.000 unique integer numbers, into a dictionary ( tDictionary ). The dictionary gets cleared between the list loads, so it never contains more than 500.000 numbers. For some reason, it takes 8 seconds, on my computer, to load the first list of 500.000 numbers into the dictionary, whereas it only takes 0.1 second to load the second list of 500.000 numbers into the dictionary. Anyone know why that is?
I'm trying to load two lists, each of 500.000 unique integer numbers, into a dictionary ( tDictionary ). The dictionary gets cleared between the list loads, so it never contains more than 500.000 numbers. For some reason, it takes 8 seconds, on my computer, to load the first list of 500.000 numbers into the dictionary, whereas it only takes 0.1 second to load the second list of 500.000 numbers into the dictionary. Anyone know why that is? These are the lists I use: and Both lists compressed: Hello, I guess in the first run the dictionary has to allocate all that ememory and since your code didn't looked like you told it upfront how many entries you're going to have it is allocating it bit by bit. Afaik the TDictionary should have some capacity property or even a capacity parameter in one of the constructor variants.
Set it to 500000 and check speed again. I guess you'll be amazed then (at least for TStringList, TObjectList etc. That exists) Greetings Markus. I'm trying to load two lists, each of 500.000 unique integer numbers, into a dictionary ( tDictionary ). The dictionary gets cleared between the list loads, so it never contains more than 500.000 numbers. For some reason, it takes 8 seconds, on my computer, to load the first list of 500.000 numbers into the dictionary, whereas it only takes 0.1 second to load the second list of 500.000 numbers into the dictionary.
Anyone know why that is? These are the lists I use: and Both lists compressed: Hello, I guess in the first run the dictionary has to allocate all that ememory and since your code didn't looked like you told it upfront how many entries you're going to have it is allocating it bit by bit.
Afaik the TDictionary should have some capacity property or even a capacity parameter in one of the constructor variants. Set it to 500000 and check speed again.
I guess you'll be amazed then (at least for TStringList, TObjectList etc. That exists) Greetings Markus. Hi Mark, Thanks for the response. I tried replacing dic:= tDictionary.create; with dic:= tDictionary.create(500000); No significant difference in performance. I also tried to switch the load order of the lists, but that didn't change anything either. The first list ( 1.txt ) still takes around 8s to load, even though it gets loaded after the second list.
Your first list may just be a pathological case where practically every number you add causes a hash collision, which requires a lot more work for the code to find a free slot in the internal list of items. If this is the case looking up keys from the first list should also be significantly slower than looking up keys from the second list. The performance of hash tables depends a lot on the distribution of the keys you add to it, on the hash table implementation, and the hashing algorithm used.
Peter Below TeamB. Yes, that's probably it then. Do you know if there's a way to change the hashing algorithm used by 'tDictionary'? Since all the numbers in the list are 32 bit integers, I tried to replace the tDictionary with a tDictionary object. That resulted in a 100x (!!) performance improvement when handling the 1.txt list. Instead of 8s, it only took.08s to load the numbers into the dictionary. I can't be sure that all numbers stay within the 32 bit boundary though, so it's not a working solution for me unfortunately.
Yes, that's probably it then. Do you know if there's a way to change the hashing algorithm used by 'tDictionary'? Since all the numbers in the list are 32 bit integers, I tried to replace the tDictionary with a tDictionary object. That resulted in a 100x (!!) performance improvement when handling the 1.txt list.
Instead of 8s, it only took.08s to load the numbers into the dictionary. I can't be sure that all numbers stay within the 32 bit boundary though, so it's not a working solution for me unfortunately. You have to create the dictionary with a IEqualityComparer implementation instead of relying on the default comparer the class will use. If you look at the source for TDictionary.Hash (which is not virtual) it relies on the comparer to get the hashed key: function TDictionary.Hash(const Key: TKey): Integer; const PositiveMask = not Integer($80000000); begin // Double-Abs to avoid -MaxInt and MinInt problems.
// Not using compiler-Abs because we must get a positive integer; // for compiler, Abs(Low(Integer)) is a null op. Result:= PositiveMask and ((PositiveMask and FComparer.GetHashCode(Key)) + 1); end; The construction of the dictionary would then look something like this: FDictionary:= TDictionary.Create( 750000, TEqualityComparer.Construct( function (const L, R: int64): boolean begin result:= L = R; end; function (const Value: int64): integer begin result:= value // truncate to 32 bits, switch off range and overflow checks! End )); The default comparer uses this function GetHashCodeI8(Inst: Pointer; const Value: Int64): Integer; begin Result:= THashBobJenkins.GetHashValue(Value, SizeOf(Value), 0); end; THashBobJenkins can be found in the system.hash unit. It implements a generic hash function with quite a bit of computation going on. The distribution it creates seems to be suboptimal for your data. Peter Below TeamB.
THashBobJenkins can be found in the system.hash unit. It implements a generic hash function with quite a bit of computation going on.
The distribution it creates seems to be suboptimal for your data. Well, it is suboptimal for quite a lot of data then And it is very slow as well. We switched over to MurmurHash2 and got very good results. Hello, if this is a general thing it might warrant for a feature reques QP issue to be created. Maybe EMBT could provide several default hash algorithms which could be selected via some optional constructor parameter and if not the current default selection will be used. Greetings Markus.