Angband.oook.cz
Angband.oook.cz
AboutVariantsLadderForumCompetitionComicScreenshotsFunniesLinks

Go Back   Angband Forums > Angband > Development

Reply
 
Thread Tools Display Modes
Old March 26, 2016, 10:12   #31
calris
Adept
 
Join Date: Mar 2016
Posts: 194
calris is on a distinguished road
Quote:
Originally Posted by AnonymousHero View Post
Since you mentioned it: On the subject of realloc(), I think mem_realloc() is buggy. I you see [url=https://github.com/angband/angband/blob/b662e8927190fee91d1450fa999c20a205762866/src/z-virt.c#L76]L76: This seems incorrect in that the addition here happens before the NULL check... thus ensuring that the "NULL" check can never be triggered. (It's usually not a problem on Linux due to overcommit, but malloc() can sometimes return NULL even if overcommit is allowed.)

It also seems to be attempting some sort of "poisoning"/size tracking to protect against corruption, but this is empatically not what one should be doing these days. Just use UBSAN and ASAN. They do this fully automatically and work for all memory accesses, including regular array indexing.
Just had a closer look at mem_realloc() and my immediate thought is WTF!

Why on earth are we attempting to realloc starting NOT the pointer to the memory we are reallocating?

And seriously, why do we have z-virt.c - I saw something in trac about nuking them. Seriously, custom wrappers around standard C functions are evil as.
calris is offline   Reply With Quote
Old March 26, 2016, 10:22   #32
AnonymousHero
Veteran
 
AnonymousHero's Avatar
 
Join Date: Jun 2007
Posts: 1,367
AnonymousHero is on a distinguished road
Quote:
Originally Posted by calris View Post
Ah, now I get it - Hash the string into, say, a u32 and just shove it into an array (which we can realloc be some fixed chunk size) and forget about sorting, just do a linear search. So all we need is either
  • A hash algorithm which guarantees no duplicate keys for non-equal strings, or
  • Some way to manage potential duplicate keys.

I'm open to any ideas you have to offer.
You don't even need any of that. Even if you get a few hash coollisions, the only consequence is that you'll occasionally be checking using strcmp(). That doesn't really matter.

(If you really want to avoid duplicates, you could use the quark_* functions instead, but they seem to be more optimized for the case where you're *usually* adding an unknown string, so YMMV.)

Wrt. choice of hash function: You can get arbitrarily advanced here, but FNV seems to be popular based on their list of users. Personally, I'd just go with the dead-simple Java String#hashCode() implementation (the formula is in the documentation). It's not great by any means (particularly if you have lots of non-ASCII characters, apparently), but until there's evidence that something better is needed, I wouldn't bother doing anything more advanced.
AnonymousHero is offline   Reply With Quote
Old March 26, 2016, 10:25   #33
AnonymousHero
Veteran
 
AnonymousHero's Avatar
 
Join Date: Jun 2007
Posts: 1,367
AnonymousHero is on a distinguished road
Quote:
Originally Posted by calris View Post
Just had a closer look at mem_realloc() and my immediate thought is WTF!

Why on earth are we attempting to realloc starting NOT the pointer to the memory we are reallocating?
That was my thought as well, but after pondering it for a bit I came to the conclusion that it must be because it's reserving extra space for size tracking/poisoning.

Quote:
Originally Posted by calris View Post
And seriously, why do we have z-virt.c - I saw something in trac about nuking them. Seriously, custom wrappers around standard C functions are evil as.
Yeah, it's kind of silly, though it does provide a more convenient interface if you really just want to crash on OOM instead of having to sprinkle huge amounts of "if != null" checks all over the place.

("Real" realloc() also has an extremely error-prone interface which it looks like mem_realloc() mitigates at least somewhat.)

Last edited by AnonymousHero; March 26, 2016 at 10:33.
AnonymousHero is offline   Reply With Quote
Old March 26, 2016, 10:34   #34
calris
Adept
 
Join Date: Mar 2016
Posts: 194
calris is on a distinguished road
Quote:
Originally Posted by AnonymousHero View Post
You don't even need any of that. Even if you get a few hash coollisions, the only consequence is that you'll occasionally be checking using strcmp(). That doesn't really matter.
OK, I'll admit, I don't get it... I've never really used hashing...

I thought that the point of a hash was to map a complex key (like a string) into a simple numerical key which you can store in a linear array and just do a basic linear search.

But if you map a string to a number, how will you know you have a collision?

From what I can think, you hash the string, lookup the key, then check the string using strcmp() to test for a collision. But then what?
calris is offline   Reply With Quote
Old March 26, 2016, 11:00   #35
calris
Adept
 
Join Date: Mar 2016
Posts: 194
calris is on a distinguished road
Quote:
Originally Posted by calris View Post
I thought that the point of a hash was to map a complex key (like a string) into a simple numerical key which you can store in a linear array and just do a basic linear search.
OK, I've done some research

The hash function maps the string into an index into a hash table (array) of some predefined size. The array needs to hold lists (linked lists, or sub-arrays than we can resize) of structures which contain the string key and the data (in this use case it's simply a 'sound id').

I'm still not getting how this is not going to result in the hash table and it's associated lists being splattered all over memory, putting us right back to our previous cache-miss problem. I will freely admit that a good hash function with minimal collisions will result in less strcmp() operations that the tree.
calris is offline   Reply With Quote
Old March 26, 2016, 11:27   #36
Pete Mack
Prophet
 
Join Date: Apr 2007
Location: Seattle, WA
Posts: 5,391
Donated: $40
Pete Mack is on a distinguished road
Why is anyone worrying about cache misses in Angband???? Yeah, it's important in other situations, but not in a text-based adventure game for crying out loud.
Pete Mack is offline   Reply With Quote
Old March 26, 2016, 11:29   #37
calris
Adept
 
Join Date: Mar 2016
Posts: 194
calris is on a distinguished road
Quote:
Originally Posted by AnonymousHero View Post
Yeah, it's kind of silly, though it does provide a more convenient interface if you really just want to crash on OOM instead of having to sprinkle huge amounts of "if != null" checks all over the place.
Good point

Quote:
("Real" realloc() also has an extremely error-prone interface which it looks like mem_realloc() mitigates at least somewhat.)
How so? The only thing I can think of is that the results of passing a size of zero is implementation defined. z-term.c returns NULL on size == 0 (which is nice, since no matter WHAT compiler we use, or what system we run on, we know the behaviour)

I'm now thinking this alone is worth keeping z-virt.c around for
calris is offline   Reply With Quote
Old March 26, 2016, 11:34   #38
calris
Adept
 
Join Date: Mar 2016
Posts: 194
calris is on a distinguished road
Quote:
Originally Posted by Pete Mack View Post
Why is anyone worrying about cache misses in Angband???? Yeah, it's important in other situations, but not in a text-based adventure game for crying out loud.
Because some of us a purists

Seriously, I don't really care what is used - I just needed a data structure that was effective and reasonably efficient at converting sound name strings to more usable sound id's. I'm familiar with AVL trees, I've used them before, I'm comfortable using them. I haven't used hash tables before, so I never gave them a thought. But Angband is a community development effort, so I will happily bow to the community consensus.
calris is offline   Reply With Quote
Old March 26, 2016, 12:44   #39
AnonymousHero
Veteran
 
AnonymousHero's Avatar
 
Join Date: Jun 2007
Posts: 1,367
AnonymousHero is on a distinguished road
Quote:
Originally Posted by Pete Mack View Post
Why is anyone worrying about cache misses in Angband???? Yeah, it's important in other situations, but not in a text-based adventure game for crying out loud.
I'm don't particularly care very much about the performance aspects; I do care about the simplicity. (I just thought I'd explain about about the performance aspects.)

That said, you usually do want to choose a performant data structure when it doesn't incur (code) complexity (as in this case)... lest you actually do end up having to optimize and finding that you have a very flat profile.
AnonymousHero is offline   Reply With Quote
Old March 26, 2016, 12:47   #40
AnonymousHero
Veteran
 
AnonymousHero's Avatar
 
Join Date: Jun 2007
Posts: 1,367
AnonymousHero is on a distinguished road
Quote:
Originally Posted by calris View Post
OK, I'll admit, I don't get it... I've never really used hashing...

I thought that the point of a hash was to map a complex key (like a string) into a simple numerical key which you can store in a linear array and just do a basic linear search.

But if you map a string to a number, how will you know you have a collision?

From what I can think, you hash the string, lookup the key, then check the string using strcmp() to test for a collision. But then what?
Collisions don't matter in this case since we wouldn't use the hash key to *index* into the array (as you would in e.g. a hash table). In that case you need to care about collisions. What I'm suggesting isn't really a hash table. It's just storing an extra "int" in the struct as a simple way to optimize the strcmp() away in most cases.
AnonymousHero is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Reworking the entire magic system (yay)! TricksterWolf Development 8 September 20, 2015 21:22
Mk build system? CJNyfalt Development 11 March 25, 2015 08:25
Resist system Jungle_Boy Development 65 August 30, 2011 04:10
Combat System Sirridan Development 9 July 14, 2009 08:11
RFC: Middle Earth map for Un in ASCII; is it readable? Bandobras Variants 13 November 25, 2007 04:15


All times are GMT +1. The time now is 20:13.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.