Blackvoxel > Programming with Blackvoxel
Custom VoxelTypes
olive:
--- Quote from: d3x0r on November 12, 2014, 10:00:54 am ---...
also; in consideration of making a container for voxels, needed to know what a voxel was, so I reimplemented Sector->Data,OtherInfos,TempInfos as VoxelData *Data;
https://github.com/d3x0r/Blackvoxel/blob/group_voxel_datas/src/ZVoxelSector.h#L203
--- End quote ---
Hum, I wouldn’t recommend doing this.
The way the storage was made is for memory bandwidth efficiency.
Packing these data will lead to waste an important amount of memory bandwidth.
This is a weird side effect of how modern processors access memory with prefetch mechanisms and important bus width : processors can't fetch only a byte or short.
In practice, in Blackvoxel, the different informations about a voxel are rarely used at the same time. Most massive processing operations are mostly done on one unique field.
With packed informations, the processor will fetch also some adjacent data of other fields. These data won't be used.
So, for this case, it's better to keep each field in it's own array. ;)
The Blackvoxel Team
d3x0r:
--- Quote from: olive on November 18, 2014, 02:31:29 am ---
Hum, I wouldn’t recommend doing this.
The way the storage was made is for memory bandwidth efficiency.
Packing these data will lead to waste an important amount of memory bandwidth.
This is a weird side effect of how modern processors access memory with prefetch mechanisms and important bus width : processors can't fetch only a byte or short.
--- End quote ---
Sectors are 16384 indexes... voxel types are 2 bytes... so 32k or 8 4k pages.
TempInfos is another short for block temperature; this should probably be in an infos since in general blocks don't change behavior based on tempurature....
OtherInfos is 4/8 bytes to reference voxel instance data; VoxelExtension*...
all together 341 + 4 bytes to 512 per page..........
total is 32k + 32k + 64k(32 bit)/128k(64 bit) (I've been running 64 bit build, because I have more sectors of smaller voxels usually)
128k/192k .. 32 or 48 pages (4096 bytes) for a sector...
-------------------
I made a branch at that point... I wasn't sure how much of a performance hit it would be... I'm aware of cache... *see below*
I guess given the defined scope of the project
for the renderer which only needs voxel type, then it needs to only fill 8 pages instead of 48... given no custom render for custom voxels... (color shading)... although maybe renderer uses TempInfos to determine texture... so 16 pages; even if blocks are rarely shaded, so temp isn't checked always... could be less...
voxel reactor uses (tempurature? does temp get modified except by time and water? Does it dissapate? Each voxelType have a thermal coefficient thing? :) ) and OtherInfos... unless a highly static level is used with non extended voxels... with an active environment, OtherInfo will definatly be cached... so no 8+32.. 40 pages loaded without tempurature... so there's no real savings at this level...
Feature: at a voxel-is-center-of-universe view, if I'm given an offset to my data, to get near voxel's (extended)data... I dunno I guess I still need the sector reference...
so ya... a highly static map benefits from only needing 32k loaded for working with a sector (plus another page for the sector, and another for the voxel type manager) ... but then again in a 1M cache 32 sectors can potentially be loaded... so that's only a few more than the block of 9 around the player at any point... so it still has to scroll through the memory...
....
in a sequence of processing blocks, only 4-6 pages are used ... (the center, left/right) (above and below) (forward and backward) , especially on boundaries of sectors... I dunno; the working set is still the same.... this is tripled with the current scheme... because mirror otherinfo, tempinfo pages may be used... 12-18 working set pages...
it resulted in more efficient assembly because there was only one pointer and other offsets computed from that...in the other place I was looking at I do remember noting that...
It may be arguably better if they were allocated all together... and put all voxeltypes and all tempinfos and all otherinfos together instead of interlaced, but with a single pointer...
What is lost is gained in other ways.... and I don't notice a particular performance hit from this change...
--------------
Once upon a time I got to play with/use a logic analyzer that decoded the pins interfacing from 386 to motherboard... got to see read/write memory to fetch instruction blocks... so there's the clearing of cache on calls and jumps backward... jumps forward are better handled by setting a ignore flag and processing the prefetch cache anyway... kinda like how arm handles short jumps.... but anyway... it's just much wider now... and burst mode...
olive:
--- Quote from: d3x0r on November 18, 2014, 04:11:21 am ---
--- Quote from: olive on November 18, 2014, 02:31:29 am ---
Hum, I wouldn’t recommend doing this.
The way the storage was made is for memory bandwidth efficiency.
Packing these data will lead to waste an important amount of memory bandwidth.
This is a weird side effect of how modern processors access memory with prefetch mechanisms and important bus width : processors can't fetch only a byte or short.
--- End quote ---
Sectors are 16384 indexes... voxel types are 2 bytes... so 32k or 8 4k pages.
TempInfos is another short for block temperature; this should probably be in an infos since in general blocks don't change behavior based on tempurature....
OtherInfos is 4/8 bytes to reference voxel instance data; VoxelExtension*...
all together 341 + 4 bytes to 512 per page..........
total is 32k + 32k + 64k(32 bit)/128k(64 bit) (I've been running 64 bit build, because I have more sectors of smaller voxels usually)
128k/192k .. 32 or 48 pages (4096 bytes) for a sector...
-------------------
I made a branch at that point... I wasn't sure how much of a performance hit it would be... I'm aware of cache... *see below*
I guess given the defined scope of the project
for the renderer which only needs voxel type, then it needs to only fill 8 pages instead of 48... given no custom render for custom voxels... (color shading)... although maybe renderer uses TempInfos to determine texture... so 16 pages; even if blocks are rarely shaded, so temp isn't checked always... could be less...
voxel reactor uses (tempurature? does temp get modified except by time and water? Does it dissapate? Each voxelType have a thermal coefficient thing? :) ) and OtherInfos... unless a highly static level is used with non extended voxels... with an active environment, OtherInfo will definatly be cached... so no 8+32.. 40 pages loaded without tempurature... so there's no real savings at this level...
Feature: at a voxel-is-center-of-universe view, if I'm given an offset to my data, to get near voxel's (extended)data... I dunno I guess I still need the sector reference...
so ya... a highly static map benefits from only needing 32k loaded for working with a sector (plus another page for the sector, and another for the voxel type manager) ... but then again in a 1M cache 32 sectors can potentially be loaded... so that's only a few more than the block of 9 around the player at any point... so it still has to scroll through the memory...
....
in a sequence of processing blocks, only 4-6 pages are used ... (the center, left/right) (above and below) (forward and backward) , especially on boundaries of sectors... I dunno; the working set is still the same.... this is tripled with the current scheme... because mirror otherinfo, tempinfo pages may be used... 12-18 working set pages...
it resulted in more efficient assembly because there was only one pointer and other offsets computed from that...in the other place I was looking at I do remember noting that...
It may be arguably better if they were allocated all together... and put all voxeltypes and all tempinfos and all otherinfos together instead of interlaced, but with a single pointer...
What is lost is gained in other ways.... and I don't notice a particular performance hit from this change...
--------------
--- End quote ---
The problem appears more clearly when looking at the data needed by each part of Blackvoxel involved in massive processing :
- The renderer doesn't use anything else than VoxelType and FaceCulling info. It will be reduced in the future.
- The MVI mostly use VoxelType data. That may sound surprising, but in fact, most scanned voxels aren't active. Even on active voxels, only few are using Extensions. (Even if it may change depending on voxel nature).
- On the sector loader, all data are needed. But each are processed independently. So here again, individual fields would be betters.
- Some temperature processing is planned to be implemented in the future. Thats why the TempInfo table is here. It will probably use Tempinfo and VoxelType. The cycle would be slower than MVI. Not sure when we will do it.
- Some pressure informations could be added in the future when most processors will have at least 8 true cores and according memory bandwidth.
The idea of accessing different data with an unique pointer could be interesting. And we can certainly implement it cleanly with limited changes. But I could not tell you what would be the order of magnitude of the gains(because of the caches). And not yet thinked how we could keep Valgrind detecting MVI memory error.
--- Quote ---and I don't notice a particular performance hit from this change...
--- End quote ---
This does not surprise me at all. :)
Some Core i7 have more than 30Gb/s of memory bandwidth.
The core 2 quad is much around 10Gb/s.
Core i3 and recent AMD are in between.
And and old intel atom (Single channel DDR2) in a netbook is around 4Gb/s. (Yes, Blackvoxel can run on it with some setting adjustments).
(Note these numbers can vary slightly depending on sources and processor generations).
So, the idea is to keep memory bandwidth in order to run on modest machines, but also for keeping room for future stuff : I'm affraid moore law is now dead.
What is weird with Memory bandwidth is it's shared nature. All cores share the memory controller. So using too much memory bandwidth on one core can reduce efficiency on other cores.
We often realized during Blackvoxel tuning that some changes could react differently depending on the hardware.
Not easy to minimize these side effects. That' why we ran a lot of test and had to correct a lot of things.
--- Quote ---Once upon a time I got to play with/use a logic analyzer that decoded the pins interfacing from 386 to motherboard... got to see read/write memory to fetch instruction blocks... so there's the clearing of cache on calls and jumps backward... jumps forward are better handled by setting a ignore flag and processing the prefetch cache anyway... kinda like how arm handles short jumps.... but anyway... it's just much wider now... and burst mode...
--- End quote ---
That's recall me the good old time when we worked with 68000, wire wrapping boards, TTL's circuit logic, Eproms, PIA, ACIA and all these amazing stuff. We did our own computer.
I think you are a good programmer as you have learned how the computer's hardware is working under the hood. That's why you can understand and do optimisations. :)
The Blackvoxel Team
Navigation
[0] Message Index
[*] Previous page
Go to full version