I apologise for the messiness of the project... Havent had time to clean it up yet or anything!
For linux and OS X specifics, all you would need to change is how caches are allocated (in WIn32 its though the VirtualAlloc function)... see the allocNewCacheByC8PC function in Chip8Engine_CacheHandler.cpp. All the VirtualAlloc function does is give you a block of memory that has executable privileges set, since the new operator or malloc doesn't. So if you can find a replacement for that, you're good to go. I believe SDL runs under linux and OS X also, so thats good. In my setup however I have used the static link files (.lib) with the header directory included and the SDL2.dll etc files... Not sure how that works under Linux/Os X.
With the blocks of code... I'm not entirely sure how other emulators handle recompiling code but heres how I have done it: (you are correct that blocks always end in a jump/branch... this is actually a very important concept! well done for getting this!!!

)
Initially, as soon as you run the program, a single cache is setup with no code in it - this produces a OUT_OF_CODE interrupt as soon as it tries to run it.. which invokes the translatorLoop() function in Chip8Engine.cpp. This function is basically responsible for turning C8 opcodes into x86 opcodes, and it does so 16 opcodes at a time (ignoring the other condition it checks for now), then transfers control back to the emulation loop. So in a way, yes it is kind of a lazy way of doing it. I chose 16 so it was easier to debug, but it could easily be 128 or something. However it would also be quite easy to change this so that it would compile up until a jump is encountered. (ie: have a variable that is set to 1 when a jump opcode is translated, then exit the translator loop by checking this variable).
In the end, it doesnt really matter. After those 16 opcodes have been recompiled, they will never be touched again. When the caches run the next time and it gets to the end of the 16 opcodes, it will again invoke a OUT_OF_CODE interrupt, which takes us back into the translator loop. This time it processes the next 16 opcodes.
So eventually, it will recompile all of the code, but it just does it 16 opcodes at a time. The caches are seperated by jumps however, and this is marked by the stop_write_flag struct member of CACHE_REGION, which is set to 1 every time a jump is reached (see 0x1NNN opcode in Chip8Engine_Dynarec.cpp for example. It contains: cache->setStopWriteFlagCurrent()

. If this flag is set, then the translator loop will most likely allocate a new cache.. the cache selection logic is handled by (within translatorLoop() ):
// Select the right cache to use & switch
int32_t cache_index = cache->getCacheWritableByC8PC(C8_STATE::cpu.pc);
cache->switchCacheByIndex(cache_index);
If you are worried about the overhead speed penalty or anything else like that then I can tell you it is not significant for the C8.. I wrote this dynarec emulator more as a proof of concept that it works than being focused on speed. But that is a good question to ask other developers... how much code they recompile at one time.
Phew, sorry if it doesnt make sense haha.. its 1am here and I need some sleep.
I also tried to get your emulator working, and I got it to almost build but SDL2_mixer is not playing nicely with VS2015 for some reason.. (its not a problem from your code though).
Blitz on my emulators is just a block moving right so.... I didnt even know it was a game

! Ill take a look when I get it working.