multithreading - Does Intel SFENCE have release semantics? -


it seems accepted definition of acquire , release semantics this: (quoted http://msdn.microsoft.com/en-us/library/windows/hardware/ff540496(v=vs.85).aspx)

an operation has acquire semantics if other processors see effect before subsequent operation's effect. operation has release semantics if other processors see every preceding operation's effect before effect of operation itself.

i have briefly read existence of half memory barriers , supposedly come in flavor of acquire barriers , release barriers following same semantics described above.

looking real example of hardware instructions came across sfence. , blog (http://peeterjoot.wordpress.com/2009/12/04/intel-memory-ordering-fence-instructions-and-atomic-operations/) says form of release fence/barrier:

intel provides bidirectional fence instruction mfence, acquire fence lfence, , release fence sfence.

however reading definition of sfence, doesn't seem provide release semantics in doesn't synchronize loads @ all? whereas release semantics understand defines ordering respect memory operations (loads & stores).

lfence not have acquire semantics; sfence not have release semantics. there's reason that: having stand-alone fence instruction acquire semantics, or release semantics, turns out useless. acquire/release good, must tied memory operation.

for example, consider common idiom sending data between 2 threads:

  1. processor writes buffer.
  2. processor writes "true" flag.
  3. processor b waits until flag true.
  4. processor b reads buffer.

note processor must ensure write flag seen after writes buffer. suppose had "rfence" instruction release fence. if put instruction after step (1), no good, because write in step 2 allowed appear migrate on rfence , on step 1.

a similar argument shows "afence" instruction acquire equally useless ensuring read of flag in step 3 not appear migrate downwards across step 4.

itanium solved problem elegantly providing write-with-release , load-with-acquire instructions tie fence memory operation.

back ia-32 , intel64: if program not use "non-temporal" instructions, remaining instructions behave if every load "acquire" , every store "release". see section 8.2.3 (and subsections) of intel® 64 , ia-32 architectures developer's manual: vol. 3a. if there "non-temporal" stores involved, have several ways enforce fence:

  • use sfence
  • use mfence - overkill
  • use lock-prefixed instruction (such "lock inc") write flag. lock-prefixed instructions implicitly have mfences.
  • use xchg, acts if has implicit lock prefix, write flag.

for example, if in earlier idiom, buffer written using non-temporary stores, have processor issue sfence or mfence between steps 1 , 2. or use xchg write flag.

all of above remarks apply hardware. when using high-level language, sure compiler not damage critical ordering of events. c++11 atomic operations library exists can tell compiler and hardware intend.


Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -