multithreading - Does Intel SFENCE have release semantics? -
it seems accepted definition of acquire , release semantics this: (quoted http://msdn.microsoft.com/en-us/library/windows/hardware/ff540496(v=vs.85).aspx)
an operation has acquire semantics if other processors see effect before subsequent operation's effect. operation has release semantics if other processors see every preceding operation's effect before effect of operation itself.
i have briefly read existence of half memory barriers , supposedly come in flavor of acquire barriers , release barriers following same semantics described above.
looking real example of hardware instructions came across sfence. , blog (http://peeterjoot.wordpress.com/2009/12/04/intel-memory-ordering-fence-instructions-and-atomic-operations/) says form of release fence/barrier:
intel provides bidirectional fence instruction mfence, acquire fence lfence, , release fence sfence.
however reading definition of sfence, doesn't seem provide release semantics in doesn't synchronize loads @ all? whereas release semantics understand defines ordering respect memory operations (loads & stores).
lfence not have acquire semantics; sfence not have release semantics. there's reason that: having stand-alone fence instruction acquire semantics, or release semantics, turns out useless. acquire/release good, must tied memory operation.
for example, consider common idiom sending data between 2 threads:
- processor writes buffer.
- processor writes "true" flag.
- processor b waits until flag true.
- processor b reads buffer.
note processor must ensure write flag seen after writes buffer. suppose had "rfence" instruction release fence. if put instruction after step (1), no good, because write in step 2 allowed appear migrate on rfence , on step 1.
a similar argument shows "afence" instruction acquire equally useless ensuring read of flag in step 3 not appear migrate downwards across step 4.
itanium solved problem elegantly providing write-with-release , load-with-acquire instructions tie fence memory operation.
back ia-32 , intel64: if program not use "non-temporal" instructions, remaining instructions behave if every load "acquire" , every store "release". see section 8.2.3 (and subsections) of intel® 64 , ia-32 architectures developer's manual: vol. 3a. if there "non-temporal" stores involved, have several ways enforce fence:
- use sfence
- use mfence - overkill
- use lock-prefixed instruction (such "lock inc") write flag. lock-prefixed instructions implicitly have mfences.
- use xchg, acts if has implicit lock prefix, write flag.
for example, if in earlier idiom, buffer written using non-temporary stores, have processor issue sfence or mfence between steps 1 , 2. or use xchg write flag.
all of above remarks apply hardware. when using high-level language, sure compiler not damage critical ordering of events. c++11 atomic operations library exists can tell compiler and hardware intend.
Comments
Post a Comment