cuda - The behavior of __CUDA_ARCH__ macro -
in host code, seems __cuda_arch__
macro wont generate different code path, instead, generate code exact code path current device.
however, if __cuda_arch__
within device code, generate different code path different devices specified in compiliation options (/arch).
can confirm correct?
__cuda_arch__
when used in device code carry number defined reflects code architecture being compiled.
it not intended used in host code. nvcc manual:
this macro can used in implementation of gpu functions determining virtual architecture being compiled. host code (the non-gpu code) must not depend on it.
usage of __cuda_arch__
in host code therefore undefined (at least cuda).
Comments
Post a Comment