Why is Matlab signrank function returns the same signed rank statistic values when flipping the signs of the data points? -
why matlab signrank function returns same signed rank statistic values when flipping signs of data points?
i have sequence of data points stored in vector x. use signrank(x) sign rank test.
matlab says
when use test 1 sample, w sum of ranks of positive differences between observations , hypothesized median value m0 (which 0 when use signrank(x) , m when use signrank(x,m)).
so think result signrank(x) , signrank(-x) should different. have experienced examples, , same sign rank statistic value x , -x. how signed rank statistic defined in matlab signrank function?
thanks!
thanks! statistic minimum between sum of ranks of positive differences , sum of ranks of negative differences. don't understand why takes minimum. you?
interesting question, , link matlab code. yes had me scratching head few minutes too, curly manner, presumably computational efficiency. surprisingly signed rank, posted previously.
here's how works (i've pasted relevant few lines of code below reference).
let me denote p sum of positive ranks (ranks corresponding positive scores), n sum of negative ranks, , a absolute sum of ranks. a = p + n (btw. note i've denoted "n" variable "w" in actual code.)
by arithmetic series, a = n*(n+1)/2. said, line min(w,(n+1)*n/2-w) returning either n or p (=a-n), whichever minimum.
but @ last line of code pasted below. numerator therefore min(n,p) - a/2.
now if n minimum returns n-(p+n)/2, equals -(p - n)/2.
however if p minimum returns p-(p+n)/2, equals -(n - p)/2.
so in either case is returning (negative of) absolute difference of positive , negative rank sums, precisely posted in simplified form of,
| sum{ sign(xi) rank(|xi|) } | btw. reason why use negative of absolute difference there saves them having find complementary cfd later.
snippet signrank code reference.
w = sum(tierank(neg)); w = min(w, n*(n+1)/2-w); ... z = (w-n*(n+1)/4) / sqrt((n*(n+1)*(2*n+1) - tieadj)/24); edit:
why take absolute value? z have asymptotic normality, isn't there should no absolute value taken?
my understanding of that's it's not normal, it's "folded normal". is, folded positive half plane. that's why p-value calculated as,
p = 2*(1 - normcdf(z,0,1)); (aside). know in actual code use negative of "z" avoid requiring cdf-complement there, it's same thing.
the p value multiplied 2 account folded distribution. conveniently, works out same calling "two tailed" p value.
think moment happen if didn't use absolute value here. took p-n , n greater p. in case p value, 2*(1-normcdf(z,0,1)), evaluate greater one, can't idea. :)
Comments
Post a Comment