python - Calculating Midprice in pandas -


i'm working intraday time , quote data in pandas, , struggling find way calulate weighted mid-price. have data represented 4 dataframes (bid_price, bid_quantity, ask_price, ask_quantity), columns of each dataframe being individual instruments, , index being timestamps. (so single bid price referenced as:

bid_price['aapl'][datetime(2013,1,1,9,30,0,0)] 

the midpoint formula wish apply dependent on bid/ask spread of instrument. if current spread wider minimum tick increment, midpoint simple average of bid , ask prices @ moment. if spread equal minimum, midpoint weighted based on bid , ask quantity.

here current code:

def get_midprice(bid_price, bid_quantity, ask_price, ask_quantity, tick_increment=0.01):     if (ask_price - bid_price) > tick_increment:         return (ask_price + bid_price) / 2     else:         return ((bid_price * ask_quantity) + (ask_price * bid_quantity)) / (bid_quantity + ask_quantity) 

this works on single datapoint, , on previous version of pandas, worked when passed 4 dataframes. now, 4 dataframes raises exception:

raise valueerror("cannot call bool() on dataframe.") valueerror: cannot call bool() on dataframe. 

which believe due change: https://github.com/pydata/pandas/pull/1073

the problem solved looping, on large dataset, slow. there better way?

as tried convey in comments, can't vectorize if branch way you're trying, , while code wouldn't have raised exception in past, wasn't doing want to. that's why arrays (and dataframes) error out instead when bool() called, avoid bug.

one way around an apply-elementwise function built new dataframe applying function on (effectively) zipped corresponding elements. there may one, although haven't used it. (i'd support adding one. it's handy sometimes, , in our homegrown n-dimensional c# library have apply-to-matched-elements function.)

usually when need pre-pandas, computed both branches , combined result (taking advantage of fact false ~ 0 , true ~ 1):

>>> = np.arange(10) >>> > 3 array([false, false, false, false,  true,  true,  true,  true,  true,  true], dtype=bool) >>> w = > 3 >>> (a**2) * w + (1000) * (1-w) array([1000, 1000, 1000, 1000,   16,   25,   36,   49,   64,   81]) 

but in both numpy , pandas can use where, 1 version of code be:

def get_midprice(bp, bq, ap, aq, ti):      above = (ap + bp)/2     not_above = ((bp*aq) + (ap*bq))/(bq+aq)     use_above = (ap - bp) > ti      combined = not_above.where(use_above, above)      return combined 

the downside of approach have compute both branches, , uses bit more memory. in practice seldom causes me problems, ymmv. note 1 minor advantage of using multiplication (even though it's little slower) instead of where it'll work when being passed scalars too.

finally, consider changing format keep information together, possibly using hierarchical multi-index, don't have experience there.


Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -