Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why publish this if it works? One article, a fleeting bit of fame, and a truck load of copycats. It would be more impressive if the article had read: "Researchers, upon discovering tweets predict the stock market, make $100mm before disclosing research to the public." No more need for University grants anymore, much more believable findings. As an aside, data analysis can be tricky. I'm pretty wary of loosely defined research objectives. For example, why is it 3 days? Why is it those 72 words? Over-fitting is a real problem with prediction based stuff.


I've always been fascinated by the romantic idea of writing my own trading engine.

So I did some research, and most people who have written them will tell you that in cases like this, training on past data doesn't correlate well with current & future data.

The stock market of 2011 is not the market of 2008.

But what do I know, not like I've actually done it :)


A good place to start: http://www.collective2.com/

You can rent your trading strategies to others, or rent someone else's strategy.

Also some good info/tools regarding automation.


I haven't looked closely at this site, but it seems like it is almost certain to devolve into a textbook example of adverse selection.

If you have a good strategy, you won't rent it out, you'll trade it. Why risk others frontrunning you? Of course, you might post a historically good strategy and front run it. Or they might just be risky strategies, which look good for a short time (encouraging people to rent them), but which carry catastrophic risks the creator doesn't want to take on.

I can't see a single reason why someone would post a good strategy here.


(Preface: I pretty much know nothing about stock markets and trading.)

Isn't there a whole industry that revolves around paying people to provide you with their trading strategies? How is that considerably different than what's happening on Collective2?

Having a good strategy doesn't mean you have the money to actually trade on it. Or that you can slowly build up your trading bankroll using the same strategy.

Then there's strategies that only yield modest returns. Why not make some money on top of that by renting it out? If you let a dozen people use your strategy, does acting on that information give you much of an advantage? I would guess that depends on how much money those people are trading on your strategy.

I would also guess there aren't many big players renting strategies on Collective2. It's an interesting concept and I think the fact that they've been active since 2003, somewhat validates the idea.

The best thing about it seems to be the ease of using an automated trading agent. I don't know how easy it is to do that elsewhere, but one reason to put your strategy on Collective2 (I'm guessing you can keep it private) would be to use their automation facilities.


There are several whole industries revolving around paying people for help with trading strategies. Most have very specialized economics.

A hedge fund requires capital to operate, and the owners can't necessarily cover fixed costs (salaries, etc) with their own personal capital. I'd be surprised if many of the strategies on collective2 fit this model

Investment advisers often fine tune a strategy to match your personal risks - i.e., help Southwest Airlines hedge their exposure to gas prices, or Apple to hedge their exposure to the RMB. Since Southwest is already short oil due to being an airline, the trading strategy of going long evens them out. It wouldn't make sense for me to trade this strategy, since I don't have an intrinsic short position in oil (plus the alpha in Southwest's strategy comes from selling flights, not oil).

If you let a dozen people use your strategy, does acting on that information give you much of an advantage?

Buy $10k of some low volume stock. Have a few other people pile on and buy the same stock (after you). The price will go up a few cents. Then you sell, probably to the same people buying from you. This is called frontrunning. If you didn't frontrun, you bear the risk that one of your renters would buy the shares before you do, thereby driving up the price before you purchase it. Less of an issue with GOOG, admittedly.


While I agree with your points, I think it is possible there are trading strategies that are defined by needing a certain amount of upfront capital (say $1 million). So, you might know it is a winning strategy after a couple of renters use it successfully, but need to rent it out until you have enough money to use it yourself.


That's the premise of a hedge fund. A strategy generating 9% alpha, but which requires $1 million in fixed costs to manage (office space, salary, market data), might require $30-100 million in operating capital.

It's hard to see how those economics would apply to collective2.com.



Correction--it worked. The authors chose a two-year old sample during which the Dow Jones fell 30.7%. I seriously doubt this will have any predictive power outside of that sample.


Fitting a model to "predict" events that have already happened, isn't anywhere as hard as actually predicting events that have yet to happen.


I'm not so sure about that. Its pretty hard to predict the past too.

Anyone doing serious research into this, will first partition the past data into training and test sets. (And sometimes other validation sets).

So the idea would be to fit a model on one set of past data (the 'training' set), check it works, and then, in the final evaluation, run it on the never seen before, never used, never thought about, 'test' data.

If you have a model trained on 2009, and it also does a great job the first time you run it on the Q1 2010 data that you've never looked at before, I'm now interested, even though every data point is in the past.

I imagine they had to do something like this to pass review.


If you have a model trained on 2009, and it also does a great job the first time you run it on the Q1 2010 data that you've never looked at before, I'm now interested, even though every data point is in the past.

So, you have a model trained on 2009. You try it on the Q1 2010 data and...it doesn't work. Damn. So you throw it out, go back to the drawing board, and try again. And again. And ag...hey, this one works! Trained on 2009 data, and it predicts Q1 2010 perfectly!

Do you trust this model to predict Q2 2010?


Obviously if there has been a 'meta' process of refinement, such that the test set has been used in model development, then its not a clean test set any more, and shouldn't be regarded as such. That's something for any researchers to watch out for, and be sure they don't do. And good researchers are well aware of these pitfalls.

That's why I mentioned the validation sets, and that the test set must never have been looked at, or used before.

But the point stands - if the method works on a clean test set, even if the test set is in the past, then it should be taken seriously.

Would I trust such a model to predict the stock market in Q2 2010? No, because my prior belief is that the stock market is very hard to predict, so I would need very strong evidence to the contrary. But that has nothing to do with having confidence in models that have been tested on historical data.


Right, and they do this for a period from February 2008 to December 2008. We do the same here at the Federal Reserve when developing models.

Sure, the data is historical, but your model doesn't distinguish between "old" and "new" data. If your model predicts test data (in-sample forecasting, right?) well then you have something interesting.


Right, that was my point. It worked for a specially-selected sample during which a bunch of correlated macro factors existed. I commented a bit lower with more details.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: