Saturday, April 20, 2013

Kaggle answers the question of "can and will market participants use transparency"

One of economists' and Wall Street lobbyists' favorite objections to requiring the banks to provide ultra transparency is that market participants won't know what to do with this much data and it will confuse them.

After all, your humble blogger is talking about having each bank disclose on an ongoing basis its current global asset, liability and off-balance sheet exposure details.

Since these details are tracked by information systems, what we are talking about with this disclosure is what I call the "Mother of All Financial Databases".

Is this database large?  Yes, but not so large that we are talking about orders of magnitude bigger than Walmart's database.

Will market participants use this database?  Of course, there is money to be made.

Can market participants use this database?

Rather than rely on your humble blogger's "yes" response, I will let Kaggle answer "yes" for me.

For those readers who are not familiar with Kaggle, an article in The Atlantic describes it as
an on-line platform for data-mining and predictive-modeling competitions. A company arranges with Kaggle to post a dump of data with a proposed problem, and the site's community of computer scientists and mathematicians -- known these days as data scientists -- take on the task, posting proposed solutions. 
Importantly, competitors don't just get one crack at the problem; they can revise their submissions until a deadline, driving themselves and the community towards better solutions. 
"The level of accuracy increases, and they all tend to converge on the same solution," explains Anthony Goldbloom, Kaggle's co-founder and CEO.
Is there any reason to believe that market participants will use a Kaggle to analyze the Mother of All Financial Databases?
Companies as varied as MasterCard, Pfizer, Allstate, and Facebook (not to mention NASA) have all created competitions. 
GE sponsored a contest to give airline pilots tools to make more efficient flight plans en route. 
Health technology company Practice Fusion funded another challenge to identify patients with Type 2 diabetes based on de-identified medical records. 
Prizes for the winning solution have ranged from $3,000 to $250,000. A $3 million prize, offered by the Heritage Provider Network for the best prediction of which patients will be admitted to a hospital within the next year, based on historical claims data, closed last week, and the winner will be announced in June at the Health Datapalooza.
Do you think a hedge fund would be willing to pay say $10 million to the winner of a Kaggle competition if the hedge fund makes $1 billion?

But how do we know that Kaggle is up to the challenge of assessing the Mother of All Financial Databases?
The key to Kaggle is the community: 85,000 data scientists (who knew there were that many data scientists in the world!) have entered competitions, and each is ranked according to their skill and results in competitions. 
Xavier Conort, a French actuary living in Singapore, holds the Number One spot (he's won 6 prizes and come in the top 10 percent a dozen times). 
As I'm writing this, Joshua Moskowitz, an American who joined 9 minutes ago, is at the other end of the pecking order. Just wait till Joshua starts competing, though; he could be a challenge Xavier in a matter of months. 
That everyone-has-a-chance ethos means that any competitor, no matter how isolated they may be, can judge their talents against the top rank of their field. 
What's more, in the company's forums competitors can swap techniques and hone their skills. Goldbloom says that a good programmer can work their way up the ladder fairly quickly, by scoring well in two or three competitions.
In short, there are no barriers to entry by a data scientist to see if they can win a Kaggle competition and collect the prize money from the hedge fund.

No comments: