Ridiculous MySQL Behavior

I’ve been messing with this for a few hours until I finally found this nugget on MySQL:



A view definition is “frozen” by certain statements:

If a statement prepared by PREPARE refers to a view, the view definition seen each time the statement is executed later will be the definition of the view at the time it was prepared. This is true even if the view definition is changed after the statement is prepared and before it is executed. Example:

CREATE VIEW v AS SELECT RAND();
PREPARE s FROM ‘SELECT * FROM v’;
ALTER VIEW v AS SELECT NOW();
EXECUTE s;
The result returned by the EXECUTE statement is a random number, not the current date and time.


Why on earth would anyone do that? Allow me to elaborate…
Read More

Unhappy Bits about Bitcoins

I wandered across bitcoin not too long ago, during some random web crawling, and downloaded it in May. I installed it, ran it, realized I was behind a firewall, killed it, uninstalled it and forgot about it for a couple of weeks until this Wired article came out and sent the whole world a’twitter about bitcoin again.

The Wired article, in short, talks about an underground website that sells illicit drugs and whose sole allowable currency is the Bitcoin. The website itself is shrouded in anonymity in the TOR network which itself is an excellent little piece of technology which I’m planning on running out of space to describe here just now, but you should look into it.

The Bitcoin spiked in popularity. You can buy and sell Bitcoins in open marketplaces such as Mt Gox (whatever that means) or Lillion Transfer if you’re using some more international currencies, or you can use them directly on sites that take them, such as this Alpaca sock store. Prices quickly went from a few dollars to around $30, although they’ve now backed off a bit to around $20/BTC (Bitcoin).

Ok, so where are we? We can buy cocaine and alpaca socks with Bitcoins. Great. But what ARE they, again? How can you get some, and should you care?
Read More

Mosaics and More Algorithm Love

My mom (whose website I should update) recently celebrated her birthday. My mom is an avid shutterbug, and abuses the digital camera we got for her a handful of Christmases ago to the tune of 10,000 photos a year, give or take. Our basement is piled with boxes of images just begging to be scanned, cataloged, sorted, and all that from back when mom was a film user (you all remember film, right?). I daydream of software that will actually be useful to that endeavor — to say nothing of how fun it would be to digitize the piles of super-8mm movie film that goes along with it — but so far we haven’t made much of a dent.

In the mean time, though, I have a hard drive which contains about 150,000 photos from my mom’s library… basically a full off-site backup in case horror happens to her computer. (ObNote: This is good practice, boys-and-girls, you should all go about giving hard drives away, with complete backups of your stuff in case of a disaster… you’ll thank me when the revolution comes). Anyway, I was looking around for a way to leverage this wealth of digital media for a birthday present and decided to go with a photomosaic.
Read More

Hey, Happy Technologist got Published!

I’m rather excited about the fact that, after only six actual posts on the new blog, one got picked up for publication. The slightly altered version is available at Technology First, and you can get the full newspaper in .pdf too.

Let’s hope this will be motivation to write more!

Musings from a few weeks of data mining

Ok, I’m still no expert data miner by a large margin, but I’ve learned a LOT in just a few weeks of playing with the Heritage Health Prize data. The folks on the Kaggle/HHP Chat Board are pretty helpful, and the internet is full of useful information. I’ve taken to using Excel and MYSQL far more than any mining-specific tools. I have been interested in R and RapidMiner, and I’ve been able to set up a few basic models with those tools. One thing I’ve been very happy with is the wealth of online tutorials available for just about everything. My resident 16 year old has been using them for a while to pick up piano and guitar songs, but I haven’t had much use until now; I’m pleased to report that the quality of these online free video or web tutorials is pretty high. I have a list started as a del.icio.us tag set if you want to see what I’ve been watching.

I’ve made 9 submissions (the first two or three of which I don’t count — let’s call those ‘test’ submissions). The 9th actually had a worse score than the 8th. Now that interests me. On my tests, which include several different sampling and “cross validation” methods on the two years of available data, my score on each submission improved from the last… not much in this last case, but enough for me to feel reasonable in submitting the algorithm. Why, then, did my result against the real data using the same algorithm go backwards? One possibility is that I’ve been overfitting the data. Basically, my algorithm makes assumptions that are either unnecessary or are only applicable to the sample data, and don’t hold true for the final data. At the tolerances we’re dealing with, it’s still possible that this is just a random selection bias issue, but it’s still interesting, and a common and very important problem in statistical data mining: how can you know when you’ve overfit? When do you know that you’re “trying too hard” as it is. 🙂
Read More