I have this thesis that goes a bit like this: phase 1 of the modern world wide web was about linking documents (hyperlinks), phase 2 was about enabling users to create content for the web (user generated content/social media), phase 3 was about linking the creators of that content (people – social nets), and phase 4 is about data (making sense of sum of phases 1-3).

So now that we have these tools to create and consume and link together tons and tons of content, how do we filter it all? That is where I think we need to focus on a higher level, and see what the problems are, and how to make simplified services to solve them.

We have petabytes of data floating around the web, many companies have petabytes internally of data that’s been collected throughout I-III but have no efficient means of digesting this and consuming it. We also have more and more efficient computing power idling away in servers all across the globe. So now what we need is an INTUITIVE way for people to consume large batches of this data and have it spit into usable results.

What I’m suggesting is that whomever builds a yahoo pipes (but easier to use) interface for hadoop level data processing could make a killing. Imagine customers load up via their terminals (web browsers) terabytes of data, design the process and algorithms, and the system distributes the computations wherever capacity is available. This is how you solve complex problems. This is how genomes get cracked in hours, and for much lower cost.

Why did I suggest hadoop as the possible basis for this? Its open source, its free, and its able to run across a myriad of machines already out there. No need to reinvent the wheel.

So build an interface, charge by the computing cycle, and profit.