I have this thesis that goes a bit like this: phase 1 of the modern world wide web was about linking documents (hyperlinks), phase 2 was about enabling users to create content for the web (user generated content/social media), phase 3 was about linking the creators of that content (people – social nets), and phase 4 is about data (making sense of sum of phases 1-3).
So now that we have these tools to create and consume and link together tons and tons of content, how do we filter it all? That is where I think we need to focus on a higher level, and see what the problems are, and how to make simplified services to solve them.
We have petabytes of data floating around the web, many companies have petabytes internally of data that’s been collected throughout I-III but have no efficient means of digesting this and consuming it. We also have more and more efficient computing power idling away in servers all across the globe. So now what we need is an INTUITIVE way for people to consume large batches of this data and have it spit into usable results.
What I’m suggesting is that whomever builds a yahoo pipes (but easier to use) interface for hadoop level data processing could make a killing. Imagine customers load up via their terminals (web browsers) terabytes of data, design the process and algorithms, and the system distributes the computations wherever capacity is available. This is how you solve complex problems. This is how genomes get cracked in hours, and for much lower cost.
Why did I suggest hadoop as the possible basis for this? Its open source, its free, and its able to run across a myriad of machines already out there. No need to reinvent the wheel.
So build an interface, charge by the computing cycle, and profit.
You might take a look at Cascading (http://www.cascading.org/); it sounds very similar to what you propose.
this does look interesting but it isn't a visual interface, this seems like a programatic interface (API), but a gui on top of this would be exactly what I suggested.
You mean a little like spreadsheet analytics with Apache Hadoop?
http://datameer.com/
Presentation seen there: http://berlinbuzzwords.wikidot.com/local–files…
take a look at http://www.wappwolf.com
a b2b workflow engine connecting apps to a workflow combined with web2.0 social elements like voting, ranking, tagging, commenting etc.
global launch in Sept10 @demo.com
Harald, how is this related to Hadoop?
yes, but with an easier to use interface. that site isn't loading for me (datameer.com)
very interesting slide deck btw.
we are using another Open Source Workflow engine for getting http://www.wappwolf.com up & running! it is really good lookin…Open Source Workflow Engines in Java – Apache ODE