Quantcast
Channel: The Flow Programming Language
Viewing all articles
Browse latest Browse all 8

Flow and KNIME

$
0
0
Long-term I want to build an IDE for Flow that lets you build data pipelines graphically, and runs the code incrementally in realtime as you edit, re-running nodes only downstream of any changes you make. I also want to make Flow support user-definable views for different data types, so that you can visualize data moving through the pipeline in different ways. I had a big plan for how this was all going to fit together, but it turns out somebody has already beaten me to it in the form of KNIME (at least for the "big picture" of how the IDE would work) -- and they've done a beautiful job of it. (Note that Flow still has a much broader goal of solving the implicit parallelization problem in the general case, but KNIME at least implements a very flow-like IDE for handling and visualizing big data pipelines.)

http://www.knime.org/
http://www.knime.org/features
http://www.knime.org/screenshots

KNIME lets you build a data analysis pipeline, complete with data normalization and filtering, inference/classification and visualization steps. It caches data at each node in the workflow (so changes to the pipeline only result in the minimum necessary recalculation), and keeps track of which experimental variables produced which results. It intelligently makes use of multiple cores on your machine wherever possible. It incorporates the entire Weka machine learning framework. It lets you add your own visualizers for different data types. It cross-links the highlighting of data points between different tables and views, so that if you select a data point in one view, it selects it in all other views. It reads and writes a large number of different data formats and can read from / write to a live database. You can call out to R at any point if you have existing R code you need to run on a piece of data.

i.e. KNIME basically does everything that anybody who works with data does every day, and keeps everything tied together in a nice workflow framework with built-in data visualization, smart caching, smart parallel job launching etc. etc.

Viewing all articles
Browse latest Browse all 8

Trending Articles