Quantcast
Channel: The Flow Programming Language
Viewing all articles
Browse latest Browse all 8

Flow syntax and semantics

$
0
0
I'm cross-posting the following from the "flowlang" Google Group, with minor modification.

--

It might be helpful to think about the creation of a syntax for Flow that maps onto the semantics described in the Flow Manifesto as follows:
  • Take a very simple purely functional programming language syntax -- maybe a subset of Haskell, or a functional subset of Python.
  • Add one operator, a scatter/push operator, say "->", that gives the language more of an imperative feel.
  • [You can also add timestamps, e.g. x(t) = x(t-1) * 2. These also give the language more of an imperative feel, but I won't talk about those here for brevity. They're just basically a convenient way of aliasing immutable variables that has semantic value to both the user and the compiler.]
Then assignment, "=", is a pull / gather operation: y = f(x) pulls a value from x and applies f then pulls the computed value into y as expected.

The way that "->" operates is that it pushes / scatters values to locations, which is typical of the imperative style of programming. For example, take the following Java code:

int[] ys = new int[] {1, 5, 2, 9, 1, 1, 4, 2};
int maxVal = 0;
for (int i = 0; i < ys.length; i++)
    maxVal = Math.max(maxVal, ys[i]);
int[] hist = new int[maxVal + 1];
for (int y : ys)
    hist[y]++;


You would do something like this in Flow (making up the syntax):

ys = {1, 5, 2, 9, 1, 1, 4, 2}
for y in ys
    1 -> ones[y]      // each entry ones[y] is a set of integers (with value 1)
hist[i] = sum ones[i] // implicit iteration: "for all i", i.e. hist = map sum ones


This histogram example is one I keep coming back to in my own mind because it's a common and minimal testcase for the "push" model of programming. Basically you're scattering counts into a histogram at indices corresponding to list values, which is not in general threadsafe and is therefore not parallelizable in imperative languages without extra work.

In the Java case I just incremented the histogram values directly; in the Flow case I output a stream of 1s and they were then collected by running a "map" operator over the collection ones[i].  The key thing to realize here is that ones[i] is constrained (by being the target of the "->" operator) to be an unordered collection, because otherwise you would get a race condition if you were pushing different values that had to stay in order relative to the input. It doesn't matter here whether it is ordered or not, of course, because everything that is getting pushed is a 1 -- but in more complicated cases it can matter. (If you force the target of a "->" operator to be ordered, then the compiler can still parallelize, but it will have to do some extra work to keep things in order.)

For smaller lists, i.e. ys.length < L, it can just run this as a single-threaded program.  For larger lists, the compiler is free to parallelize this in a lot of different ways, for example:
  1. It put a lock on each bin, ones[i], to prevent race conditions.
  2. By realizing that the "sum" operator is just folded addition, and by realizing that addition is associative and commutative, the compiler can build one copy of the "ones" array-of-sets in each thread's TLS (thread local storage), and then combine these separate copies at the end.
(The second version has higher memory requirements but is lock-free until the end. It incurs a bit more time at the combine stage. This can all be calculated as a Big-Oh complexity function of the input size, input value distribution, etc., and the compiler can switch between implementations as needed.)

There are other optimizations that can be performed, e.g. the compiler can perform something similar to "hoisting" by realizing that you're just pumping the ones through a set collection and then into a sum function, and that can be converted to an increment operation.

UPDATE:

The "->" operator should be able to support arbitrary MapReduce-style mapping, scattering, shuffling, grouping by key and reducing.  For example, let's say the "y in ys" actually represents an id for a Person record of some form, and you want to group everybody together who has the same first name, you should be able to do something like the following:

for y in ys
    p = persons[y]
    p -> firstNameGroups[p.firstName]

It might be more helpful in general to extend "->" to take (key,value) pairs, and group by keys:

for y in ys
    p = persons[y]
    (p.firstName, p) -> firstNameGroups

...or, for the histogram example:

for y in ys
    (y, 1) -> ones


Viewing all articles
Browse latest Browse all 8

Trending Articles