Pipeline

Pipeline #

We will see in this section how functional languages provide a convenient syntax to compose functions that modify lists.

This lets us apply a sequence of operations (filter, transform, aggregate, etc.) to a stream of incoming data, analogously to Unix-like pipes. This style of programming also pairs well with functors and monads, which we will see in later sections.

Thanks to lazy evaluation, each operation in the pipeline can start feeding partial results to the next one, even if it has not finished processing its input list.

Adoption. A variety of imperative languages (notably C++, C#, Java or Rust) have borrowed or adapted this pattern. For instance, pipelines have become an important part of Java programming, since the release of Java Stream API (2014).

This pattern has also been adopted by some scripting languages for statistical analysis, such as R (natively) or Matlab (via libraries).

In Haskell #

Operations on a list #

We have encountered several higher-order functions commonly used to specify operations on lists: map, filter, takeWhile, all, any, foldr, foldl, find, etc.

Example. Recall the product type that we used to model a playing card:

data Card = CardC { suit :: Suit, rank :: Int }

Then the function

filter (\c -> suit c == Spades)

has type

[Card] -> [Card]

It retains all cards with suit Spades.

Composition #

Example (continued). The following function sums the ranks of all spades in a list of card.

data Card = CardC { suit :: Suit, rank :: Int }

myPipeline :: [Card] -> Int
myPipeline = sum . map rank . filter (\c -> suit c == Spades)

myPipeline is the composition of three functions:

  • the function filter (\c -> suit c == Spades) seen above,
  • the function map rank, which maps each retained card to its rank,
  • the function sum, which aggregates these ranks.

Evaluation #

Thanks to lazy evaluation, each function in a pipeline may process its input on the fly, without waiting for termination of the previous function in the pipeline.

Example (continued). In the example above, let us use:

  • f for the first function in the pipeline (i.e filter (\c -> suit c == Spades)),
  • g for the second function in the pipeline (i.e map rank),
  • h for the third function in the pipeline (i.e sum).

so that myPipeline can be defined as

myPipeline = h . g . f

For a list of cards cs, a naive evaluation strategy would:

  • first evaluate f cs,
  • then apply g to the resulting list,
  • the apply h to the resulting list.

Instead, thanks to lazy evaluation, the function g can process cards on the fly, as soon as they are output by f (and similarly for h and g).

Simplified syntax #

Forward composition #

Motivation. Consider a composition $c$ of functions.
For instance

$\qquad c = h \circ g \circ f$

Computationally, this reads from right to left :

$\qquad$ “apply $f$, then $g$, then $h$.”

Instead, when we write code, it can be convenient to use an expression that mimics the order of execution.

For instance, we could define a reverse composition symbol $\gtrdot$, and write:

$\qquad c = f \gtrdot g \gtrdot h$

Syntax. The Haskell module Control.Category provides a reverse composition function >>>.
It can be defined as follows:

(>>>) :: (a -> b) -> (b -> c) -> a -> c
f >>> g = g . f

This lets us write

$\qquad$ f >>> g >>> h

instead of

$\qquad$ h . g . f

Example(continued). The pipeline above can alternatively be defined as

import Control.Category

myPipeline :: [Card] -> Int
myPipeline =
  filter (\c -> suit c == Spades)
    >>> map rank
    >>> sum

Let

maybeHead :: [a] -> Maybe a

map a nonempty list to its head (and the empty list to Nothing).

Simplify the following function:

myPipe :: [Int] -> Maybe Int
myPipe =
  filter even
    >>> map (+ 1)
    >>> filter even
    >>> maybeHead
myPipe :: [Int] -> Maybe Int
myPipe _ = Nothing

Forward application #

A similar syntax can be used to chain function applications.

Motivation.

$\qquad h(g(f(x)))$

means

$\qquad$ “start from $x$, then apply $f$ to it, then apply $g$ to the result, and then apply $h$ to the result”.

Instead, it can be more convenient to first write $x$, then $f$, then $g$, then $h$.

Syntax. The module Data.Function provides a reverse application function &.
It can be defined as:

(&) :: a -> (a -> b) -> b
x & f = f x

The function & associates to the left, so that x & f & g stands for (x & f) & g.

This let us write

$\qquad$ x & f & g & h

instead of

$\qquad$ h (g (f x))

Example(continued). The pipeline above can alternatively be defined as

import Data.Function

myPipeline :: [Card] -> Int
myPipeline cs =
  cs
    & filter (\c -> suit c == Spades)
    & map rank
    & sum

Let

maybeHead :: [a] -> Maybe a
maybeLast :: [a] -> Maybe a

map a nonempty list to its head and last element respectively (and the empty list to Nothing).

Evaluate the following expressions.

  1. [2,1,4,7] & filter even
              & map (+ 1)
              & maybeLast
    
  2.  [-5,4,2,-1] & filter (>= 0)
                 & map (\x -> -x)
                 & filter (> -2)
                 & all (not . even)
    
  3.  [
       ("Bolzano", "BZ", 39100),
       ("Rovereto", "TN", 38068),
       ("Merano", "BZ", 39012),
       ("Bressanone", "BZ", 39042)
     ]
       & filter (\(_, prov, zip) -> prov == "BZ" && zip /= 39100)
       & map (\(name, _, _) -> name)
       & maybeHead
    
  1. Just 5
  2. True
  3. Just "Merano"