Introduction
We have now completed the core of the application, but in order to use it, we still have to stich the different parts together. We will also take this opportunity to smooth out some rough edges in the current implementation. Although we are not going to go deep into advanced functional programming, we would like to sketch basic ideas that will allow for a more elegant solution. You shouldn't need prior knowledge of the underlying abstract concepts to follow the rest of the tutorial.
Computational Model
While it is quite clear what to do, there are many different ways to implement the processing steps. In our case we will try to follow the monadic model of computation that we can represent as follows:
read input data > error


v
read formulas > error


v
parse formulas > error


v
evaluate formulas > error


v
save output > error


v
success
What this means is that we define the sequence of computations in such a way that it will break automatically if an error occurs. In this way we don't need to write error handling code for each step, we will "collect" the errors and handle them in once place.
Side Note
If you are not familiar with
for
expressions in Scala, this is just some syntactic sugar that converts a series of nestedflatMap
andmap
calls to a sequence in afor
block. The following example should clarify this idea:val A: Option[Int] = Some(2) val B: Option[Int] = Some(3) val result = A.flatMap(a => B.map(b => a * b)) result == Some(6)
It looks much simpler if we rewrite it like this:
val result = for { a < A b < B } yield a * b
Implementation Idea
From the above example it is clear that we need a container type that can propagate possible errors occuring at any step in the computation sequence. Which standard Scala type is best suitable for the job? Option
is not suitable, because it can't hold additional information like an error message. Try
also doesn't quite fit, because we have built our parsing logic without throwing exceptions. We are left with Either
, so we can decide to specialize it like Either[String, T]
, in order to be able to output error messages to the user.
An initial sketch of the program then is going to look somewhat like this:
val result = for {
input < Data.loadInput(arguments.input)
formulas < Data.loadFormula(arguments.formula)
trees < parseFormulas(formulas)
output < process(input, trees)
msg < Data.writeOutput(arguments.output, output)
} yield msg
This compositional style has the added advantage that we can build up the solution stepbystep, without needing the full sequence of computations as shown above. Before we define the missing functions above, we have to make sure that everything we have developed so far is compatible with Either[String, T]
, as we show next.
House Cleaning
We can start updating the main function by considering the csv loading functions only:
def main(args: Array[String]): Unit = {
val arguments = parseCliArguments(args)
val result = for {
input < Data.loadInput(arguments.input)
formulas < Data.loadFormula(arguments.formula)
} yield formulas
}
It is still possible to compose them, because both return Try
, as we shall see next.
I/O Functions
When we started, we created functions that load csv files and convert the data to the corresponding row format of the input data / formula definition rows:
def loadInput(path: String): Try[List[InputDataRow]] = loadCsv[InputDataRow](path)
def loadFormula(path: String): Try[List[FormulaRow]] = loadCsv[FormulaRow](path)
Let us first observe that if the functions fail due to a wrong file path, permission problems, corrupted files, etc., then the only thing we need to do is output the corresponding error message. We don't need to handle the full error object. This means that we can freely convert Try
to Either[String, T]
as follows:
def loadInput(path: String): Either[String, List[InputDataRow]] = loadCsv[InputDataRow](path)
def loadFormula(path: String): Either[String, List[FormulaRow]] = loadCsv[FormulaRow](path)
Looks like we need to update the loadCsv
function as well. Luckily for us, Try
has a convenience method toEither
that we are going to use. Unfortunately, we note that it returns Either[Throwable, T]
. This means that we have to convert it to Either[String, T]
explicitly:
.toEither match {
case Left(error) => Left(error.getMessage)
case Right(msg) => Right(msg)
}
There could be a more elegant way to solve this, but we are willing to make a compromise for comprehisibility's sake. The full listing of loadCsv
looks like this:
def loadCsv[T](path: String)(implicit ct: ClassTag[T]): Either[String, List[T]] = {
val mapper = new CsvMapper with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
loadFileContent(path).map{content =>
mapper
.readerFor(ct.runtimeClass)
.`with`(headerSchema)
.readValues[T](content)
.asInstanceOf[java.util.Iterator[T]]
.asScala
.toList
}
}.toEither match {
case Left(error) => Left(error.getMessage)
case Right(msg) => Right(msg)
}
In order to proceed, we will define a stub for the FormulaParserApp.process
function that for now will serve as a 'glue' between the input and output only:
def process(input: List[InputDataRow], trees: List[FormulaRow]): Either[String, List[String]] = Right(List())
With this change, we can update the main function as follows:
def main(args: Array[String]): Unit = {
val arguments = parseCliArguments(args)
val result = for {
input < Data.loadInput(arguments.input)
formulas < Data.loadFormula(arguments.formula)
} yield process(input, formulas)
result match {
case Left(errorMessage) => println(s"Error: $errorMessage")
case Right(msg) => println(s"Success: $msg")
}
}
We are still missing a write function for the output data, so let's add one:
def writeOutput(path: String, data: List[String]): Either[String, String] = Try {
val bw = new BufferedWriter(new FileWriter(new File(path)))
bw.write(data.mkString("\n"))
bw.close()
s"Wrote ${data.size} lines"
}.toEither match {
case Left(error) => Left(error.getMessage )
case Right(msg) => Right(msg)
}
Since this represents the final stage of the computation, we may as well update the main function:
val result = for {
input < Data.loadInput(arguments.input)
formulas < Data.loadFormula(arguments.formula)
output < process(input, formulas)
msg < Data.writeOutput(arguments.output, output)
} yield msg
Now that we have defined the beginning and the end of this simple computational pipeline, let us focus on tidying up some earlier code.
Parsing All Formulas
The critical step in this tutorial is obviously the parsing process. We will start by defining a stub for the parsing function based on a compatible signature:
def parse(formulas: List[FormulaRow]): Either[String, List[(Long, FormulaAST)]] = Right(List())
In order to feed the resulting formula ASTs to the processing function, it should accept this type as well:
def process(input: List[InputDataRow], trees: List[(Long, FormulaAST)]): Either[String, List[String]] = Right(List())
With this, we are now ready to declare the full computational pipeline:
val result = for {
input < Data.loadInput(arguments.input)
formulas < Data.loadFormula(arguments.formula)
trees < parse(formulas)
output < process(input, trees)
msg < Data.writeOutput(arguments.output, output)
} yield msg
Parse Step
What we need to do next is obviously iterate over the elements of List[FormulaRow]
and parse each formula:
val trees = formulas.map(row => FormulaParser.parse(row.formula))
In addition, we also need the formula ids for the evaluation step, so we can extend the above line to:
val trees = formulas.map(row => FormulaParser.parse(row.formula).map(ast => (row.id, ast)))
Let's stop for a moment and look back at the FormulaParser.parse
function, did we forget something?
def parse(formulaExpression: String): Either[String, FormulaAST] =
// Unfortunately ParseResult does not provide flatMap
FormulaLexer.tokenize(formulaExpression) match {
case FormulaLexer.Success(tokens, _) =>
formula(FormulaTokenReader(tokens.asInstanceOf[List[FormulaToken]])) match {
case Success(formulaAST, _) => Right(formulaAST)
case Failure(msg, _) => Left(msg)
case Error(msg, _) => Left(msg)
}
case FormulaLexer.Failure(msg, _) => Left(msg)
case FormulaLexer.Error(msg, _) => Left(msg)
}
This doesn't look like the most elegant code ever written. The reason is that the Scala parser combinator library returns a ParseResult[T]
object that doesn't have a flatMap
function. We did not consider composition at the time, but now we could improve on this.
First, take a look at tokenize
function of FormulaLexer
:
def tokenize(formula: String): ParseResult[List[FormulaToken]] = parse(tokens, formula)
We have to convert it to def tokenize(formula: String): Either[String, List[FormulaToken]]
, so let's match the different types that ParseResult
can return:
def tokenize(formula: String): Either[String, List[FormulaToken]] = parse(tokens, formula) match {
case Success(tokens, _ ) => Right(tokens)
case Failure(msg, _) => Left(msg)
case Error(msg, _) => Left(msg)
}
With this we can also update the FormulaParser.parse
function that now looks like this:
def parse(formulaExpression: String): Either[String, FormulaAST] =
FormulaLexer
.tokenize(formulaExpression)
.flatMap(tokens => formula(FormulaTokenReader(tokens)) match {
case Success(ast, _) => Right(ast)
case Failure(msg, _) => Left(msg)
case Error(msg, _) => Left(msg)
})
Now that we can produce a list of parsed formulas in the form List[Either[(Long, FormulaAST)]]
, we need to make a design decision. How do we handle formulas that fail to parse? If a formula is invalid (syntax error), it will produce Left[String]
, but we can still retain a mix of errors and correctly parsed formulas even when multiple formulas in the configuration are invalid. Do we ignore the failed formulas and how do we notify the user?
Since the definition of the formulas is not the job of our application, we can now decide not to process the data if we encounter faulty formulas. This means that the application will have to terminate with an error message about the syntax error when the first such case is encountered. If all formulas are correct, then we proceed to compute the results.
Up until now we have automatically handled the error case by supplying Either[String, T]
as the return type of the individual computational steps. But the parsing of the formulas occurs inside the FormulaParserApp.parse
function, where we have defined the perfectly valid list of results val trees: List[Either[String]]'. We need to do some extra work to look for errors and pass the first one as the result of the
FormulaParserApp.parse` function.
Scala collections provide a very convenient function to get the first element of the type we are looking for in a List
:
val errorOption: Option[String] = trees collectFirst { case Left(error) => error }
We can also collect all correctly parsed formulas as follows:
val correctASTs = trees.collect { case Right(r) => r}
The only thing that remains to do is to convert errorOption
into an Either[String, List[]]
that the FormulaParserApp.parse
function can return:
val result: Either[String, List[(Long, FormulaAST)]] = errorOption.toLeft(correctASTs)
If errorOption
is filled, toLeft
will return Left(errorMessage)
, on the other hand it will return toRight(List(...))
if errorOption
is None
.
We can now finalize the parse
function by sorting the list of formula syntax trees as follows:
def parse(formulas: List[FormulaRow]): Either[String, List[(Long, FormulaAST)]] = {
val trees = formulas.map(row => FormulaParser.parse(row.formula).map(ast => (row.id, ast)))
trees
.collectFirst { case Left(error) => error }
.toLeft(trees.collect {case Right(r) => r}.sortBy(_._1))
}
Now if there is a bad formula definition anywhere in the formula configuration file, the program is going to output an error message and halt.
As a final touch, we might want to clarify the error messages a bit in the FormulaLexer.tokenizer
case Failure(msg, input) => Left(s"Lexer Error: '$msg'. Input: $input")
case Error(msg, input) => Left(s"Lexer Error: '$msg'. Input $input")
and FormulaParser.parse
functions:
case Failure(msg, input) => Left(s"Parser Error: '$msg'. Input: $input")
case Error(msg, input) => Left(s"Parser Error: '$msg'. Input: $input")
Formula Evaluation
We have come a long away, but we still need to complete the final computational step: evaluating the parsed formulas for every row of input data. We have already stubbed the FormulaParserApp.process
function, now let's think about the design.
First, we have to convert the InputDataRow
object to a compatible Map[Int, Double]
that the FormulaEvaluator.evaluate
function accepts. The most obvious way to do this is to define a member function InputDataRow.toMap
of the InputDataRow
class. If we take a look at the data, we will notice that not all fields have numerical values:
vin  readout_date  brand  model  mileage  coolant_temperature  oil_pressure  ignition_cycles  tyre_pressure 

1GKS1AECXFR441593  24.04.2018  Volkswagen  Eurovan  173888  29.91  45.04  813  3.78 
SCBBR93W678635158  17.07.2010  Suzuki  X90  29902  92.99  12.18  665  2.35 
WAUSG78E26A094153  04.01.2001  Hyundai  Equus  150406  107.78  22.49  2550  1.4 
Since our numbering scheme for the fields starts from the beginning:
field  number 

vin  1 
readout_date  2 
brand  3 
model  4 
mileage  5 
coolant_temperature  6 
oil_pressure  7 
ignition_cycles  8 
tyre_pressure  9 
we can see that we should convert only the numerical fields:
def inputMap: Map[Int, Double] = Map(
5 > mileage.toDouble,
6 > coolant_temperature,
7 > oil_pressure,
8 > ignition_cycles.toDouble,
9 > tyre_pressure
)
With this detail out of the way, we can now think about the format of the output row. As we have explained in the beginning, we are going to use the formula ids to define the columns of our output table. Each row in the output table will contain formula results calculated from the corresponding row in the input table:
1  2  3  ...  n 

row 1 formula 1  row 1 formula 2  row 1 formula 3  ...  row 1 formula n 
...  ...  ...  ...  ... 
row n formula 1  row n formula 2  row n formula 3  ...  row n formula n 
Let us start by creating the csv header. This is just a matter of joining the formula ids that we already have in the formula list:
val header = formulas.map(_._1.toString).mkString(",")
Next, we can evaluate each formula for the given row input:
val output = input.map(row =>
formulas.map(formulaPair => FormulaEvaluator.evaluate(formulaPair._2, row.inputMap).mkString).mkString(",")
)
Note that we convert the Option[Double]
result of FormulaEvaluator.evaluate
to a String
in such a way, that we are going to produce an empty string if the option is empty.
Finally, we can stitch together the process
function as follows:
def process(input: List[InputDataRow], formulas: List[(Long, FormulaAST)]): Either[String, List[String]] = {
val header = formulas.map(_._1.toString).mkString(",")
val output = input.map(row =>
formulas.map(formulaPair => FormulaEvaluator.evaluate(formulaPair._2, row.inputMap).mkString).mkString(",")
)
Right(header :: output)
}
Summary
Congratulations! We have now a standalone command line application that can evaluate arbitrary algebraic expressions for arbitrary tabular input. The knowledge that we have gained in the past chapters culminated in designing a functional computational pipeline that is defined in a succint manner and handles failure automatically. This, however, just defines a basic example for what is possible to achieve with a custom parser. It can get really interesting when thinking about designing more programming constructs such as logical expressions, functions and more! These fancy features are discussed in the final chapter of this tutorial.