Blog

Formula Parsing with Scala VI: Formula Evaluation

Formula Parsing with Scala VI: Formula Evaluation

Introduction

In the last section of the tutorial we did the heavy lifting - implementing the parsing logic for constructing abstract syntax trees (ASTs). Each such tree corresponds to a function that we can reuse to calculate the result of the corresponding formula for different inputs.

As usual, let's look at an example to understand how to go about solving this problem. The task is to calculate the formula ($x + 2) * ($y - 3) for specific values of $x and $y (we can think of this as a function f(x, y) = (x + 2) * (y - 3)). Our starting point is the graphical representation of the corresponding AST:

     *
    / \
   /   \
  +     -
 / \   / \
x   2 y   3

In order to compute the result for concrete values like x = 1 and y = 5, we first need to substitute the terminal variables with their values:

     *
    / \
   /   \
  +     -
 / \   / \
1   2 5   3

Then we evaluate the operations that are defined in the parent nodes of the terminals and replace them by the result:

  *
 / \
3   2

We can terminate when we reach the root node:

6

Our general strategy for the evaluation is then to traverse the tree and perform all operations defined in the nodes with the values in their terminal child nodes until we reach the root node of the tree.

Evaluation Function

As a starting point for the evaluation function, let's consider its signature: we have to pass the AST, the values for the input variables, and we need to output a number. We also would like to handle the case when something goes wrong (division by zero, etc.). Since we are not interested in error information, we can simple return Option[Double].

Let's add a first draft of the evaluation function:

import de.argodis.tutorial.scalaparser.parser.nodes.FormulaAST

object FormulaEvaluator {
  def evaluate(formulaTree: FormulaAST, input: Map[Int, Double]): Option[Double] = {
    None
  }
}

As we discussed above, we are going to evaluate the tree by traversing it, so here is a recursion that does just that:

def evaluate(formulaTree: FormulaAST, input: Map[Int, Double]): Option[Double] = {
  // Calculate numerical expressions
  def calculate(node: FormulaAST): Option[Double] = node match {
    case _ => None
  }
  // Run the calculation
  calculate(formulaTree)
}

Evaluating Terminals

We can start building up the formula evaluator by considering the terminals (constants and variables) first:

// Calculate numerical expressions
def calculate(node: FormulaAST): Option[Double] = node match {
  case Constant(value) => Some(value)
  case Variable(id) => input.get(id)
  case _ => None
}

Now we can test this first version

import org.scalatest.{FunSuite, Matchers}

class TestFormulaEvaluator extends FunSuite with Matchers {

  test("evaluate expression '1'") {
    FormulaParser
      .parse("1")
      .map(ast => FormulaEvaluator.evaluate(ast, Map()))
      .map(result => result shouldBe Some(1.0))
  }

  test("evaluate expression '$1' where '$1 = -4.25' ") {
    FormulaParser
      .parse("$1")
      .map(ast => FormulaEvaluator.evaluate(ast, Map(1 -> -4.25)))
      .map(result => result shouldBe Some(-4.25))
  }

}

Handling Bad Calculations

Before we finalize the evaluator, let's think back what out task is: compute formulas from input data. We should never trust raw input data as it is, due to human error, incompleteness, etc. It might very well happen that data is missing, so our formulas are not going to be calculated correctly. We would like to have a robust formula evaluation design that is able to handle such eventualities. We are going to build it according to the best effort principle: if an error occurs, we will return a non-result (None), while a successful evaluation is going to return the value of interest. We are not interested in what kind of error has occurred, therefore an Option[Double] seems to be the best way to model our requirement. This means that all calculations should not "crash" when faced with bad data and they should return an Option[Double].

Arithmetic Errors

Another class of problems that we might encounter, even when the input data is there, are arithmetic errors - division by zero, overflow, etc. The Scala floating point type scala.Double represents such corner cases as scala.Double.NaN, scala.Double.PositiveInfinity, etc. We are not concerned with the exact type of error, so we have to convert such a value to the None type. To this end, let's create a small conversion function for these cases:

  // Handles problematic floating point values
  private def sanitize(x: Double): Option[Double] = x match {
    case value if java.lang.Double.isNaN(value) => None
    case Double.PositiveInfinity => None
    case Double.NegativeInfinity => None
    case value => Some(value)
  }

Now let's test a few cases:


  test("convert a proper double to Some(value)") {
    FormulaEvaluator.sanitize(0.25)  shouldBe Some(0.25)
  }

  test("convert a scala.Double.NaN to None") {
    FormulaEvaluator.sanitize(scala.Double.NaN)  shouldBe None
  }

  test("convert a positive infinity to None") {
    val result = Some(1.25).map(value => value / 0.0)
    FormulaEvaluator.sanitize(result.get) shouldBe None
  }

  test("convert a negative infinity to None") {
    val result = Some(1.25).map(value => - value / 0.0)
    FormulaEvaluator.sanitize(result.get) shouldBe None
  }
Safe Operations for Wrapped Input Values

Now we can present a higher-order function that enables us to easily handle arbitrary binary operations with arguments of type Option[Double]. We are going to use it to safely evaluate the arithmetic operator nodes of the abstract syntax tree:

  // Take options as input, return an option as output
  def safeOp(left: Option[Double], right: Option[Double], op: (Double, Double) => Double): Option[Double] = {
    val result = for {
      x <- left
      y <- right
    } yield op(x, y)
    // We don't want to deal with NaN, Infinity, etc
    result.flatMap(sanitize)
  }

More tests are in order:

  test("safeOp returns a result when both arguments are defined") {
    FormulaEvaluator.safeOp(Some(1), Some(1), _ + _) shouldBe Some(2)
  }

  test("safeOp returns None when the left argument is None") {
    FormulaEvaluator.safeOp(None, Some(1), _ + _) shouldBe None
  }

  test("safeOp returns None when the right argument is None") {
    FormulaEvaluator.safeOp(Some(1), None, _ + _) shouldBe None
  }

  test("safeOp returns None when both arguments are None") {
    FormulaEvaluator.safeOp(None, None, _ + _) shouldBe None
  }

  test("safeOp returns None when a division by zero occurs") {
    FormulaEvaluator.safeOp(Some(0.25), Some(0.0), _ / _) shouldBe None
  }

Evaluating the Abstract Syntax Tree

The final piece of the puzzle is how to evaluate the operator tree nodes. Our effort so far is now paying off, since the only thing we need to do is to match the various operator nodes and simply return the corresponding safeOp:

    // Calculate numerical expressions
    def calculate(node: FormulaAST): Option[Double] = node match {
      case Constant(value) => Some(value)
      case Variable(id) => input.get(id)
      case OperatorAdd(left, right) => safeOp(calculate(left), calculate(right), _ + _)
      case OperatorSubtract(left, right) => safeOp(calculate(left), calculate(right), _ - _)
      case OperatorMultiply(left, right) => safeOp(calculate(left), calculate(right), _ * _)
      case OperatorDivide(left, right) => safeOp(calculate(left), calculate(right), _ / _)
      case _ => None
    }

That's it! We now have a fully functional arithmetic formula parser and evaluator for arbitrary expressions. Let's take a moment to feel our might by adding a few tests:

  test("evaluate expression '$1 - 1'") {
    FormulaParser
      .parse("$1 - 1")
      .map(ast => FormulaEvaluator.evaluate(ast, Map(1 -> 1)))
      .map(result => result shouldBe Some(0.0))
  }

  test("evaluate expression '$1 - $1'") {
    FormulaParser
      .parse("$1 - $1")
      .map(ast => FormulaEvaluator.evaluate(ast, Map(1 -> 1)))
      .map(result => result shouldBe Some(0.0))
  }

  test("evaluate expression '$1 - $2'") {
    FormulaParser
      .parse("$1 - $2")
      .map(ast => FormulaEvaluator.evaluate(ast, Map(1 -> 1, 2 -> 1)))
      .map(result => result shouldBe Some(0.0))
  }

  test("evaluate expression with a missing second argument '$1 - $2'") {
    FormulaParser
      .parse("$1 - $2")
      .map(ast => FormulaEvaluator.evaluate(ast, Map(1 -> 1)))
      .map(result => result shouldBe None)
  }

  test("evaluate expression '($1 + $2) * ($3 + $4)'") {
    FormulaParser
      .parse("($1 + $2) * ($3 + $4)")
      .map(ast => FormulaEvaluator.evaluate(ast, Map(1 -> 1, 2 -> 1, 3 -> 3, 4 -> 2)))
      .map(result => result shouldBe Some(10.0))
  }

Feel free to test more complex expressions as well! In the next chapter we are going to tie everything together, in order to complete the command-line based application that we can use with arbitrary data.

Summary

In this chapter we have learned how to evaluate the abstract syntax trees for specific inputs. This means that we are finally in position to apply them as functions to the full input data set we started with. In the next part of the tutorial we are going to complete the application by defining a sequence of operations and adapting the various parts we have developed so far into a seamless computational model.

Continue on to Tying Everything Together

0 Comments 0 Comments
0 Comments 0 Comments