Building a JSON validator with Sylver - Part2/3 : Intuitive JSON AST queries

Building a JSON validator with Sylver - Part2/3 : Intuitive JSON AST queries

In Part 1, we used Sylver's meta language to build a specification for the JSON format. But an AST, by itself, is not of much use. In this next tutorial, we'll continue building our JSON configuration validator. To this end, we'll learn how to use Sylver's query REPL (Read Eval Print Loop) to identify the parts of our JSON code that do not comply with a set of increasingly complex rules. In the next and last part, we'll learn how to package queries into a rule set to share and reuse them easily.

If you have already installed sylver and sylver --version doesn't output a version number >= 0.1.3, please go to https://sylver.dev to download a fresh copy of the software.

Prelude

We will reuse the json.syl spec file built in the previous tutorial. Since the config.json file that we used to test our parser contains a valid configuration, we'll need to create a new JSON document (invalid_config.json) containing an incorrect configuration:

{
    "variables": [
        {
            "name": "date of birt`",
            "description": "Customer's date of birth",
            "type": "datetime"
        },
        {
            "name": "activity",
            "description": "A short text describing the customer's profession",
            "type": "string"
        },
        {
            "name": "country",
            "description": "Customer's country of residence",
            "type": "string",
            "values": ["us", "fr", "it" ]
        }
    ]
}

This file represents the configuration of an imaginary software handling a database of customers. Each customer profile is described using a configurable set of variables.

We want to validate these variable declarations using the following rules:

  1. Variable descriptions must contain at most 35 characters
  2. The name of a variable must be a single lowercase word
  3. If the values field is set, the type field should be absent (as it matches the type of the values)

Let's parse this file and start a REPL to query our AST!

Basic REPL usage

Loading the AST of invalid_config.json is as simple as invoking:

sylver query --spec=json.syl --files=invalid_config.json

The files argument accepts one or several file names or quoted glob patterns.

You can exit the REPL by typing :quit at the prompt.

:print invalid_config.json and :print_ast invalid_config.json can be used to visualize one of the loaded files, or the corresponding AST.

Query language basics

Syntax queries are of the form match <NodeType> <binding>? (when <boolean expression>)?. The node binding and the when [...] clause are optional. NodeType represents either a node type as it appears when printing the AST or a placeholder (_) that matches every node. The whole part following the match keyword is called a query pattern.

In the REPL, queries must be followed by a ;

The most straightforward query, returning every node in the AST, is written as follows:

match _;

A slightly more advanced query to return every String node in our document:

match String;

If we only wish to retrieve the string literals above a certain length, we can add a when clause:

match String str when str.text.length > 35;

This query matches only the string literals whose text representation (quotes included) contains more than 35 characters. In our document, there is only one match on line 10.

The node type can be tested using the is keyword. For instance, to retrieve any node whose direct parent is an Object:

match _ node when node.parent is Object;

Returns the members list of all the objects in our document.

The is keyword can also test a node against a full query pattern surrounded by curly braces. So, for example, we can retrieve every node whose parent is a member with key name.

match _ node when node.parent is {
    Member m when m.key.text == '"name"'
};

String literals can be single or double quoted.

Now that we know how to write basic queries, let's try to find the nodes that violate our rules.

Rule 1: variable descriptions should be shorter than 35 characters

Except for the && operator in boolean expressions, this rule only uses features that appeared in the previous section so you can test yourself by trying to write it without looking at the following block of code!

match String desc when desc.text.length > 37 && desc.parent is {
   Member m when m.key.text == '"description"'
};

This query should return a single node on line 10 that is indeed longer than the specified length. Note that we check that the text length is above 37 instead of 35 because of the surrounding quotes.

Rule 2: variable names should be a single lowercase word

match String s when !s.text.matches(`"[a-z]+"`) && s.parent is {
   Member m when m.key.text == '"name"'
};

Returns a single node corresponding to the invalid date of birth name.

Apart from the boolean ! operator, this rule demonstrates the use of the matches method on text values. Unsurprisingly, it returns true when the text matches the regex literal given as an argument. As in spec files, regex literals are delimited by backticks.

Rule 3: fields type and values should be mutually exclusive

For this rule, we'll use array quantifying expressions of the form:

<quantifier> <array value> match <query pattern>

Where the quantifier is any of the following keywords: no, any, all. Array quantifying expressions return true when any, all, or none of the values in the given array match the query pattern. Using this new tool, we can find the Object nodes for witch there is at least one child member with key type and one child member with key values:

match Object n when
    any n.members.children match {  
        Member m when m.key.text == '"type"' 
    }
    && any n.members.children match { 
        Member m when m.key.text == '"values"' 
    };

Conclusion

We now have queries to identify violations of our business rules, but opening the REPL and pasting the queries whenever we want to validate a document isn't very practical. So, in the final part of this series, we'll learn how to package queries into a Sylver rule set to consume, distribute and share them more conveniently!