Building a JSON validator with Sylver - Part2/3 : Intuitive JSON AST queries
In Part 1, we used Sylver's meta language to build a specification for the JSON format. But an AST, by itself, is not of much use. In this next tutorial, we'll continue building our JSON configuration validator. To this end, we'll learn how to use Sylver's query REPL (Read Eval Print Loop) to identify the parts of our JSON code that do not comply with a set of increasingly complex rules. In the next and last part, we'll learn how to package queries into a rule set to share and reuse them easily.
If you have already installed sylver
and sylver --version
doesn't output
a version number >= 0.1.3
, please go to https://sylver.dev to download a fresh copy of the software.
Prelude
We will reuse the json.syl
spec file built in the previous tutorial.
Since the config.json
file that we used to test our parser contains a valid
configuration, we'll need to create a new JSON document (invalid_config.json
) containing
an incorrect configuration:
{
"variables": [
{
"name": "date of birt`",
"description": "Customer's date of birth",
"type": "datetime"
},
{
"name": "activity",
"description": "A short text describing the customer's profession",
"type": "string"
},
{
"name": "country",
"description": "Customer's country of residence",
"type": "string",
"values": ["us", "fr", "it" ]
}
]
}
This file represents the configuration of an imaginary software handling a database of customers. Each customer profile is described using a configurable set of variables.
We want to validate these variable declarations using the following rules:
- Variable descriptions must contain at most 35 characters
- The name of a variable must be a single lowercase word
- If the
values
field is set, thetype
field should be absent (as it matches the type of the values)
Let's parse this file and start a REPL to query our AST!
Basic REPL usage
Loading the AST of invalid_config.json
is as simple as invoking:
sylver query --spec=json.syl --files=invalid_config.json
The files
argument accepts one or several file names or quoted glob patterns.
You can exit the REPL by typing :quit
at the prompt.
:print invalid_config.json
and :print_ast invalid_config.json
can be used to visualize
one of the loaded files, or the corresponding AST.
Query language basics
Syntax queries are of the form match <NodeType> <binding>? (when <boolean expression>)?
.
The node binding and the when [...]
clause are optional.
NodeType
represents either a node type as it appears when printing the AST or a
placeholder (_
) that matches every node. The whole part following the match
keyword
is called a query pattern.
In the REPL, queries must be followed by a ;
The most straightforward query, returning every node in the AST, is written as follows:
match _;
A slightly more advanced query to return every String
node in our document:
match String;
If we only wish to retrieve the string literals above a certain length, we can
add a when
clause:
match String str when str.text.length > 35;
This query matches only the string literals whose text representation (quotes included) contains more than 35 characters. In our document, there is only one match on line 10.
The node type can be tested using the is
keyword. For instance,
to retrieve any node whose direct parent is an Object
:
match _ node when node.parent is Object;
Returns the members list of all the objects in our document.
The is
keyword can also test a node against a full query pattern surrounded by curly braces. So, for example, we can retrieve every node whose parent
is a member with key name
.
match _ node when node.parent is {
Member m when m.key.text == '"name"'
};
String literals can be single or double quoted.
Now that we know how to write basic queries, let's try to find the nodes that violate our rules.
Rule 1: variable descriptions should be shorter than 35 characters
Except for the &&
operator in boolean expressions, this rule only
uses features that appeared in the previous section so you can test
yourself by trying to write it without looking at the following block of code!
match String desc when desc.text.length > 37 && desc.parent is {
Member m when m.key.text == '"description"'
};
This query should return a single node on line 10 that is indeed longer than the specified length. Note that we check that the text length is above 37 instead of 35 because of the surrounding quotes.
Rule 2: variable names should be a single lowercase word
match String s when !s.text.matches(`"[a-z]+"`) && s.parent is {
Member m when m.key.text == '"name"'
};
Returns a single node corresponding to the invalid date of birth
name.
Apart from the boolean !
operator, this rule demonstrates the use of the matches
method on text values. Unsurprisingly, it returns true
when the text matches the
regex literal given as an argument. As in spec files, regex literals are delimited by
backticks.
Rule 3: fields type
and values
should be mutually exclusive
For this rule, we'll use array quantifying expressions of the form:
<quantifier> <array value> match <query pattern>
Where the quantifier is any of the following keywords: no
, any
, all
.
Array quantifying expressions return true when any, all, or none of the values
in the given array match the query pattern.
Using this new tool, we can find the Object
nodes for witch
there is at least one child member with key type
and one child member with key values
:
match Object n when
any n.members.children match {
Member m when m.key.text == '"type"'
}
&& any n.members.children match {
Member m when m.key.text == '"values"'
};
Conclusion
We now have queries to identify violations of our business rules, but opening the REPL and pasting the queries whenever we want to validate a document isn't very practical. So, in the final part of this series, we'll learn how to package queries into a Sylver rule set to consume, distribute and share them more conveniently!