Artifact [71fb9b0dbe]
Not logged in

Artifact 71fb9b0dbe526bf53e68d45ca4ce4ac995e4328e:


RL_JSON

This package adds a command [json] to the interpreter, and defines a new Tcl_Obj type to store the parsed JSON document. The [json] command directly manipulates values whose string representation is valid JSON, in a similar way to how the [dict] command directly manipulates values whose string representation is a valid dictionary. It is similar to [dict] in performance.

Also provided is a command [json template] which generates JSON documents by interpolating values into a template from a supplied dictionary or variables in the current call frame. The templates are valid JSON documents containing string values which match the regex "^~[SNBJTL]:.+$". The second character determines what the resulting type of the substituted value will be:

  • S: A string.
  • N: A number.
  • B: A boolean.
  • J: A JSON fragment.
  • T: A JSON template (substitutions are performed on the inserted fragment).
  • L: A literal - the resulting string is simply everything from the forth character onwards (this allows literal strings to be included in the template that would otherwise be interpreted as the substitutions above).

None of the first three characters for a template may be escaped.

The value inserted is determined by the characters following the substitution type prefix. When interpolating values from a dictionary they name keys in the dictionary which hold the values to interpolate. When interpolating from variables in the current scope, they name scalar or array variables which hold the values to interpolate. In either case if the named key or variable doesn't exist, a JSON null is interpolated in its place.

Quick Reference

  • [json get json_val ?key ... ?modifier??] - Extract the value of a portion of the json_val, returns the closest native Tcl type (other than JSON) for the extracted portion.
  • [json parse json_val] - A deprecated synonym for [json get json_val].
  • [json get_typed json_val ?key ... ?modifier??] - Extract the value of a portion of the json_val, returns a two element list: the first being the value that would be returned by [json get] and the second being the JSON type of the extracted portion.
  • [json extract json_val ?key ... ?modifier??] - Extract the value of a portion of the json_val, returns the JSON fragment.
  • [json exists json_val ?key ... ?modifier??] - Tests whether the supplied key path and modifier resolve to something that exists in json_val
  • [json set json_variable_name ?key ...? value] - Updates the JSON value stored in the variable json_variable_name, replacing the value referenced by key ... with the JSON value value.
  • [json unset json_variable_name ?key ...?] - Updates the JSON value stored in the variable json_variable_name, removing the value referenced by key ...
  • [json normalize json_val] - Return a "normalized" version of the input json_val - all optional whitespace trimmed.
  • [json template json_val ?dictionary?] - Return a JSON value by interpolating the values from dictionary into the template, or from variables in the current scope if dictionary is not supplied, in the manner described above.
  • [json new type value] - Return a JSON fragment of type type and value value.
  • [json fmt type value] - A deprecated synonym for [json new type value].
  • [json foreach varlist1 json_val1 ?varlist2 json_val2 ...? script] - Evaluate script in a loop in a similar way to the [foreach] command. In each iteration, the values stored in the iterator variables in varlist are the JSON fragments from json_val. Supports iterating over JSON arrays and JSON objects. In the JSON object case, varlist must be a two element list, with the first specifiying the variable to hold the key and the second the value. In the JSON array case, the rules are the same as the [foreach] command.
  • [json lmap varlist1 json_val1 ?varlist2 json_val2 ...? script] - As for [json foreach], except that it is collecting - the result from each evaluation of script is added to a list and returned as the result of the [json lmap] command. If the script results in a TCL_CONTINUE code, that iteration is skipped and no element is added to the result list. If it results in TCL_BREAK the iterations are stopped and the results accumulated so far are returned.
  • [json pretty json_val] - Returns a pretty-printed string representation of json_val. Useful for debugging or inspecting the structure of JSON data.

Paths

The commands [json get], [json get_typed], [json extract] and [json exists] accept a path specification that names some subset of the supplied json_val. The rules are similar to the equivalent concept in the [dict] command, except that the paths used by [json] allow indexing into JSON arrays by the integer key (or a string matching the regex "^end(-[0-9]+)?$"), and that the last element can be a modifier:

  • ?type - Returns the type of the named fragment.
  • ?length - When the path refers to an array, this returns the length of the array. When the path refers to a string, this returns the number of characters in the string. All other types throw an error.
  • ?size - Valid only for objects, returns the number of keys defined in the object.
  • ?keys - Valid only for objects, returns a list of the keys in the object.

A literal value that would match one of the above modifiers can be used as the last element in the path by doubling the ?:

json get {
    {
        "foo": {
            "?size": "quite big"
        }
    }
} foo ??size

Returns "quite big"

Examples

Produce a JSON value from a template: ~~~tcl json template { { "thing1": "~S:val1", "thing2": ["a", "~N:val2", "~S:val2", "~B:val2", "~S:val3", "~L:~S:val1"], "subdoc1": "~J:subdoc", "subdoc2": "~T:subdoc" } } { val1 hello val2 1e6 subdoc { { "thing3": "~S:val1" } } } ~~~ Result: ~~~json {"thing1":"hello","thing2":["a",1000000.0,"1e6",true,null,"~S:val1"],"subdoc1":{"thing3":"~S:val1"},"subdoc2":{"thing3":"hello"}} ~~~

Performance

Good performance was a requirement for rl_json, because it is used to handle large volumes of data flowing to and from various JSON based REST apis. It's generally the fastest option for working with JSON values in Tcl from the options I've tried, with the next closest being yajltcl. These benchmarks report the median times in microseconds, and produce quite stable results between runs. Benchmarking was done on a MacBook Air running Ubuntu 14.04 64bit, Tcl 8.6.3 built with -O3 optimization turned on, and using an Intel i5 3427U CPU.

Parsing

This benchmark compares the relative performance of extracting the field containing the string "obj" from the JSON doc:

{
	"foo": "bar",
	"baz": ["str", 123, 123.4, true, false, null, {"inner": "obj"}]
}

The compared methods are:

Name Notes Code
old_json_parse Pure Tcl parser dict get [lindex [dict get [json_old parse [string trim $json]] baz] end] inner
rl_json_parse dict get [lindex [dict get [json parse [string trim $json]] baz] end] inner
rl_json_get Using the built-in accessor method json get [string trim $json] baz end inner
yajltcl dict get [lindex [dict get [yajl::json2dict [string trim $json]] baz] end] inner
rl_json_get_native json get $json baz end inner

The use of [string trim $json] is to defeat the caching of the parsed representation, forcing it to reparse the string each time since we're measuring the parse performance here. The exception is the rl_json_get_native test which demonstrates the performance of the cached case.

-- parse-1.1: "Parse a small JSON doc and extract a field" --------------------
                   | This run
    old_json_parse |  241.595
     rl_json_parse |    5.540
       rl_json_get |    4.950
           yajltcl |    8.800
rl_json_get_native |    0.800

Generating

This benchmark compares the relative performance of various ways of dynamically generating a JSON document. Although all the methods produce the same string, only the "template" and "template_dict" variants handle nulls in the general case - the others manually test for null only for the one field that is known to be null, so the performance of these variants would be worse in a real-world scenario where all fields would need to be tested for null.

The JSON doc generated in each case is the one produced by the following JSON template (where a(not_defined) does not exist and results in a null value in the produced document):

{
	"foo": "~S:bar",
	"baz": [
		"~S:a(x)",
		"~N:a(y)",
		123.4,
		"~B:a(on)",
		"~B:a(off)",
		"~S:a(not_defined)",
		"~L:~S:not a subst",
		"~T:a(subdoc)",
		"~T:a(subdoc2)"
	]
}

The produced JSON doc is:

{"foo":"Bar","baz":["str\"foo\nbar",123,123.4,true,false,null,"~S:not a subst",{"inner":"Bar"},{"inner2":"Bar"}]}

The code for these variants are too long to include in this table, refer to bench/new.bench for the details.

Name Notes
old_json_fmt Pure Tcl implementation, builds JSON from type-annotated Tcl values
rl_json_new rl_json's [json new], API compatible with the pure Tcl version used in old_json_fmt
template rl_json's [json template]
yajltcl yajltcl's type-annotated Tcl value approach
template_dict As for template, but using a dict containing the values to substitute
yajltcl_dict As for yajltcl, but extracting the values from the same dict used by template_dict
-- new-1.1: "Various ways of dynamically assembling a JSON doc" ---------------
                 | This run
    old_json_fmt |   49.450
     rl_json_new |   10.240
        template |    4.520
         yajltcl |    7.700
   template_dict |    2.500
    yajltcl_dict |    7.530

Under the Hood

Older versions used the yajl c library to parse the JSON string and properly quote generated strings when serializing JSON values, but currently a custom built parser and string quoter is used, removing the libyajl dependency. JSON values are parsed to an internal format using Tcl_Objs and stored as the internal representation for a new type of Tcl_Obj. Subsequent manipulation of that value use the internal representation directly.

License

Copyright 2015 Ruby Lane. Licensed under the same terms as the Tcl core.