Prototype TDE templates

Test SQL and Optic Against TDE Template Without Installing

personClever Llamas
calendar_today2023-11-06

Anyone who has worked with TDE templates understands that to fully prototype them you need to install them into the database. The reason being that when it is installed, the related content is indexed and made available via the various APIs - Optic API, SQL, SPARQL (and the cts functions that interface with those.).

In many cases, this is OK. However, when you are prototyping, you may not want to install the template into the database and wait for re-indexing. This is a toolbox article designed to help you prototype SQL and Optic from TDE templates without installing them and waiting for indexing.

We already have tde:node-data-extract(). Why is that not enough?

This is a great tool and we will use it as part of the solution. It helps us understand that there is data extracted. It does not however help us understand for sure if the data and datatypes as extracted are fit-for purpose against queries. Furthermore, we might also be looking at the best options for data structure based on what is available via TDE as well as OPTIC or SQL. What we need is a way to change data, template and queries concurrently when prototyping.

The simplified approach

The approach is to use the tde:node-data-extract() function to extract the data from the template and then send the data to into op:from-literals(). The actual format is slightly different whe using javascript vs xQuery.

import module namespace op="http://marklogic.com/optic"
     at "/MarkLogic/optic.xqy";
(:
  My standard setup for documents for testing  - sometimes JSON, sometimes XML 
  In this example, We have the records for Adam and Jenny
:)
let $docs := for $json in xdmp:unquote('[{"llama": {"id": 19, "name": "Adam", "type": "llama", "birthDate": "2004-12-10", "farm": 1, "pasture": 2}}, {"llama": {"id": 20, "name": "Jenny", "type": "llama", "birthDate": "2000-08-11", "farm": 1, "pasture": 2}}]')//object-node("llama")/.. 
  return document{$json}
  
let $template := <template xmlns="http://marklogic.com/xdmp/tde">
<description>TDE Extraction-sample</description>
<context>/llama</context>
<rows>
  <row>
    <schema-name>toolbox</schema-name>
    <view-name>tdePrototype</view-name>
    <columns>
      <column>
        <name>id</name>
        <scalar-type>int</scalar-type>
        <val>./id</val>
      </column>
      <column>
        <name>name</name>
        <scalar-type>string</scalar-type>
        <val>./name</val>
      </column>
      <column>
        <name>entityType</name>
        <scalar-type>string</scalar-type>
        <val>./type</val>
      </column>
      <column>
        <name>birthdate</name>
        <scalar-type>date</scalar-type>
        <val>./birthDate</val>
      </column>
      <column>
        <name>farmId</name>
        <scalar-type>string</scalar-type>
        <val>./farm</val>
      </column>
    </columns>
    </row>
  </rows>
</template>

(: massage the results of node-data-extract to work as op:from-literals and do a little query:)
return op:from-literals(xdmp:to-json(tde:node-data-extract($docs, $template))//*:data ! xdmp:from-json(.), "llama")
  => op:where(op:eq(op:view-col("llama", "birthdate"), "2004-12-10"))
  => op:select(("name", "id", "birthdate"), "")
  => op:result()
<plan:column type="column-def" column="birthdate" column-number="2" column-index="2" static-type="STRING"></plan:column>

String Match

This may look like it was useful. However, look closely at the date in the statement:

=> op:where(op:eq(op:view-col("llama", "birthdate"), "2004-12-10"))

This is just a string match. This is intresting, but not what We want. We want to be able to prototype against the datatypes that are defined in the TDE template.

Don't be fooled based on strings that happen to appear to work.

  => op:where(op:gt(op:view-col("llama", "birthdate"), "2004-11-10"))

String Sort Order This will give an expected result as well when using op:gt(). However, that is just because the date format in string order happens to work ('2004-11' being sorter before '2004-12').

This string ordering (and not bitrhdate as a typed date) can be proven with another simple example:

  => op:where(op:lt(op:view-col("llama", "birthdate"), "llama"))

Llama vs numeric For this, I also get results. This is because the string 'llama' is sorted after strings starting with numbers.

Apply datatypes from TDE template

The solution to bringing this to a more complete tool for prototyping is to apply datatypes to the literal values.

Using op:from-literals()

This simple example articulates the approach for xQuery and maps:

(: See previous sample for data and template:)
let $xs := op:prefixer("http://www.w3.org/2001/XMLSchema#")
let $literals := for $row in xdmp:to-json(tde:node-data-extract($docs, $template))//*:data ! xdmp:from-json(.)
  let $map := map:new()
  let $_ := for $column in $template//tde:column
    let $name := $column/tde:name/text()
    let $type := $column/tde:scalar-type/text()
    return map:put($map, $name, sem:typed-literal(xs:string(map:get($row, $name)), $xs($type)))
  return $map

return op:from-literals($literals)
 =>op:result()

For the above, the format for Javascript would be different (slightly different payload for op.fromLiterals()). However, we can take care of that by digging a by using SPARQL VALUES to represent our data.

Using op:from-sparql()

If you were to pull back some layers of the Optic API, you would notice that op:from-literals() is actually using SPARQL under the hood. Because of that, We can skip op:from-literals() and instead use op:from-sparql() using SPARQL VALUES to generate the data and assign the datatypes. This approach is also the same regardless of xQuery and Javascript.

There is one subtle difference that makes me lean towards this approach - if there are no documents, this code will not fail, in the case of op:from-literals(), a little more work would be needed to make it a bit more robust.

(: See previous sample for data and template:)

(: generate the rows:)
let $rows := string-join(
  for $row in xdmp:to-json(tde:node-data-extract($docs, $template))//*:data ! xdmp:from-json(.)
  return '(' || string-join(for $column in $template//tde:column
    let $name := $column/tde:name/text()
    let $type := $column/tde:scalar-type/text()
    return '"' || map:get($row, $name) || '"^^xs:' || $type
  , " ") || ')'
, "&#13;")

(:create the select statements:)
let $select := '(' || string-join(for $column in $template//tde:column 
  return '?' || $column/tde:name/text(), " ") || ')'
  
(: generate the SPARQL queries:)     
let $sparql := 
'prefix xs: <http://www.w3.org/2001/XMLSchema#>
SELECT * WHERE { 
     VALUES ' || $select || ' {
     '|| $rows ||'
  }
}'

return op:from-sparql($sparql)
  => op:result()
<plan:column type="column-def" column="birthdate" column-number="2" column-index="2" static-type="DATE"></plan:column>

SPARQL VALUES representation of the query

prefix xs: <http://www.w3.org/2001/XMLSchema#>
SELECT * WHERE { 
     VALUES (?id ?name ?entityType ?birthdate ?farmId) {
     ("19"^^xs:int "Adam"^^xs:string "llama"^^xs:string "2004-12-10"^^xs:date "1"^^xs:string)
     ("20"^^xs:int "Jenny"^^xs:string "llama"^^xs:string "2000-08-11"^^xs:date "1"^^xs:string)
  }
}

Full SPARQL-based Example:

import module namespace op="http://marklogic.com/optic"
     at "/MarkLogic/optic.xqy";
     
import module namespace oxs="http://marklogic.com/optic/expression/xs"
  at "/MarkLogic/optic/optic-xs.xqy";     
(:
  My standard setup for documents for testing  - sometimes JSON, sometimes XML 
  In this example, We have the records for Adam and Jenny
:)
let $docs := for $json in xdmp:unquote('[{"llama": {"id": 19, "name": "Adam", "type": "llama", "birthDate": "2004-12-10", "farm": 1, "pasture": 2}}, {"llama": {"id": 20, "name": "Jenny", "type": "llama", "birthDate": "2000-08-11", "farm": 1, "pasture": 2}}]')//object-node("llama")/.. 
  return document{$json}
  
let $template := <template xmlns="http://marklogic.com/xdmp/tde">
<description>TDE Extraction-sample</description>
<context>/llama</context>
<rows>
  <row>
    <schema-name>toolbox</schema-name>
    <view-name>tdePrototype</view-name>
    <columns>
      <column>
        <name>id</name>
        <scalar-type>int</scalar-type>
        <val>./id</val>
      </column>
      <column>
        <name>name</name>
        <scalar-type>string</scalar-type>
        <val>./name</val>
      </column>
      <column>
        <name>entityType</name>
        <scalar-type>string</scalar-type>
        <val>./type</val>
      </column>
      <column>
        <name>birthdate</name>
        <scalar-type>date</scalar-type>
        <val>./birthDate</val>
      </column>
      <column>
        <name>farmId</name>
        <scalar-type>string</scalar-type>
        <val>./farm</val>
      </column>
    </columns>
    </row>
  </rows>
</template>

(: This time, we actully add the datatypes dynamically as we go by referencing the TDE template and creating  a typed literal:)
(:rows:)
let $rows := string-join(
  for $row in xdmp:to-json(tde:node-data-extract($docs, $template))//*:data ! xdmp:from-json(.)
  return '(' || string-join(for $column in $template//tde:column
    let $name := $column/tde:name/text()
    let $type := $column/tde:scalar-type/text()
    return '"' || map:get($row, $name) || '"^^xs:' || $type
  , " ") || ')'
, "&#13;")

(:select:)
let $select := '(' || string-join(for $column in $template//tde:column 
  return '?' || $column/tde:name/text(), " ") || ')'
  
(:SPARQL:)     
let $sparql := 
'prefix xs: <http://www.w3.org/2001/XMLSchema#>
SELECT * WHERE { 
     VALUES ' || $select || ' {
     '|| $rows ||'
  }
}'

return op:from-sparql($sparql, "llama")
  => op:where(op:gt(op:view-col("llama", "birthdate"), oxs:date("2004-12-09")))
  => op:select(("name", "id", "birthdate"), "")
  => op:result()

From SPARQL

A sample using showing prototyping SQL conditions is as follows - which would yield the same result:

  => op:where(op:sql-condition("llama.birthdate BETWEEN '2004-10-22' AND '2004-12-30'"))

Portable Example

Once I've worked out the details of my TDE template and queries, I may want to test it on some existing data on an environment with more content. Ths can be done by shipping the TDE template and code to someone with access to a server. The major difference is that the code below gets a random sample of data rather than the few documents i used for prototyping. This could be provided in a 1-tab workspace, for example.

The overall layout is as follows:

  • Define window size
  • Define the query in order to isolate some content in the database
  • Define the Optic query that we want to run against our sample data
  • Define the TDE template that we want to use

The code below the declared variables will:

  1. Isolate documents in a random order based on the query and limit provided
  2. Generate SPARQL from the extracted data to use as Optic input
  3. Create the Optic plan, run it against the Optic query and return the results
import module namespace json="http://marklogic.com/xdmp/json"
 at "/MarkLogic/json/json.xqy";
import module namespace op="http://marklogic.com/optic"
     at "/MarkLogic/optic.xqy";
declare namespace tde ="http://marklogic.com/xdmp/tde";

declare variable $SAMPLE-SIZE := 600;
declare variable $ISOLATION-QUERY := cts:directory-query("/llama/name/", "infinity");
declare variable $OP-CODE := function($row){$row   
  => op:where(op:sql-condition("birthdate between '2004-02-22' and '2022-02-22'"))
  => op:select(("name", "id", "birthdate"), "")
};
declare variable $TEMPLATE := <template xmlns="http://marklogic.com/xdmp/tde">
  <description>TDE Extraction-sample</description>
  <context>/llama</context>
  <rows>
    <row>
      <schema-name>toolbox</schema-name>
      <view-name>tdePrototype</view-name>
      <columns>
        <column>
        <name>id</name>
        <scalar-type>int</scalar-type>
        <val>./id</val>
        </column>
        <column>
        <name>name</name>
        <scalar-type>string</scalar-type>
        <val>./name</val>
        </column>
        <column>
        <name>entityType</name>
        <scalar-type>string</scalar-type>
        <val>./type</val>
        </column>
        <column>
        <name>birthdate</name>
        <scalar-type>date</scalar-type>
        <val>./birthDate</val>
        </column>
        <column>
        <name>farmId</name>
        <scalar-type>string</scalar-type>
        <val>./farm</val>
        </column>
      </columns>
      </row>
    </rows>
</template>;


declare function local:generate-sparql($docs){
  let $select := '(' || string-join(for $column in $TEMPLATE//tde:column 
    return '?' || $column/tde:name/text(), " ") || ')'

  let $rows := string-join(
    for $row in xdmp:to-json(tde:node-data-extract($docs, $TEMPLATE))//*:data ! xdmp:from-json(.)
    return '(' || string-join(for $column in $TEMPLATE//tde:column
      let $name := $column/tde:name/text()
      let $type := $column/tde:scalar-type/text()
      return '"' || map:get($row, $name) || '"^^xs:' || $type
    , " ") || ')'
  , "&#13;")

  return 
  'prefix xs: <http://www.w3.org/2001/XMLSchema#>
  SELECT * WHERE { 
       VALUES ' || $select || ' {
       '|| $rows ||'
    }
  }'
};

let $docs := cts:search(doc(), $ISOLATION-QUERY, "score-random")[1 to $SAMPLE-SIZE]
let $sparql := local:generate-sparql($docs)
return op:from-sparql($sparql)
  => $OP-CODE()

Conclusion

Hopefully the last example is a useful illustration of how to prototype both locally and with with live data in a portable way with no need to install the TDE template.

Need Some Help?


Looking for more information on this subject or any other topic related to MarkLogic? Contact us (info@cleverllamas.com) to find out how we can assist you with consulting or training!