Simple Schema
This quickstart guides you through creating a Simple Schema and see how it can be used. Simple Schemas are a much easier way of defining serialization schemas, as they can be based on widely adopted and probably familiar formats (though complex) Avro and Json-schema. Especially for large schemas, Simple Schemas help to define your data shape much quicker.
Defining a Simple Schema
For this quickstart, the following yaml
file that defines the simple schema will be used.
Note Session Id
has two additional arguments: required
and repeated
. This is necessary if you want to mark it as
the keyField
.
To install this schema into STRM Privacy, it needs to be attached to a Data Contract.
Creating a Data Contract
This quickstart requires a Data Contract with this schema that will define that:
SessionId
is thekeyField
, the attribute that ties events together.UserName
is a PII field, purpose 1
These names currently are the avroName
s, not the names in the simple
schema.
Checking strm create data-contract --help
tells us what we need to
create.
$ strm create data-contract quickstart/demo-data-contract/1.0.0 \
--schema-definition simple-schema.yaml \
--contract-definition contract.json
DATA CONTRACT STATE PUBLIC KEY FIELD # PII FIELDS # VALIDATIONS
quickstart/demo-data-contract/1.0.0 DRAFT false SessionId 1 0
--schema-definition
: the schema definition file that was defined in the previous section--contract-definition
: the definition of the data contract
Inspecting the schema
To get the schema from the Data Contract, execute the following command:
$ strm get data-contract quickstart/demo-data-contract/1.0.0 --output json | jq '.dataContract.schema'{ "schema": { "ref": { "handle": "quickstart", "name": "demo-data-contract", "version": "1.0.0", "schemaType": "AVRO" }, "state": "DRAFT", "definition": "{\"type\":\"record\",\"name\":\"Clicks\",\"namespace\":\"com..... "simpleSchema": { "name": "Clicks", "avroName": "Clicks", "nodes": [ { "type": "STRING", "name": "Session Id", "avroName": "SessionId", "doc": "the string value that connects events to a single sequence" }, { "type": "STRING", "name": "User Name", "avroName": "UserName", "doc": "we use a data contract to define that this is private" }, { "type": "STRING", "name": "url", "avroName": "url", "doc": "the url on the website" }, { "type": "NODE", "name": "mouse positions", "avroName": "mousepositions", "doc": "price in UK pounds", "nodes": [ { "type": "INTEGER", "name": "x", "avroName": "x", }, { "type": "INTEGER", "name": "y", "avroName": "y", } ] } ] } },...
Name and avroName
The avroName
fields look similar to the
original name
. These names however, are subject to the Avro name
constraints
whereas the name
attributes in a Simple Schema are not 1.
The generated schema
The schema response, contains an actual Avro avsc definition string, which has been generated from the provided Simple Schema. This definition is what is actually being used by STRM Privacy, the Simple Schema is only a way to create it.
If you're interested in the actual Avro Schema, you can make it a little
more visible by using some jq
magic.
$ strm get data-contract quickstart/demo-data-contract/1.0.0 --output json | jq '.dataContract.schema.definition | fromjson | .fields[1]'
{
"name": "SessionId",
"type": "string",
"doc": "the string value that connects events to a single sequence"
}
View the entire generated Avro schema
with strm get data-contract quickstart/demo-data-contract/1.0.0 --output json | jq '.dataContract.schema.definition | fromjson'
Avoiding this complexity is exactly why we created Simple Schema. We expect that a large majority of use cases on STRM Privacy will use Simple Schemas.
Avro namespace
The Avro definition contains a
namespace
attribute, which can be extracted with the following command:
$ strm get data-contract quickstart/demo-data-contract/1.0.0 --output json | jq '.dataContract.schema.definition | fromjson | .namespace'
"quickstart.DemoDataContract.v1_0_0"
It is possible to override this in the Simple Schema yaml
file, but when it is absent, it
will be created from the data contracts' <handle>/<name>/<version>
.
Its value will generally only be interesting when you want to
use generated programming language code to create events.
In the example that follows, Python generated code is used to create some
events.
Send some data with Python
python3 -m venv venv. venv/bin/activatestrm get schema-code quickstart/demo-data-contract/1.0.0 --language=pythonunzip python-avro-demo-data-contract-1.0.0.zipcd python-avro-demo-1.0.0/make installpip install strmprivacy-driver
- generates Python code that knows how to serialize data that are valid for the Simple Schema you just created.
- you could also do
make dev-install
. See theMakefile
contents. - the STRM Privacy Python driver is accidentally excluded.
Next, follow along with the full Python
example. The difference in this quickstart
however,
is that the
example sender_async.py
code needs to be modified to use the schema-code that you just generated.
Continuing, create the following two streams:
strm create stream demo
strm create stream --derived-from demo --levels 1
Create a few environment variables that can be used to start up the Python code.
clientId=$(strm get stream demo --output json | jq -r '.streamTree.stream.credentials[0].clientId')
clientSecret=$(strm get stream demo --output json | jq -r '.streamTree.stream.credentials[0].clientSecret')
Next, start sending events with this Python code. The 204 http status code indicates the event was accepted and processed by the STRM Privacy Event Gateway with no issues.
python3 sender.py $clientId $clientSecret
INFO:__main__:Event sent, response 204
INFO:__main__:Event sent, response 204
Since Python 3.10, you'll encounter a warning from the HTTP client:
charset_normalizer:Encoding detection on empty bytes, assuming utf_8 intention.
on the 204 empty response. This is unfortunate, but can be safely ignored.
With the Web Socket endpoint, it is possible to consume the events of the input stream.
$ strm listen web-socket demo | jq { "strmMeta": { "eventContractRef": "quickstart/demo-data-contract/1.0.0", "nonce": 1606491963, "timestamp": 1640870686866, "keyLink": "08ad5b5c-f71a-46ea-88b3-41e2facb6211", "consentLevels": [ 3 ] }, "SessionId": "session-0", // callout-1 "UserName": "ASSPO2VhVDtRvZD+8yDkrJwwm8wEvnuXtSD6", "url": "url-0", "MousePositions": [ { "x": 353, "y": 188 }, { "x": 60, "y": 938 } ]}
- The field
UserName
is encrypted in the events that are received here, as we are listening via web socket to the encrypted stream.
The data that is received contains the avroName
attributes, and not the original name
. If you are interested in
receiving the original Simple Schema names, please contact us.
Regarding the derived stream, UserName
is now decrypted, and
only events with at least consent for purpose 1 are processed.
With the web socket endpoint, it is also possible to listen on the derived stream for purpose 1. Here too, only events where the data subject consented to purpose 1 will be received.
$ strm listen web-socket demo-1{ "strmMeta": { "eventContractRef": "quickstart/demo-data-contract/1.0.0", "nonce": -2105288911, "timestamp": 1640870774948, "keyLink": "08ad5b5c-f71a-46ea-88b3-41e2facb6211", "billingId": "strmquickstart1585470330", "consentLevels": [ 1 ] }, "SessionId": "session-0", "UserName": "user-7", "url": "url-2", "PrijsInGb": 3.3405764, "MousePositions": [ { "x": 252, "y": 992 }, { "x": 940, "y": 265 } ]}
UserName
is now decrypted
Simple Schema Reference
A simple schema is defined via the following Protobuf definitions.
message SimpleSchemaDefinition { string name string avro_name string namespace string doc repeated SimpleSchemaNode nodes }SimpleSchemaNode { SimpleSchemaNodeType type string name string avro_name bool repeated bool required repeated SimpleSchemaNode nodes string doc}enum SimpleSchemaNodeType { STRING BOOLEAN FLOAT INTEGER LONG NODE}
- the name of the top level entity.
- the Avro compatible name. This
field is heuristically derived from
name
unless you explicitly set its value in the SimpleSchema you provide. In that case it’s up to you to make sure it is valid. - the namespace of the top level entity. This affects generated source
code. It is generated from
<handle>/<name>/<version>
unless you explicitly set it, in which case you must make sure it’s a valid Avro namespace - optional documentation that describes the meaning of the attribute or event
- a list of
SimpleSchemaNode
entities describing attributes of the event. - the
SimpleSchemaNodeType
of an attribute. Simple ones areSTRING
etc… The type can also beNODE
in which case the schema has a nested structure. In that case thenodes
field must hold at least 1 childSimpleSchemaNode
. - the name of the attribute.
- whether the entry is repeated, so it is either a single node or a list of nodes
- whether this attribute is required. When required your sending software must fill in this value.
- when
type
is equal toSimpleSchemaNodeType
you can add 1 or moreSimpleSchemaNodes
here