Skip to main content

Simple Schemas

To simplify the complexity of creating schemas in formats such as Avro (.avsc) or JSON-schemas, a new format has been created at STRM Privacy, called Simple Schemas. It is a YAML based format1, which allows for readability and understandability by many people in your organization.

Simple Schemas

A Simple Schema is composed of only the data fields that you require. The strmMeta section in is omitted in the Simple Schema representation, as it is translated into an Avro avsc definition for you to use when serializing data. Since Avro is well-equipped for serializing and deserializing data, there was no need to create yet another serialization format.

note

Simple Schema is a representation of your schema, it is not a serialization format.

In the quickstart Simple Schema example, you'll be guided through a hands-on interaction that shows the technical details when using Simple Schemas.

A simple schema defines a list of nodes. Each node is an entity with the following attributes:

  • name
    required
    that you can use to access the entity.
  • avro_name
    optional
    conforms to the naming rules for Avro. Is derived from name unless it was explicitly set. Only use if you need to override the Avro name; as this needs to be an Avro compatible name, this needs to be correct.
  • type
    required
    an integer, string, float or a node
  • repeated
    optional
    defines whether the field can occur more than once (i.e. is a list). Defaults to false.
  • required
    optional
    defines whether the sender must fill in this field value. Defaults to false.
  • doc
    optional
    documents the purpose of the field.
  • nodes
    optional
    holds child-nodes for nested data structure. This is only valid when the type is NODE

An example of a simple schema:

name: Clicks
nodes:
- name: SessionId
type: STRING
doc: the string value that connects events to a single sequence
required: true
repeated: false
- name: User Name
type: STRING
doc: we use a data contract to define that this is private
- name: url
type: STRING
doc: the URL of the current page
- name: mouse positions
repeated: true
type: NODE
nodes:
- name: x
type: INTEGER
- name: "y"
type: INTEGER
caution

YAML allows for many variations to indicate the boolean value true. The reason "y" is quoted in the example above, is since YAML otherwise would resolve y to true.

1 the shortcomings and challenges of YAML are well-known, though readability and simplicity was the major motivation to use YAML for Simple Schemas.