Standard Data Exchange formats

The Data Exchange library allows you to flexibly map any JSON, XML format, and CSV or Excel sheet onto appropriate identifiers in your AIMMS model. Although not difficult, creating such mappings may considerable time to create. In this section we will describe how to create standard Data Exchange formats for JSON, XML, CSV and Excel based on the identifiers in your model, just by specifying which identifiers to take together in a single table through annotations specific to the Data Exchange library.

In this section we’ll describe two standard formats that can be generated by the Data Exchange library:

  • a row-based format, with multiple dataset and tables

  • a document-based format

The row-based format

To introduce the concepts behind the row-based standardized Data Exchange formats offered by the Data Exchange library, we’ll describe a generic pattern used by many customers to set up application databases with their models, which we will use as a base for constructing the standardized formats.

Commonly used principles in setting up application databases

Application databases for AIMMS models, typically group identifiers with an identical domain into a single relational table, with a separate column in the primary key of the table for each index in the shared index domain and a non-primary column for every multi-dimensional identifier. To allow for storing multiple datasets/scenarios in such a table, typically an additional dataset/scenario key is added to the primary key of the table. Typically, the dataset/scenario key is shared among multiple tables.

When exchanging data with such an application database, typically, the data for an dataset/scenario is read from, or written to, the tables holding the data for these datasets/scenarios in its entirety. Replacing the data of individual rows or columns within a dataset/scenario, may lead to consistency problems with application instances that have already read the data in the dataset/scenario prior to updating individual rows/columns. Thus, even when only some data is changed, creating a complete new dataset seems a much easier and safer approach, especially as it is fairly easy to clean up old and, potentially, replaced datasets. In case updating individual rows and columns is essential, the use of CDM seems to be the more appropriate solution to make sure that data between multiple sessions is properly synced.

Design principles of standard Data Exchange format

Thus, we arrive a common and generic framework that will allow to exchange almost any data with an AIMMS model:

  • datasets to organize a number of tables holding data describing a particular functional aspect of an application

  • tables to hold the data of one or more instances of datasets for a number of AIMMS identifiers with an identical domain, where the columns either hold the dataset instance, and the indices and values of the identifiers represented in the table.

Once all potential tables and datasets have been specified, it is fairly easy to generate Data Exchange mappings to read or write JSON, XML, CSV or Excel files that contain the data of single tables, all tables in a single datasets, or all data associated with all datasets.

Example

The following JSON data contains the data of an instance of two datasets, each containing two tables.

{
  "dataSets": {
    "DataSet1": {
      "instance" : "data of 07-09-2020"
      "tables": {
        "Table1": [
          { "i":"i1", "j":1, "pn": 10.0, "ps": "a value"},
          { "i":"i1", "j":2, "pn": 20.1, "ps": "another value"}
        ],
        "Table2": [
          { "i":"i1", "j":1, "k":3, "qn": 10.0, "qs": "a value"},
          { "i":"i1", "j":2, "k":4, "qn": 20.1, "qs": "another value"}
        ]
      }
    },
    "DataSet2": {
      "instance" : "data of 07-09-2020"
      "tables": {
        "Table3": [
          { "i":"i1", "rn": 10.0, "rs": "a value"},
          { "i":"i2", "rn": 20.1, "rs": "another value"}
        ],
        "Table4": [
          { "i":"i1", "k":3, "sn": 10.0, "ss": "a value"},
          { "i":"i1", "k":4, "sn": 20.1, "ss": "another value"}
        ]
      }
    }
  }
}

Uses of the row-based format

The standard Data Exchange format discussed above is flexible enough to support a range of scenarios for integrating an AIMMS model into the wider IT landscape:

  • The format allows a standardized approach from calling external APIs from within an AIMMS model using the Asynchronous HTTP Client. When calling web services to call Python or R scripts, e.g., to apply ML/AI algorithms to data passed from the model, or to retrieve results from applying ML/AI algorithms to external data retrieved from these scripts, the format can be easily read into, or generated from, Pandas in Python or Dataframes in R.

  • The format would be the natural candidate to call AIMMS models through a REST API, as it can be readily generated from any environment.

  • Based on the concepts of datasets and tables, it easy to generate an application database from the model annotations, and to create a web service that allows data exchange with such an application database using the standard format.

Generating the dataset mapping files from annotations

To create the mapping between multi-dimensional identifiers and datasets, tables and column names, you can use the following model annotations:

  • dex::Dataset

  • dex::TableName

  • dex::AutoTablePrefix

  • dex::ColumnName

  • dex::SuffixList

  • dex::ExtraAttributeList

  • dex::RowFilter

Through the dex::TableName annotation you can indicate for multi-dimensional identifiers and/or sections of multi-dimensional identifiers, to which table they should belong. The Data Exchange library will verify that all identifiers share a common index domain, and return an error if this is not the case. You can use the dex::ColumnName annotation to indicate a column name for multi-dimensional identifiers and indices. If you don’t specify an explicit column name, the Data Exchange library will use the identifier name as the implicit column name. Instead of using annotations, you can also directly set the column name for specific identifiers via the identifier dex::ColumnName. If you want to set the column name for columns in a table associated with indices, you can either create a separate index node, and use the dex::ColumnName attribute, of assign the column name to be used for the index to the string parameter dex::ColumnName.

Alternatively to creating table names yourself through the dex::TableName annotation, you can also let the Data Exchange library create tables names by specifying the dex::AutoTablePrefix annotation. For every identifier, with this annotation set, for which you didn’t specify an explicit table name, the Data Exchange library will auto-generate a table name, by starting with the dex::AutoTablePrefix, followed by all indices in the declaration of the identifier, separated by underscores. This will create tables where all identifiers with the same collection of indices will end up in the same table. All scalar identifiers will be assigned to the table dex::AutoTablePrefix followed by _scalar. Through the global option dex::PrefixAutoTableWithDataset you can prefix the generated table names with the specified dataset name, to prevent potential name clashes when the same table name is generated for multiple data categories.

By assigning the dex::Dataset annotation to specific identifiers or sections of identifiers, the Data Exchange library will deduce the mapping between datasets and tables. Typically one would assign the dex::TableName and dex::Dataset to sections of identifiers with identical domains. If any identifier is both mapped to a table and a dataset, the combination will be assigned to dex::DatasetTableMapping. Instead of using the dex::Dataset annotation, you can also assign 1 to specific combinations of tables and datasets in the identifier dex::DatasetTableMapping directly.

Through the dex::SuffixList annotation you can specify the extra suffices (next to the level value) that you want to add the to the set of columns of the table to which the identifier itself is added. The format of the of the dex::SuffixList is as follows

<suffix-1>[=<suffix-1-columnname>][;<suffix-2>[=<suffix-2-columnname>]];...

If you do not explicitly specify column names in the semi-colon-separated list of suffices, the column names will be <identifier>.<suffix>.

With the dex::ExtraAttributeList annotation you can specify any additional mapping attributes that you want to have added to the mapping generated for a specific identifier. The value of the dex::ExtraAttributeList annotation is a semi-colon-separated list.

<annotation>=<value>[;<annotation>=<value];...

where <value> is the literal text that you want assign to the annotation <annotation>. If you want to set the extra attributes for columns in a table associated with indices, you can either create a separate index node, and use the dex::ExtraAttributeList attribute, of assign the column name to be used for the index to the string parameter dex::ExtraAttributeList. When you add name-binds-to and name-regex attributes to an multi-dimensional identifier, the Data Exchange library will pick this up, and use the dimension reduced by one to compare with the number of bound indices of each row, because using the name-binds-to index will cause the values of the last index to appear as columns in the generated table. If you have specified multiple identifiers with a name-binds-to attribute over the same set, you should use the name-regex-prefix or name-regex-postfix attributes to distinguish which columns are associated with which identifier.

You can use the annotation dex::RowFilter to specify an identifier that should serve as a write-filter attribute for the rows being generated in the mapping. The identifier should have the same indices as all identifiers in the table. The RowFilter annotation should be the same for all identifiers in a specific table. If all identifiers in a table are contained in a single section in your model, you can best add the annotation to that section, in which case all identifiers in the section will inherit it. With the row filter you can limit the number of rows being generated when writing a file using the mapping.

To generate all annotation-based mapping, you can call the procedure dex::GenerateDatasetMappings(). This will generate Data Exchange mappings in the subfolder Mappings/Generated in the main project folder. The following mappings will become available for every <dataset> and <table>:

Mapping

Description

JSONDataset

all tables for all datasets in a single JSON file

XMLDataset

all tables for all datasets in a single XML file

Generated/<dataset>-Excel

all tables for dataset <dataset> in a single Excel file (one sheet per table; only if data is available for that sheet)

Generated/<dataset>-Excel-all-sheets

all tables for dataset <dataset> in a single Excel file (one sheet per table; regardless whether data is available for that sheet)

Generated/<dataset>-<table>-JSON-Sparse

table <table> in dataset <dataset> in a single sparse JSON file (only non-default data)

Generated/<dataset>-<table>-JSON-Dense

table <table> in dataset <dataset> in a single dense JSON file (also default data)

Generated/<dataset>-<table>-JSON-RowOriented

table <table> in dataset <dataset> in a single row-oriented JSON file (array of row arrays)

Generated/<dataset>-<table>-JSON-ColumnOriented

table <table> in dataset <dataset> in a single column-oriented JSON file (array of column arrays)

Generated/<dataset>-<table>-XML-Sparse

table <table> in dataset <dataset> in a single sparse XML file (indices as attributes; values as elements; only non-default data)

Generated/<dataset>-<table>-XML-SparseAttribute

table <table> in dataset <dataset> in a single sparse XML file (indices and values as elements; only non-default data)

Generated/<dataset>-<table>-XML-Dense

table <table> in dataset <dataset> in a single dense XML file (indices as attributes; values as elements; also default data)

Generated/<dataset>-<table>-XML-DenseAttribute

table <table> in dataset <dataset> in a single dense XML file (indices and values as elements; also default data)

Generated/<dataset>-<table>-CSV

table <table> in dataset <dataset> in a single CSV file

Adding extra columns

Through the parameter dex::ExtraTableValueColumns you can add extra string columns to a table with the value attribute set to the string value of this parameter. If that string value starts with the # character, it will be replaced with the content of the memory stream whose name matches the string value when writing files using these generated mappings.

For the special case where you want to add the name of the dataset instance, you can also set the string parameter dex::DatasetColumnName to the column name to be added to every generated table to hold the name of the dataset instance. The value attribute of these columns will be set to #<dataset>-instance, where <dataset> is the name of the appropriate dataset for each table. Prior to writing you have to provide the instance name for in the memory stream named #<dataset>-instance.

Limiting the mappings generated

Through the parameter dex::DatasetGeneratorFilter(gt,mm,amt) you can indicate which mappings you want the Data Exchange library to generate. By default, only Excel, CSV and Parquet will be generated. The parameter dex::DatasetGeneratorFilterDomain(gt,mm,amt) holds all the allowed formats, and you can copy the values of it to obtain all possible mappings.

Generating dataset mappings containing external bindings

By default the generated mappings will iterate over the complete domain of all identifiers contained in each table. Via the element parameter dex::ExternalTableBinding(tbl,extbind,IndexIndices) you can define a collection of external bindings for a collection of initial indices of all identifiers contained in table tbl. The value of the element parameter should be the element parameter to which the index is externally bound. Via the parameter dex::LeaveExternalBindsToIntact(tbl,IndexIndices) you can indicate whether to skip the column for the externally bound index (default), or to retain this column in the table.

Generating multiple mappings

You can generated multiple collections of dataset mappings by setting the string parameter dex::MappingPrefix to a prefix that will be used for every generated mapping. You can use this for instance to create a mapping with and without external bindings.

A document-based JSON format

Next to a table-based format, the Data Exchange library can also generate a document-based nested JSON format, where sets are regarded as a collection of objects with attributes, which, in its turn, may refer to sets translated into objects which now displays data defined over both sets. This may lead to JSON documents such as:

{
  "documentDate" : "2021-10-29T10:00:00Z",
  "Countries" : [
    {
        "country" : "The Netherlands",
        "population" : 17651600,
        "capital" : "Amsterdam",
        "Provinces" : [
            {
                "province" : "Drente",
                "capital"  : "Assen",
                "..." : "..."
            },
            "..." : "..."
        ],
        "..." : "..."
    },
    "..." : "..."
  ],
  "..." : "..."
}

This document could be generated on the basis of the following identifiers in the model

  • documentDate

  • CountryPopulation(country)

  • CountryCapital(country)

  • ProvinceCapital(country,province)

where country is an index in the set Countries, and province an index in the set Provinces.

Generating the document mapping files from annotations

To create the mapping between multi-dimensional identifiers and datasets, tables and column names, you can use the following model annotations:

  • dex::Document

  • dex::FieldName

  • dex::SuffixList

  • dex::ExtraAttributeList

Through the annotation dex::Document you can indicate the document name in which you want the identifier to be included.

By default, the field name associated with a particular identifier is identical to the identifier name. With the annotation dex::FieldName you can override this default. You can also set this annotation for the indices in the domain of the multi-dimensional identifiers used in the document and their associated sets.

The suffices dex::SuffixList and dex::ExtraAttributeList can be used in a similar ways as with the dataset mappings.

To generate all annotation-based mapping, you can call the procedure dex::GenerateDocumentMappings(). This will generate Data Exchange mappings in the subfolder Mappings/Generated in the main project folder, one per document mapping.

Creating your own annotation-based formats

The standard formats described in this section make some arbitrary choices in representing the data in various formats. However, they all follow the same pattern for generating them, which you can follow to generate your own annotation-based data exchange formats.

The basis for generating a new annotation-based format is a generator mapping, which is an XML mapping specifying how to generate a Data Exchange mapping for a JSON, XML, CSV or Excel document type, based on the contents of identifiers in the Data Exchange library.

  • The generator mappings for the formats in this section are contained in the Mappings/Generators subfolder of the Data Exchange library.

  • The section Public Section/Dataset Mapping Generators of the Data Exchange library contains the identifiers used by the dataset mappings. It also contain a procedure to read the dataset annotations and fill the dataset-related identifiers, as well as a procedure to generate the dataset mappings based on this data.

  • The section Public Section/Document Mapping Generators of the Data Exchange library contains the identifiers used by the document mappings. It also contain a procedure to read the document annotations and fill the document-related identifiers, as well as a procedure to generate the document mappings based on this data.

To define your own annotation-based formats, you can create a new library, for which you can specify the annotations you want to make available in the model in the file AnnotationDefinitions.xml in the Settings subfolder of your library (taking the AnnotationDefinitions.xml file from the Data Exchange library as an example). Subsequently, you can take any of the given generator mappings, and generating data and procedures as an example to create your own custom annotation-based mapping.