Azure Data Lake Storage
Every AIMMS Cloud account is equipped with an Azure Data Lake Storage account. This storage account can be used for storing application data as, for instance, a collection of Parquet files. The use of Parquet files can serve as a faster replacement for storing application data in a database. Storing multiple scenarios of application data can be easily accomplished by storing each scenario as a separate collection of Parquet files in a separate directory in a Data Lake Storage file system.
Many applications are able to deal with Parquet files, and as such they can serve as an ideal method for data integration with other applications. Data warehouses such as Databricks are built on top of the Parquet format, and Snowflake has integrated support for importing and exporting data through Parquet files. Tools such as PowerBI can directly access Azure Data Lake Storage and work with the data offered in Parquet files.
Because of the ease of use, easy capabilities for integration, performance and space efficiency every AIMMS Cloud account has been equipped with a Azure Data Lake Storage account. The Data Exchange library offers an easy-to-use collection of functions to create file systems within the account, and upload and download collections of files from within AIMMS.
File systems
Within a single Azure Data Lake Storage account, multiple file systems can be created. These file systems can be mounted as an HDFS volume in platforms such as Apache Spark, or be directly connected to PowerBI for reporting.
You can create file systems for multiple purposes, such as
a storage area for the application data and/or scenario data used by the AIMMS application itself. Ideally, you would make such a file system readable and writable only to the AIMMS application itself.
an import area where external applications can write datasets to be used by an AIMMS application. This area would need to be writable to external applications.
an export area where the AIMMS application can write datasets to be used by external applications. This area would need to be readable to external applications.
Managing file systems
Once the authorization details of your Data Lake Storage account have been configured, the Data Exchange library offers a number of functions to create, delete and list all the file systems in your Data Lake Storage accounts. This will allow you to create one or more file systems for different purposes as described above.
Transferring files
Once you have a file system ready, the Data Exchange library offers a collection of functions to upload or download a single file, or the contents of an entire directory of files to/from a path within a file system within your Data Lake Storage account.
Support for storing data of generated datasets in DLS as a collection of Parquet files
The Data Exchange library introduced the datasets generated from model annotations as a convenient way to store your application data in various row-based formats. Given the large amounts of data typically involved with optimization models, we prefer Parquet for storing application data, given its compact storage, fast reading and writing, and its wide usability because of its open source nature. To store and exchange application data in an easily shareable way, the DEX library provides an easy way to generate all data in a dataset as Parquet files and upload them to Azure Data Lake storage in a single call, and, conversely, download Parquet files from Azure Data Lake storage and read the data into the model.
For any given generated dataset in your model, you can call one of the functions
dex::dls::WriteDataSetByTable(dataset, instance)
dex::dls::WriteDataSetByInstance(dataset, instance)
to let the Data Exchange library generate a collection of Parquet files for an instance of the given dataset dataset from the current data in your model, and store them in the configured Azure Data Lake storage. These functions differ in the way they organize the data in Azure Data Lake storage. Both work from a container in the configured Azure Data Lake storage, pointed to by either dex::dls::DatasetsByTable or dex::dls::DatasetsByInstance, respectively. Inside this container, Parquet files are organized as follows:
<dataset>/<table>/<instance>.parquet
<dataset>/<instance>/<table>.parquet
Conversely, you can read back the data associated with an instance of a generated dataset through the functions
dex::dls::ReadDataSetByTable(dataset, instance)
dex::dls::ReadDataSetByInstance(dataset, instance)
You can use these functions to create a very easy-to-use storage scheme for your application data, which can also be used very easily to exchange data with other applications by providing them with a SAS token to your Azure Data Lake storage container.