6 Data Export
In the SPARK data export, one or more comma separated values (.csv
) files can be downloaded with the data of a project. These files follow the following conventions.
The filename has the format organizationId_projectId{*}.csv
, where the {*}
bit is empty when a full dataset is exported, and is used to denote the selection of data in that file when data is downloaded in separate files (see below). Note that organization identifiers (organizationId
) are set by the organization administrator, and are no persistent unique identifiers such as ROR identifiers (Research Organization Registry identifiers), and project identifiers are set by the project owner and has to be unique within the organization, but can collide with other project identifiers in other organizations.
The data will have the following column names and contents:
userId
: the unique identifier of the user in the project.timestamp
: the date, time, and timezone in ISO8601 format (e.g.2024-05-23T21:47:27+07:00
).datasourceType
: the type of data source for which this row contains information. This can have the following values:answer
: a participant’s answer to a questionstorage_node
: manually stored data in a decision tree Data Storage Nodestorage_question
: manually stored data in a Data Storage Questionstorage_state
: manually stored data in a state transitionspark_metadata
: SPARK behavior metadata
datasourceProvenance
: the data source provenance; contextual information specifying the data origins. This can have the following values:- for data of the
answer
data source type, the identifier of the question set - for data stored through a node, question, or state storage specification, the identifier of the relevant data source (e.g. of the third party API)
- for SPARK behavior metadata, in the current version of the SPARK, the only valid value is
treePath
, signifying that thevalue
specified in this row is a Decision Tree Path string.
- for data of the
datasourceId
: the identifier of the relevant data source. This has the following value:- for data of the
answer
data source type, the identifier of the question to which the answer is specified. - for data of one of the three
storage
types (storage_node
,storage_question
, orstorage_metadata
), the identifier of that data storage directive. - For data of the
spark_metadata
data type, the relevant identifier; for atreePath
, this is the identifier of the relevant tree.
- for data of the
value
: the relevant value: the actual data point. For Tree Paths, this lists the node identifiers that were traversed from the tree’s root (the initial node, which has the tree identifier as its identifier) to the ultimate leaf (the terminal node), separated by a greater-than sign, for exampletreeId > firstNodeId > secondNodeId > terminalNodeId
.
When downloading the data, users can choose whether they want to download:
- one
.csv
file; - a
.zip
archive containing separate.csv
files for each value ofdatasourceType
(in which case the filenames contain in place of the{*}
placeholder (see above) either_answer
,_storage_node
,_storage_question
,_storage_state
, or_spark_metadata
); - a
.zip
archive containing separate.csv
files for each value ofdatasourceProvenance
(in which case the filenames contain in place of the{*}
placeholder (see above) either an underscore immediately followed by the question set identifier, e.g._moodQuestions
; or an underscore immediately folowed by the data source API, e.g._heartRate
; or_treePath
); - a
.zip
archive containing separate.csv
files for each value ofuserId
(in which case the filenames contain in place of the{*}
placeholder (see above) an underscore immediately followed by the relevate user identifier).