Benutzer-Werkzeuge

Webseiten-Werkzeuge


fileformats:hdascii

The HD-ASCII File Format

History

This component implements the HD-ASCII File format, originally designed and implemented by Jan Simon in Matlab for usage in the Gaitlab Heidelberg.

General features

The HD-ASCII file format is a container format for any data organized in matrices. The data types are orientated on the types available in the Matlab environment. Each matrix of data is named, but there is no defined order.

Disadvantages

The HD-ASCII file format is extensivly used in the Gaitlab Heidelberg, but it is not recommended for Nimue based applications, because of several limitations:

  • The resolution of the numbers is insufficient defineable. This can result in ilusive bugs.
  • To ascertain which data is available the complete file must be read because there is no order of the data and no header with the labels.
  • If timeseries are saved and also some additional parameters, there is no way to detect the length of the timeseries because it is not defined which dimension of the multidimensional matrices corresponds to the time.
  • It is not possible to save data from more than one trial, e.g. data of a complete session.
  • It mixes data types with saved data. Each matrix has a header and if this is closed with a „0“ no data for this matrix is saved.

All of these limitations are overruled by the Nimues default open and flexible .d3d file format.

File extentions and mime types

To distinguesh different sets of data in the Heidelberg Gaitlab different file suffixes are defined.

File Extension Mime type Frames Description Fix set of variables
.glk single trial Lower Body Kinetics (PiG) X
.glkn single trial Time normalized .glk-files X
.glm text/x-glm single trial Lower/Upper Body Kinematics (PiG) X
.glmn single trial Time normalized .glm-files X
.gle text/x-gle single trial EMG data X
.glen single trial Time normalized .gle-files X
.gla text/x-gla mean Lower/upper body kinematics (PiG) average files X
.pkl Meta Data X
.glx text/x-glm single trial Data of single gait trials, without time-normalisation, experimental set of variables and parameters, typically an extention set of the data written in glm-files.
.glxn single trial time normalized .glx
.gxa or .gxa text/x-gax mean *
.gaf text/x-gaf mean Foot Kinematics X
.glf single trial Foot Kinematics X
.glfn mean Session mean; normalized to 101 frames time normalized .glf X
.gnm text/x-gnm

If a single trial is splittet into strides and timenormalized for each stride a seperate file is created. These files are numbered as the following examples shows:

xxx.glx –> xxx_00.glxn, xxx_01.glxn, xxx_02.glxn

General syntax

Most of the following parts of the documentation is a formatted copy from Jans original documentation.

The HD-ASC files have been invented as simple and human-readable data files without limitations for certain computer types or software for accessing. Therefore the data are save in ASCII mode with the 127-bit character set. A file can include an arbitrary number of variables, which can have the types double, string (which is a character array) or string list. Each variable can have arbitrary number of dimensions of any size limited just by the physical properties of the computer system. In addition the first line contains a header with a section for a user-defined control string. The number of significant figures for double values is defined in the header also.

Details

The order of variables in the file is arbitrary.

Although there is no reason to prefer DOS, the ASC-HD style files should use DOS line breaks, which have the ASCII code [13, 10]. The files are closed with a trailing line break, what is usual for unix.

The names of the variables are built up by upper and lower case characters from 'a' to 'z', 'A' to 'Z', '0' to '9' and '_' with a leading letter. Dots '.' are allowed to separate sub-variables, so they must not be leading or trailing or following eachother.

Examples of valid names: A, A1, A2_, A_B, A.C, A.D.E, b, bcd

Examples of invalid names: 1A, _B, .C, D., e..f, g.h., j$

In the files the section for each variable starts with a tag line containing the name surrounded by square brackets and followed by the dimensions of the data. The separators between the dimensions determine the type of the data: doubles use the colon ':', strings the dollar '$' and string lists the ampersand '&'. Comments start with an '#' and are allowed in the lines with the variables only.

The values start in the following line with different strategies for the data types:

Character arrays (strings)

The strings are the rows of a character array. Only [1 x N] or [M x N] character arrays are allowed in opposite to string lists and double arrays. The single strings are written as lines and separated by line breaks. Therefore character array must not contain line breaks (ASCII 10 and 13). The number of characters is equal for all strings of a character array, so strings are padded with trailing spaces (Matlab style). The length of the strings need not be specified, because it is implicitly determined by the number of charcters in the lines of the file. Therefore only the index N matters for [N x M] character arrays. In consequence the tag lines „[StringName]$4“, „[StringName]$1$4“ and „[StringName]$4$1“ behave exactly the same.

String lists (Matlabs cell string)

The strings of a string list can vary in the length and no trailing spaces are needed. The single strings of the list are written as lines separated by line breaks. Therefore strings must not contain line breaks (ASCII code 10 or 13). For string lists with more than 2 dimensions the first index is varied at first, then the second and so on, so-called columnwise order.

Double arrays

For doubles with 2 dimensions the rows (varying 2nd index) are stored in lines separated by spaces. For more then 2 dimensions the 2nd index remains the number of elements per line, while the trailing dimensions are kept together (see examples and algorithm). This means, that row vectors (dimension [1 x N]) are written in a single line, while column vectors (dimension [M x 1]) fill M lines. It is not allowed to write column vectors in a line, because the number of lines must match the product of all but the 2nd dimension under all circumstances. Then the reading function can skip a value section by jumping over a well-defined number of lines to find the next tag line.

For doubles and string lists at least 2 dimensions are assumed (as usual in Matlab). Scalars and vectors have the size [1 x 1], [N x 1] or [1 x M] respectively. In the tag lines the sizes of [1 x M] column vectors can be abbreviated by omitting the first index. Scalar variables need no dimensions at all, so just a separator for determination of the type of data has to be included in the tag line. For scalar doubles even this separator is optional. See examples for illustration. Additional empty lines are allowed before a tag line, but not inside the value sections.

The IEEE equivalents of NaN (not a number) and +/-Inf (infinity) double values are written as 'NaN' and 'Inf' or '-Inf', respectively. Examples: doubles For a scalar double 'A' with value 2 the tag line and data line is:

[A]:1:1
2

The tag line can be abbreviated (all types, seperator depends on type):

[A]:1

or (all types, seperator depends on type)

[A]:

or just (separator can be omitted for doubles only)

[A]

For row vectors of size [1 x N] the first dimension need not be mentioned. The tag and data lines for 'B' with value [3,4] are:

[B]:2
3 4

The long form of the tag line without abbreviations is valid, too:

[B]:1:2

A double matrix of size [2 x 3] with values [1,2,3;4,5,6] with name 'C':

[C]:2:3
1 2 3
4 5 6

Multi-dimensional doubles

The approach to use the second dimension as lines and keeping the trailing dimensions together enforces a complex ordering of values. In consequence the array is split at the first dimension in parts.

An generic Matlab algorithm for an array [A] with dimension [Dims] is:

Ax = transpose(reshape(A, Dims(1), prod(Dims(2:end)));
fprintf(FID, Format, Ax);

Here [Format] if a format string with the number of [Dims(2)] keys like '%g '. Remember that Matlabs printf function inserts the values columnwise and the dimensions of [Ax] do not correlate directly with [Dims].

For reading an additional reshaping is needed:

Ax = fscanf(FID, '%g', prod(Dims))
Ay = transpose(reshape(Ax, prod(Dims(2:end)), Dims(1)));
A  = reshape(Ay, Dims);

The in PutASC and GetASC implemented algorithms differ from these demonstrations.

For multi-dimensional string lists the proceeding is much easier: There is no reason to keep anything together, so they are saved columnwise. This equals the representation in the memory in Matlab.

Example: D is a [2 x 3 x 4] array containing the numbers from 1 to 24 in columnwise order:

D(1,1,1) = 1; D(2,1,1) = 2; D(1,2,1) = 3 and so on.

In the ASC-HD file this is written as:

[D]:2:3:4
1 3 5
7 9 11
13 15 17
19 21 23
2 4 6
8 10 12
14 16 18
20 22 24

Number of columns is the 2nd dimension 3, number of lines is the product of all other dimensions 2*4 = 8.

Examples: strings

A single string is a row vector of characters like 'abc'. It has the dimension [1 x 3], but the number of columns is determined by the number of characters per line. The variable is called 'D':

[D]$
abc

This is equivalent to:

[D]$1
abc

A character array of size [2 x 6] named 'E' with contents ['Du '; 'hier '], spaces shown as dots:

[E]$2
Du....
hier..

Pay attention for the 2nd dimension: The length of each line determines the row dimension! Therefore $N, $1$N and $N$1 are treated identically in opposite to the other types. Examples: string lists The string list {'Du'; 'hier'} as string list has size [2 x 1] and the lengths of the single strings can vary (see Matlab CELLSTR):

[E]&2&1
Du
hier

The string list with size [1 x 2] can have abbreviated dimensions again:

[F]&2
Du
hier

is equivalent to:

[F]&1&2
Du
hier

Spaces and empty strings are allowed in string lists. To illustrate that each line of the example is surrounded by > and <. G = {'', ' ', 'Hello '} in ASC-HD style:

>[G]&3<
><
> <
>Hello <

The strings of multi-dimensional string lists are written in columnwise order: For list H of size [2 x 3 x 4] the string H(1,1,1) is written at first, then H(2,1,1) follows and the next is H(1,2,1) and so on. This is much straighter than the proceeding for multi-dimensional doubles, because there is no need to join several elements in single lines. Examples: Empty arrays Empty arrays of size [0 x 0] get a single 0 dimension: 'A' is an empty double, 'B' an empty string and 'C' and empty string list, then the tag lines looks as following:

[A]:0
[B]$0
[C]&0

No lines with values appear in these cases. Empty arrays with any dimension differing from 0 are stored with the complete dimensions: 'A' is a double of size [0 x 2 x 3], then:

[A]:0:2:3

and no values appear. Comments and empty lines The tag line can be commented with the '#' character. Spaces after the dimensions section are ignored.

[TagName]:2:3   # This is the comment

Empty lines can be inserted before a tag line:

[Tag1]$
String1
[Tag2]&
StringListElement1

Header line

Old style: In ASC-HD v2.0 the header line looks as following:

#!ASCII v2.0 GaitLabs Heidelberg Standard

or with header 'Specific Header' defined in inputs of PutASC:

#!ASCII v2.0: Specific header

The next version of the ASC-HD syntax is 4.0 (there is no 3.0!) valid since fASC4.00: The standard header line starts with a string equivalent to:

#!ASCII v4.0 ASC-HD [Digits 6]

This means that the file is an ASCII file of the version 4.0 in the ASC-HD style. Double numbers are saved with 6 significant digits. No individual header was defined.

For individual headers any string can be appended to this standard header using a colon ':' as separator:

#!ASCII v4.0 ASC-HD [Digits 6]:Individual part (23-Apr-2006)…

Then GetASC replies the output Header:

'Individual part (23-Apr-2006)…'

If new variables are appended to an existing file, the number of digits must not be decreased to avoid a loss of data. Be aware that management of data precision is a critical drawback when using ASCII files.

Comments

Disk space is cheap today. Therefore it is recommended to use 'complete' data files: Among the data enough information to allow a re-creation of the contents.

Especially for ASC-HD files, which are designed with respect to long-time accessability and for exchange with other labs, completeness and reproducibility is the base scientific data acquisition. So some meta-variables are good ideas:

Date and time of creation, host name, institution, properties of measurement and computation, examined subject or object - or at least a pointer to a location where these information could be found.

Guidelines for implementations of reading and writing routines

Line breaks deviating from the preferred DOS style should be accepted.

Before appending new variables to a file, check the trailing line break. It is a beloved feature to extract single variables from am ASC-HD file, but avoid dull searching for '[' to locate tag lines: It can be data of a string or string list!

Users will insert errors in ASCII files whenever possible. A fair reading function counts the found elements and displays the name of the variable and the line number of problems.

There is no idea how to treat characters with more then 7 bits. Think of an additional unicode type, the TeX like [„a] for the german Umlaut-a, the ISO-Latin-1 [&auml;].

fileformats/hdascii.txt · Zuletzt geändert: 2017/12/22 15:08 von oliver

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki