Skip to content

AFNI and NIfTI Server for NIMH/NIH/PHS/DHHS/USA/Earth

Sections
Personal tools
You are here: Home » Misc Items » NIML Documentation » NIML Manual

NIML Manual

Document Actions

NeuroImaging Markup Language (NIML)

Base Level Specification

Robert W Cox, PhD -- 21 Feb 2002
**Please note that this document is a DRAFT, and extremely subject to revision**

Introduction

The purpose of this specification is to define a flexible, extensible, and self-describing format for encoding structured data for neuroimaging applications. The largest component of such information is the image data itself, but the images themselves are of limited use unless some auxiliary data (e.g., voxel dimensions, image orientation, timing information) are attached.

Another motivation for this specification is to work towards defining a standard and protocol for neuroimaging applications to exchange smallish pieces of data. If the community moves towards the development of interoperating software tools, it will be important for these applications to share not only the image files, but for them to be able to "talk" to each other interactively and to exchange small chunks of information or commands (e.g., "jump to coordinates (32,47,-13)".

This base level specification details how collections of disparate information can be packaged together. The body of this document describes the format for the data.

A C API for reading, writing, and storing information using this standard and protocol is described in the appendices. At this writing, a mostly-complete (but weakly-tested) implementation is available.

Individual data elements (1D or 2D tables of numbers and/or strings) are encoded in an XML-inspired format. An entire data collection consists of a number of data elements grouped together. One or more higher level documents will specify the structure and contents of prototypical neuroimaging data sets, and describe a communications standard for interoperating neuroimaging applications.

[** The higher level documents are only a gleam in my mind's eye.**]
XML note: The software that parses data formatted in the way specified herein is partly an XML processor and partly an application, in the jargon of the XML specification. For details about XML, the best place to start is the annotated XML specification: http://www.xml.com/axml/axml.html . The XML notes herein are intended to provide asides useful to someone who already knows something about XML.
XML note: Except for binary data, it will be possible to encode data in this format in a well-formed XML document (but not DTD-validated, thanks in part to the ni_typedef elements, which allow new NIML element types to be defined in the NIML document itself). Places in this specification where care must be taken to ensure XML well-formedness will be pointed out.
XML note: Documents formed according to this specification will not be fully general XML, since many features of XML (e.g., arbitrary nesting, CDATA, general DTDs, Unicode, entities) will not be supported. This is one reason why software that reads the type of data specified herein is only partly an XML processor.
XML note: Why not use a general XML processor as a front-end to this software (e.g., expat, available at http://www.jclark.com/xml/expat.html)? Mainly because I see a need for binary data to be included, since a typical MRI data set is 10-100 Mbytes. Expansion to a pure text form seems excessive just to conform to the XML specification, especially in standalone neuroimaging applications that otherwise don't care about XML at all. Nor do I think that the XML solution to binary data (reference to an external unparsed entity) is adequate, since that will make it imposssible to package up all the data for a neuroimaging data set into one file or one data transmission stream.

Glossary

The definitions of some terms used in this specification are given here. For some terms, the equivalent XML construct is given in parentheses.

  • Application: The program that receives and/or transmits data using this specification.
  • Input processor: The software library that reads data in the format specified by this specification and puts it into data structures accessible by the application.
  • Output processor: The software library that takes data structures from the application and formats them for transmission/storage according to this specification.
  • API: Application Programming Interface; this jargon just means the specification of the functions and data structures that implement a particular set of operations. In this specification, a C language API for NIML input and output processors is defined in the appendices.
  • Element: A unit of input, which may contain attributes, data, and/or other elements.
    • An element starts with the element header (XML: "start tag"), which may contain attributes.
    • An element continues with the data stream (XML: "content"), which may be imported from an external file (external data stream), may be present in the same transmission/file as the element header (internal data stream), or may be missing (empty element).
    • An element finishes with the end of data token (XML: "end tag").
    In this specification, elements are either data elements or group elements, but not both.
  • Data Element: An element that contains attributes and/or data, but does not contain other elements. A data element corresponds to an array of similar data structures in the application. (XML: mixed content with only #PCDATA [parsed character data]).
  • Empty Element: A data element that does not actually contain data. It may contain attributes.
  • Group Element: An element that contains (optionally) attributes and other elements, but does not contain any data that is not itself inside a data element. A group element is used to put together a number of data elements (and maybe other group elements) into a larger data collection. (XML: "mixed content" with only element children).
  • Attribute: A string of the form name=value in an element header. Attributes can be used to define the way in which an element's data stream is to be interpreted (e.g., into floats, ints, etc.). Attributes can also be used to pass arbitrary string information to be attached to an element, in addition to the data stream.
  • Name: when capitalized, "Name" refers to a string of characters that starts with an alphabetic character and continues with a sequence of alphanumeric characters, plus a few allowable special characters.
  • Data Structure: A collection of data organized for in-memory access by a computer program. In C, a struct; in C++, an object instance, etc.
  • Errors: How an NIML input processor deals with errors in the input is not specified here. In the worst case, it is allowed to send a nasty e-mail to your mother telling her how stupid you are, erase your disk drive, and then crash your computer; however, such behavior is not part of this specification. The documentation for a well-written processor interface will explain how it deals with errors.


A Simple Example: One Data Element

A data element consists of a header in angle brackets "<...>", a data stream that follows the ">", and a token "</>" that closes the data stream:
 <vector ni_type=float ni_form=text ni_dimen=3> 1.3 2.2 -3.7 </> 
where the components of the above element are:
< opens the element header
vector gives the type of the data element (almost any string)
ni_type=float says that the data stream for this element should be read into 4-byte floats
ni_form=text says that the data stream is stored in text format (the default)
ni_dimen=3 says that there are 3 floats that follow (default is 1)
> is the end of the header; data stream starts at the next byte
 1.3 2.2 -3.7  is the data stream to be decoded into numerical values
</> signifies the end of the data stream for this element

Data Element Format

Bytes before the opening "<" are skipped. The "<" marks the start of the element header, which describes the contents of the data element and the data stream that will populate the data element.

XML note: An XML processor should pass through whitespace that appears between elements. This NIML specification says to ignore such whitespace (and anything else between elements, for that matter), which is one reason that NIML processing software is an "XML application" (interpreting the input) as well as an "XML processor" (making the input available to the application).

Element name:
Immediately after the opening "<" is the element name (e.g., this could be used to mark a data structure's type or class name). Later, a mechanism for specifying data element subtypes is given, and a number of predefined subtypes is listed.

Names:
The allowable characters in an element (or attribute) Name are "A-Z", "a-z", "0-9", and the special characters underscore, period, and hyphen ("_", ".", and "-"). The first character in a Name must be alphabetic. The first whitespace or other non-Name character found ends the Name. (Whitespace is defined by XML to be the characters blank, newline, carriage-return, and horizontal tab.) The maximum legal length of a Name is 255 characters. Some examples:

  Z_zzza-...         legal
  _Ethel_            illegal (can't start with "_")
  In:the:beginning   illegal (can't use ":" in a Name)

XML note: The characters that are allowed in a Name are taken from the XML specification, with the exception of the colon. The XML namespace specification (http://www.w3.org/TR/REC-xml-names/) reserves the use of the colon in Names for namespace identification. XML allows Names to start with underscore "_", but NIML does not. XML does not put a maximum length on a Name. NIML documents will be encoded strictly in 8-bit characters, with the first 128 characters being US-ASCII (no Unicode or UTF-8 for NIML). This restriction means that it would be legal to use one of the ISO-8859-* character sets for non-English languages in an NIML file, but this would raise serious portability issues (since the values from 128..255 are interpreted differently in these different character sets).

Reserved Names:
Element and Attribute Names that start with the characters "ni_" are reserved for expansion of this specification. The following are the reserved Names currently in use:

Name Purpose
ni_type Attribute: specifies type of data to read
ni_form Attribute: specifies format of data stream
ni_dimen Attribute: specifies number of values to read
ni_delta Attribute: specifies coordinate spacing between data values on a uniform grid
ni_units Attribute: specifies units used in ni_delta
ni_origin Attribute: specifies coordinate offsets for data values on a uniform grid
ni_axes Attribute: specifies axis orientations for data values on a uniform grid
ni_url Attribute: specifies external location of data stream for a data element
ni_typedef Element: defines a new data element subtype
ni_name Attribute: provides a name for an ni_typedef-ed element
ni_group Element: provides a way to group multiple elements together
ni_include Element: provides a way to read an external file into an input stream

Elements with no data (Empty elements):
The minimal element is an element name with no attributes or data stream. Such a construct could be used as a flag or command to the receiving application. For example,

  <close/>
could be used in a transmission as a command to indicate that the transmission's I/O channel should be closed. The fact that the element header closes with "/>" indicates that there is no internal data stream (i.e., this is an "empty element", in the XML jargon). Note that the "/" character is not a legal Name character, so that the element name ends with the "e" in "close".

XML note: The XML specification allows empty elements to also be of the form "<name></name>", and implies that this form is to be indistinguishable from the form "<name/>". An NIML empty element must follow the latter form, closing the element header with "/>".

Attributes:
Following the element name is a sequence of attributes in the general form "attname=string". For data elements, some of these attributes give information about how to interpret the data stream (internal or external) into data structures. The order of the attributes is not important for the parsing operations. Attributes are separated by whitespace. As mentioned earlier, attnames that start with the characters "ni_" are reserved for expansion of this specification. In addition to the predefined attributes described below, the element header may include other attributes. These will not be interpreted by the input processor, but will be passed through to the application, in the order in which they are encountered in the element header.

XML note: XML requires that no two attributes in the same element have the same attname. NIML does not enforce this requirement, but if you wish to produce a well-formed XML document, then you need to be aware of this restriction. XML allows whitespace to occur around the "=" that separates the attname from the string. NIML does not allow this whitespace; the next character after attname must be "=", and the next character after that must be a Name character or a quote character.

Strings:
Strings that are sequences of Name characters, not necessarily starting with a letter, can be present on the right hand side (RHS) of an attribute (or in a data stream) without being enclosed in quotes. Strings with other characters must be present in a quoted form, using "double quote" or 'single quote' (apostrophe) characters. If the non-whitespace character that starts a String is " (or '), then the string is assumed to be in quoted form, and everything up to the next " (or ') character is included in the string. Whitespace characters, including newlines, are included in the String value, but the quoting characters are not. In keeping with the XML specification, the following end-of-line character sequences will be "normalized" to the Unix-standard single 0x0A byte (LF character):

Hexadecimal Character Names Systems
0x0D 0x0A CR LF Microsoft standard
0x0D CR Macintosh standard
N.B.: This definition of how Strings are to be formatted also applies to String input in the data stream section of a data element.

XML note: In XML, attributes must be in a quoted string format. Thus, if an application wishes to write an NIML file to be a well-formed XML file, it should use attributes in the form attname="string", even if the string contains no whitespace. Also, XML specifies that the RHS of most attributes should be normalized by replacing all sequences of contiguous whitespace characters by a single blank. NIML does not require this step; however, all predefined NIML attribute values contain no whitespace.

In keeping with the XML roots of this specification, the following escape sequences representing single characters will be recognized in Strings:

Escape Translation Note
&lt; < (less than) Required
&gt; > (greater than) Required
&quot; " (quote) Required
&amp; & (ampersand) Required
&apos; ' (apostrophe) Optional
Characters marked as Required can only be represented in a String by the escape sequence. Characters marked as Optional can be represented by the escape or by themselves. (Since none of these are Name characters, they can only be present in quoted Strings.)

Some example attributes:

  ni_type=5f.i.S
  ni_type='5float,int,String'
  ni_url="http://zork.bork.gork/fork/spoon/pork.ork#1024-$"
  command="cat fred &gt; 'ethel'"
All but the first have their RHS in quoted form, since these String values contain non-Name characters. In the last one, the RHS String value will be passed to the application as "cat fred > 'ethel'".

XML note: Other XML-defined escapes (such as "&#x3A;" for insertion of a single character specified in hexadecimal) should not be used, since they are not required to be recognized by NIML processor software.

Comma-Separated Substrings as Attribute Values:
In some specific cases, the RHS value of a pre-defined attributes is described as being a list of comma-separated substrings. An example of such a string (which must be quoted) is "float,int,short". This String can be broken into 3 substrings "float", "int", and "short". This construction is used to specify multiple parameters to attributes that are designed to process them (e.g., ni_dimen). However, when the attribute String value is actually passed to the application, it will not be broken into substrings.

External data streams and the ni_url attribute:
Input bytes that occur between the closing ">" of the element header and the opening "<" of the end token are called the internal data stream. It is also possible for a data element to specify that its data stream shall be read from an external source rather than from the input bytes immediately following the ">" that closes the element header. The external source is specified with the ni_url attribute, as in

 <TheKing ni_url="http://www.elvis.com/" />
This specifies that the contents of the data at the given URL be taken as the data stream for this data element. If ni_url is used, then the data element header must end with "/>", since there can be no internal data stream present after the header if there is an external data stream. An external data stream does not end at its first "</", but continues until the end of the data read from the URL.

The types of URLs that can be specified in a ni_url attribute depend on the input processor. In some processors, there may be not support for such inclusion (e.g., in a socket transmission). In the C API defined in the appendices, the following types of URLs are allowable:

Form Meaning
http://a/b Absolute reference, fetched by HTTP
ftp://a/b Absolute reference, fetched by anonymous FTP
file:/a/b Absolute reference to a local file
It is also legal to append a URL fragment specifier of the form "#p..q" at the end of the attribute value. Here, "p" and "q" indicate the first and last bytes of the fetched data to include in the data stream. p and q may be in one of these forms:
  • Decimal numbers, 0 being the first byte of the fetched data.
    • Example: "file:/home/elvis/grot.oog#1024..2047" is the second 1K of the file.
  • The symbol "$" represents the last byte of the fetched data.
    • Example: "ftp://lucy.com/fred.zzz#1024..$" skips the first 1K of the file.
  • The symbol "$-x" represents the xth byte previous to the last byte of the fetched data, where "x" is a decimal number.
    • Example: "http://fred.org/zork.dat#$-4095..$" is the last 4K of the file.
It is an error if the value of p is after the value of q in the fetched data. If no fragment is given, then the data stream is taken from all the fetched data (as if the fragment were #0..$).

Use of ni_url may not be wise, especially if it involves fetching data files from another computer system. Using ni_url makes reading the NIML data file dependent on the existence of another file.

An external data stream will be processed as described in the next section. How much of it will be stored into the data structure transmitted to the application will depend on the ni_type and ni_dimen attributes.

End token:
If the data element has an internal data stream, then the end of the data stream is indicated by the bytes "</". (If binary data is being read, then "</" characters inside the specified length binary data will not indicate the end of the data stream.) NIML allows the end token to be the characters "</>" or "</elementname>", where elementname is the name of the element that is being closed.

If the internal data stream runs into the end of file or the transmission closes (e.g., a socket shuts down), this is also taken as a valid end token for any elements that have not yet "closed" (including the current data element and any group elements enclosing it). This rule makes it easy to have a final data element in a file without closing it with proper end tokens. In this way, an NIML file containing image data can conform to the informal convention that the image data is always the very last collection of bytes in the file, regardless of what header information comes before.

XML note: XML requires that elements that have content (i.e., an internal data stream) end with "</elementname>". Also, XML does not consider a document to be properly closed if the file just ends. This means that a well-formed XML version of an NIML file cannot conform to the "image data is last" convention.

Data Stream Interpretation

The following attributes determine how the data stream is interpreted by the input processor.

ni_type Attribute:
This attribute specifies the type or types of the individual data components in the data stream. The following 8 types are available:

Name byte short int float double complex rgb RGBA String Line
Initial b s i f d c r R S L
Size (bytes) 1 2 4 4 8 8 (2 floats) 3 (red grn blu) 4 (r g b alpha) arbitrary arbitrary

An individual type is specified by its name or by the single character of its initial (which is why "String" starts with an uppercase letter, to distinguish it from "short", and why "RGBA" is capitalized while "rgb" is not).

The ni_type attribute may specify a single type, as in the example at the very beginning of this document, or it may specify multiple types, separated by periods "." and with an optional decimal numeric count prepended:

  ni_type=float.int.int  OR  ni_type=f.i.i  OR  ni_type=f.2i  OR  ni_type=f2i
which specifies that the values to be read from the data stream come in triples: 1 float followed by 2 ints, then 1 more float, 2 more ints, etc. In this example, the data stream must come in these units of 3 numbers. The last illustration above shows that when single character abbreviations are used for type names, they do not need to be separated by periods ".".

Aside: Maybe there are too many variations here. Instead of allowing "float" and "f", we should only allow the latter? That would make the NIML processor's job simpler. Since we might eventually write NIML processors in several languages, simplicity is an important goal.

If the ni_type attribute is not present, then the data stream will be interpreted as if ni_type=b.

XML note: The reason that the separator for multiple types is a period "." is that this is a legal Name character, and NIML allows the RHS of an attribute to be unquoted if it consists entirely of Name characters. However, XML requires the RHS of an attribute to be quoted. If the type definition String is quoted, you can also use commas "," as the type separator.

Line Data Values:
The Line type is a special form of String. A Line is the text between the current scanning point of the data stream and the next end-of-line; it does not include the end-of-line character. This input type is designed to make it easy for an application to read and write individual lines of text without using quotes to enclose possible whitespace. For example

  <junk ni_type=3L>
    I am the first Line
    This is Line #2
    And this is Line number 3  </>
The three strings that will be saved are "I am the first Line", "This is Line #2", and "And this is Line number 3", since whitespace at the beginning and end of a Line will be discarded. It is possible (not necessarily wise) to include Line data on a physical line with other values; an example illustrates the processing that results:
  <data ni_type=f.L ni_dimen=2>
     3.0   Hi Bob
     5.7
     This is cool
  </>
The first Line value read is the string "Hi Bob", since the blanks after "3.0" are discarded (being at the start of the Line data). The second Line value read is the string "This is cool", since the end-of-line after the "5.7" is also discarded.

ni_form Attribute (optional):
This attribute specifies the format of the data stream. The possible values are

  ni_form=text   OR   ni_form=binary   OR   ni_form=base64
The first means that the data stream is in text format, the second that it is binary, and the third that it is base64 encoded binary (which allows binary data to be encoded in a pure text format, at the cost of a 33% expansion in size). If the ni_form attribute is not present, then ni_form=text is assumed.

The binary and base64 attributes may optionally have one of the two strings ".msbfirst" or ".lsbfirst" appended, as in "ni_form=binary.msbfirst". This addition specifies the byte order of the binary data. If the byte order is not specified (here or otherwise), then the receiving program should assume that the binary data is stored in MSB first order ("network order"), as on Sun-Sparc, SGI-MIPS, PowerPC, and HP-PA CPUs (and the opposite of Intel CPUs). If the current CPU does not match the order of the data, then two byte data (i.e.,short) ab will be swapped to ba before being passed to the application; four byte data (i.e., float, int) abcd will be swapped to dcba; eight byte data (i.e., double) abcdefgh will be swapped to hgfedcba.

XML note: In XML, there is no way around the fact that "</" closes an element, except by using a CDATA section. Since "]]>" ends the CDATA section, one is still left with the difficulty of including an arbitrary sequence of bytes into an XML document. (In fact, some bytes are not legal anyplace in an XML document, since only valid "characters" are allowed, and not all byte sequences are valid Unicode characters.) If one wants to write an NIML file that is also a well-formed XML document, one must avoid the use of binary data. In general, I would recommend that text encoding be used for most data, and that binary (or base64) be used only for very large data elements (e.g., images).

ni_dimen Attribute (optional, but probably needed):
This attribute specifies how many data elements are to be read from the data stream. One data element corresponds to a complete set of values as specified in the ni_type attribute. If ni_type=fii and ni_dimen=3, then the data stream should contain 3 floats and 6 ints (in order f i i f i i f i i).

If the ni_dimen attribute is not specified, it is equivalent to giving ni_dimen=1. The NIML input processor will not try to guess the number of input values from the data stream.

To read an arbitrary series of bytes from the data stream into a contiguous array, the combination of attributes needed is

  ni_type=b ni_form=binary ni_dimen=num_bytes 
where num_bytes should be replaced by the number of bytes to be read.

A useful way to think of the data specified by the ni_type and ni_dimen attributes is that the data stream defines a 2D array of values. The ni_type attribute specifies the contents of each row in this array, and the ni_dimen attribute specifies how many rows will be read. In the following example, the data element produces a data structure containing the array shown in the table:

  <data ni_type=f.i.S ni_dimen=4>
    3.72 55 "This is row 1" -0.70 444 'I'm row #2' 666.666
    -555 OK-3  0.003 777 "The last row!" </>
float int String
3.72 55 "This is row 1"
-0.7 444 "I'm row #2"
666.666 -555 "OK-3"
0.003 777 "The last row!"
In the C API (see appendices), the data would end up being stored in 3 arrays, one for each column of this array. The first array would be pointed to by a float *, the second by a int *, and the third by a char **. All of these would be gathered together into one NI_element struct.

N.B.: Although specifying "ni_type=3f ni_dimen=2" and "ni_type=f ni_dimen=6" mean the same thing as far as parsing the data stream goes (6 floats expected), these do not mean the same thing to the application. The first specification is for a 3x2 table of numbers, and the second is for a 1x6 table of numbers. In the C API (see appendices), the data structure returned to the application would be stored differently for these two cases. The first case would produce 3 vectors of length 2; the second case would produce 1 vector of length 6.

Multi-Dimensional Arrays and Related Attributes (optional):
For ease in dealing with multidimensional arrays (e.g., images), it is also legal to specify the ni_dimen attribute's value as a string of more than one integer, separated by commas, as in ni_dimen="128,128,16" (i.e., the attribute value is a list of comma-separated substrings). This means that 128*128*16=262144 values (specified by ni_type) will be read from the data stream, possibly representing a 3D image or a time series of 2D images.

The following attributes can be used in conjunction with ni_dimen to specify information that lets the data be interpreted as lying on a regular grid in n-dimensions, where n is the number of values specified in the RHS of ni_dimen. Each of these attributes should have the same number of comma-separated substrings in its RHS value as ni_dimen does.

ni_delta: This should be a set of floating point numbers indicating the spacing between the locations of data values in the grid.

ni_origin: This should be a set of floating point numbers indicating the origin of the locations of data values in the grid.

ni_units: This should be a set of string values that specify the units used in ni_delta and ni_origin. These strings are also not interpreted by the processor in any way, but are simply passed through to the application.

ni_axes: This should be a set of string values that specify the direction/orientation of the coordinates axes. These strings are not interpreted by the processor in any way, but are simply passed through to the application.

Example of a header for an element to hold the data for a 4D image (say from an FMRI experiment):

  <fourD ni_type=short
         ni_dimen="64,64,16,80"
         ni_delta="3.75,3.75,5.0,2.5"
         ni_origin="-120.0,-120.0,-10.0,0.0"
	 ni_axes="R-L,A-P,I-S,time"
         ni_units="mm,mm,mm,s">
This would correspond to an experiment with 64x64 images, 16 slices per volume, and 80 volumes gathered in time (5242880 values). The voxel dimensions are 3.75 mm in plane, slice thickness of 5.0 mm, and TR is 2.5 seconds. The first data axis is Right-to-Left, the second is Anterior-to-Posterior, the third is Inferior-to-Superior, and the fourth is time. The (i,j,k,p) voxel in this 4D array is located at the (i+64*j+4096*k+65536*p)th short in the data stream, and is located at coordinates (x,y,z,t) = (-120+3.75*i, -120+3.75*j, -10+5+k, 2.5*p), for i=0..63, j=0..63, k=0..15, p=0..79.

Example of a header for an element to hold a single time series of 128 points, with sampling interval of 1.5 seconds:

 <oneD ni_type=float ni_dimen=128 ni_delta=1.5 ni_units=s> 

If ni_dimen is not used, then ni_delta, ni_origin, ni_units, and ni_axes are not broken down by the NIML processor. These attributes, if present, will still be passed to the application as strings.

Other Attributes (optional):
Other attributes may be included in the element header. All attributes will be processed and passed back to the application (as strings) in the order in which they are encountered.

Data stream:
The data stream starts at the next byte after the ">" that closes the element header, unless a "/" character immediately preceeds the ">", as in "/>". In that case, there is no data stream present in the input, and this ">" is the end of the data element encoding.

Text data:
If the data stream is in text form, then the data values are read from the stream as follows:

Type C format string
byte %u (cast to unsigned char)
short %d (cast to signed short)
int %d
float %f
double %lf
complex %f%f (real part, imaginary part)
rgb %u%u%u (each cast to unsigned char)
RGBA %u%u%u%u (each cast to unsigned char)
String non-whitespace sequence (%s), or "quoted string"
Line data up to the next end-of-line
Data values must be separated by at least one whitespace character. If a String contains whitespace, the String must be present in the text data stream in a quoted form.

Recall that Line data is defined as the text from the current scanning point up to then next end-of-line, with leading and trailing whitespace eliminated. If an entirely blank line occurs in the input, then the Line string corresponding would be empty (have zero length). For example:

  <linestuff ni_type=L ni_dimen=3>
     Line 1

     Line 3
  </>
The second line here is the empty string.

Binary or base64 data:
If the data stream is in binary or base64 format (as specified by ni_form), then the data values are read from the stream byte-by-byte (after base64 decoding, if needed), with each value taking the number of bytes specified earlier. String and Line data values are not allowed in these forms. This restriction is made so that the number of bytes in the data stream can be computed from the ni_type and ni_dimen attributes (e.g., ni_type=f.i.s and ni_dimen=3 would require a binary data stream to contain exactly (4+4+2)*3=30 bytes, and a base64 data stream to contain 30 bytes after the base64 characters are decoded).

An internal data stream ends with the bytes "</"; an external data stream ends with the end of the URL that was fetched. If the data stream is internal, the data element transmission ends with the next following ">", which allows the closing sequence to be either "</>" or "</elementname>". After the proper ni_dimen number of data values have been read, any data bytes before the closing "</" will be discarded.


Defining Data Element Subtypes

If you just want to transmit/store 3 floats, say, the above format seems excessively complicated. Therefore, a syntax is available to let you declare subtypes of the generic data element that can be used more easily.

XML note: The idea that a ni_typedef element can influence the interpretation of future elements does not violate the XML specification (which is solely concerned with "processors"), but it does not fall within the XML specification either. XML uses the "<!ELEMENT ...>" and "<!ATTLIST ...>" constructs to constrain how elements and their attributes may be formed. Alternatively, an XML Schema can be used to provide control over the form/structure of XML data (http://www.w3.org/TR/xmlschema-1/). The XML-only methods are clumsy and don't suit the NIML needs well; the XML Schema method can specify what is allowed in great detail, but is quite complex and seems like too much to support for the purposes of the neuroimaging community.

An empty element (i.e., its header ends with "/>") with name "ni_typedef" is used to define a subtype. With a ni_typedef element, you specify the ni_type attribute and possibly the ni_dimen attribute that will be used when a subtype element is found. An example specifying both:

 <ni_typedef ni_name=fv3 ni_type=f ni_dimen=3/> 
This defines the new element type fv3 to contain exactly 3 floats in its data stream. An example of such an element:
 <fv3>2.71828 3.1416 666.0</> 
Note that it would still be legal to add the ni_form= attribute to the header of the fv3 element. You can't specify ni_form in the ni_typedef element; that is, you can't force a subtype to be encoded in a particular format.

If the ni_dimen attribute is missing from the subtype definition, then it can be supplied when the subtype is used; for example:

  <ni_typedef ni_name=xyzlist ni_type=3f/>
  <xyzlist ni_dimen=4>1 2 3 4 5 6 7 8 9 10 11 12</>
This subtype is intended to encode a list of 3-tuples of floats; the example produces a 3x4 table of floats. (Recall that if ni_dimen is not supplied, then ni_dimen=1 is assumed.)

Predefined Subtypes:
The following predefined subtypes can be used:

  <ni_typedef ni_name=ni_f1 ni_type=float/> (1 float)
  <ni_typedef ni_name=ni_f2 ni_type=2f   /> (2 floats)
  <ni_typedef ni_name=ni_f3 ni_type=3f   /> (3 floats)
  <ni_typedef ni_name=ni_f4 ni_type=4f   /> (4 floats)

  <ni_typedef ni_name=ni_i1 ni_type=int/>   (1 int)
  <ni_typedef ni_name=ni_i2 ni_type=2i />   (2 ints)
  <ni_typedef ni_name=ni_i3 ni_type=3i />   (3 ints)
  <ni_typedef ni_name=ni_i4 ni_type=4i />   (4 ints)

  <ni_typedef ni_name=ni_irgb  ni_type=i.r/> (int+color)
  <ni_typedef ni_name=ni_irgba ni_type=i.R/> (int+color)

  <ni_typedef ni_name=ni_S ni_type=S/>      (string)
  <ni_typedef ni_name=ni_L ni_type=L/>      (line string)
It is an error to redefine one of these subtypes, to define a new subtype that starts with "ni_", or to redefine a subtype that was previously defined through an explict ni_typedef element. A user-defined subtype cannot be used in an element until it has been defined previously in the data transmission.


Including External Files to Define Elements

The ni_include data element can be used to specify that a given file should be included; for example:

 <ni_include ni_url="file:/home/elvis/defs.ni"/> 
which says to read the given file into the data transmission at this point. Since this is an data element (with no data stream), it cannot appear inside another data element. If desired (why?), the #p..q fragment specification can be appended to the end of the URL.

One use for the ni_include element would be to read in a set of ni_typedefs at the start of a file that used them heavily.


Contents of an Entire Data Collection

A data file or transmission stream will often contain more than one data element that must be kept together to make a coherent whole. Data elements can be grouped together using the construction

  <ni_group>
    ...elements...
  </ni_group>
where "...elements..." is replaced by one or more data elements, formatted as described earlier. The whitespace between elements will be ignored. Groups may be nested. Attributes may be included in the "<ni_group ...>" header, as with data elements.


Appendix A: Processor and Application Interaction

Most of this specification is concerned with how arbitrary data will be encoded in a (supposedly) self-describing format. However, these Appendices deal with with one model of how the input and output processors can interact with the application.

The model presented herein is batch-oriented, in that an entire unit of information is processed at once. For an input processor, a free-standing (not in a group element) data element is turned into a data structure which is fully populated and then returned to the application; a group element is turned into a tree of data structures which are fully populated and the tree is returned to the application. For an output processor, the application must fully fill up a data structure, then call the output processor library to generate the resulting data/group elements.

An alternative model would be stream-oriented processing. For input processing, the application would register functions ("callbacks") to be called when certain structures (e.g., attributes, individual data values) in the input data were encountered. For example, the beginning of a data element would trigger one callback, and the decoding of each input value from the element's data stream would trigger another callback. This would allow the application to get a finer level of control over the handling of the input, without having to have it all decoded and stored before getting access to the decoded values. This specification does not address the development of a stream-oriented API for NIML data.

XML note:
"Batch-oriented" corresponds to "DOM" in XML (http://www.w3.org/TR/DOM-Level-3-Core/).
"Stream-oriented" corresponds to "SAX" in XML (http://www.megginson.com/SAX/index.html).

Nota Bene: The data structures and routines specified in the following appendices have not yet been fully implemented. Thus, they are especially subject to change as experience accumulates. See Appendix F for information on the current status of an implementation of this API.


Appendix B: Internal Representation of a Data Element in C

The information specified by a data element will be read into a C struct of type NI_element which has the following fields:

Field Name and Type Meaning
int type ; First field is always NI_ELEMENT_TYPE
char *name ; Element name
int attr_num; Number of attributes
char **attr_lhs; attr_lhs[i] points to the ith attribute name
char **attr_rhs; attr_rhs[i] points to the ith attribute String
int vec_num; Number of vectors (from ni_type)
int vec_len; Length of vectors (from ni_dimen)
int vec_filled; How many vector rows were filled on input (<=vec_len)
int *vec_typ; vec_typ[i] is the type of the ith vector
void **vec; vec[i] points to the start of the ith vector
int vec_rank; Number of dimensions specified in ni_dimen
int *vec_axis_len; vec_axis_len[i] is the ith dimension count (from ni_dimen)
float *vec_axis_delta vec_axis_delta[i] is the ith dimension grid spacing (from ni_delta)
float *vec_axis_origin vec_axis_origin[i] is the ith dimension grid offset (from ni_origin)
char **vec_axis_unit vec_axis_unit[i] is the ith dimension grid unit string (from ni_units)
char **vec_axis_label vec_axis_label[i] is the ith dimension axis label (from ni_axes)
Further details on these fields are given below.

type:
The first field is an int which can be used to distinguish the type of this element structure; the value NI_ELEMENT_TYPE here indicates that this is a data element. (For group elements, the corresponding value would be NI_GROUP_TYPE.)

name:
This is a standard NUL-terminated C string. Since an element name must contain at least one character, this will not have zero length.

attr_num:
This is the number of attributes read, including all the ni_* attributes. This may be zero (e.g., for the elements "<ni_f1>3.2</>" and "<quit/>", there are no attributes).

attr_lhs and attr_rhs:
If attr_num is zero, then these pointers will be set to the NULL pointer. Otherwise, attr_lhs[i] will be a pointer to a standard NUL-terminated C string that is the LHS of the ith "attname=string" attribute, and attr_rhs[i] will be a pointer to the ith RHS string, for i from 0 to attr_num-1. Attributes will be stored in the order encountered in the data element header, including the attributes that start with "ni_".

vec_num:
This is the number of types declared in the ni_type attribute; for example, "ni_type=f.2i" would give vec_num=3.
Emtpy elements: vec_num is zero if there is no data stream.

vec_len:
This is the total number of entries from ni_dimen.

vec_typ:
This array specifies the types of each vector of data read from the data stream, as specified from the ni_type attribute. If vec_num=0, then vec_typ will be the NULL pointer. Otherwise, vec_typ[i] is a code indicating the data type, for i from 0 to vec_num-1:

Name byte short int float double complex rgb RGBA String Line
Code 0 1 2 3 4 5 6 7 8 9
Macro NI_BYTE NI_SHORT NI_INT NI_FLOAT NI_DOUBLE NI_COMPLEX NI_RGB NI_RGBA NI_STRING NI_LINE
vec[i] byte * short * int * float * double * complex * rgb * rgba * char ** char **

The byte, rgb, rgba, and complex types are defined by
  typedef unsigned char             byte ;
  typedef struct { byte r,g,b ; }   rgb ;
  typedef struct { byte r,g,b,a ; } rgba ;
  typedef struct { float r,i ; }    complex ;
Empty elements: vec_type is NULL.

vec:
This array of arrays actually contains the data interpreted from the data stream, if vec_num is greater than zero. vec[i] is a pointer to an array of the type encoded by vec_typ[i] and of length vec_len, for i from 0 to vec_num-1. For example, if vec_typ[2]==NI_FLOAT, then the proper use of the pointer vec[2] is something like

  int j ;
  float *fv = (float *) vec[2] ;
  for( j=0 ; j < vec_len ; j++ ) do_something( fv[j] ) ;
If vec_typ[4]==NI_STRING, then printing out the jth string would be done like so:
  char **sv = (char **) vec[4] ;
  printf("%s\n",sv[j]) ;
Empty elements: vec is NULL.

vec_rank:
This value is the number of dimensions specified in ni_dimen; some examples:

  ni_dimen=7              implies   vec_rank=1
  ni_dimen="64,64"        implies   vec_rank=2
  ni_dimen="64,64,16,80"  implies   vec_rank=4
Empty elements: vec_rank is set to 0.

vec_axis_len:
This array holds the substring values decoded from ni_dimen. Continuing the examples above:

  vec_axis_len[0] = 7
  vec_axis_len[0] = 64; vec_axis_len[1] = 64;
  vec_axis_len[0] = 64; vec_axis_len[1] = 64; vec_axis_len[2] = 16; vec_axis_len[3] = 80;
Empty elements: vec_axis_len is NULL.

vec_axis_delta:
This array holds the values decoded from the ni_delta, if it was present.
Empty elements and elements without ni_delta: vec_axis_delta is NULL.

vec_axis_origin:
This array holds the values decoded from the ni_origin, if it was present.
Empty elements and elements without ni_origin: vec_axis_origin is NULL.

vec_axis_unit:
This array of pointers to C strings holds the values decoded from ni_units (i.e., the substrings that were separated by commas).
Empty elements and elements without ni_units: vec_axis_unit is NULL.

vec_axis_label:
This array of pointers to C strings holds the values decoded from ni_axes (i.e., the substrings that were separated by commas).
Empty elements and elements without ni_axes: vec_axis_label is NULL.


Appendix C: Internal Representation of a Data Group in C

The information specified by a ni_group will be read into a C struct of type NI_group which has the following fields:

Field Name and Type Meaning
int type ; First field is always NI_GROUP_TYPE
int attr_num; Number of attributes
char **attr_lhs; attr_lhs[i] points to the ith attribute name
char **attr_rhs; attr_rhs[i] points to the ith attribute String
int part_num; Number of parts (elements or sub-groups)
int *part_typ; part_typ[i] is the type of the ith part
void **part; part[i] points to the data describing the ith part

part_num:
This is the number of elements or sub-groups encountered between the opening "<ni_group>" and the closing "</ni_group>".

part_typ:
part_typ[i] specifies whether the ith part is a data element (constant NI_ELEMENT_TYPE) or a group itself (constant NI_GROUP_TYPE), for ii=0..part_num-1.

part:
If part_typ[i]==NI_ELEMENT_TYPE, then (NI_element *)part[i] is a pointer to a NI_element struct, defined above. If part_typ[i]==NI_GROUP_TYPE,then (NI_group *)part[i] is a pointer to a NI_group struct.


Appendix D: The C API for Input from NIML

Input to the NIML Functions: NI_stream:
Data is provided to the NIML processor through an opaque handle of type NI_stream. ("Opaque" means that the internal components of this type are not visible to the application). A NI_stream for input is a source of bytes that will be scanned to construct data and/or group elements.

Opening a NI_stream for Input:
An application opens an input stream with a function call like so:

  NI_stream ns ;
  ns = NI_stream_open( sname , "r" ) ;
Here, sname is a C string (NUL-terminated) that specifies whence the stream is to derive its data. The following formats for sname are supported:
  • "file:filename"
    This form opens the file "filename" for input, using the C library function fopen().

  • "fd:integer"
    This form does I/O to the pre-opened (by the application) file descriptor given by integer. For example, "fd:0" can be used for input from stdin, and "fd:1" can be used for output to stdout. When NI_stream_close() is called, this file descriptor will not be closed -- the application opened it, so the application can close it.

    • If the application opened a stream FILE *fp with fopen() (or popen(), etc.), it can retrieve the file descriptor integer with C library function fileno(), and use something like sprintf(sname,"fd:%d",fileno(fp)) to construct the first argument to NI_stream_open().

  • "http://hostname/filename"  OR
    "ftp://hostname/filename"
    These forms fetch the given URL and then reads data from it. Effectively, these forms are somewhat like "str:", where the input string of bytes comes from an external resource. The entire contents of the URL will be fetched during the NI_stream_open() call and stored in a memory buffer inside the NI_stream structure.

  • "str:string"
    This form uses a copy of the characters that follow "str:" as the source of input bytes. For example:

      "str:<fred ni_type=f ni_dimen=3>1.1 1.2 1.3</>"
    It is also possible to provide the string to be decoded with a pair of calls like
      ns = NI_stream_open( "str:" , "r" ) ;
      NI_stream_setbuf( ns , string ) ;
    
    The call to NI_stream_setbuf() will do nothing if ns was not opened as a "str:" in input ("r") mode. Otherwise, any existing contents of the internal string buffer will be discarded and replaced by a copy of the contents of string.

  • "tcp:hostname:port"
    This form opens a TCP/IP socket to the computer hostname (which can be specified by Internet name or by IP address in the standard dotted form 123.456.789.123), on the port given by port. For example,

      "tcp:127.0.0.1:9999"
    opens a socket to the local computer on port #9999.
    • When opening a socket for input "r", the hostname is actually ignored. In this case, the socket is listening for connection from any Internet host. When opening a socket for output "w", then the socket actually uses the hostname to reach out to try to attach to a listening "r" socket. After the connection is established, the IP address (as a string in dotted form) of the caller can be ascertained from function NI_stream_name(ns).
    • Since sockets are (usually) used for communication between two separate processes in realtime, it is necessary for the application to wait until the socket is properly attached to both processes. At that point the application can read data from the socket.
    • If NI_read_element() (see below) is called with a tcp: NI_stream that is not connected at both ends, it will wait the specified amount of time for the other process to connect. If no connection is made, or if the connection is made but no data is available, NI_read_element() will return NULL.
    • Alternatively, a program can check if a NI_stream is "good" by using the function call
        int msec=5, nn=NI_stream_goodcheck(ns,msec) ; 
      The input msec is the number of milliseconds to wait for the stream to become good. The return value nn is 1 if the stream is potentially capable of reading data (socket is open; or file/string hasn't been used up yet). The return value is 0 if the stream isn't yet ready, but is waiting for connection (socket isn't connected yet). The return value is -1 if an unrecoverable fatal error has happened to the stream (socket connection failed or broke, input file/string was exhausted) such that no more data will be readable. This function can be used in a loop to check for establishment of the connection:
       ns = NI_stream_open( "tcp:anybody:6666" , "r" ) ;
       if( ns == NULL ){ fprintf(stderr,"Can't open socket 6666\n"); exit(1); }
       while(1){
         nn = NI_stream_goodcheck(ns,1) ;
         if( nn == 1 ) break ;  /* good! */
         if( nn <  0 ){ fprintf(stderr,"Can't accept on socket 6666\n"); exit(1); }
         /** could do something else here before trying again **/
       }
       fprintf(stderr,"Socket 6666 connected from address %s\n",NI_stream_name(ns)) ;
      
    • Using NI_stream_goodcheck() to make the connection rather than NI_read_element() offers the ability for the reading process to verify the writing process's IP address against a list of trusted hosts before accepting any input bytes. If you just use NI_read_element() to make the connection, then the first element will be read before you have a chance to check the IP address with NI_stream_name().
    • Unlike other NI_streams, a socket can be used for bi-directional communication. That is, an application can use both NI_read_element() and NI_write_element() with a tcp: stream. Once the connection is made, it doesn't matter which end opened the stream with "r" and which with "w".
    • Given an arbitrary NI_stream value ns, to determine if it is potentially allowed to write or read to that stream, use the function NI_stream_writeable(ns) or NI_stream_readable(ns). These functions return 1 if ns is of a type that allows the given operation, and return 0 if not. These should not be confused with NI_stream_writecheck() and NI_stream_readcheck(), which actually determine if writing/reading will send/return any values at that instant.
    • Note that many port numbers are in use for various semi-standard Internet services. A list of such ports can be found at http://www.iana.org/assignments/port-numbers. It is pretty safe to use ports from 49152 through 65535, since these are all unassigned.

If an error occurs when opening the stream (e.g., filename can't be opened, hostname cannot be found, port number illegal), NI_stream_open() returns (NI_stream)NULL.

Closing a NI_stream:
An application closes an input (or output) stream with a call like

 NI_stream_close( ns ) ; 
where ns is a valid NI_stream value that was previously returned from NI_stream_open(). NI_stream_close() has no return value. After this function has been called, the memory associated with ns has been deallocated, and it is illegal for the application to refer to ns again, unless it is reassigned by another call to NI_stream_open().

Reading Data from an NIML Input Stream:
The next block of data can be read from an opened stream using a call like so:

  void *nini ;
  int msec = 1 ;
  nini = NI_read_element( ns , msec ) ;
where ns is a valid NI_stream value that was previously returned from NI_stream_open(), and msec is the number of milliseconds the process should wait for more data to appear in the input stream. Use msec=0 for an immediate return if no data is available.

NULL is returned if a complete element could not be extracted from the input stream. To check if this has failed because the connection was closed, use

  int nn = NI_stream_readcheck(ns,0) ;
  switch( nn ){
     case -1:  /* stream has gone bad */
     case  0:  /* stream is OK, just waiting for data (sockets only) */
     case  1:  /* stream has data waiting to be read */
  }
If NI_read_element() returns NULL and NI_stream_readcheck() then returns -1, then the stream will deliver no more data.

If not NULL, the value returned by NI_read_element() points to a NI_element or to a NI_group data structure. The program can determine which by

  int tt = NI_element_type( nini ) ;
  if( tt == NI_ELEMENT_TYPE ){  /* data element */
    NI_element *nel = (NI_element *) nini ;
    /* do something here */
  } else if( tt == NI_GROUP_TYPE ){  /* group element */
    NI_group *ngr = (NI_group *) nini ;
    /* do something else here, I suppose */
  } else {
    /* this should never occur, unless nini==NULL (tt==-1) */
  }
  • If a free-standing (un-grouped) data element is the first element encountered in the input, then this function will read all its data (waiting, if needed, until the data stream for this element is done), and return a NI_element *.
  • If a group data element is the first element encountered in the input, then this function will read all the elements contained within it (waiting, if needed, until the end token for the group), and return a NI_group * with the appropriate number of parts.

Checking for Available Input:
It can be useful to check if data is available to be read, in order to avoid calling NI_read_element() and waiting for input when there is no input. This function call does just that:

  int cod ;
  cod = NI_stream_readcheck( ns , 1 ) ;
The return value is positive if data can be read from the NI_stream, zero if no data can be read (but the stream is still good), and negative if the stream has failed in some way (e.g., the socket was closed at the other end). This function only checks if at least 1 data byte can be read from the I/O stream represented by ns; it does not check if valid NIML data is present. For socket streams, the second argument is the number of milliseconds the function should wait to check if data is present. For the other stream types, the function will return immediately, since there is no need to synchronize with another process.

Freeing Data from an NI_group or NI_element:
Function call NI_free_element( nini ) can be used to free all the data from a NI_element or NI_group constructed by the NIML functions.

Some application software may wish to move some or all of the data out of an NI_element prior to freeing the NI_element data structure itself. Instead of having to copy arrays, the application can simply copy any pointer from the NI_element to its own storage, and then set the pointer in the NI_element to NULL. For example:

  NI_element *nel ;
  float *fp ;
  int   nfp ;
  nel = NI_read_element(ns,1) ;    /* read a group element      */
  nfp = nel->vec_len ;             /* save length of data array */
  fp  = (float *) nel->vec[0] ;    /* copy data array pointer   */
  nel->vec[0] = NULL ;             /* clear pointer in nel       */
  NI_free_element( nel ) ;         /* free everything else in ngr */
This example skips all checking (e.g., if nel==NULL), and assumes that the data structure returned is a data element that contains a float vector as its first entry. In a real application, many more cases would need to be allowed for.

Attributes within an Element:
The application can certainly search for a given attribute name in an element returned by NI_read_element(); however, there is a utility function to do this. For example:

  char *rhs = NI_get_attribute( nel , "idcode" ) ;
will return NULL if there is no attribute with left hand side "idcode"; otherwise it returns a pointer to the right hand side value of the attribute. This pointer points into the element's data structure, so it should not be modified or free()-ed. If you need to make a copy of it, use the strdup() library function. NI_get_attribute() will work with both data and group elements.

Error Conditions:
The following is a discussion of how an implementation of the C API should handle various error conditions.

  • If the input stream aborts (e.g., premature end of file or socket close) before any input element header is completed, then NI_read_element() will return NULL.
  • NI_read_element() will skip input data bytes until it finds the opening of an element header (the character "<"). If the header contains an error (e.g., an illegal name), then it will be skipped. NI_read_element() will then try to find another good header farther on in the input stream.
  • If a data stream ends prematurely (before the number of data values specified by ni_dimen, ni_binary, or ni_base64), then the data element will still be "full size" as specified by ni_dimen (recall the default length is 1). However, the component values corresponding to the missing data in the NI_element data structure will be set to zero values, since the vec[i] arrays will have been created with calloc(). The vec_filled field in the NI_element struct will be set to the number of rows of the vectors that were fully filled. For example:
      <elvis ni_dimen=3 ni_type=fi> 3.2 1 4.7 2 3.1 </> 
    would result in vec_len=3 but vec_filled=2, since the last row would only be half filled.
  • If a data stream contains more data than needed, the extra inputs are ignored. There is no indication in the NI_element structure that the data stream was too long.
  • If a text data stream for a data element contains input that cannot be properly decoded into numbers, then the data stream bytes that can't be decoded are skipped until the next whitespace is found, and the corresponding output value is set to zero. For example:
     <vector ni_type=3f> 3.2 z66 7.1 </> 
    decodes to the 3 numbers "3.2", "0.0", and "7.1". No indication of this error is made in the NI_element structure.
  • If a text data stream for a data element is supposed to have a String value, and the String starts with a quote character (" or '), then the String ends with the next matching quote character or with the end token (end of file/transmission, or the "<" character). The latter case, where the matching quote character is missing, is almost surely an error, and may result in a very long String value and the skipping of many other values:
      <junkola ni_type=f.S ni_dimen=3>
        3.2 "This is
        4.7 Bob
        9.3 Dole </>
    
    The first String value starts with the "T" in "This" and ends with the blank after "Dole". The second and third float and String values will never be read.


Appendix E: The C API for Output to NIML

Opening a NI_stream for Output:
The string "w" is supplied as the second argument to NI_stream_open() when a program wants to write to the stream.

  • For "fil:" streams, the output file will be erased, if it already exists.
  • For "str:" streams, the stream writes into an internal buffer (always in text mode). This buffer can be accessed later using the function NI_stream_getbuf():
      NI_stream ns = NI_stream_open( "str:" , "w" ) ;
      NI_element *nel ;
      NI_write_element( ns , nel , NI_TEXT_MODE ) ;
      printf("%s\n",NI_stream_getbuf(ns)) ;
    
    The function NI_stream_clearbuf(ns) can be used to erase the contents of the "str:" output buffer, so that new elements can be overwritten into that space.
  • For a "tcp:hostname:port" stream, an outgoing call is placed to the specified Internet (IPv4) host on the specified port. Unless another program is already listening at that host with a "r" socket on the same port number, the returned NI_stream will not be ready to write to immediately.
  • You can use a loop with NI_stream_goodcheck() to detect when an outgoing NI_stream socket is properly connected.
  • If you use NI_write_element() on a socket that is not yet connected, that function will attempt to connect. If the connection fails, NI_write_element() will return 0 immediately. It will not wait for an an outgoing connection to be established.

Writing Elements to an NIML Output Stream:
The application must first assemble a data element, or a group element containing one or more data elements. Then the element is written to the output stream with function NI_write_element():

  NI_group   *ngr ;
  NI_element *nel ;
  int nbe, nbg ;
  nbe = NI_write_element( ns , nel , NI_TEXT_MODE ) ;
  nbg = NI_write_element( ns , ngr , NI_BINARY_MODE ) ;
  (void) NI_write_element( ns , nel , NI_BASE64_MODE ) ;
The return value is the number of bytes written to the output stream. If 0 is returned, then nothing was written (this will be the case if the output socket isn't yet connected at the reading end). If -1 is returned, then nothing was written and the NI_stream suffered an unrecoverable error (this will be the case if the output socket was connected but the connection was broken: e.g., if the reading application crashed).

For debugging purposes, it is often useful to write an element to standard output (in text form, of course). This can be done with the following code snippet:

  NI_stream nstdout ;
  NI_element *nel ;  /* get this from somewhere */
  nstdout = NI_stream_open( "fd:1" , "w" ) ;
  NI_write_element( nstdout , nel , NI_TEXT_MODE ) ;
  NI_stream_close( nstdout ) ;

Output "str:" streams are always written in text mode, regardless of the third parameter to NI_write_element(). Also, data elements that contain String or Line components will always be written in text mode.

The data elements written by this API will

  • Always use quoted Strings as the RHS values for attributes;
  • Always close an element with "</elementname>" rather than "</>";
  • Always mark binary and base64 data streams with the byte order of the application's CPU.

Assembling Data Elements:
It is perfectly possible for the application to assemble its own data elements prior to calling one of the NI_put functions to write them. However, the following routines are intended to make it simpler to assemble a data element from data structures and arrays already present in the application.

  • Create a data element:

      NI_element *nel ;
      nel = NI_new_data_element( "elementname" , 6 ) ;
    
    Creates a data element with the given element name and with ni_dimen=6; the second argument specifies the length of the arrays added to the element, using function NI_add_column().
    • If the second argument is 0, then this will be an empty element.
    • If the second argument is negative, then instead of using NI_add_column() to add fixed-length arrays to the data element, you use NI_add_row() to add one row at a time to the data element. See below for details.
    • You cannot mix the use of NI_add_column() and NI_add_row() in a single element. When you create the element with NI_new_data_element(), you must choose which method will be used to actually put data into the NI_element structure.

  • Add an array (column) to a data element, for data elements whose column length was specified as positive in NI_new_data_element().

      float *fff ;
      NI_add_column( nel , NI_FLOAT , fff ) ;
    
    This adds a float column to the data element. The number of values pointed to by fff must match the number of values specified in NI_new_data_element(). This data is copied into the data structure pointed to by nel, and so can be over-written or deleted by the application after this function call. For the data types NI_STRING and NI_LINE, the third argument should be char **. Each NUL-terminated string from this array will be copied into the data element's internal storage.

  • Add a row to a data element, for data elements whose column length was specified as negative in NI_new_data_element().

    • Before you can add a row, you must describe the contents of the row, and its mapping from some C struct. This definition is done with the function NI_define_rowmap_VA(). For example:
        typedef struct { int m,n; float f; char *s; } somestruct ;
        somestruct sss = { 3,2,1.7,"Fourier Transform" } ;
        nel = NI_new_data_element( "something" , -1 ) ;
        NI_define_rowmap_VA( nel ,
                             NI_INT   , offsetof(somestruct,m) ,
                             NI_INT   , offsetof(somestruct,n) ,
                             NI_FLOAT , offsetof(somestruct,f) ,
                             NI_STRING, offsetof(somestruct,s) , -1 ) ;
        NI_add_row( nel , &sss ) ;   /* add 1st row of data */
      
      In this example, the struct type somestruct has four fields. Each field is defined to the element with a pair of int arguments to NI_define_rowmap_VA(). The first member of the pair is a type code, such as NI_INT. The second member of the pair is an offset into the struct type where the data lives. This offset is most conveniently computed using the C standard macro offsetof(). The final argument to NI_define_rowmap_VA() should be -1 (not a legal type code).
    • Function NI_add_many_rows( NI_element *nel, int nrow, int stride, void *dat ) can be used to add nrow rows at a time. Continuing the above example:
        somestruct *ttt = malloc(sizeof(somestruct)*100) ;
        /** fill ttt[i].stuff for i=0..99 **/
        NI_add_row( nel , 100 , sizeof(somestruct) , ttt ) ;
      
      This is more efficient than adding one row at a time, but simpler than converting each field (e.g., m in somestruct) in an array of structs to a column vector and then using NI_add_column().
    • You can also define a rowmap (in the same way) when you want to extract data from an element acquired from NI_read_element(). The corresponding function call (to get the data out) is
        NI_get_row( nel , rr , &sss ) ; 
      where the int input rr is the row index from which the data should be extracted, and &sss is the pointer to the struct into which the data should be placed (at the offsets previously established by a call to NI_define_rowmap_VA()). Of course, in this case, you must call the NI_define_rowmap_VA() after you acquire the element from NI_read_element() and before you call NI_get_row(). The number and type of row components must agree with the number defined in the data element header. How your program ensures that is beyond the scope of this API (e.g., you could have an convention for various element names to be mapped to corresonding struct types).
    • If you prefer not to use the variable argument list function NI_define_rowmap_VA(), you can instead use the function NI_define_rowmap_AR(), where the type and offset values are input in arrays:
        int typ[4] , off[4] ;
        typ[0] = NI_INT    ; off[0] = offsetof(somestruct,m) ;
        typ[1] = NI_INT    ; off[1] = offsetof(somestruct,n) ;
        typ[2] = NI_FLOAT  ; off[2] = offsetof(somestruct,f) ;
        typ[3] = NI_STRING ; off[3] = offsetof(somestruct,s) ;
        NI_define_rowmap_AR( nel , 4 , typ , off ) ;
      
      In fact, NI_define_rowmap_VA() just assembles type and offset arrays from its inputs, then calls NI_define_rowmap_AR() to do the actual rowmap setup inside the data element struct.

  • Specify dimensionality of element data (optional):

      int    nd[2] = {    2 , 3        } ;
      float del[2] = {  1.5 , 2.5      } ;
      float org[2] = { -1.3 , 3.3      } ;
      char *uni[2] = { "mm" , "parsec" } ;
      char *axi[2] = { "x"  , "y"      } ;
    
      NI_set_dimen ( nel , 2,nd ) ;   /* ni_dimen="2,3"       */
      NI_set_delta ( nel , del  ) ;   /* ni_delta="1.5,2.5"   */
      NI_set_origin( nel , org  ) ;   /* ni_origin="-1.3,3.3" */
      NI_set_units ( nel , uni  ) ;   /* ni_units="mm,parsec" */
      NI_set_axes  ( nel , axi  ) ;   /* ni_axes="x,y"        */
    
    These functions set the indicated attributes. The first one that must be used is NI_set_dimen(), if the number of dimensions is more than 1. In the example, this function sets the size of each dimension; these values must multiply out to the same length as given in NI_new_data_element(). If NI_set_dimen() is not used, then the number of dimensions is taken as 1. The number of dimensions is needed for the other functions (NI_set_delta(), etc.) so that they can extract the correct number of values from their input arrays (del, etc.).

  • If you add data to the element using the NI_add_row() interface, then you cannot specify the dimension attributes as above until the last row is added. This restriction is so that the call to NI_set_dimen() can check if the supplied dimensions multiply out to the column length of the element.

  • Add other attributes (optional):

     NI_set_attribute( nel , "attname" , "attvalue" ) ; 

A data element created this way can be freed by using NI_free_element().

Assembling Group Elements:
A similar set of functions can be used to assemble a group element.

  • Create a new group element:

      NI_group *ngr ;
      ngr = NI_new_group_element() ;
    

  • Add an element (data or group) to a group element:

      NI_element *neladd ;
      NI_add_to_group( ngr , neladd ) ;
    

A group element created this way can be freed by using NI_free_element().


Appendix F: Implementation Status

[21 Feb 2002] The first implementation of most of the C API exists. It is not very well tested yet.

  • Some limitations:

    • http: and ftp: streams work for reading "r", but not for writing "w". (It isn't clear to me what writing to http: means anyway; maybe a POST method to a CGI script? What to do with the response from the script? That seems to be the subject for a standard communications protocol between neuroimaging applications, which would be one of the nonexistent "higher level documents" I alluded to at the beginning.)
    • The various numeric types and String fields are implemented.
      • Note that String data can only be written in text mode; if you try to write a data element in binary mode and the element has a String column, the data element will be written in text mode anyway.
      • On output, Strings are always "quoted" and use escape sequences to represent the 5 special characters & " ' < > .
      • On input, Strings are "unescaped" and have any end-of-line characters included normalized (as described for Attributes, much earlier).
    • Line fields are not implemented.
    • <ni_typedef .../> works.
      • However, its definitions are global; that is, if one input streams defines a new type, then that will affect the data elements coming in from another input stream in the same application.
    • ni_url and ni_include are not implemented.

  • Miscellaneous utility functions not mentioned previously:

    • void NI_sleep(int msec) will put the process to sleep for msec milliseconds.
    • int NI_clock_time(void) will return the number of milliseconds since it was first called in the application (first call always returns 0).
    • long NI_filesize( char *pathname ) will return the length of the given file in bytes (returns -1 if file not found).
    • int NI_byteorder(void) returns constant NI_LSB_FIRST on LSB-first CPUs and NI_MSB_FIRST on MSB-first CPUs.
    • void NI_swap2( int n, void *ar ) will swap n sets of byte pairs in place in array ar.
    • void NI_swap4( int n, void *ar ) will swap n sets of byte quads in place in array ar.
    • void NI_swap8( int n, void *ar ) will swap n sets of byte octuples in place in array ar.
    • void NI_typedef( char *name, char *type, char *dimen ) is equivalent to
        <ni_typedef ni_type=type ni_dimen=dimen/>
      If the input string dimen is NULL, then the ni_dimen attribute isn't set.
    • char *NI_type_name( int tcode ) returns a pointer to a static string (don't free() this!) with the name corresponding to an individual type code as stored in a NI_element vec_typ field.
    • int NI_type_size( int tcode ) returns the size in bytes [sizeof()] of a single value as stored in a variable with the given type code (e.g., NI_FLOAT).
    • int NI_element_allsize( NI_element *nel ) returns the size in bytes of all the data stored in the columns of a data element. This will return 0 if the element is a group element or is an empty data element.
    • int NI_read_URL( char *url, char **data ) reads the remote file specified in the url string, and returns the data (un-gzipped, if needed) into a newly malloc()-ed space pointed to by *data. The number of bytes is the return value of the function; -1 is returned if an error occurred.
      N.B.: FTP and/or gzipped files are staged through the directory named in environment variable TMPDIR (default /tmp).

  • Base64 utility functions:

    • B64_to_base64( int nbin, byte *bin , int *nb64 , byte **b64 ) transforms an array of binary data to a base64 encoded array. The input array is bin[0..nbin-1]. The number of bytes in the newly malloc()-ed output array *b64 is *nb64. There is no ASCII string at the end of this array - it is not a C string.
    • B64_to_binary( int nb64, byte *b64 , int *nbin , byte **bin ) does the reverse transformation. It takes a base64 encoded array b64[0..nb64-1] and produces a binary array of length *nbin into newly malloc()-ed output array *bin. Characters that are meaningless to the base64 decoding process (e.g., whitespace) are skipped.
    • B64_set_linelen( int n ) sets the line length of the base64 encoding produced by B64_to_binary. The default is 72 characters per line, but this can be changed to as few as 16 or as many as 76. The actual number of characters per line will be a multiple of 4, since that is how base64 works.
    • These functions aren't actually used in the NIML part of the code, but are supplied for the programmer's convenience.

  • MD5 utility functions:

    • char * MD5_static_array( int n, char * bytes ) MD5 hashes the data in array bytes[0..n-1] and returns a pointer to a statically allocated C string with the ASCII representation of the hash code. (Don't free() this space! Copy it if you need to save it.) The string is 32+1 characters long.
    • char * MD5_malloc_array( int n, char * bytes ) does the same thing, but returns the result in a malloc()-ed string.
    • char * MD5_static_string( char * string ) does the MD5 hash on a C string. MD5_malloc_string() also exists.
    • char * MD5_static_file(char * filename) does the MD5 hash on the contents of a file. MD5_malloc_file() also exists.
    • MD5 functions are used in NIML only for the UNIQ_idcode() purpose. They are mentioned here for the programmer's convenience.

  • Function char * UNIQ_idcode(void) returns a globally unique identifier C string in newly malloc()-ed space. This string will fit in a 32 byte array (at most, including the NUL byte at the end). No two invocations of this function should return the same string. The characters in the string are alphanumeric 'a-z', 'A-Z', '0-9', with '-' and '_' possible as well. For example: "XYZ_qXxxypkMTmm_wSMh0-dEZA".

    • The purpose of this function is to make it possible for an application to tag a set of elements with a unique identifier that can be used later to associate the elements together.
    • If the environment variable IDCODE_PREFIX is set, then its first 3 characters (if alphabetic) are used to form the prefix of the identifier string; otherwise, the prefix is taken to be "XYZ", as in the example. This feature makes it possible for each site to tag their identifiers with some local initials (e.g., "NIH"). The prefix is followed with an underscore, and then the rest of the identifier characters.
    • The identifier is generated as follows: Generate a string from the system identfier information (uname()), the current time of day, the process id, and the number of times the function has been called. Then MD5 hash this string to a 128 byte code. Base64 encode this byte code to a 22 byte string; replace '/' with '-' and '+' with '_'.
    • Function char * UNIQ_hashcode(char *) can be used to get the MD5+Base64 string encoded from the input string. Unlike UNIQ_idcode(), this function will return the same string if called with the same input. It can be used to produce an idcode string from a filename, for example.

  • Internet host name functions:

    • char * NI_hostname_to_inet(char *hostname) will returned the "dotted form" of the IP address for the given host name, in a C string (e.g., input "localhost" will return "127.0.0.1"). The returned string is malloc()-ed. If NULL is returned, the host couldn't be found.
    • void NI_add_trusted_host(char *hostname) lets you add a host to the "trusted list". If the trusted list isn't empty, then socket connections won't be accepted from hosts not on the list. The default trusted list is empty, unless you initialize it by calling this function at least once.
      • You can call this function with input NULL to have the trusted list initialized with the built-in list, which is just "127.0.0.1" and "192.168."
      • The first call to this function always puts these 2 addresses on the trusted list. It also will add to the trusted list any addresses stored in environment variables of the form NIML_TRUSTHOST_xx for xx from 00 to 99. (If 100 addresses isn't enough for you, maybe you are too trusting?)
      • The trusted list is a list of IP addresses stored in dotted form. If input hostname isn't in dotted form, then NI_hostname_to_inet() will be used to find its primary IP address, which will then be stored in the trusted list.
    • int NI_trust_host(char *hostname) returns 1 if hostname is on the trusted list and 0 if it isn't. If hostname isn't in dotted form, then it will looked up with NI_hostname_to_inet(). If the trusted list was not initialized at all, then all hosts will be trusted (the default insecure condition).
      • A host will be trusted if its IP address string starts with the same character sequence as any of the hosts in the trusted list. Thus, "192.168." means to trust all addresses of the form "192.168.abc.xyz" (this is the reserved private Class B network range).

  • The implementation is in two files: niml.h and niml.c. The test programs nimltest.c and nisurf.c can be used as samples. It has only been tried on Linux as yet (and not so much, either).

  • Many errors just fail silently. For example, characters that aren't understood on the RHS of a ni_type attribute are simply ignored.

    • Setting a global error code (e.g., NI_errno) like the C library does is a possible cheap solution. (But this idea is not implemented.)
    • XML doesn't specify what to do with errors, either. The number of possible errors when parsing input is nearly endless. Dealing with this is especially tricky if the input/output is a ongoing dialog between two free-standing applications. Ugh.
    • One complex error condition:
      If an input stream does not end, but also does not close the data stream with "</>", after about 10 seconds NI_read_element() will return with the (probably incomplete) element anyway. This event can only happen with sockets: if the sending application hangs (not crashes) while sending the data stream, if the sending computer itself crashes ungracefully (e.g., power failure), or if the network between the two programs itself goes down. It isn't clear what the "right" thing to do in this case is, since there is no simple way to test if a socket connection is fully open or only half-open.


Appendix G: Documentation of API Functions and Structures

Alas, this important section has yet to be written.


Appendix H: Complexity of the NIML Standard

As mentioned earlier in an aside, simplicity is an important consideration, since NIML may end up being re-implemented in a number of languages. For this reason, it may be desirable to define a basic NIML specification which has some features removed. Some candidates for simplification/elimination (NIML Lite, also known as the Shrubbery):

  • Eliminate ni_url and ni_include.
  • Eliminate the multi-dimensional descriptive attributes.
  • Eliminate ni_typedef.
  • Eliminate ni_group.
  • Eliminate base64 input or output.
  • Eliminate Line type data (maybe get rid of some other types as well).
  • Delete support for unquoted Strings on the RHS of attributes or in data streams.
  • Disallow long names and integer counts in ni_type; that is, only allow things like ni_type="ffiii", which is pretty easy to decode (each character between the opening " and the closing " defines one column of the data).
  • Add API function to convert all input numeric types to floats, simplifying the number of data types with which the application has to deal.

Appendix I: Linguistic Issues

"NI" or "ni" is to be pronounced as the word "knee", but with a high pitch and shortened. For the defining example of this, please see the film Monty Python and the Holy Grail.

"Niml" means "ants" in Arabic. It is also an acronym for

  • National Institute of Modern Languages
  • Northern Illinois Metro League
  • North Insurance Management Ltd
  • New Iconomy Mailing List (?)
  • Non-Indigenous Minority Languages
  • Norwich Investment Management Limited
These results (from 1 minute of Googling) clearly illustrate that all semi-pronounceable acronyms have already been used, over-used, re-used, abused, and used-up.
Created by Robert Cox
Last modified 2006-01-05 11:52
 

Powered by Plone

This site conforms to the following standards: