Like the input file formats on the HP1000, output files for recording observing data have had many and varied formats. Initially data were generally recorded in some form of ASCII, with cryptic numbers in some form of header, followed by data written as sets of numbers. These were largely unintelligible to a human reader. They were soon found to rapidly fill large amounts of very expensive disk space, then costing around R1000 / MB. Access was also very slow for large files. This prompted a change to binary formats, for instance a pure binary file for spectra, and to an ASCII header + binary data for pulsars. These used much less space and could be read and written rapidly. This was an efficient local solution.
However the data were not portable. The outside astronomical world adopted the FITS format for making data transportable. FITS files user headers comprising keywords + parameters, with one keyword + parameter per 80-character line, all stored in one or more 2880-byte blocks, followed by the data in an array of either binary or ASCII format, also stored in multiple 2880-byte blocks. FITS encompasses many different internal formats, and it requires a 96-page manual to define the FITS Standard at /usr/local/src/fits_standard.ps. Optional internal FITS structures for data storage include random groups (deprecated), binary tables and ASCII tables.
The recording of cordinates in FITS as also evolved. For further discussion, see Representations of celestial coordinates in FITS, of 1996/09/09, by E W Greisen and M Calabretta, in /usr/local/src/fits/wcs.all.ps
Unfortunately, HP-Kermit cannot handle the large record length of FITS files, so recourse was made to unblocked ASCII formats for porting data from the HP. In the case of spectra, where I had attempted to implement FITS files on the HP, I used what is essentially an unblocked version of FITS. Pulsar data, stored in binary with a cryptic ascii header, were converted to ASCII. Continuum data, never recorded in binary, were ported off the HP in their existing cryptic ASCII format.
Spectra and drift scans were also exported offsite in a minimalist format of a line of descriptive header, followed by column names, followed by X,Y pairs in two-column format, all written in ASCII. The advantage of this format is its extreme portability, for example being e-mailable and editable with any editor, and its ability to be imported into non-specialist data analysis software such as spreadsheets.
What data should be recorded on the NCCS, and in what format?
In theory we could again return to the efficient binary files, with cryptic, minimal housekeeping, of the HP. However, R1000 now buys 10 GB of disk space, computer speeds are up by a factor of 1000, and experience suggests that we should be recording a maximum of housekeeping information about each observation, not a minimum, in order to provide maximum validation of the data against changes in the system.
It is also beneficial to standardize the definitions used for the housekeeping. In this regard, the use of the FITS concept of keywords followed by parameters is a suitable model to follow. The following example shows the housekeeping associated with spectra files currently being ported from the HP1000. Keywords that follow the FITS standard are identified.
BEGIN / start of new data set NAXIS = 1 / number of axes in data array (FITS) NAXIS1 = 256 / number of values on AXIS1 (FITS) BUNIT = 'K' / brightness units of data (FITS) OBJECT = '339.88-1.26' / name of object (FITS) TELESCOP= 'HartRAO 26M' / telescope (FITS) OBSERVER= 'S. GOEDHART' / observer's name (FITS) DATE-OBS= '1999-02-13' / date of observation (FITS) DATE = '1999-03-17T13:54:17' / date of creation of header (FITS) CTYPE1 = 'VELO-LSR KM/S' / coordinate type for axis 1 (FITS) CTYPE2 = 'ANTENNA_TEMP' / brightness type of the data CRVAL1 = -43.193 / value at CRPIX1 (FITS) CRPIX1 = 1 / location of ref point on axis (FITS) CDELT1 = 0.11239 / pixel spacing on NAXIS1 (FITS) RESTFREQ= 6.668518E+09 / line rest frequency in Hz RA = 252.1 / RA of EQUINOX in degrees DEC = -46.059 / DEC of EQUINOX in degrees RADOWN = 252.1 / RA of EQUINOX for down spectrum DECDOWN = -46.059 / DEC of EQUINOX for down spectrum EQUINOX = 1950. / equinox in years for coords (FITS) GLII = 339.883079 / galactic longitude GBII = -1.25706087 / galactic latitude GLIIDOWN= 339.883079 / galactic longitude for down posn GBIIDOWN= -1.25706087 / galactic latitude for down posn JULDATE = 2451222.67 / Julian date of SCAN HA = -19.87 / hour angle of SCAN in degrees POL = 0 / polarization TCAL = 14.2 / calibration signal value TCALUNIT= 'K' / calibration signal units TSYS = 59.0315 / system temperature of SCAN in TCALUNITS DTSYS = 2.32639612 / uncertainty in TSYS in TCALUNITS DUR = 210. / total integration time of SCAN in S RMS = 0.142293849 / expected rms noise in data in BUNITS PSS = 0. / point source sensitivity in Jy/K BW = 640000. / original bandwidth of spectrum in Hz FROFFSET= 320000. / frequency offset for fr sw sp in Hz SCAN = 31322 / scan or observation number ADDED = 2. / number of spectra averaged FIRST_CH= 1 / first valid pixel in data LAST_CH = 256 / last valid pixel in data POLYFIT = 3 / order of polynomial fit to baseline FOLDED = 0 / =0 unfolded, =1 folded freq. sw.sp. SMOOTHED= 0 / =0 unsmoothed =1 smoothed TRANSFRM= 0 / =0 spectrum =1 transform ALTDEG = 75.03 / telescope altitude (elev) in deg ATMCORR = 1.0013 / atmospheric atten. corr. applied PNTCORR = 1.07704318 / pointing correction applied GAINCORR= 0.0 / antenna gain correction applied END / end of housekeeping for data set (FITS) 0.336205E+00 0.342031E+00 0.177201E-01 -.336548E+00 / data -.878676E-02 -.546026E-01 -.264129E+00 -.471806E-03 / next line of data ... repeat to end of data BEGIN / start of next data set ...
It should be noted that the housekeeping in the example does not tell us about the details of the feedhorn or receiver or polarization (although there is a keyword for this) that was used, subtleties that we might want to know about.
What are the advantages of the format shown above?
What are the disadvantages of this format?
The FITS file format is the "industry standard" for astronomy. So for comparison with the first example, this is the "primary HDU" of a small area map observed with the HP1000 (in fact a beam pattern, in decibels) after porting to Linux and conversion to FITS format using the FITSIO subroutine package:
SIMPLE = T / file does conform to FITS standard BITPIX = -32 / number of bits per data pixel NAXIS = 2 / number of data axes NAXIS1 = 112 / length of data axis 1 NAXIS2 = 101 / length of data axis 2 EXTEND = T / FITS dataset may contain extensions COMMENT FITS (Flexible Image Transport System) format defined in Astronomy and COMMENT Astrophysics Supplement Series v44/p363, v44/p371, v73/p359, v73/p365. COMMENT Contact the NASA Science Office of Standards and Technology for the COMMENT FITS Definition document #100 and other FITS information. CTYPE1 = 'RA ' / COORD TYPE CRVAL1 = 85.840 / START RA (DEG) CDELT1 = -0.040 / RA INTERVAL (DEG) CTYPE2 = 'DEC ' / COORD TYPE CRVAL2 = 20.000 / START DEC (DEG) CDELT2 = 0.040 / DEC INTERVAL (DEG) DATE-OBS= '26/03/1985' / DATE OF OBSERVATION BUNIT = 'dB ' / DATA IN DECIBELS BSCALE = 1.0 / REAL = VALUE*BSCALE + BZERO BZERO = 0.0 / TELESCOP= 'HART-26M' / TELESCOPE OBSERVER= 'M.J.GAYLARD' / INSTRUME= '13CM CIRC FEED' / RECEIVER OBJECT = 'TAURUS A' / HISTORY beam offsets +0.052, +0.012, ie ~ on axis HISTORY Fsyn = 23.333MHz, Fobs = 2300 MHz, repeats = 4 HISTORY ambient GAASFET amplifier, narrowband IF HISTORY unsmoothed median linear beam pattern map END This was followed by blocks containing the map data as 32-bit binary integers.
While the above looks similar to the spectra example, note that each line is padded with blanks to exactly 80 characters, and the 29 lines shown were followed by 7 blank lines of 80 characters, to produce the HDU comprising 36 lines each of 80 characters, in one 2880-byte block.
Note also that in this case a lot of technical information relevant to the data were presented simply as HISTORY, where appropriate keywords and associated parameters would have provided a better solution.
What are the advantages of true FITS format?
What are the disadvantages of true FITS format?
What should we do for the NCCS?
It is clear that for exporting data, at least two formats are needed:
Initially all data are to be recorded in FITS format. These files can then be converted into other file formats as required, eg multicolumn ascii, TEMPO format for pulsars, etc.
A good model to follow in doing this appears to be the SDFITS standard developed for the GBT. A starting point for looking at the documentation on this is: GBT Software Project Note - Documentation Index at http://www.gb.nrao.edu/GBT/MC/doc/index/spn_doc_index/.