6. File format¶
This chapter describes the format of all files Phosphoros needs to know about. It is divided into three sections, describing the format of input files (provided by users), of intermediate products (generated by Phosphoros itself for internal use) and of output files (containing Phosphoros results).
6.1. Input files¶
6.1.1. Catalogs¶
Input catalogs can be either ASCII or FITS tables (Phosphoros will auto-detect the type). They must contain the following columns:
ID: The ID of the source
Filter Fluxes: One column for each filter, containing the Flux in \(\mu\)Jy
Filter Flux Errors: One column for each filter, containing the Flux error in \(\mu\)Jy
and optionally:
SpecZ: The spectroscopic redshift (mandatory for training catalogs)
RA, DEC: Right ascension and declination of the source, in degrees
GAL_EBV: Galactic color excess in the source sky position, in magnitude
Filter Shifts: One column for each filter, containing the filter shift in \(\mathring{\rm A}\)
Phosphoros does not require any specific order or naming of the catalog columns. In ASCII tables, the first line, starting with #, should specify the column names.
Phosphoros internally uses strings for the IDs and double precision floats for all the other columns. If the input catalogs contain columns of different type, Phosphoros will automatically perform the convertion.
6.1.2. Auxiliary Data¶
Auxiliary data are ASCII space-separated tables, with two columns. The meaning of columns changes depending on the data type. Unless particular cases, both columns will be parsed by Phosphoros as double precision decimal numbers. Scientific notation (i.e. 0.1234e-56) is allowed.
A dataset file can contain any number of comments, starting with the symbol #.
The name of an auxiliary dataset (e.g., a SED or a filter) used by
Phosphoros is by default the <folder name>/<file name without
extension>. However, if the first line of the file is a one-word
comment (e.g., # TestName), Phosphoros will use it (e.g.,
<folder name>/TestName) as the name of the dataset, in place of
the filename.
Here below a list of typical auxiliary data and their format.
Filter Transmission Curves: the first column contains the wavelength values expressed in \(\mathring{\rm A}\) and the second column the transmission value in the range [0,1].
Note
By default, Phosphoros assumes that filter transmission curves refer to photon-counting systems. In order to specify that a filter is for a energy-measuring system, users must insert the following header in the filter transmission file:
# <Filter Name> # FilterType Energy 3600 0.000242676 3610 0.000493176 3620 0.000953069 ...
The presence of
<Filter Name>is mandatory for energy-measuring systems (but not for photon-counting ones). Moreover, if no keywordFilterTypeis found or it has a different value fromEnergy, photon-counting systems will be considered.SED Templates: the first column contains the wavelength values expressed in \(\mathring{\rm A}\) and the second column the flux expressed in erg/s/cm2/\(\mathring{\rm A}\).
Note
In their header, SED templates can also contain physical parameter information. As example:
# PARAMETER : mass=0.1*L+1.3[solar mass] 901.57 0.0000164030 903.65 0.0000163025 905.73 0.0000162024
Reddening Curves: the first column contains the wavelength values expressed in \(\mathring{\rm A}\) and the second column the values of the reddening curve \(k(\lambda)\).
Luminosity Function Curves: the first column contains the luminosity values in erg/s/Hz or the AB magnitude values and the second column the values of the galaxies number density (typically, but not necessarily, in \({\rm Mpc}^{-3}\,({\rm erg/s/Hz})^{-1}\) or in \({\rm Mpc}^{-3}\)). Note that the format of the file is the same regardless of using magnitude or luminosity.
6.1.2.1. Axes Priors¶
Axis priors have different format according to whether the model parameter is a numerical or categorical variable.
SED and Reddening Curve Axes Priors: here the first column is a string, representing the name of the SED template or the Reddening Curve accordingly. The second column is a double precision decimal number, representing the prior weight to multiply the likelihood with.
Note
The first column must contain the qualified name of the file as seen by Phosphoros (including, i.e., group information). For example, for a SED template in the directory:
> $PHOSPHOROS_ROOT/AuxiliaryData/SEDs/CosmosEll
the first column should be
CosmosEll/<sed name>. Phosphoros actionsdisplay_seds(DS) anddisplay_reddening_curves(DRC) can retrieve these names, as explained in Explore Auxiliary Data.E(B-V) and Redshift Axes Priors: the first column contains E(B-V) or redshift values accordingly, and the second column contains the prior weight to multiply the likelihood with.
6.1.2.2. Multi-dimensional Priors¶
Multi-dimensional generic priors are FITS files with the following Header Data Units (HDUs), in this specific order:
1. Primary HDU: The primary HDU is intentionally left empty. If it contains any data, they are ignored by Phosphoros.
2. Prior HDU: The prior HDU is an image extension, containing a 4 dimensional array, which keeps the prior weights for each cell of the parameter space. It must have the following characteristics:
extension name : it can be any string, which is used for identifying the parameter space region in sparse grids (see bellow)
array type : double precision floating point (BITPIX=-64)
first axis : represents redshift
second axis : represents E(B-V)
third axis : represents reddening curve
fourth axis : represents SED
3. Redshift HDU: the redshift HDU is a binary table extension, which keeps the values of the redshift axis knots. It must have the following characteristics:
extension name :
Z_region, whereregionis the name of the related prior HDUlength : The same as the first axis in the related prior HDU
- first column :
Name : Index
Type : 32-bit integer (TFORM=J)
- second column :
Name : Value
Type : double precision floating point (TFORM=D)
4. E(B-V) HDU: the E(B-V) HDU is a binary table extension, which keeps the values of the E(B-V) axis knots. It must have the following characteristics:
extension name :
E(B-V)_region, whereregionis the name of the related prior HDUlength : The same as the second axis in the related prior HDU
- first column :
Name : Index
Type : 32-bit integer (TFORM=J)
- second column :
Name : Value
Type : double precision floating point (TFORM=D)
5. Reddening Curve HDU: the Reddening Curve HDU is a binary table extension, which keeps the values of the reddening curve axis knots. It must have the following characteristics:
extension name :
Reddening Curve_region, whereregionis the name of the related prior HDUlength : The same as the third axis in the related prior HDU
- first column :
Name : Index
Type : 32-bit integer (TFORM=J)
- second column :
Name : Value
Type : string (TFORM=*A, where * the max length)
6. SED HDU: the Sed HDU is a binary table extension, which keeps the values of the SED axis knots. It must have the following characteristics:
extension name :
SED_region, whereregionis the name of the related prior HDUlength : The same as the fourth axis in the related prior HDU
- first column :
Name : Index
Type : 32-bit integer (TFORM=J)
- second column :
Name : Value
Type : string (TFORM=*A, where * the max length)
7. Sparse Grids HDUs: to create priors for sparse grids, the set of prior HDU and axes HDUs have to be repeated as many times as the number of regions in the sparse grid.
Tip
Do not try to create files of this complex format from
scratch! Phosphoros provides the tool create_flat_grid_prior
(CFGP) that will generate a flat prior FITS file based on
the parameter space of a model grid file (for more info see
Multi-dimensional Priors).
6.2. Intermediate Products¶
In the standard directory organization of Phosphoros, all intermediate products are stored in the directory (or in sub-directories of):
> $PHOSPHOROS_ROOT/IntermediateProducts/<Catalog Type>
6.2.1. Model Photometry Grid¶
Due to the size, the file containing the grid of modeled photometry is
typically stored in an internal Phosphoros format. Access from the C++
language can be done by using the Phosphoros PhzDataModel
module. Access outside C++ can be performed with the Phosphoros action
display_model_grid (DMG). For more information see the
Investigate model grids section.
Users can also store the model grid file in ASCII using the CLI, by
setting the following option of the compute_model_grid (CMG)
action as:
--output-model-grid-format=TEXT
By default, the file is named as Grid_<Catalog Type>_<parameter
space name>_<IGM prescription>.dat (e.g.,
Grid_Challenge2_Parameter_Space_MADAU.dat) and stored in the
IntermediateProducts/<Catalog Type>/ModelGrids directory. A
different name can however be chosen with the GUI (see
GUI: Generating the model grid) or with the CLI (using the
--output-model-grid option)
6.2.2. Photometric Zero Point Corrections¶
This file is an ASCII table with two columns. The first column is the qualified name of filters (including the group information) and the second one is the photometric correction value.
By default, the file is named as <Catalog Type>_<parameter space
name>_<average method>.txt (e.g.,
Challenge2_Parameter_Space_WEIGHTED_MEDIAN.txt) and stored in the
IntermediateProducts/<Catalog Type> directory.
Note
The corrections are on the source flux and not on the magnitude, meaning that the flux of each filter will be multiplied with the provided value.
6.2.3. Filter Mapping¶
The filter_mapping.txt file is an ASCII file used to map filter
trasmission curve files to catalog column names. It is located in the
following directory:
> $PHOSPHOROS_ROOT/IntermediateProducts/<Catalog Type>/
This file looks like:
# Filter, Flux Column, Error Column, Upper Limit/error ratio, Convert from MAG, Filter Shift Column
DECAM/g FLUX_G FLUXERR_G 3 0 NONE
DECAM/i FLUX_I FLUXERR_I 3 0 NONE
DECAM/r FLUX_R FLUXERR_R 3 0 NONE
DECAM/z FLUX_Z FLUXERR_Z 3 0 NONE
EUCLID_DC1/vis FLUX_VIS FLUXERR_VIS 3 0 NONE
vista/H FLUX_H FLUXERR_H 3 0 NONE
vista/J FLUX_J FLUXERR_J 3 0 NONE
vista/Y FLUX_Y FLUXERR_Y 3 0 NONE
and includes 6 columns:
Column 1,
Filter: The qualified name of the file containing the filter transmission curve (i.e., the directory name below theAuxiliaryData/Filtersdirectory plus the filter name)Column 2,
Flux Column: The catalog flux column name corresponding to the filterColumn 3,
Error Column: The catalog flux error column name corresponding to the filterColumn 4,
Upper Limit/error ratio: The number used to recompute flux errors ifUpper Limit recompute error flagis equal to, e.g.,-99(see GUI: Mapping filters to column names)Column 5,
Convert from MAG:0if photometry are provided in fluxes,1in AB magnitudeColumn 6,
Filter Shift Column: The name of the catalog column containing the filter shift (ifNONE, filter variation correction is not applied to the filter)
The error_adjustment_param.txt file is found in the
same directory and looks like:
DECAM/g 1 0 0
DECAM/i 1 0 0
DECAM/r 1 0 0
DECAM/z 1 0 0
EUCLID_DC1/vis 1 0 0
vista/H 1 0 0
vista/J 1 0 0
vista/Y 1 0 0
where Column 1 is the qualified name of the file containing the filter transmission curve, and Columns 2,3,4 are the values of the coefficients \(\alpha_k\), \(\beta_k\), \(\gamma_k\) used to re-calibrate flux errors (see Eq. (3.1)).
The files are automatically generated by the GUI at the Catalog
Setup step. Otherwise, users have to create them at the right place.
6.2.4. Other Products¶
Phosphoros generates other intermediate products when luminosity priors, filter variation correction and Galactic absorption correction are applied. They are the luminosity model grid, the filter variation correction grid and the correction coefficients grid and are located, respectively, at the directories:
> IntermediateProducts/<Catalog Type>/LuminosityModelGrids/
> IntermediateProducts/<Catalog Type>/FilterVariationCoefficientGrids/
> IntermediateProducts/<Catalog Type>/GalacticCorrectionCoefficientGrids/
Both files are stored by default in binary format, accessible only by the Phosphoros C++ executables. They can also be stored in ASCII format using the CLI, as follows:
in the
compute_luminosity_model_grid(orCLMG) action, by setting the option--output-model-grid-format=TEXTin the
Compute Filter Variation Coefficient Grid(orCFVCG) action, by setting the option--output-filter-variation-coefficient-grid-format=TEXTin the
compute_galactic_correction_coeff_grid(orCGCCG) action, by setting the option--output-galactic-correction-coefficient-grid-format=TEXT.
6.3. Results¶
In the standard directory organization, all Phosphoros outputs are stored in the directory:
> $PHOSPHOROS_ROOT/Results/<Catalog Type>/<input catalog name>/
where the name of the input catalog is without the extention.
6.3.1. Output Catalogs¶
Output catalogs can be stored either in FITS or in ASCII format. The
default name is phz_cat, with the extension according to the
format.
In the basic case (i.e., without saving the best model or the 1D PDFs), output catalogs contain the following columns
ID: the source ID
Z: the best-estimate of redshift (in this case it coincides with the 1DPDF-Peak-Z value)
Posterior-Log: the amplitude of the posterior distribution at the maximum
Likelihood-Log: the amplitude of the likelihood at the maximum
1DPDF-Peak-Z: the redshift at the maximum of the 1D redshift PDF
If Best posterior model is enabled in the GUI (or
--create-output-best-model=YES in the compute_redshift action
of the CLI), these columns are added:
SED, ReddeningCurve, E(B-V) and Z: they are the values corresponding to the maximum of the posterior distribution.
SED-Index: this is the index of the best-model SED template inside the group the SED belongs to.
Scale: the normalized scale factor \(\alpha\) associated with the best model.
If Best likelihood model is enabled (or
--create-output-best-likelihood-model=YES), the columns have the
same names as those above except that they start with LIKELIHOOD-
(e.g., LIKELIHOOD-SED).
Note
If enabled, output catalogs also contain the physical parameter values as estimated from the best-fit model, one column for each physical parameter.
6.3.2. Marginalized 1D PDFs¶
The marginalized 1D PDFs can be either generated as part of output catalogs or as an individual file.
If they are generated as a catalog column in ASCII format, they are a list of comma separated values. If they are generated in FITS format, they are vector columns. In both cases, the axis bins are given as part of the comments of the file.
If the 1D PDFs are generated as an individual file, they are FITS files containing binary table HDUs with two columns, the first of which represents the axis parameter (e.g., redshift) and the second the probability. The name of each HDU is the ID of the corresponding source and it can be used for searching the 1D PDFs. Moreover, the order of the HDUs matches the order of the sources in the input catalog (starting from the first extension HDU).
6.3.3. Multi-dimensional Posterior Distribution¶
Multi-dimensional posterior outputs depend on the choice of users to save the full grid or a sampling of the posterior distribution. All the multi-dimensional outputs are stored in the directory:
> $PHOSPHOROS_ROOT/Results/<Catalog Type>/<input catalog name>/posteriors/
a) If Full grid is selected in the GUI (or
--full-PDF-sampling=NO in the compute_redshift action),
Phosphoros produces one FITS file for each source of the catalog,
containing the multi-dimensional posterior distribution. The name of
the file is the ID of the source, with the extension fits. It
contains the following HDUs:
Primary: a 4-dimensional array containing the likelihood or posterior distribution (order of axes: Z, E(B-V), RedCurve, SED)
Z: a single column binary table with the values of the Z axis
E(B-V): a single column binary table with the values of the E(B-V) axis
Reddening Curve: a single column binary table with the values of the Reddening Curve axis
SED: a single column binary table with the values of the SED axis
Note
Phosphoros provides a tool for visualising files of this type, as explained in the Posterior Investigation section.
b) If Sampling is selected in the GUI (or
--full-PDF-sampling=YES in the compute_redshift action),
Phosphoros saves only a sampling of the model parameters. Multiple
FITS files are produced, each with the results of at the most ten
thousand sources (this number can be modified using the CLI). The name
of output files is Sample_File_posterior_<n>.fits, where <n> is
the file index. The output files have the following columns:
OBJECT_ID, GRID_REGION_INDEX 1, SED_INDEX,
REDSHIFT, RED_CURVE_INDEX and EB_V. In addition,
Phosphoros creates a FITS file (Index_File_posterior.fits)
containing the object IDs and the file names where the corresponding
outputs are stored.
Note
If enabled, the Sample_File_posterior_<n>.fits files contain
additional columns with the value of physical parameters and of the
model luminosity at 10pc (in the filter used for the SED
normalization).
Footnotes
- 1
GRID_REGION_INDEX is the 0-based index of sub-spaces in the parameter space (e.g.,
Elliptical,Spiral, etc.). The index follows the order in which the sub-spaces have been defined in the GUI or in the CLI. This information is useful, for example, to display the photometry of best-fit models through thedisplay_model_gridaction (see Investigate model grids).