globalemu Functions

globalemu.preprocess.process()

process() is used to preprocess the data in the provided directory using the techniques outlined in the globalemu paper. For process() to work it requires the testing and training data to be saved in the data_location directory in a specific manner. The “labels” or temperatures (network outputs) should be saved as “test_labels.txt”/”train_labels.txt” and the “data” or astrophysical parameters (network inputs excluding redshift) as “test_data.txt”/”train_data.txt”.

class globalemu.preprocess.process(num, z, **kwargs)[source]

Parameters:

num: int

The number of models that will be used to train globalemu. If you wish to use the full training data set then set num = 'full'.

z: np.array

The redshift range that corresponds to the models in the saved “test_labels.txt” and “train_labels.txt” e.g. for the 21cmGEM data this would be np.arange(5, 50.1, 0.1).

kwargs:

base_dir: string / default: ‘model_dir/’

The base_dir is where the preprocessed data and later the trained models will be placed. This should be thought of as the working directory as it will be needed when training a model and making evaluations of trained models.

data_location: string / default: ‘data/’

As discussed above the data_loaction is where the data to be processed is to be found. It must be accurately provided for the code to work and must end in a ‘/’.

xHI: Bool / default: False

If True then globalemu will act as if it is training a neutral fraction history emulator.

AFB: Bool / default: None

If True then globalemu will calculate an astrophysics free baseline and subtract this from the training data signals. The AFB is specific to the global 21-cm signal and as globalemu is set up to emulate the global signal by default this parameter is set to True. If xHI is True then AFB is set to False by default.

std_division: Bool / default: None

If True then globalemu will divide the training data by the standard deviation across the training data. This is recommended when building an emulator to emulate the global signal and is set to True by default. If xHI is True then std_division is set to False by default.

resampling: Bool / default: None

Controls whether or not the signals will be resampled with higher sampling at regions of large variation in the training data set or not. Set to True by default as this is advised for training both neutral fraction and global signal emulators.

logs: list / default: [0, 1, 2]

The indices corresponding to the astrophysical parameters in “train_data.txt” that need to be logged. The default assumes that the first three columns in “train_data.txt” are \({f_*}\) (star formation efficiency), \({V_c}\) (minimum virial circular velocity) and \({f_x}\) (X-ray efficieny).

globalemu.network.nn()

nn() is used to train an instance of globalemu on the preprocessed data in base_dir. All of the parameters for nn() are kwargs and a number of them can be left at their default values however you will need to set the base_dir and possibly epochs and xHI (see below and the tutorial for details).

class globalemu.network.nn(**kwargs)[source]

kwargs:

batch_size: int / default: 100

The batch size used by tensorflow when performing training. Corresponds to the number of samples propagated before the networks hyperparameters are updated. Keep the value ~100 as this will help with memory management and training speed.

epochs: int / default: 10

The number of epochs to train the network on. An epoch corresponds to training on x batches where x is sufficiently large for every sample to have influenced an update of the network hyperparameters.

activation: string / default: ‘tanh’

The type of activation function used in the neural networks hidden layers. The activation function effects the way that the network learns and updates its hyperparameters. The defualt is a commonly used activation for regression neural networks.

lr: float / default: 0.001

The learning rate acts as a “step size” in the optimization and its value can effect the quality of the emulation. Typical values fall in the range 0.001-0.1.

dropout: float / default: 0

The dropout for the neural network training. globalemu is designed so that you shouldn’t need dropout to prevent overfitting but we leave it as an option.

input_shape: int / default: 8

The number of input parameters (astrophysical parameters plus redshift) for the neural network. The default accounts for 7 astrophysical parameters and a single redshift input.

output_shape: int / default: 1

The number of ouputs (temperature) from the neural network. This shouldn’t need changing.

layer_sizes: list / default: [input_shape, input_shape]

The number of hidden layers and the number of nodes in each layer. For example layer_sizes=[8, 8] will create two hidden layers both with 8 nodes (this is the default).

base_dir: string / default: ‘model_dir/’

This should be the same as the base_dir used when preprocessing. It contains the data that the network will work with and is the directory in which the trained model will be saved in.

early_stop: Bool / default: False

If early_stop is set too True then the network will stop learning if the loss has not changed within the last twenty epochs.

xHI: Bool / default: False

If True then globalemu will act as if it is training a neutral fraction history emulator.

output_activation: string / default: ‘linear’

Determines the output activation function for the network. Modifying this is useful if the emulator output is required to be positive or negative etc. If xHI is True then the output activation is set to ‘relu’ else the function is ‘linear’. See the tensorflow documentation for more details on the types of activation functions available.

loss_function: Callable/ default: None
By default the code uses an MSE loss however users are able to pass their own loss functions when training the neural network. These should be functions that take in the true labels (temperatures) and the predicted labels and return some measure of loss. Care needs to be taken to ensure that the correct loss function is supplied when resuming the training of a previous run as globalemu will not check this. In order for the loss function to work it must be built using the tensorflow.keras backend. An example would be
from tensorflow.keras import backend as K

def custom_loss(true_labels, predicted_labels,
        netowrk_inputs):
    return K.mean(K.abs(true_labels - predicted_labels))
The function must take in as arguments the true_labels, the predicted_labels and the network_inputs.
resume: Bool / default: False

If set to True then globalemu will look in the base_dir for a trained model and loss_history.txt file (which contains the loss recorded at each epoch) and load these in to continue training. If resume is True then you need to make sure all of the kwargs are set the with the same values that they had in the initial training for a consistent run. There will be a human readable file in base_dir called “kwargs.txt” detailing the values of the kwargs that were provided for the initial training run. Anything missing from this file will of had its default value. This file will not be overwritten if resume=True.

random_seed: int or float / default: None

This kwarg sets the random seed used by tensorflow with the function tf.random.set_seed(random_seed). It should be used if you want to have reproducible results but note that it may cause an ‘out of memory’ error if training on large amounts of data (see https://github.com/tensorflow/tensorflow/issues/37252).

globalemu.eval.evaluate()

evaluate() is used to make an evaluation of a trained instance of globalemu. It has to be initialised with a set of kwargs, most importantly the base_dir which contains the trained model. Once initialised it can then be used to make predictions and return the predicted signal plus the corresponding redshift. evaluate() can reproduce a high resolution Global 21-cm signal (450 redshift data points) in 1.5 ms.

class globalemu.eval.evaluate(**kwargs)[source]

The class can be initialised with the following kwargs and the following code

predictor = evaluate(**kwargs)

kwargs:

base_dir: string / default: ‘model_dir/’

The base_dir is where the trained model is saved.

model: tensorflow model / default: None
If making multiple calls to the function it is advisable to load the trained model in the script making the calls and then to pass it to evaluate(). This prevents the model being loaded upon each call and leads to a significant increase speed. You can load a model via,
from tensorflow import keras

model = keras.models.load_model(
    base_dir + 'model.h5',
    compile=False)
logs: list / default: [0, 1, 2]

The indices corresponding to the astrophysical parameters that were logged during training. The default assumes that the first three columns in “train_data.txt” are \({f_*}\) (star formation efficiency), \({V_c}\) (minimum virial circular velocity) and \({f_x}\) (X-ray efficieny).

gc: Bool / default: False

Multiple calls to the function can cause runaway memory related issues (it is worth testing this behaviour before scheduling hpc jobs) and these memory issues can be somewhat eleviated by setting gc=True. This performs a garbage collection after every function call. It is an optional argumen set to False by default because it can increase the time taken to perform the emulation.

z: list or np.array / default: Original redshift array

The redshift values at which you want to emulate the 21-cm signal. The default is given by the redshift range that the network was originally trained on (found in base_dir).

Once the class has been initialised you can then make evaluations of the emulator by passing the parameters like so

signal, z = predictor(parameters)

Parameters:

parameters: list or np.array

The combination of astrophysical parameters that you want to emulate a global signal for. They must be in the same order as was used when training and they must fall with in the trained parameter space. For the 21cmGEM data the order of the astrophysical parameters is given by: \({f_*, V_c, f_x, \tau, \alpha, \nu_\mathrm{min}}\) and \({R_\mathrm{mfp}}\) (see the globalemu paper and references therein for a description of the parameters). You can pass a single set of parameters or a 2D array of different parameters to evaluate. For example if I wanted to evaluate 100 sets of 7 parameters my input array should have shape=(100, 7).

Return:

signal: array or float

The emulated signal. If a single redshift is passed to the emulator then the returned signal will be a single float otherwise the result will be an array. If more than one set of parameters are input then the output signal will be an array of signals. e.g. 100 input sets of parameters gives signal.shape=(100, len(z)).

z: array or float

The redshift values corresponding to the returned signal. If z was not specified on input then the returned signal and redshifts will correspond to the redshifts that the network was originally trained on.

globalemu.plotter.signal_plot()

This function can be used to assess the accuracy of emulation of a test data set given a trained model and produces a figure showing the mean, 95th percentile and worst emulations. Examples of these figures can be found in the MNRAS preprint. The figure will be saved in the provided 'base_dir/'.

class globalemu.plotter.signal_plot(parameters, labels, loss_type, predictor, base_dir, **kwargs)[source]

The class can be initialised with the following kwargs and the following code

plotter  = signal_plot(parameters, labels, loss_type,
                predictor, base_dir, **kwargs)

Parameters:

parameters: list or np.array

The astrophysical parameters corresponding to the testing data.

labels: list or np.array

The signals, corresponding to the input parameters, that we want to predict and subsequently plot the mean, 95th percentile and worst emulations of.

loss_type: ** str or function**

The metric by which we want to assess the accuracy of emulation. The built in loss functions can be accessed by setting this variable to ‘rmse’, ‘mse’ or ‘GEMLoss’. Alternatively, a user defined callable function that takes in the labels and signals can also be provided.

predictor: ** globalemu.eval object **

An instance of the globalemu eval class that will be used to make predictions of the labels from the input parameters.

base_dir: string / default: ‘model_dir/’

The base_dir is where the signal plot will be saved.

kwargs:

rtol: int or float / default: 1e-2

The relative accuracy with which the function finds a signal with a loss equal to the mean loss for all predictions.

atol: int or float / default: 1e-2

The absolute accuracy with which the function finds a signal with a loss equal to the mean loss for all predictions.

figsizex: int or float / default: 5

The of the figure along the x axis to be passed to plt.subplots().

figsizey: int or float / default: 10

The of the figure along the y axis to be passed to plt.subplots().

xHI: Bool / default: False

If True then globalemu will act as if it is evaluating a neutral fraction history emulator.

loss_label: string/ default: ‘Loss = {:.3f}’

This kwarg can be used to adjust the loss labels in the plot legends. For example if we wanted precision in the 4th decimal place we can set loss_label= 'Loss = {:.4f}'. Equally if we wanted to change the name of the loss and add in units we can have loss_label= 'RMSE = {:.3f} mK'.

globalemu.gui_config.config()

This function can be used to generate a configurate file for the GUI that is specific to a given trained model. The file gets saved into the supplied base_dir which should contain the relevant trained model. The user also needs to supply a path to the data_dir that contains the relevant testing and training data. Additional arguments are described below.

A GUI config file is required to be able to visualise the signals with the GUI and once generated the gui can be run from the command line

globalemu /path/to/base_dir/containing/model/and/config/

class globalemu.gui_config.config(base_dir, paramnames, data_dir, **kwargs)[source]

Parameters:

base_dir: string

The path to the file containing the trained tensorflow model that the user wishes to visualise with the GUI. Must end in ‘/’.

paramnames: list of strings

This should be a list of parameter names in the correct input order. For example for the released global signal model this would correspond to

Latex strings can be provided as above.

data_dir: string

The file path to the training and test data which is used to set the y lims of the GUI graph and ranges/intervals of GUI sliders.

Kwargs:

logs: list / default: [0, 1, 2]

The indices corresponding to the astrophysical parameters that were logged during training. The default assumes that the first three columns in “train_data.txt” are \({f_*}\) (star formation efficiency), \({V_c}\) (minimum virial circular velocity) and \({f_x}\) (X-ray efficieny).

ylabel: string / default: ‘y’

y-axis label for gui plot.

globalemu.downloads.download()

download() can be used to download the released trained models for both the global signal and neutral fraction history emulators.

class globalemu.downloads.download(xHI=False)[source]

Parameters:

xHI: Bool / default: False

Setting this equal to True will cause the method model() to download the released neutal fraction history model rather than the released global signal network.