Quick Start
===========
Configure Existing Experiment
-----------------------------
Procedure is very simple (**NEER**):
- **N**\ avigate to the example directory.
- **E**\ dit the configuration file **da\_solver.inp**
- **E**\ dit the experiment module file **model_.py** if needed.
- **R**\ un the test case file **_Pytest.py**.
Inside each example directory you will find a configuration file **da\_solver.inp** that you can edit. Here is a list of options currently
used by DAPack:
- **filter\_name** : name of the data assimilation filter. This will be modified when smoothers take place in DAPack.
As of now, this can be one of the following (non-case-sensitive):
- **EnKF** : A stochastic (perturbed-observations) version of the ensemble Kalman Filter (EnKF).
- **SqrEnKF** : A deterministic (square root) version of the ensemble Kalman Filter (EnKF).
- **BootStrap\_PF**: Bootstrap particle filter.
- **PF** : Particle filter with resampling step. This also implements the sequential importance resampling (SIR).
- **HMC** : This should be chosen if Hybrid Monte-Carlo sampler is to be used. This is used for each of the following cases:
- Vanilla HMC with manual parameter tuning.
- The No-UTurn-Sampler (NUTS).
- Generalized NUTS for DA (Work-in-progress).
The sampler will be determined based on the settings in the two variables **Hamiltonian\_integrator, mass\_matrix\_strategy**.
- **particle\_filter\_resampling\_strategy**: The strategy used to re-sample states from the prior ensemble. Currently **systematic , stratified** are implemented:
- **ensemble\_size**: The number of ensemble members to keep during the assimilation process.
- **initial\_time**: Beginning of the timespan of the experiment.
- **final\_time**: Beginning of the timespan of the experiment.
- **cycle\_length**: length of the assimilation cycle. Measurements will be made at multiples of that interval. The three variables **initial\_time , final\_time, cycle\_length**
altogether define the filter timespan.
- **observation\_operator\_type**: Type of the observation operator. For now, we have a **linear,empirical**.
Both choices create a sparse matrix with ones corresponding to observed entries and zeros elsewhere. The operator function implemented here is very simple but more will be added.
The design of the linear observation depends on the value of **observed\_variables\_jump**.
- **observation\_noise\_type**: Only **Gaussian** observation errors are implemented.
- **observation\_noise\_level**: Standard deviation of observation errors is calculated as the product of **observation\_noise\_level** and
the average magnitude of the signal over the timespan of the assimilation experiment. This is calculated per prognostic variable.
- **observation\_spacing\_type**: Either **fixed**, or **random**. If **fixed** is chosen, the observation time points are selected and fixed based on
the variable **observation\_steps\_per\_filter\_steps**. If **random** is chosen, each time point in the filter timespan is chosen to be an actual observation point
(assimilation point) based on a coin flip with probability of being picked set in the variable **observation\_chance\_per\_filter\_steps**
- **observation\_steps\_per\_filter\_steps**: Observation frequency in time. 1 means observations are made at each time instance in the timespan,
2 means observations are made at every other time instance, and so on.
Only used if **observation\_spacing\_type == fixed**
- **observation\_chance\_per\_filter\_steps**: Probability of a time instance in the filter timespan to be picked as an observation point.
Only used if **observation\_spacing\_type == random**
- **observed\_variables\_jump**: This controls the observation frequency over the grid state points. 1 means all prognostic variables are observed at all grid points,
2 means all prognostic variables only at each other grid point are observed, and so on.
- **screen\_output**: Yes(True) or No(False). Controls whether to show numerical results on the screen or not.
- **screen\_output\_iter**: The frequency of screen outputting. This is w.r.t simulation cycles. Used only if **screen\_output == Yes**
- **file\_output**: Yes(True) or No(False). Controls whether to save numerical results to files on disk or not.
- **file\_output\_iter**: The frequency of file outputting. This is w.r.t simulation cycles. Used only if **file\_output == Yes**
- **file\_output\_means\_only**: Yes(True) or No(False). Controls whether to save (to files on disk) the ensemble means (Yes) or the whole ensembles (No).
- **decorrelate**: Yes(True) or No(False). Create and apply a decorrelation operator to the background error covariance matrix. This is known as localization.
- **decorrelation\_radius**: localization distance/radius.
- **periodic\_decorrelation**: if periodic boundaries are used, this should be set to True.
- **read\_decorrelation\_from\_file**: Yes(True) or No(False). Check for 'hdf5' file named **Decorr** to read the decorrelation matrix from.
An Exception will be thrown if the file is not in place.
- **background\_errors\_covariance\_method**: This creates a modeled version of the background error covariance matrix. Only two methods are currently implemented:
- **diagonal**: This will result in uncorrelated structure.
- **diagonal**: A full covariance matrix that may be decorrelated if requested by setting **decorrelate=Yes**.
These options necessarily requires background errors to be **Gaussian**. More options will be considered.
In both cases, a standard deviation vector is created calculated as the product of **background\_noise\_level** and the average magnitude of the signal over the timespan of the
assimilation experiment. This is calculated per prognostic variable.
Then either to set it as the diagonal of the background error covariance matrix if **diagonal** is chosen, set the dense (pre-localized) version of the background error covariance
matrix to the outer product of this perturbation vector.
- **background\_noise\_level**: Check previous point.
- **background\_noise\_type**: **Gaussian** for now.
- **update\_B\_factor**: The background error covariance matrix is updated as a linear combination of the modeled background error covariance matrix,
and a flow-dependent (ensemble-based) version. This factor is multiplied by the flow-dependent version.
1 means flow-dependent version dominates, 0 means modeled version is used, and any other value (between 0 and 1) results in
a hybrid version of the background error covariance matrix.
- **model\_errors\_covariance\_method**: This creates a model error covariance matrix. We assume model errors are Gaussian. This will be investigated further later.
Construction strategy as in the background error covariance matrix, and uses uncertainty level set in the variable **model\_noise\_level**.
- **model\_error\_steps\_per\_model\_steps**: Time frequency of adding model errors. This is w.r.t model time step. Model time step is set in the configuration
file **solver.inp** in the model directory and can be set in the setup function in the model class.
- **model\_noise\_type**: **Gaussian** for now.
- **model\_noise\_level**: check previous two points
- **use\_sparse\_packages**: Yes(True) or No(False). Use sparse packages for matrix representation or not.
- **linear\_system\_solver**: Either **lu, splu** will be used for solving linear systems if required ( to find the effect of :math:`\mathbf{B}^{-1}` on a vector).
Of course constructing the full inverse is avoided.
- **Hamiltonian\_integrator**: Symplectic integrator used to propagate the Hamiltonian system used in HMC. Available are **verlet, 2stage, 3stage, 4stage**
- **Hamiltonian\_step\_size**: Size of the step size used in the Symplectic Hamiltonian integrator. Will be initial value only if NUTS is used.
- **Hamiltonian\_number\_of\_steps**: Number of steps taken by the symplectic integrator to between proposed points. This controls the length of the Hamiltonian trajectory.
Will not be needed for NUTS.
- **Hamiltonian\_burn\_in\_steps**: Number of generated states (to reach convergence) before starting the sampling process.
- **Hamiltonian\_mixing\_steps**: Number of states dropped between retained states after convergence. This is usually useful to reduce correlations and consequently
increase independence of sampled states.
- **mass\_matrix\_strategy**: The mass matrix is assumed to be diagonal. It can be multiple of the identity matrix, but better depend on the variances of the posterior.
Available choices are **identity, prior\_variances, prior\_precisions, modeled\_variances, modeled\_precisions**. I recommend **\_precisions**.
- **mass\_matrix\_scale\_factor**: This scales the diagonal of the mass matrix.
- **hamiltonian\_sampling\_strategy**: **HMC, or NUTS**. The former directs the sampler to use fixed step size and number of steps in the symplectic integrator,
while the latter activates generalized NUTS with dual averaging.
Add New Model
-------------
This is big thing that we will continue working on to make it much easier. Currently the easiest two options are:
- Add the model to the HyPar package by editing the source code. In this case, you will need to create a directory under the **models** directory,
and a corresponding one under the **examples** directory. Follow the pattern in other examples, you will find all are almost the same except for
naming. Of course you can modify the model class file to override default functions given in the base classes..
- Write the model class in details. We will add an example soon.
Add New Filter
--------------
You will need to edit two modules. Firstly, you need to add your assimilation cycle function and related functions to the module name
:mod:`_Assimilation_Filters`. Secondly, you need to edit the function :func:`DA_filtering_cycle` inside :mod:`HyPar_DAFiltering`
to add the option corresponding to the name you chose for your filter. If you will add statistical monitors to your filter, you may want to update the function
:func:`DA_filtering_process` inside :mod:`HyPar_DAFiltering` by updating the file output section(s).