Most of my previous work with neural networks has been based on the Keras library, so in implementing QARAC, I initially used what I was most familiar with. Some lower-level parts of the algorithm were implemented in TensorFlow, which for some years has been the default backend to Keras (however, it is now possible to use Keras with a choice of backends again).

I decided to to some local testing before committing large amounts of compute time to training the models, but when I did so, I got the following warning.

```
WARNING:tensorflow:Gradients do not exist for variables ['tf_roberta_model/roberta/pooler/dense/kernel:0', 'tf_roberta_model/roberta/pooler/dense/bias:0', 'qarac_trainer_model/qarac_encoder_model/global_attention_pooling_head/local projection:0', 'qarac_trainer_model/qarac_encoder_model_1/global_attention_pooling_head_1/local projection:0', 'tf_roberta_model_1/roberta/pooler/dense/kernel:0', 'tf_roberta_model_1/roberta/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
WARNING:tensorflow:Gradients do not exist for variables ['tf_roberta_model/roberta/pooler/dense/kernel:0', 'tf_roberta_model/roberta/pooler/dense/bias:0', 'qarac_trainer_model/qarac_encoder_model/global_attention_pooling_head/local projection:0', 'qarac_trainer_model/qarac_encoder_model_1/global_attention_pooling_head_1/local projection:0', 'tf_roberta_model_1/roberta/pooler/dense/kernel:0', 'tf_roberta_model_1/roberta/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?
```

It looks like the Training Model wasn't able to propogate gradients between its constituent models. This seems to be a feature of the architecture of Keras, in which `Model`

s are made up of `Layer`

s. Layers are designed to be components of a larger model, and so propogate gradients across their inputs, whereas models, which are intended to be complete systems, do not. From Keras's point of view, I was trying to use models as layers, and it didn't like it.

Since HuggingFace models are available in both TensorFlow and PyTorch, so I looked to see if PyTorch would be more suitable for what I wanted to do. I found that PyTorch doesn't make the same distinction that Keras does between layers and models - both are `Module`

s, so there would be not problem with propogating gradients between them. The learning curve going from Keras to PyTorch wasn't too steep. The main differences were that the method of a Keras layer that's called `call`

is called `forward`

in PyTorch, and there's no direct equivalent of a Keras model's `compile`

and `fit`

methods, so you have to write a training loop. Also, HuggingFace's PyTorch and TensorFlow models aren't exact drop-in replacements for each other, so on occasion adjustments were needed where one wanted a parameter that the other didn't.

You should learn something new on every project, and that has been one of my key personal goals for QARAC. I didn't envisage that I'd end up learning PyTorch for this project, but the fact that I have done is welcome and will come in useful for future projects.

There's only one more thing I need before I can train the models, and that's a budget for compute time, or a community hardware grant from HuggingFace.

If you are interested in this project, please contact Playful Technology Limited.