Which is more stable validation loss or prediction loss?
Generally, your model is not better than flipping a coin. The reason the validation loss is more stable is that it is a continuous function: It can distinguish that prediction 0.9 for a positive sample is more correct than a prediction 0.51.
Why is the validation loss more stable in machine learning?
The reason the validation loss is more stable is that it is a continuous function: It can distinguish that prediction 0.9 for a positive sample is more correct than a prediction 0.51. For accuracy, you round these continuous logit predictions to { 0; 1 } and simply compute the percentage of correct predictions.
Is the validation accuracy less than the training accuracy?
It is not overfitting since your validation accuracy is not less than the training accuracy. In fact, it sounds like your model is underfitting since your validation accuracy > training accuracy.
Why does the loss / accuracy fluctuate during the training?
For batch_size=2 the LSTM did not seem to learn properly (loss fluctuates around the same value and does not decrease). Upd. 4: To see if the problem is not just a bug in the code: I have made an artificial example (2 classes that are not difficult to classify: cos vs arccos). Loss and accuracy during the training for these examples:
Why does validation loss occur after each train step?
In such case, though your network is stepping into convergence, you might see lots of fluctuations in validation loss after each train-step. But if you wait for a bigger picture, you can see that your network is actually converging to a minima with fluctuations wearing out. (see the attached images for one such example).
Why is there a gap in validation accuracy?
The gap between accuracy on training data and test data shows you have over fitted on training. Maybe regularization can help. There are few ways to try in your situation. Firstly try to increase the batch size, which helps the mini-batch SGD less wandering wildly.