Recently I need to convert a model including 1D convolution layers trained with Keras v2 to PyTorch. Since Keras uses channels last $(N, L, C)$ and PyTorch uses channels first $(N, C, L)$, the weight matrix needs to be transposed in a consistent way such that the output of the two models are the same after flattening.
We can do a simple test to make sure we know how to transpose the weight matrix. Here we will create a simple model in Keras and PyTorch first and get the weight matrix from the Keras model and transpose it to PyTorch format,
|
|
which gives the output,
tensor([[0.7114]], grad_fn=<AddBackward0>)
tensor([0.5577], grad_fn=<ViewBackward0>)
The output of the two models are DIFFERENT. But why? We have transposed the weight matrix, right? The reason is that the weight matrix of the Keras model is not in the same order as the PyTorch model even thought they share the same shape which cause the code to fail silently. To make sure the weight matrix is in the correct order after flattening, we need to make the matrix is in the order of channel last $(N, L, C)$ again before flattening! So the correct code of forward pass for PyTorch model should be,
|
|
which gives the output,
tensor([0.4469], grad_fn=<ViewBackward0>)
Now both models share the same weight matrixes and produce the same output.