Feedback on solution and problem with error calculation

First of all, congratulations to all winners, and thanks for the people at Classiq for organizing this competition. I was assured that posts here are still welcome and hope that someone will still read this despite the competition having ended :).

Our solution was an attempt to implement the unitary

U = exp(-i c_1 P_1) exp(-i c_2 P_2) ... exp(-i c_n P_n)

for some combination of 10-qubit Pauli operators P_i with some coefficients c_i, such that the error between U and exp(-iH) as defined was below the threshold of 0.1.

Our method to find this U was a rather crude iteration of optimizing coefficients and truncating terms from the original Hamiltonian (I can elaborate if someone is interested, but it seems irrelevant for my question here). Nonetheless, our final solution came quite close to the 3rd place solution with a circuit depth of 781. However, I have been told that our solution also seems to have not met the error threshold with an error of 0.159, although our own calculations give an error of 0.099 meeting the threshold.

Hence, I am trying to understand if anything is wrong with our way of calculating this error. The Jupyter notebook (incl. additional files) at https://github.com/grossardt/shortcircuit details our calculations.

In the end, we get a transpiled Qiskit circuit of only U and CX gates, and comparing the Qiskit generated unitary of this circuit (corrected by the global phase) with exp(-iH) results in the mentioned error of 0.099.

Funnily, when saving from Qiskit to QASM and loading the QASM file again, there is an additional phase that I cannot explain, resulting in an uncorrected error of 0.289. The correct minimal error value of 0.099 is then only obtained when minimizing with respect to the relative global phase.

However, either way the error threshold seems to be satisfied, so in principle our circuit should qualify as a solution? I would be very greatful if someone has a look and may be able to check whether there is something wrong with our error calculation.

Alternatively, maybe someone from the Classiq team is willing to share some insights into the code used for veryfying the submissions?

Thanks a lot!