check this video for a quick entrance of multi-head attention. check this code for the implementation of the multi-head attention. check this blog to see the theory and details about mask attention.