Face swapping Helmut Schmidt with my Dad

No disrespect to any person shown! This is just to achieve critical attention and relevance. Learning to deploy this kind of technology works best, when illustrated trough relatable examples. My audience primarily being friends and family at this point, in this way I can take something they know (my Dad) and associate it with something the world knows (Helmut Schmidt)

So how does this work on a high level?! ...

  • You take two videos*, a source video (my Dad) and a target video (Helmut Schmidt giving a speech)
  • Split both videos into their individual frames
  • Export only the faces of those frames and have them centered via history of oriented gradients (HOG)
  • Train a neural network for each of the two faces, so they end up having a general concept of the characteristics of those two faces
  • The target footage (Helmut Schmidt) determines what angels of the source footage are needed. This is why it's better to have great qualitative variety in the source-footage (my Dad)
  • The model is then going to try to replicate any frame form the target footage (Helmut Schmidt) by whatever the source footage (my Dad) allows for
  • After 48 Hours of training on a GTX1070 the results didn't seem to improve too much anymore, so training was ended.
  • A small section of the target footage is selected and the face in each frame is being replaced by the best match the modell was able to come up with from the source footage.
  • Exported and merged back into a video it looks something like on the left side here:

Obviously this doesn't win any beauty contests! Given that there were only TWO videos overall, a two minute piece of my dad at 720p and twelve minutes in 1080p of Helmut Schmidt, I have to say, the result still look relatively acceptable as a proof of concept. This all has been done with the "FakeApp", which simplifies the process via it's GUI and preset hyperparameters.

*This example ONLY works out because the overall environmental circumstances in both videos, determining the look of the footage, are very similar. Usually you would want to have quite heterogeneous footage of your source-actor, who generally is supposed to be able to end up in basically any target footage!