Hello everyone,

So I just started getting into machine learning and it’s been a lot of fun. I took some online courses on Pluralsight to help me get started with TensorFlow, Googles machine learning library. However, even with the online courses I took I’m not completely sure how to handle the project I’m looking to undertake. I’m hoping there may be some people here who can point me in the right direction.

I’m actually looking to train an AI to watch youtube videos of dogs and cats (yes cat videos…). I want it to not only be able to identify the particular breed of animal (German shepherd, Collie etc..), but I want it to learn the general behavior of the animal. From what I understand, a convolution neural network would be ideal to use for identifying objects in images. However, for video is it any different?

Now why would I want to do this? Well I want the AI to have a general knowledge of how a particular breed moves. Not so much the behavior, but more so how it moves and how it doesn’t. I’m trying to develop an automated solution for drawing and animating animals frame by frame.

My thought is, if an AI, can understand the 3D proportions of any particular object or animal, and it’s movements, then you should be able to direct the AI to literally draw the subject on screen in multiple frames. You could simply use photoshop for the canvas. I’m not looking to copy content. That’s easy to do without training an AI. I want it to generalize it’s knowledge and create original animations and designs based on what it has learned.

So, I have been struggling to come up with a potential solution this. How much instruction does an AI need to carry out this task? I mean if I were programming this with a rules based approach, I would try and use a form of photogrammetry to produce a 3D point cloud based on video data, which could then be reconstructed into 3D in any fashion you like. This is essentially a form of computer vision. Sometimes when there is not enough 3D information extracted from a series of 2D images, interpolation is used to estimate where detail is to be added based on existing point data. There are a few libraries to handle this sort of operation. MatLab, MatplotLib (Python) and OpenCV .

However, AIs do not use a rules based approach. If provided enough data (and the right kind), could it figure out how to get the desired output by itself? Or does it need a certain amount of instruction to get there i.e. photogrammetry?

I suppose you could train a supervised learning network with certain 3D point cloud data so it knows what to look out for in the videos. Im not sure if I even need to use 3D Point clouds though? What do you guys think?

Sorry for the long post...I tried to be brief but its rather complex.