Our product features 1000's of books used by schools and kids learning to read. Our students watch video of someone reading the book, see example below:
[url removed, login to view]
We want to add a 'bouncing ball' visual feature to all our videos, see this example [url removed, login to view]
We're ideally looking to automate the feature, but happy to explore our options to cost effectively achieve our objectives.
Option 1 - process the audio/video & produce a file of the words spoken and the time the person spoke them (so we can use that on our platform to add the bouncing ball)
Option 2 - integrate to some of the new natural language processing API's, Machine Intelligence API's, etc. What new tech or services could we use and should investigate
Option 3 - manually create the data needed (we'd recruit low cost resources so probably not a solution for this project, but maybe there is a mix we should consider of automated and manual?