Prompt-based Visual Story Generation

  • Utilized DAMO Vilab's Text2Video diffusion model and Meta AI's Text2Audio model (AudioCraft) synergistically with BERT-based sentence embedding and K-means clustering for topic segmentation to transform user-inputted story prompts into visual stories, yielding an average survey rating of 3.69/5 from user feedback.
  • Leveraged Ollama to generate music context aware prompts (fed into AudioCraft) by configuring Mistral 7B with a system prompt.

Keywords: Generative AI, Text2Video/Audio Models, Ollama, Prompt Engineering, LLMs, Diffusion Models

GitHub repository for the project: https://github.com/parnerka/CSCI-544-Project-Fall23-Group27