Virtual Presenters (AI Avatars in-depth)

31 Mar 2024 / 27 minutes to read

Elena Daehnhardt

E-mail Twitter GitHub Pinterest LinkedIn Ko-fi

I am still working on this post, which is mostly complete. Thanks for your visit!

An affiliate disclosure I am an affiliate of the following products mentioned in this post:
Synthesia.io
Deepbrain AI
Hour One AI
Hey Gen

If you click an affiliate link and subsequently make a purchase, I will earn a small commission at no additional cost (you pay nothing extra). This is important for promoting tools I like and supporting my blogging.

I thoroughly check the affiliated products' functionality and use them myself to ensure high-quality content for my readers. Thank you very much for motivating me to write.

Table of Contents

Introduction

This post will briefly introduce AI-powered tools like Synthesia.io that produce incredible avatars.

AI Avatars

AI avatars, also known as virtual humans or digital humans, are computer-generated representations of humans that are increasingly being used in various applications.

How they are created

Avatars are created using artificial intelligence techniques, such as machine learning and deep learning, to simulate the human appearance, behaviour, and interaction.

Deep learning is a type of machine learning that uses Artificial Neural Networks to learn from data. Neural networks are inspired by the structure of the human brain, and they can learn to perform complex tasks such as image recognition and natural language processing.

Do you want to know how does Deep Learning differ from Machine Learning? Read my first post Deep Learning vs Machine Learning

One way to create sophisticated AI avatars using deep learning is to use a generative adversarial network (GAN). GANs are a neural network consisting of two competing networks: a generator and a discriminator. The generator creates new data, such as images or videos. The discriminator is responsible for identifying whether the data is real or fake.

I have asked Google Gemini:

Who invented the generative adversarial network? Please give me the first research article.

The concept of Generative Adversarial Networks (GANs) is credited to Ian Goodfellow and his colleagues. Their research article  introducing GANs was published in June 2014.  While the exact authorship includes multiple researchers, Goodfellow is generally recognized as the main inventor.

When extending the search results, Gemini also provided me with Transcript: AI Breakthroughs with Ian Goodfellow and Richard Mallah by Ariel Conn (2017) from the Future of Life Institute discusses the significant developments in AI in 2016 with Richard Mallah and Ian Goodfellow. In the Transcript you can find links to An introduction to Generative Adversarial Networks (with code in TensorFlow) and the Deep Learning book.

GANs can be used to create AI avatars that are more realistic and lifelike than those made using traditional methods. For example, GANs can create avatars capable of expressing emotions and interacting with their environment. Read related research paper by Abinaya and Vadivu (2024) Enhancing the Potential of Machine Learning for Immersive Emotion Recognition in Virtual Environment.

Purposes

AI avatars can be used for a variety of purposes, including:

Marketing and advertising to create engaging and personalized marketing campaigns, answer customer questions, and provide customer service.
Education to create interactive and engaging educational materials and provide personalized tutoring and instruction.
Training and development to can create immersive and realistic training simulations and provide personalized feedback and coaching.
24/7 Customer service to handle complex customer inquiries and resolve issues.
Entertainment to create virtual worlds and entertainment experiences, as well as in games, movies, and other forms of media.

As AI technology advances, AI avatars are becoming increasingly sophisticated and lifelike. This is opening up new possibilities for their use in various applications.

Here are some of the benefits of using AI avatars:

Cost-effectiveness: AI avatars are much more cost-effective to create and maintain than traditional human actors.
Flexibility: AI avatars can be easily customized and updated.
Scalability: AI avatars can be used to create content for a wide range of platforms and devices.
Accessibility: AI avatars can be used to create accessible content to people with disabilities.

AI avatars are still in their early stages of development, but they have the potential to revolutionize the way we interact with the world around us. They can potentially make our lives more convenient, entertaining, and informative.

Next, we will explore the leading AI tools that can be used to create avatars today.

Synthesia AI

Synthesia.io is a cloud-based platform that uses artificial intelligence (AI) to create videos with realistic human voices and avatars. It is the #1 rated AI video creation platform, with over 50,000 customers worldwide.

Key Features

Create videos in 120+ languages with natural-sounding AI voices
Choose from a variety of AI avatars to represent your brand or message
Add micro-gestures to make your avatars even more lifelike
Edit your videos as easily as a slide deck
Share your videos with anyone on the web or embed them in your website or blog

Use cases for Synthesia AI

Training materials: Create engaging training videos for employees or customers
Product demos and presentations: Showcase your products or services in a dynamic way
E-learning courses: Develop interactive and engaging e-learning courses
Marketing videos: Create high-quality marketing videos that will capture attention
Instructor-led videos: Live-stream or pre-record instructor-led videos

Benefits

Save time and money by creating videos without the need for actors or studios
Reach a wider audience with videos in multiple languages
Personalize your videos with custom avatars and micro-gestures
Make a lasting impression with high-quality, engaging videos

If you are looking for a powerful and easy-to-use AI video creation platform, Synthesia.io is a great option. Its wide range of features and benefits can help you create videos that will make a difference.

Here are some examples of how Synthesia.io is being used by businesses (see their case studies at page Discover AI video success stories):

Zoom uses Synthesia.io to create interactive training modules at scale. Sales teams now have access to realistic simulations instead of lengthy PDF files.
Electrolux uses Synthesia.io to localize their training videos. They create a video, upload scripts, add elements and avatars, and then localize the original English version into 30+ European languages with a single click. Local trainers can access and adjust the translations if necessary, and the video is ready for deployment.
LATAM Airlines uses Synthesia.io to create 300+ videos, reaching over 16,000 learners, with an 83% reduction in production time, thanks to Synthesia’s video creation platform. They can now easily create training videos in multiple languages, leading to better knowledge retention and engagement among learners.
Berlitz is a global leader in language education uses Synthesia.io for language learning materials to reduce the production time for 1700 micro videos by 70%, lowered resource allocation from a full-time team of 6 to only 2 members, and reduced production cost by a factor of 3. Now, they’re excited about the potential of AI video in further diversifying and scaling their digital learning experiences.
Persado is an AI-powered customer engagement platform based in New York uses Synthesia.io to create training content efficiently, and sales reps access bite-sized videos for learning anytime, anywhere. The Persado team values Synthesia’s ease of use, constant improvements, and custom templates for creating training videos at scale.

How is it done?

Synthesia.io creates its avatars using neural networks. To generate high-quality and personalized avatars, you are suggested to provide approximately 15 minutes of footage while standing in front of a green screen. After receiving the footage, Synthesia.io spends approximately two weeks training its models to create a new custom avatar specifically for you.

The technology used by Synthesia.io is proprietary, and only a few details are shared about it. Their work involves a lot of research on photorealistic and controllable neural video synthesis. Synthesia.io works with their co-founders, Prof. Matthias Niessner (TUM) and Prof. Lourdes Agapito (UCL), to conduct foundational research for developing 3D neural rendering techniques to synthesize realistic video. You can find more information about their work on their website, Welcome to Synthesia AI Research.

Alternatives

There are several alternatives to Synthesia.io, each with strengths and weaknesses. Here are a few of the most popular options:

D-ID revolutionizes the way we interact with digital devices, making communication more natural and intuitive. With this interface, users can engage in face-to-face conversations with technology, without the need for typing or clicking.

Creating D-ID agents is a simple process. You will need to select the appearance and voice settings for your agent, upload your text or PDF file to customize it to your specific needs, and provide instructions on how to behave. You will be given some free credits to try it out and see if you like it.

Personally, I am fond of the selection of voices and avatars available. However, I would love to see the avatars capable of understanding my speech and communicating with me. The technology for this already exists, and it would make me even happier to have this option for my future agents.

My first D-ID avatar, Agent 001

Rephrase.ai is a powerful AI platform that enables users to transform plain text into engaging videos. It has many advanced features, such as adding music, transitions, and effects to your videos. Rephrase.ai is ideal for users looking for a high level of control and customization options for their videos.
Hour One AI is a platform for creating synthetic videos that utilises artificial intelligence. It comes with various features, including the capability to generate videos in various languages. Hour One AI is an excellent choice for companies looking to produce multilingual videos.
Hey Gen is an AI-powered video generator that can transform marketing text into engaging videos. Hey Gen provides a wide range of features, such as customised backgrounds and graphics, to create high-quality marketing videos. It is an excellent choice for businesses looking to produce professional video content. %;

Fotor AI creates avatars or faces using the respective Web Interface at Avatar Maker and AI Face Generator. Additionally, Fotor offers powerful AI tools to enhance photos, remove backgrounds and unwanted objects, and even generate images from text. Transform blurry photos, change backgrounds, and remove distractions with ease.
Deepbrain AI is an advanced platform that allows users to create realistic-looking AI-generated videos. It offers a wide range of features, including creating videos with custom avatars and micro-gestures. This platform is an excellent option for businesses that need to produce high-quality, captivating videos.

DeepBrain's Template UI for Work Guide

I really enjoy using DeepBrain virtual presenters because of the option to add gestures between specific sentences. The customization options are vast, which makes it a fun and engaging experience. Additionally, you can easily create images and videos from your text to include in your presentation.

BHuman AI Studio is a powerful platform for creating realistic AI-generated videos for e-learning, product demos, and marketing purposes. It provides a range of features that enable users to create videos with custom avatars, backgrounds, and graphics. This platform is an excellent option for businesses and individuals who want to create engaging and informative videos.
Photoleap is an application that uses AI to create avatars from your selfies. It’s an avatar-creating app and a powerful photo editing tool that can transform any photo into a digital artwork. With Photoleap, you can describe anything by clicking on the generate button, and the AI will create an image for you in just a few seconds. The app lets you turn your words into art on your phone. Additionally, you can draw anything on your mind and add a short prompt, and our AI will fill in the gaps to create your image. See Transform your selfies into avatars instantly with AI Selfies

Avatars in Python

Py-Avataaars at pypi

w You can create a simple “toy” avatar in Python. There are a few different libraries that you can use to do this, but one of the most popular is called Py-Avataaars. Py-Avataaars is a Python library that provides a simple interface for creating and rendering avatars. It uses a pre-trained model to generate the avatars, and you can customise them with various parameters, such as skin colour, hair colour, and hairstyle.

Install it with pip:

pip install py-avataaars

Here is an example of how to create an avatar using Py-Avataaars:

import py_avataaars as pa

avatar = pa.PyAvataaar(
    style=pa.AvatarStyle.CIRCLE,
    skin_color=pa.SkinColor.LIGHT,
    hair_color=pa.HairColor.AUBURN,
    facial_hair_type=pa.FacialHairType.DEFAULT,
    top_type=pa.TopType.LONG_HAIR_CURVY,
    hat_color=pa.Color.RED,
    mouth_type=pa.MouthType.TWINKLE,
    eye_type=pa.EyesType.WINK,
    eyebrow_type=pa.EyebrowType.DEFAULT,
    nose_type=pa.NoseType.DEFAULT,
    accessories_type=pa.AccessoriesType.SUNGLASSES,
    clothe_type=pa.ClotheType.GRAPHIC_SHIRT,
    clothe_color=pa.Color.BLACK,
    clothe_graphic_type=pa.ClotheGraphicType.BEAR,
)

# You can save into PNG or SVG file
avatar.render_svg_file("my_avatar.svg")

This code will create an avatar with defined parameters and save it as a SVG image called “my_avatar.svg”.

PyAvataaar, a SVG avatar image

Stable Diffusion DreamBooth

If you’re looking for more than just simple, static SVG avatars and are willing to put in some effort, check out Pyry Pajunen’s excellent tutorial Easy Realistic Avatars with Stable Diffusion DreamBooth: No-Programming, Step-by-Step Guide (No Third-Party Apps) Pyry Pajunen explaining how to create lifelike avatars using Stable Diffusion DreamBooth, an AI-powered tool that generates accurate avatars with realistic expressions and movements. You can use Google Colab to run the code.

Instruct Pix2Pix

InstructPix2Pix is an incredible AI-powered tool that allows you to edit a picture using plain English instructions. You can write what you want to be changed, for instance, “make the sky red” or “add a cat to the picture,” and the AI will do its best to follow your directions, creating a new image with those changes.

There are two ways to use InstructPix2Pix. Firstly, you can try it out for free via the online demo. For instance, you can play with InstructPix2Pix at HuggingFace Spaces or at replicate.com. All you need to do is upload your picture and type in your instructions. This option is ideal for those who want to give InstructPix2Pix a go without any software installation.

Secondly, for those who are more tech-savvy, you can download the code and run it on your computer. This option offers more control over the editing process.

The complete instruct-pix2pix tutorial is at stable-diffusion-art.com.

There are many options to create fantastic avatars, so I may write about it more in the future.

Best of luck with coding and have fun!

Dangers and ethical considerations

The use of avatars, particularly AI-generated avatars, has numerous beneficial applications across various sectors, including education, healthcare, entertainment, and customer service. However, as with many technologies, there are potential downsides and dangers associated with their use, especially when ethical guidelines are not strictly followed or considered. Below are some of the most dangerous applications of avatars and their related ethical concerns.

Human jobs become obsolete

The first obvious danger for human presenters is that their employment security might be in danger in the near future. Personally, I expect to see virtual presenters working in TV shows and other video content globally in the next two to five years. The technology is already here!

I have asked chatGPT (4.0):

Will virtual avatars substitute human TV presenters in the future?

The question of whether virtual avatars will substitute human TV presenters in the future involves considering technological advancements, audience preferences, and the evolving landscape of media consumption. Here's an analysis of potential scenarios and factors that could influence this outcome:

### Technological Advancements

- **Realism and Interactivity**: As AI and virtual reality technologies advance, virtual avatars become increasingly realistic and interactive. If they reach a point where they can mimic human emotions, reactions, and spontaneity convincingly, they might be seen as viable alternatives to human presenters in certain contexts.
- **Cost and Efficiency**: Virtual avatars can be more cost-effective in the long run, requiring no physical studio space and being available 24/7 without the need for breaks. This economic advantage might persuade media companies to adopt avatars for presenting roles.

### Audience Preferences

- **Human Connection**: Despite technological advancements, many viewers value the human element in media. The authenticity, personality, and emotional depth of human presenters often create a connection that might be challenging for virtual avatars to replicate fully.
- **Acceptance of Technology**: Younger audiences, who are generally more accepting of digital innovations, might be more open to virtual avatars as presenters. The pace at which virtual avatars are adopted could thus depend on demographic shifts and changing viewer attitudes.

### Media Landscape Evolution

- **Niche Applications**: Virtual avatars might first become prevalent in specific segments of the media, such as gaming, virtual reality experiences, or channels dedicated to technology enthusiasts, before seeing broader use.
- **Hybrid Models**: It's possible that rather than a complete substitution, we'll see a hybrid approach where human and virtual presenters coexist, each serving roles that best suit their strengths. For example, avatars could handle routine updates or be used in environments where realism is less critical, while humans focus on in-depth analysis, interviews, and emotionally charged content.

### Ethical and Regulatory Considerations

- **Misinformation and Trust**: Ensuring that virtual avatars don't contribute to the spread of misinformation will be crucial. Their potential use in deepfakes and propaganda raises concerns about trustworthiness and the need for clear labeling and ethical guidelines.
- **Employment and Industry Impact**: The displacement of human jobs by virtual avatars would necessitate discussions on retraining, employment alternatives, and the social responsibilities of broadcasters and tech companies.

### Conclusion

While it's likely that virtual avatars will play an increasing role in TV presenting and other media formats, a complete substitution of human presenters is not guaranteed. Factors such as audience demand for human connection, the nuanced capabilities of human presenters, and ethical considerations might limit the extent to which avatars replace humans. Instead, a more blended approach, where avatars supplement human presenters or serve specific niches, seems a more probable future scenario.

Deepfakes and Misinformation

AI avatars can be used to create deepfakes, which are hyper-realistic video or audio recordings that mimic real people saying or doing things they never actually did. Deepfakes can be used to spread misinformation or false narratives.

The creation and dissemination of deepfakes raise significant ethical issues around consent, truthfulness, and the potential to harm individuals’ reputations, manipulate public opinion, and undermine trust in media and public figures.

Identity Theft and Fraud

AI-generated avatars can be utilized to impersonate individuals for fraudulent purposes, such as scamming, phishing, or accessing restricted information.

The use of AI avatars for impersonation attacks directly concerns privacy invasion, security breaches, and the financial and emotional harm caused to the victims.

AI Avatars can be deployed in social engineering attacks to manipulate individuals into divulging confidential information or performing actions against their best interest, leveraging the trust and authority that a seemingly “human” interaction might command.

These practices raise ethical questions about manipulation, consent, and the exploitation of psychological vulnerabilities for malicious purposes.

Bias and Discrimination

If not carefully designed, AI avatars can perpetuate or even exacerbate biases present in their training data, leading to discriminatory practices or reinforcing stereotypes in interactions.

The propagation of bias and discrimination through AI avatars challenges principles of fairness, equality, and justice, particularly affecting marginalized groups.

Privacy Concerns

The development and interaction with AI avatars can involve the collection and analysis of large amounts of personal data, including voice, facial features, and personal preferences.

he use of personal data to create or interact with avatars brings up concerns about privacy, consent, data protection, and the potential for surveillance.

Highly realistic avatars can create unrealistic standards of beauty or behavior, impacting social dynamics and personal relationships, especially among vulnerable populations such as young people.

The concern here revolves around the psychological impact, including issues related to self-esteem, body image, and the nature of social interactions and relationships in a digital age.

Addressing Ethical Concerns

To mitigate these dangers, it is crucial to develop and adhere to strong ethical guidelines and regulatory frameworks. These should prioritize transparency, consent, privacy, fairness, accountability, and the prevention of harm. Additionally, public awareness and education on the potential misuse of such technologies can empower individuals to navigate digital interactions more safely and critically.

Conclusion

AI avatars, computer-generated representations of humans, are rapidly gaining traction across various industries, including education, marketing, and entertainment. Synthesia is one of the most impressive tools for creating avatars, alongside other remarkable applications.

In this post, we’ve explored some of the leading AI applications and techniques for crafting avatars, complemented by links to related research and advanced AI avatar creation methods and libraries accessible to all.

As these technologies continue to evolve, the potential for more realistic and interactive avatars promises to unlock unprecedented opportunities in how we learn, market products, and entertain ourselves.

Remember, as we explore the possibilities offered by AI avatars, we must consider ethical considerations to ensure the respectful and responsible use of this powerful technology.

Stay updated on the latest AI avatars and other innovations I learn by signing up for our newsletter.

AI apps for Video

Try the following fantastic AI-powered applications.

I am affiliated with some of them (to support my blogging at no cost to you). I have also tried these apps myself, and I liked them.

Synthesia.io can generate videos from text prompts, creates AI avatars and much more.

Hour One AI uses text-to-video generator technology that allows you to easily create, manage, and streamline cinematic AI avatar videos.

Hey Gen uses text-to-video generator technology that allows you to easily create, manage, and streamline cinematic AI avatar videos.

vidIQ helps to grow YouTube channels with optimised content and keyword generation.

Deepbrain AI helps to create videos faster with AI-powered video editing that features realistic AI avatars, natural text-to-speech, and powerful text-to-video capabilities.

Pictory.ai creates professional quality videos from your script with realistic AI voices, matching footage and music in a few clicks. Pictory.AI can also convert blog posts into captivating videos and extract highlights from your recordings to create branded video snippets for social media, and much more.

References

1. Artificial Neural Networks

2. Deep Learning vs Machine Learning

3. Transcript: AI Breakthroughs with Ian Goodfellow and Richard Mallah

4. An introduction to Generative Adversarial Networks (with code in TensorFlow)

5. Deep Learning book.

6. https://publications.eai.eu/index.php/sis/article/view/5036/2871

7. Synthesia.io

8. Discover AI video success stories

9. Welcome to Synthesia AI Research

14. Fotor AI Avatar Maker

15. Fotor AI Face Generator

20. Transform your selfies into avatars instantly with AI Selfies

21. Py-Avataaars

22. Easy Realistic Avatars with Stable Diffusion DreamBooth: No-Programming, Step-by-Step Guide (No Third-Party Apps) Pyry Pajunen

23. Google Colab example code using DreamBoot Stable Diffusion

24. HuggingFace Spaces

25. replicate.com.

26. Instruct Pix2Pix: Edit and stylize photos with text

About Elena

Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.