Creating Assets for AR Experiences using Generative AI Tools
April 4, 2024
An introduction to creating assets for Augmented Reality experiences using generative AI.
The Importance of Quality Assets
When creating a truly engaging experience, one of the key factors is, of course, the design and the quality of the assets. Using images, 3D models, video and audio that look just right and capture the atmosphere that you want, can be the difference between success and failure.
But sometimes you don’t have the budget or the time to create outstanding assets from scratch, but you do have the ideas and the designs that you need to start creating. Luckily, we live in the era of generative AI and there are some great tools out there to help you create exactly the assets you need.
Different Tools for Different Tasks
The development and adoption of generative AI has been extremely quick and widespread, so we now have a wide range of tools at our disposal for creating almost any kind of asset that you need. Some of them are affordable whilst others are more expensive. I’ll go through several different ones suited for different tasks, such as image creation, 3D model creation or audio creation. Some are more mature than others, so depending on the tool and the type of asset, you'll see varying quality, and therefore there may be tasks that require some manual work to polish the generated asset.
I’ll go through the tools we use at Hololink, that can deliver usable results quickly and reliably, as well as the ones we use knowing that the result won’t always be exactly the way we want it.
Without further ado, let’s jump right in!
Tools for Image Creation
Image creation software lets you create an image by writing a text prompt, describing what you want to see, what the light should be, the style etc.
Midjourney
Midjourney is the image creation AI that we mostly use at Hololink.This is due to the fact that the results live up to our expectations almost every time and can easily be tweaked. With the new –cref flag, we are even able to create a human character that can be used consistently in a series of situations and with different clothing, lighting, illustration styles etc.
When you’ve gotten started with Midjourney, I recommend having a look at the list of parameters that you can use to control the output and get an image that gets as close to your vision as possible: https://docs.midjourney.com/docs/parameter-list.
Midjouney’s prices start at $8/month and go up to $96/month, depending on your needs. Until now, we’ve been able to suffice on the cheapest plan and it’s been worth every cent.
Dall-E is very much like Midjourney and can actually be accessed free of charge through Bing Chat.The results are also very good and it is much easier to use than Midjourney. But the level of control is limited, so changing parts of an image, and re-styling is not as good as with Midjourney.
As it is free through Bing Chat, Dall-E might be the tool for your small projects with a very limited budget.
This is the tool you need to create a spherical image that can be used as a world canvas, letting your user be completely emerged in an environment that surrounds them in all directions. Up until now, I only know of one generative AI tool for creating this, but there may be others that I am not aware of.
Skybox AI by Blockade Labs
Skybox is a fantastic tool for creating immersive environments. In Hololink we use it mainly for 360 image domes to create a specific atmosphere around the user, but it can also be used to create meshes that can be used in Unity or other tools for games and virtual reality.
See some Skybox AI creations in a 360 Hololink here: https://skyboxes.hololink.app, or scan the QR code below if you’re reading this on a desktop or laptop:
Skybox starts at $12/month, and at $60/month you get unlimited generations and the ability to export 3D meshes. For creating for Hololink, the $12 tier is sufficient.
The only tool that I know of that is easily accessible for text-to-video generation is RunwayML. Recently, there has been a lot of talk about OpenAI’s Sora, but since it isn’t available to the public yet, we have to go for some less advanced video generating tools.
RunwayML
RunwayML was founded in 2018, and includes several generative AI possibilities, one of which is the possibility to create short video clips (up to 18 seconds long) from text prompts. The results are not exactly cinema-ready, but some of the results are stunning and are great for creating background video for adding atmosphere to short augmented reality experiences.
The video above was generated with RunwayML using the simple prompt: “A spaceship floating in space above the planet Earth”
Pricing starts at $12/month, but be aware that you’re paying for credits, so playing around for too long can take you to a point where you need to buy additional credits or upgrade your plan.
There are a few tools for animating images. I have chosen two that can that can animate any image you want, and two that are mainly for animating images of humans or humanoid avatars
RunwayML
RunwayML, mentioned above, can be used to animate non-human images, but the main use-case is for creating panning effects or zooming in or out of an image. The effect works well, but you may find that it is limited to background video or fancy loading screens, depending on your needs. For example, getting human figures to move will often result in slow-motion movements and small AI hallucinations, such as a face morphing out of shape.
Additionally, RunwayML, can animate a humanoid image with speech, in the same way that D-ID and Synthesia, which are both mentioned below, although it is not as specialised, and thus I suggest using one of the other tools for this kind of task.
Neural Frames
Neural Frames has quite a lot in common with Runway, in that you can generate images and animate them, directly in the software. The video is mostly panning and colour effects in the video, resulting in interesting, and sometimes psychedelic effects. You can also create and edit your AI video, using music.
The image and the animation of the image in the video above, were both done in Neural Frames
D-ID is a service that offers the possibility of animating a photograph or an illustration of a human, and adding voice to the character, so that it moves and speaks, either with audio generated in the platform, or with audio that you have uploaded yourself. The user interface is very easy and intuitive.
The results are mostly very good and at Hololink we have used several D-ID creations in our projects.
The video above was generated in D-ID, using an avatar created in Midjourney.
The price starts at $4.7/month up to $196/month, giving you between 10 and 100 minutes of video, depending on your subscription. They also offer a short free trial, letting you generate 5 minutes of video.
Synthesia does very much the same thing as D-ID, and have been praised for their high quality avatars and 120 languages with a variety of dialects. Whether you use this or D-ID is up to your specific needs, as both deliver good results.
Prices are $29/month (10 minutes of video) or $89/month (30 minutes of video), with only two fixed tiers and no free trial, whereafter you have to talk to sales if your needs are greater than this. This makes Synthesia less accessible than D-ID.
When it comes to really good 3D assets, the robots are not that close to becoming our overlords as they are with something like images. They are actually quite far from it. Some companies have made great developments and you can definitely create models that are usable in some settings, but we’ll be waiting a bit longer before anyone can create game-ready high definition 3D from generative AI.
We haven’t actually used generative AI for 3D models at Hololink for client projects, but I can see that we’re nearing the point where we might start to do this very soon.
Here are some tools that are available right now.
Luma AI Genie
Genie has an easy and simple interface, letting you type a prompt and seeing four versions of the resulting 3D model at once.
I’ve tried different prompts and as long as you keep it relatively simple, the results are good enough for simple experiences or prototypes. Once you refine the models, you get something that looks very nice, but I seldom get results that are so good that I would build an experience around them as the centre-piece.
Meshy is a multi-facetted tool that allows you to generate 3D models with a text prompt, but also to generate 3D from an image or create new textures for existing models and meshes.
The text-to-3D is very easy to use and the results are on-par with what you get from Luma AI’s Genie. You simply add your prompt and you can even choose between different basic styles for the texturing. You then get 4 different basic versions of the model, that you can then choose to refine.
This helmet was generated with Meashy:
Meshy has a free tier, which gives you 200 credits per month, which basically gives you 8 refined models. After that, pricing starts at $16/month for 1.000 credits.
CSM does the same things as Meshy, with a very different interface. There are more options for you to choose between when generating the model, which starts with CSM showing you an image from your prompt, that you then choose to generate a model from. After generating the model, you can download it in several different formats.
This is a low-res model generated with CSM:
Like the other tools, they have a free plan with most of its features. After that you can choose the $20/month plan or the $60/month plan. Only the most expensive plan has all features, and it is the only plan that gives you the ownership rights to your models, while the other plans require you to let them be CC (Creative Commons)
This 3D editor allows you to create simple game-style 3D models, such as chairs, staffs, swords etc. Like Luma AI, the models created look like they would be great for secondary assets in a low- to mid-poly game or interactive experience. They are much more cartoonish than the models created with other tools, but this actually makes them easier to use, as the style is clear and the textures aren't so advanced that they become messy. Which can happen with the tools above.
There are different types of audio generating tools available, both for music and speech. For music, there are tools that use text-to-music generation, while others let you choose style, tempo, emotion, etc.
Many of the tools we have already mentioned have speech generators built in, such as RunwayML, D-ID and Synthesia. D-ID actually uses the only stand-alone speech synthesis tool that I’m mentioning here, Elevenlabs.
Music
I’ve chosen 4 different music generating tools that work in different ways.
Suno.ai
This music creation tool literally blew me away. In under a minute, it generates music and text, and it genuinely sounds good 🤯. You simply add a text prompt, either just text or with style, instrumentation and choice of voice and it gives you something that isn’t far off, and definitely good to go.
I just discovered Suno.ai when writing this article and I feel confident that we’ll be using it in the future, unless something even better comes along.
Here’s an example of a happy pop song about Hololink that I generated on Suno:
Suno offers a free plan, allowing you to generate 10 songs, and at $8/month on a yearly plan, you get enough credits to create 500 songs with a length of up to 2 minutes.
Beatoven let’s you start with a text prompt for creating the first version of your track. You can then edit the instrumentation, tempo and genre, to get a different version.
The results are quite nice and every time you generate a track, you get 4 alternative versions to choose between before applying any edits to it, and you can choose a section of the generated track and then let the software recompose based on that snippet.
Pricing starts at $6/month for 15 minutes of downloaded music and goes up, depending on how many minutes you need.
There is a free tier which lets you play around as much as you want, but you can’t download the results.
Aiva doesn’t start with a text prompt, but lets you start either by adding a style, a chord progression or an influence to create your track quickly and with very little input from your side. You can also use the step-by-step method, which lets you choose a style and then describe your chord progression with a text prompt. This method requires you to download their desktop app, as they don’t support playing these real-time creations in the browser.
Here's a track created with AIVA:
Aiva is priced at three levels, free, Standard and Pro, with free letting you create and download 3 tracks per month, with copyright owned by AIVA. Standard costs $11/month and AIVA also owns the copyright here. At the Pro level of $33/month, you get the copyright.
This tool does much of the same that AIVA does, but with a different and more well-designed interface.
You can either generate music directly by choosing genre, instruments, energy and more, or you can start by adding a text prompt and then refining the genre etc afterwards.
The results are nice, but some of the tracks I have created using Loudly sound very much like a quick combination of loops created in GarageBand. Like this one:
Pricing starts with a free tier that lets you generate one song per month, from there you can start at $5.99 and go up, depending on the number of songs and the types of license you need for them. There are a lot of options, so I urge you to take a look yourself if you’re thinking of subscribing.
As mentioned above, many of the tools have built-in speech synthesis, but when it comes to creating really good AI voices in a variety of dialects and languages, with the ability to control most aspects of the voice, we only use one speech tool at Hololink. That isn’t to say that there aren’t any other good ones out there, but this is our favourite.
It’s important to say that we have no affiliation with them, but I simply like the quality of their output.
Elevenlabs
Although Elevenlabs has only existed for around 2 years, their speech synthesis has become something of a standard, when wanting to create believable voices for use in many different situations. The interface is minimalistic, with white backgrounds and black text, but the functionality can be quite advanced and playing around with the stability settings can be tricky, but fun.
When adding long texts, the AI can start going off the rails, but Elevenlabs have added a “Projects” feature, to let you work on longer texts, such as whole books, with consistency in the narration. This isn’t what we use it for in Hololink, but it is a nice addition to a good tool.
Pricing starts with a free tier and then goes from $5/month to $330/month, depending on how much text you need. They don’t count the number of minutes you create, but instead let you pay for the number of characters you use.
As you can see, there are a lot of different options to choose from, when wanting to create assets with generative AI. And the options grow every day, as the race to create the next golden AI software company is far from finished, so we can look forward to even more options in the not so distant future.
My suggestion is to jump right in and try the tools I’ve mentioned above and any others that you can find, so that you know which ones you want to use for your work to make your workflow faster and easier and to create immersive experiences that will make you and your clients stand out.
Once you have the assets and the design, all you have to do is drop them into the Hololink editor and create an AR experience that will turn heads. We’re looking forward to seeing your creations!
Jens, PM @ Hololink
An introduction to creating assets for Augmented Reality experiences using generative AI.