In today’s episode we chat with Jose Sotelo, Co-Founder of Lyrebird and an expert in speech synthesis technology. Serious authorpreneurs are always on the lookout for new tools that can help them share their books to a broader audience. In today’s episode Jose unveils what possibilities lie ahead in the interesting field of speech synthesis technology, why progress in this field is important for authors and how they can benefit from software like Descript, a tool that has the potential to give the concept of ‘audiobook’ a whole new meaning.

This site contains affiliate links to products that we have used and love, and that we think may be of help to you on your authorpreneur journey. We may receive a commission on sales of these products, which is how this podcast stays independent and free of advertising. Thanks for your support! Click here for a full list of recommended tools and resources. 

Complete Episode Transcripts

This site contains affiliate links to products that we have used and love, and that we think may be of help to you on your authorpreneur journey. We may receive a commission on sales of these products, which is how this podcast stays independent and free of advertising. Thanks for your support! Click here for a full list of recommended tools and resources. 

Transcript for Strategic Authorpreneur Episode 033: AI Voice Technology with Jose Sotelo

Jose Sotelo: Hey, I’m Jose Sotelo, and welcome to the Strategic Authorpreneur podcast.

Crystal: Hey there are strategic authorpreneurs, I’m Crystal Hunt.

Michele: And I’m Michele Amitrani. We’re here to help you save time, money and energy as you level up your writing career.

Crystal: Welcome to episode 33 of the Strategic Authorpreneur podcasts. On today’s show, we’re talking about AI voice technology for authors with special guest Jose Sotelo.

He’s a developer at Lyrebird, which is the company behind what we love lovingly refer to as our robot voice doubles. And, but first let’s, before we dig into the robots, let’s talk a little bit about what our human selves have been up to this past week.

What has happened since the last episode?

Michele: This is one of the favourite part of the podcast for me, Crystal, is the recommending of one resource or one book to our listeners/viewers. I already spoke about this book, the Ultimate Guide to Book Marketing by Nicholas Erik. Subtitle: The 80/20 system for selling more books. I actually now have the paperback you can see if you are on YouTube. And I really devoured this book and I underline it as really rarely I did in the past few months.

I really think it’s one of the best reading book marketing wise that are out there. It has been updated and expanded. There is a section about audio books strategy on how to use them if you’re wider, for example, or also if you are specifically, let’s say only in Amazon, but just let me say this book it’s about a bit of everything. It doesn’t give you too many information about too much stuff. It does. In some regard, wants you to know just the bit about the business, so it’s not, I would say for people that completely know anything about self publishing, but it is for self publisher who wants to step up to go on the next level.

And there are so many information about productivity, how to organize promos, what is the best strategy if you want to a wide versus Amazon exclusive? I just want to take one second of your time, for the content. So there are several sections, but, some of them just to give you an idea of the content is ‘cracking Amazon algorithms’, ‘traffic’, ‘market research’, ‘promotional strategies’, ‘branding,’ ‘newsletter building’, ‘engagement,’ ‘optimization’. Launching’. There are all of those things that are the must know for a self publishers who really wanted to do this for a living. So I just want to recommend this book again, it’s not the first time I talked about this now I finished it and I re-read it and I definitely recommend Nicolas Erik book, if you’re serious about self publishing or if you want to level up your authorpreneur business. And what I’ve been up to, Crystal? I just mentioned wide and Amazon. I’m actually on the break of trying to decide, where I want to go on that sense. If I want to publish my book wider or do I want to give the exclusive to my content to Amazon its a very delicate choice. We spoke about that, Crystal many times in these podcasts, but also between ourselves. I think it’s one of those things that polarize authors the most, if you are self published.

It’s huge. It’s something that can change, I would say it’s something that can change the trajectory of your business in the long term. It’s not a choice that you should take lightly. And in my regard, what I’m trying to do here is to just do the right amount of research to take a decision without ‘analysis paralysis’ as Nikolas Erik calls it, like just getting too many information, and not choosing something. I’ve been there, I’m still there. Crystal is actually about to make a decision on that sense, because it’s important to evaluate your options.

So I am on the process of gathering data and in the next couple of months I’ll probably have an answer to the question, that $60 million question: Are you going to be a wide author? Or are you going to give the exclusive to Amazon for the foreseeable future?

So these are the fun, and at the same time, mind-wrenching things that I’ve been up to in the past week. What have you been up to in the past seven days, Crystal?

Crystal: Quite similar, actually, although the decision I did make the decision for myself of whether or not I was going to be wide or not.

I made that decision originally about three years ago and it was a three-year plan. I wanted to be exclusive to Amazon for long enough that I could build up some revenue and a little bit of momentum and test out a bunch of stuff and see what directions I wanted to go in for my various series and how that was going to work.

And see how Kindle Unlimited did for writing short and kind of experiment and just gather data. And because I’ve been really focused on nonfiction over the last two years, as opposed to the fiction, I’ve seen the very predictable drop in sales of fiction that happens when you don’t release anything for an extended period of time.

And you are not immune from that just because you were an established author. And so even if you were earning thousands of dollars a month, If you stop releasing things and or don’t pour a ton of money into your ads and whatever else, then you do lose that momentum over time. In my case, I transferred it onto my nonfiction, which was great, but I’m ready to switch back to really focusing on fiction and for me, because that momentum has already been disrupted by my choice to focus on the nonfiction I’m basically starting over. I have a lot of reviews on the books and they’re good and they’re solid and I have a very large mailing list so when I say starting over, it is not quite starting from zero, but I am definitely starting in a place where there’s not a lot to disrupt at the moment, to be honest.

It’s been two years almost since I really put out a new book in the series that I’ve been working on. I did a couple of short standalone short stories, but nothing that ever really got a lot of traction. So this is an opportunity for me to basically say, okay, beta testing period over, now I’m ready to do this for real.

And so I am making a really solid business plan and my daughter is coming on board as an author assistant so that I can focus more fully on the writing, not so much on the publishing side and I will be for sure, going wide so that isn’t happening and I will be reporting back as we build from the ground up, but with a little bit of a head start, I think in terms of existing audiences and stuff.

So I am excited about that. I’m excited to align my business models with my values in a different way and that I really do want everybody to have a choice about where they spend their money and how they get the books, and for the books to be available through libraries. I think we’re going to do a whole episode on going wide, in the next few weeks so stay tuned and we’re gathering lots of expert guests and things that will help us with the process and hopefully by extension help, you decide if you want to make that decision and if you do how you want to go about trying to be successful. In the meantime, we are just configuring the plans for 2021 because goodbye 2020 is I think an excellent sentiment for this year.

We are done with you and ready to move on to our bright and shiny future. So talking about bright and shiny futures, the AI voice stuff is super interesting. So we are going to dive right into the interview with Jose and see what he has to say. And then we are going to come back and talk a little bit about what we heard and what we think the implications are for each of us and also for authors in general. First, just to clarify, we use a couple of different company names in the interview and to keep you from getting confused, Lyrebird is the name of the software that runs the AI stuff in the background. So those voice doubles it’s in charge of that. Descript is the company and the software that we actually license it through. So Descript has a whole suite of tools that you can use to do podcast transcriptions, and you can edit your podcasts, audio and video, you can do all kinds of cool stuff with Descript that makes producing audio content a lot easier. You can even record right into it actually.

And so there is lots of neat functions in descript and the AI voice doubles is one of those functions. So just to clarify that a little bit for you, hopefully that is helpful and let’s go enjoy the interview with Jose and we’ll meet back here shortly.

I’m curious though, if we can get started with just a bit of an introduction to you.

About Jose Sotelo

So what your role is and how you came to be involved in AI voice technology?

Jose Sotelo: For sure. My name is Jose. I am a researcher in Descript. I’m an AI researcher. So the way that I started working on this is because I came to Montreal to study my masters. And during my masters, I was working on something called a Genetic Model, which is trying to answer the question or try to answer the question: Can a computer create new things? Can a computer make a new poem? Can a computer create new song? Can a computer make a new movie? Things like that. And so in particular I was working on, can we make the computer speak like me? And copy the way that I speak?

So it turns out that we can do that and we can do that pretty well using just a few minutes of data. So I started working on this, doing my masters, and at the end of my masters, I started my PhD. But then at the same time with a few colleagues from university, we launched a demo of these technologies online.

Using the voices of some politicians and actually people went really, were really interested in this. We receive like calls from investors, calls from, just companies wanting to use this technology and we decided to start the company based on that company was called Lyrebird. And so we worked on that and we had some demos online that people could use to copy their own voice to give it a try.

There’s many funny videos online on YouTube of people saying I’m going to prank my friend’s by calling them, using my Lyrebird voice. And I will see if they can figure out it’s my voice or that it’s my, that it’s not me. So we were working on that. Then we really enjoyed that, but we were maybe struggling a bit with, finding a way to make these technologies really useful for people. And so at that point that we got in touch with Andrew, who is the CEO of Descript, and the Descript is a company that makes the best podcast editing tool basically. And so in this context, it turns out that the technology that we’ve developed was really useful because it allowed people that record their voice … the technology allows people who record their voice to add things to their recordings and to correct things about the recordings in a really easy way. And so that’s how I ended up working with Descript. We really liked a lot that working with Andrew and we really liked a lot like the idea of the company.

So we decided to join forces and to work with them on this.

Crystal: Nice. We had heard about, I don’t know if you know a lot of the beta testers, but Joanna Penn is an author and indie publisher and she has done a ton of podcasting and stuff. And so I was listening to her podcast probably a year ago and I heard her make an offhand comment about how there was this company that was making this technology that could effectively get you your own AI voice double. And I was like, Ooh, I want that immediately. I want that. That’s so cool. So I emailed someone on your end and said, Hey, can I get it on this action? And then Michele sent the same email and so we got to play around with the beta and it was really fun sending out the demo reel. We had a few friends and family and I sent one, I think it had about 23 different samples on it. And it was the assignment for me, as part of the testing, was listen to this and say which ones are closest to you or which one sounds the best or most clear or whatever. And so I was getting input to see what sounds more like me from the outside, because of course we don’t hear our voice from the inside the same way that other people do. So I sent it to a bunch of folks and the response was quite fascinating and it was” Oh, Okay, which one was the robot. And I said: no, no. They were all the robot voice. None of those were me.

So it was even in that initial beta testing really effective and also really interesting. And they were short samples. So like you said, it worked more or less, better for long samples, but Michele took one of his short stories and he actually had the original beta voice read a short story he just pasted it in chunks. And it actually did a really solid job. I know that’s not necessarily what it was designed to do, but it definitely functioned and we were comparing it to some of the voice, the voices that are an option on medium.com, a lot of writers use medium and they have, I think it’s played on AC or something like that you can pick one of the voices that’s in there to read your articles, but it’s so much more fun to have it, your own voice.

And so we were really excited, just to see that this was an option and of course, to be able to use Descript, to do it or podcast transcriptions, and that’s starting place for all of our podcast stuff is that, and we are just getting into using more and more of the actual editing tools inside Descript to do the cleaning up of things as well and we will do this episode fully in Descript editing mode so that you can see what it can and can’t do. And hopefully you won’t be able to tell, but we will do a bit of a breakdown after for you of what we did so that you can see how it all works. I have a question when you’re doing the testing or when you’re training the voices, because now that you’re out of beta and anyone who wants one can get their own voice by signing up for Descript Pro.

So how does that work? How do you train your voice? We’re all familiar with training Dragon, which is a dictation software that a lot of writers use, but how does this happen now?

How does Descript work?

Jose Sotelo: Yeah. So the way it works is really simple. What we ask people is to go into the Descript and to record a sample of their voice.

We ask them to record at least 10 minutes of something and we recommend them to read more because it works better when you record yourself more like up to 90 minutes, that’s when it works really the best. We asked them to read something and in specific, the reason we do this is really to prevent people from misusing this technology and from using Descript, to copy the voice of someone else without their consent. So that’s the main reason why we ask people to go and repeat something specific.

Crystal: Yeah, that’s great. That’s one of the questions that came up multiple times in our writer’s group, as we were talking about this, and people were asking, how do you protect your own voice out there if you are someone who does podcasts or does a lot of audio, I teach online a ton. So there are video and audio recordings they’re me all over the place. That’s definitely something that came up. So it’s great to hear that there is some, built in sort of protection or efforts to make sure that whoever it is who’s talking into the microphone is the person who owns that voice.

I’m curious so there’s a software that lets you check for plagiarism like in school, when kids are uploading their essays for handing them into their teachers, there’s a program you can run them through that looks for duplication. Is there anything like that in the audio world where you can you do like a copy check to see if somebody has duplicated your voice somewhere or if there’s anything like that in the system?

Jose Sotelo: So there are some solutions that do things like that. For instance, what they’re called in Audio I think it’s watermarks. So a way to put some information in the audio that you  cannot hear, but that a computer would be able to identify.

So to know that our recording is coming from somewhere and it’s not, let’s say an origin recording of someone, they have some problems. I don’t like personally much this solution, even though it’s a solution that we have for sure considered internally. We have found that the best way to deal with this problem of people trying to copy that voice of someone else is really to limit ourselves, to limit our technology and to ask people to record something.

Because if you can already, let’s say, get me an audio of your friend saying, Oh, I know that these boys is going to be used to create my artificial voice using Descript, then basically you can make them say anything. And so we are not putting someone let’s say in danger of losing their identity or having their privacy compromised.

This is something that we really care a lot about. We have cared about this since the first time that we launched the first name of the company. So yeah, it’s one of the values that we got ourselves by.

Crystal: And so I’m curious as well. So there’s lots of words, read or read that look the same on the page and so one of the writerly questions that comes up is always how does the AI know which way to say it? And does it? Does it know which way to say it? Or is that one of the places where you have to watch for things not working quite the way you expect?

Jose Sotelo: So this is a tricky problem. It’s not impossible to solve this problem using machine learning.

And actually we have something in the backend of our program that uses machine learning to try to understand when you say grief, or do you want to say read? And it works well, but it’s not perfect. What we are working on right now is like something that will allow you to overwrite the initial proposal of our algorithm to put whatever pronunciation you want to use.

This will also be useful in cases, for instance, that you have some foreign words that may be like, we cannot know the pronunciation or like maybe you have a pet name to your friends that it’s not part of the normal English language and so we cannot really know what’s the right pronunciation, but you would be able to fix it to whatever you want.

What we do right now is that, as I said, like we have a machine learning system on the back end that tries to predict what is the right pronunciation of the word for things like read and read. What it’s doing specifically is it’s trying to, it’s something called that part of speech tagger, that it’s trying to detect what’s the word that you want to say, but not like really the, just like the words that seem like the letters, but what is the meaning of the word that you try to say? And so in that way it’s going to know for instance, in the read versus read, if you’re speaking about like the present or the past.

That’s how it works. It’s machine learning in the backend.

Crystal: So it can analyze contextual clues then and figure out from what else is around it, which way it’s supposed to be. So how does it work for intonation? If something is a question, does it do the same thing? It knows that there’s a question mark at the end?

So it adjusts, or how does that work?

Jose Sotelo: So that’s really just the magic of machine learning and the way it works is that we just ask you to read that 10 minutes of data, but actually before we have collected thousands and thousands of hours of recordings of other people, and we have trained our model to be able to predict that and also like to predict what is it that makes your voice unique.

And so that’s how we solve those two problems of just how to speak and also how to speak like you. And for instance, for the question marks, you can see whenever, someone is saying a question there’s app in donation that it’s specific to that, and that it’s like a pattern that can be recognized in the speech of everyone.

And so that’s what our algorithm is trying to look for when it’s generating something.

Crystal: Okay. So it gets to use the collaborative amount of wisdom that it already has in the system and then just overlay your own voice and specifics over top of that.

Jose Sotelo: Exactly.

Crystal: One of the things that people had also asked about was whether or not you could record two different voices that were both you, but let’s say you’re doing characters and you want to have, you want to voice all of them, but maybe you want to record one that’s more for your male character and one that’s more for your female character, or maybe you have a kid or a dog or a, some cartoon, something. Is that possible? Can you train multiple voices with the same person or does it need to be separate people?

Jose Sotelo: So it depends what you want to do.

So if what you want to do is to have a different artificial voices that sound really different, and that you are going to use for instance, in a, let’s say that you’re, that you are ad audio engineer working on some cartoons and you have two cartoon characters. Let’s say Rick and Morty, for instance, Rick and Morty, they are both voiced by the same guy, but they’ve very different speech patterns. And so in that case, what I would recommend is that you can create two different artificial voices and put the data for both in different compositions in Descript and just go through the process two times and that’s going to, that’s what’s going to give you like the best two distinct artificial voices at the end.

That being said, actually there’s some people like really creative people that have gone into Descript and not following our instructions. They have one, there are things like, Oh, but how would, the voice of my wife and the voice of myself sound combined? And so they went and recorded it like half of the script, the wife recorded half of the script, and the husband recorded at the end of that, they generated something. And the funny thing is that the algorithm was able to merge their voices into a combined way. So you can hear some patterns that come from like the two different people, but you cannot really say exactly, what comes from who? So it’s a bit of a mix between the two people it’s really funny and it’s really interesting.

Crystal: That’s very cool. What are some of the other ways that are unusual that people have experimented that you noticed, or that could be useful in a creative kind of context?

Jose Sotelo: I’m not sure, but I can tell you like some things that people have done with this technology that have surprised me in the past. For instance, there was one guy that recorded it himself, singing.

Just like all the texts that we asked him to read, instead of reading it, he was just singing it. And then the artificial voice for him was sounding as he was recording. So it was, singing the thing, the new text that you grow into the app. And the funny thing is that we never designed the system to work for this, and we never designed this system to replicate singing, but it’s still worked, okay. It was not, it wasn’t perfect, but it was okay. As well like people have used these too to for instance, copy their emotion. If you are going to, let’s say, try to create a voice of yourself, but speaking angry all the time, the artificial voice is going to sound angry and it’s going to be, it’s going to be really, consistent in sounding angry.

So yeah, those kinds of things, people have experimented with as well.

Crystal: Interesting. I can see the potential for doing like little cartoon snippets and doing a kind of a sketch because you can use some of the programs like Doodly as one of them and Animaker. And you can do short little animated things to teach people about stuff, but I can definitely see the fun in having some voices that you can pull in that are entertaining or instant little musical, a little bit of your very own glee cartoon or something like that would be highly entertaining.

Jose Sotelo: Yeah. I’m actually one of the, let’s say business lines that we have explored and that we are thinking on doing is really launching a pack of voices with very different cartoon characters so that people could animate a stories like that, or maybe even one day video games. So if someone is listening to this and it’s interesting on testing this kind of thing, write an email to us, let us know and we would be really interested in hearing that because it’s one of the things that we are considering at the moment.

Crystal: Cool. One of the questions that, immediately spring to mind was eventually the potential for voice actors to license their voice without them having to be in the studio alive. And so the option would then be that you could license the use of their voice for your audio book and have their robot voice there, robot, double, whatever, read the book for you, which I think is an interesting idea and definitely something that a lot of authors would be interested in because right now it costs thousands of dollars to have somebody’s voice your audio book, but audio books are a really great tool for accessibility.

So if that’s something that you in the mix of your development options, there’s definitely a lot of people who would be really interested and I think the opportunity to embrace the technology and find ways to make sure that the creators get paid properly is nice if there’s a system that’s just set up, then I think this less likely that people do their own thing but that’s definitely something that has come up in conversation every time I’ve talked to anyone in the creative sphere about having the idea of that AI voice was the ability to be able to have sort of narrations done like that.

Accessibility

Jose Sotelo: Yeah, exactly. I think you’re completely right on this. And it’s really like one of the important aspects about this is accessibility.

Another aspect that I would say is that this really even opens opportunities to do things that were never done before, because I think a lot of people, especially voice actors worry that these things like will create like less work for them or things like that. I think that there is some chance that the work will change. I’m not sure like about whether it will increase. Let me give you an example. We have had conversations with some video game makers that, have created like, one limited, endless world to explore. But in this world, obviously you cannot have audio-based characters.

Like you cannot have the characters speak to you because I think the recordings that go into the video game, like they cannot be unlimited like they will have to be limited just because the boy sector cannot be there and record himself for 100 hours or even if it was 100 hours, like that’s not only.

And so using this kind of technology, you could open the world to these kind of applications. And, yeah, we are really looking into this. One of the other ideas in terms of like business development that we have had is something like creating a marketplace for artificial voices, where you, as a voice actor could go and create your artificial voice and then license it to people interested in using it.

So that way, like you would be paid whenever someone would be interested in using your artificial voice, in their content. That’s something that we’re also looking into.

Crystal: I think that has some really cool applications for audio books, because right now, in order to have each character in a book with their own voice, it’s next to impossible in terms of coordination and costs and everything else, it’s really prohibitive.

So there are very small numbers of audio books that have a full cast, and are done effectively. Like a radio play would have been in the old days with actors who play each part. But if you could license voices, for X number of minutes or X number of words, or however it was going to work, you could cast your characters and another problem that we’ve run into with audio books is availability of our voice actors. Because if you’ve got a long series and you have characters that continue from book to book, somebody might take some time off to have a kid, or they might get sick and they can’t record on schedule or something can happen.

And so it was like the backup option of being able to combine those IA voices and license them. Then the actress would still get paid, but they wouldn’t be, under these crazy deadlines in the same way or you’d be able to have consistency, even if there’s three years between when you did book one in book six, that would still be consistent voice over time.

And so that, I think there’s some real potential in terms of applications inside audio books in that way.

Jose Sotelo: Yeah, I think for that, another way in which this technology could be applied throughout the books would be that let’s say that me, I have a favorite narrator that I prefer my books to be narrated by, but you have a different one.

And so right now it’s very difficult to, let’s say professional narrators would read the same book, just because of that’s not how audio companies work today. Like at most there’s the one-hour book version for each book. And so in that case, that would open the possibilities for me to create my own audiobook in the voice of my preferred voice narrator, because it would be so easy and so inexpensive to do that then, we would open the door for something that it’s really not possible today. That’s also another way in which we are looking at this technology.

Crystal: Let’s switch gears for a second and talk about how the voice interacts with descript and how you can combine the text part and the audio part and work with it that way. I think one of the cool features is that you can edit text in a transcript on Descript, and then it will edit the audio and video files to match what you have edited in your text files. So can you tell us a little bit more about like, how does that work and how could people make use of that?

How the voice interacts with Descript

Jose Sotelo: Yeah. So to speak about that, give me just one second to speak about the basic idea of Descript. The basic idea of Descript is that, we want people to be able to edit their outer files and their video files in a much easier way compared to what they were doing before. The way that we do that, the way that we enable that is by having them look at the transcript. So we noticed that a lot of audio or video that people make, it’s just like people speaking and right now it’s normally, or before Descript, it was very painful to edit these recordings because you would not need to go into, let’s say the a WAV file and you’ll try to edit it from there and it’s really time consuming. It’s really annoying to do. But instead, if you look at the transcript and let’s say that if you want to remove, or edit down something that you record it, you could just look at the transcript, see the parts that you don’t like and just remove the transcript for those and with Descript that would mean that the changes would be automatically done on the audio or, and your video. And so these parts would be removed. Now the cool things about overdub, which is a feature that we have been speaking about, is that not only you can remove parts of your transcript to remove the corresponding part of the audio, but now you can type something and you can type something new and the software will create the corresponding audio for that specific part that you just typed. Not only that, but it’s going to match it to the things that go before and after so that the transition is natural. And so what that means basically is that when you make mistakes, during our recording, you don’t need to go in anymore back to the studio to record new audio.

You can just stay in your computer and type the changes and the script will make those changes for you.

Crystal: Very handy as someone who has occasionally made an error on a podcast and would like to just in that moment of final sort of editing and production and be able to just make that go away. It’s definitely handy to have that option.

Jose Sotelo: Exactly. Our goal is to make this thing easier for content producers.

So that they don’t need to go back and do this annoying work and they can just do it while they’re editing their audio in their computer. And it’s really easy, really fast for them.

Crystal: So one of the challenges we’ve come up against is that my cohost for the podcast is Michele Amitrani, and he is Italian.

And so he has somewhat of an accent and the transcription does reasonably well, but it’s definitely not as it’s good for him as it is for me. And so we do find there’s a certain amount of cleanup to be done, but I’m wondering, does it learn. If I’m editing the transcripts to be accurate to the audio does it learn from what I’m changing and will it get smarter as it goes on?

Jose Sotelo: Yeah. So on the backend we use different speech recognition and genes and different speech recognition technologies, and these technologies will continue to improve for different accents. As you said today that’s one of the places where the technology can definitely improve. I would say that it’s going to take some time.

It’s a bit complicated and we have a few things that we are looking at all the time. And I don’t, I cannot promise like when exactly this is going to improve for people with accents, but for sure with time, this is going to improve because at the end, every time we are collecting more data and the systems on the backend will continue to improve.

Crystal: And it is an interesting feedback loop because we found when he goes in to edit some of the transcripts, I used to do them all, and, my daughter did them for a while and then he’s been doing them and so that’s an interesting feedback loop because you realize, okay, when I talk too fast the technology can’t quite keep up so it’s better to slow down a little bit, and once you’ve had to correct things, multiple times, you learn as well. So there’s a certain amount, I think of both directions where you start to adapt when you realize which things make it difficult. So if somebody was looking for just a handful of tips of what they could do to make their transcriptions better from audio, the recording, or, in the other way, if they are reading a sample to their voice to train it, what could you recommend that people keep in mind?

Jose Sotelo: As you said, if you use these and you observe the results, you will look at some things that you can do that will improve the results of the transcription.

I would say things like speaking slowly and trying to end on saying the words as clearly as you can. Also something really important, especially specifically for Overdub is trying to record yourself in a quiet room. Use a good quality microphone. Just the things that you would normally find when you’re looking at how to produce a good quality recording for a podcast, those same things work in the context of transcription and also in the context of overdub creation.

Crystal: People who have, I tend to talk with a lot of emphasis in waving of hands and a fair amount of kind of vocal expression and if I’m reading something or it’s a story, then I tend to use a lot of expression in my voice. And Is that something that makes it easier or harder if you were recording your sample or training a new voice, should you try and be even about things or is it better just to be totally expressive and let the computer figure it out?

Jose Sotelo: So I would say in general, it’s better to be as expressive as possible. This is a bit difficult because it’s good to keep a consistent emotion in the recording. That’s going to create the best quality results for a specific emotion, but between the two options, it’s for sure better to be expressive, even if the emotion changes than to be flat, because the problem is that if you’re flat, then their artificial voice is going to learn this pattern and then it’s going to sound flat.

People usually don’t like that as much. So I would say that it’s better to speak with an expressive intonation.

Crystal: Excellent. And can you add in certain words, like for example, our names, my maiden name was Stranaghan it’s definitely not spelled the way that it sounds and it’s one of those ones. So can you go in and train your voice to be able to say those things?

Jose Sotelo: So the way it’s going to work is that we are going to have a pronunciation editor, basically.

It’s not there yet in that, but this is something that we are having in our minds and we hope to release soon, because indeed this is a problem that people come up with all the time so yeah, this is something that we are going to fix.

Crystal: Very cool. So the same way you would train your spellcheck to recognize certain things, you could basically record a replacement for a single word, and it would learn that or use that, depending.

Jose Sotelo: Yeah. You don’t even need to record it I think the only thing that you would need is to type the pronunciation, and we are going to we are working on the interface. So this interface is a bit tricky and you need to do it carefully for it to work well. But yeah, that’s the plan that it’s easy to use and that it gives you an idea of how your artificial voice is going to pronounce something and to make it easy to change the things that you don’t like.

This is not only important for names and last names, also important for company names and for weird words that you see are not part of a normal dictionary.

Crystal: Well as writers, anyone who writes fantasy can tell you for sure that there is, and to be a standardized way of saying most of their places and character names and all of those things are going to be nonstandard so that’s definitely a key thing that people would be asking about. And I would think too with a lot of technical terminology, with a psychology background, there’s plenty of words we use in psychology that are very weird to pronounce and not necessarily what’s going to make sense to the average person, never mind the computer that may or may not have the same training

Jose Sotelo: No you’re completely right. Then, as I said, this is something that even in, even people in the beta requested already, like we have received many requests for this specific feature. So this is something that we are looking into right now.

Crystal: So I imagine that there are many people out there who are now like, Ooh, I could have my own voice double.

Is that going to cost me an absolute fortune?

How much does it cost?

Jose Sotelo: So in terms of pricing, Overdub costs, $30 or around $30 per month. Right now, this is part of the, Descript Pro Plan. Not only does it include Overdub, but also it includes like all the features of, the normal Descript plan, which costs $10. And which includes like many hours of transcription and advanced features like speaker detection and we even have something that will allow you to, if you record a lot of audio, we even have something that we allow you to remove the disfluencies in your speech automatically. So yeah, we are working on all these kinds of features.

Crystal: And for the people who are not necessarily audio fluid out there, so disfluencies would be things like: Uhm, so, like, mhm, mmm, the sort of spinning circle that your mouth does when your brain is thinking about things.

Jose Sotelo: I am not a native speaker Crystal, so I hope that you use Descript to help me sound more professional in this recording.

Crystal: That will be good homework for me. I’m going to get in there and I’m going to figure it all out. I’ve been really looking forward to giving it a run for its money and see and so I’ve just upgraded to the pro plan last week and I will be playing around in there quite substantially. And as this section is all audio only, not video, I think it’s probably a good place to start since I do think there’s probably some potential issues with all of the hand talking that Michele and I do might get us into trouble with the editing feature as we wave our hands around all the time so it would be a little bit trickier for the computer to splice together those pieces and make it look like it hadn’t been edited up. But definitely on our audio, we can have some fun with that and see what we can do. So if people want to go get to their own voice level, where should we send them?

Jose Sotelo: So going Descript.com and download our app, it’s really easy to get started.

You get a seven day trial for testing your voice and testing all the other features of Descript. yeah, get started as soon as possible and if you guys have any questions, please send us an email and we’ll be happy to answer us as best as we can.

Crystal: I can speak from experience that the tech support folks and also just anyone at the company any time we’ve had questions has been so great about giving us really good answers and getting back to us in a very timely fashion giving how much is going on out there in the world. And so we have really enjoyed working with you guys and I just want to say thank you so much for taking the time to talk to us today.

I know that people are very interested in this and we have a lot of writers in our writer communities who are pretty excited about some of the possibilities this might offer. So we will definitely keep everybody posted about how our editing experience goes with this episode. And we’ll do a debrief section as always right, following this interview with Michele and I, so we’ll talk about the actual editing process of using the tool to edit this audio section of the interview for you so you get the behind the scenes look of the behind the scene. So thank you, Jose so much for joining us and we look forward to seeing all of the exciting new developments as they roll out over the next few months.

Jose Sotelo: Yeah. Thank you so much for giving us the opportunity to talk about that, Descript in your show, Crystal and yeah, happy to, it was a pleasure to speak with you and hopefully I was able to answer some of the questions that you had about the internal magic inside Descript.

Post interview discussion

Crystal: It still feels like magic, but in a slightly less cloaked in mystery sort of way, I feel like I can see the edges of what might be happening in the back end, but it definitely still does have that magic feel, which is very cool.

Michele: Yeah, that was amazing. I think the conversation that Crystal had with the Jose was interesting on different sides. I’m just going to talk about a couple of things that I picked up and that I want to share with you. One of the things that I believe to be mind-blowing no less than that, is that when you are an author and you start seeing the possibility that you can, basically harvest from every single kind of thing around you, this possibility are basically limitless. And what Crystal and Jose have been talking about has been mind-blowing for me because it’s basically a way for us to go and evolve from the word of words, to the word of audio or video in a frankly science-fictiony kind of way. So it’s almost scary to think about that in that way and in that regard, but it’s also very exciting because between Crystal and Jose, there have been a couple of things that they shared and that I want to underline that at least I found very meaningful and very important.

One of the things is that, this software that you can potentially use, can do a couple of things for you as an author. One of the things is, when, and if this software is going to reach a point where it can be actually used as a way for your story to be transformed in an audiobook for example, it can be done creating a marketplace of artificial voices.

And again, we’re going to use a lot of the word artificial because again, it seems like a science fiction kind of discussion, but actually it’s something that we tasted and we tried, Crystal mentioned in the interview that if I’m not mistaken, I tried the software and actually created the audiobook version with my robot, double of a short story Glass Into Steel.

And although it wasn’t perfect and of course there is also my accent to be accountable for, it was something, and it was build in just a few minutes and completely for free. That was a beta version and there were things that Jose said have been fixed. So probably the new version is even better than that.

But if you are an actor right now and you’re looking at us, you could basically go and record, your voice, and then potentially, if everything goes well, license it to people interested in using it. And those people would be us, Crystal, me, all the people that are doing beautiful things with the words that they are creating worlds.

I was just, I just, I was just having fun with Crystal saying, imagine if I could get on one of my mythological fantasy the voice of Patrick Stewart to actually tell the tale. It’s farfetched now, but, it’s fun to think of the potential, to have something like that.

Because if something like this happens, there are also some legal things that I think needs to be addressed before. Something like this become possible. But I think one of the things, that comes as a consequence is that you can scale this very quickly and very fast. If the software that does the I’m saying Patrick Stewart, the voice, just because I like the person, hello Patrick Stewart. If you can do that for one book and with the software, or maybe, can do it in a few minutes, just because it’s like a bot you don’t need months to create your book, you can do that if you have a, I don’t know, several dozens of books, you can do that potentially in a few hours. If we have an update that you can just punch a button and everything is updated for you.

And if there is just one fixed cost, you have to pay for example, to license that voice. Or if you have to pay that on a monthly basis or whatever it is, it’s still going to be for me way more affordable that record with an actor nowadays a book that can cost you several hundreds, if not thousands of dollars and Crystal has way more experienced than me on that regards. She’s been there, she’s done that. So I think she’s going to be super excited to talk about this, possibility of creating a marketplace of actors that actually enable us to share our work to a bigger audience. So I would like to know Crystal, your opinion on that if you will.

Crystal: I will indeed. Interestingly, just after the interview with Jose wrapped up, I got an email in my inbox saying that, Oh, in your Descript Pro account, because I pay for the pro level in order to get the updated, AI voice, there’s actually a library of voices. So I can now pick from a dozen voices that are in there and use them however I want there.

There is a little library. So for me immediately, my brain is Ooh, I could cast different voices as different characters in some of my books, or I could pick a male and female narrator voice that I like to use for point of view switches if you have a book that’s in first person and you want to switch back and forth between point of views, amazing.

I could pick voices just for the dialogue parts and have that worked out, right? It is a little bit cost prohibitive to do that in a traditional audio book narration setting, because you’ve got to bring everybody into their studios or get them off the record separate pieces and splice them all together.

But if you are able to produce those pieces of yourself there’s all kinds of possibilities. I’m already brainstorming. I have a Rivers End radio podcast show, but, I am working on which will go live hopefully in the new year. And so I want to have a radio announcer on that radio show and, somebody who’s introducing the books and visitors to the Rivers End library and things like that.

So I am just exploring possibilities of how to use that library of voices in that way. And, I also am really big on accessibility. So I really want all of my stuff to be able to be listened to by folks who have maybe vision challenges or who need to stay mobile and can’t just be sitting all the time, or for whom reading is hard.

So I think, having things in audio is really important both in fiction and nonfiction. So I’m really excited to see if I can use my voice in some of that as well, to connect with people and also just to have a bit of fun, because I think playing around in a creative space and understanding what the possibilities are, is really important to explore from the inside out because like you mentioned there are some definite concerns around security and safety, and identity theft and all kinds of things that are inherent in this technology existing. But it’s going to exist and so we can educate ourselves about how it works and understand where the risks might be and protect ourselves and our intellectual property, or we can pretend like it’s not happening in which case we’re not going to have a very good chance of being proactive in protecting ourselves or benefiting from some of the really cool upsides. So I think that will be a fun and interesting thing to it would be part of the development as well, and be in on the ground floor when we are testing it and seeing how it works and really wrapping our minds around the opportunities.

If you want to get your own robot voice, you can. So you can go to, Descript.com. We’ll put it in the show notes for you so you can link through and go check out the options there. And if you are producing a podcast or you want audio book versions of your articles, then this might be a good solution for you and it is actually very affordable so that was pretty exciting. You would expect it to cost thousands of dollars, but it actually doesn’t. If this is a fun and shiny thing for you, then go on and check it out and broaden your horizons.

We’re going to broaden our own horizons with Michele’s favorite part of the show, the curious jar, which is filled with random questions submitted by listeners.

Have you ever been haunted by one of your characters?

If you have a question for the curious jar, we need you to email it to us at ideasatstrategicauthorpreneur.com spelling, spelling it will be the hardest part. Don’t worry. It’s in the show notes and now I’m going to pick a question. Tell me when to stop.

Michele: Now.

Crystal: Ooh, orange one. Okay. Let’s see the question. Drum roll please. Oh, this is a good one. Okay. Have you ever been haunted by one of your characters?

Michele: What do you think they mean by haunted?

Crystal: I, yeah, so you want me to go first? I know how I would interpret this one for sure. because I am haunted by one of my characters. I think when I referred to it as haunted I don’t mean like they walk around my house and ghostly form going, Ooh I mean haunted in the sense that I this character for showed up when I was walking through the forest, actually, not too far from my house, but yeah, he’s a firefighter named Alex Martinez and he’s one of the Martinez family brothers.

So he first gets introduced in Maybe, and he’s just he’s come off shift at the firehouse and he meets his brother at the bar for the brother’s birthday and, that’s our first exposure to him. But the first time I ever encounter him as a character was this super vivid flash of him on his knees in full turnout gear, holding a baby at the site of this car crash and just like breaking down and, that has just stuck with me and it’s been maybe four years roughly that I’ve been trying to work up the nerve to write his story, because I know what happens, I know who that baby belong to], I know all the pieces of it and it is emotionally going to be a bit of a tough one, but still was a really good happy ever after at the end.

But anytime I dig back into the writing, anytime I finish a book and I’m ready to start the next one, I check in with myself of are we ready for this one yet? And it does, it just sits there in the back of my mind. And I finally found the image of what I want for the cover and that has been a long time of searching different stock sites and trying to find what fit, the vision that I had. So I’m super excited about that. And I think I’m just about ready, but he has been haunting me for quite some time, as has that little girl. So anyway, we will see how all that shakes out, hopefully in the next few months here.

How about you?

Michele: I don’t think that happened to me yet. I guess I can’t speak for the future. To have a vivid vision of somebody fictional that my mind created. I probably did dream some of the character I created, but the problem with me is that I don’t remember my dreams usually, which is something that not all the people experienced.

They usually, sometimes they remember their dreams. I don’t, but I can’t remember any instance in which there was one of my characters that just haunted me or that I needed to write it down. I’m usually the one haunting the characters that I want to know a bit more in the deep. I guess I’m going to be very boring this time and just say no, it never happened to me. But, it might.

Crystal: Excellent. There’s always time. You will be haunted one day. It’s in your future. The ghosts of Michele’s past present, and future. There you go. We could do a little Christmas Carol retelling of literary ghosts. So that would be a fun holiday episode for us. I have to work on that. See if we can throw in a few extra robot voices for the different ghosties and see what we can do for you guys.

We would love to hear your answer to the curious Jar question. Have you ever been haunted by one of your characters? And if so, tell us about that. What’s that like? And you can drop the answer in the comments below. You can email us at ideas@strategicauthorrepreneur.com and we would love to hear what you have to say

Michele: And for show notes, links to resources that we mentioned, and for coupons and discounts on too, we love please visit us at strategicauthorpreneneur.com.

Crystal: Be sure to subscribe so you don’t miss out on our next episode, we are going to be talking to marketing expert Fazia Burke about how you can connect better with your readers and some new strategies and tools for promoting your books online. So don’t miss out on that. And if you’re finding the podcast helpful, we would love if you drop by our website and click the buy us a coffee button, or leave a review anywhere that you are listening to this podcast.

Thank you so much for all of your support and we will look forward to seeing you next week.

Michele: Thank you and see you next week. Bye bye.