Asynchronous Audio Technology w/ Carl Robinson

Tim Marting
Jan 5, 2023
27 min read

Today, my guest is Carl Robinson, CEO, and co-founder of Rumble Studio, a startup in Paris on the frontier of asynchronous interview audio technology. Carl, welcome to the podcast. As I dive deeper into your technology, I see it as groundbreaking, especially when incorporating AI. Can you walk us through what your company Rumble Studio does?

Rumble Studio is an online tool, is a SAS that helps creators, agencies, and companies create audio content, like podcasts, much more quickly, easily, and affordably. The unique part of our solution is that we don't run live interviews like we're doing now but asynchronous interviews, which are, as a brand, take-turns interviews. Essentially you set the questions upfront, send an invitation to your guest or guests, you can interview as many people as you want at the same time, and then the guests are interviewed by Rumble Studio automatically in their own time.

You say Rumble Studio interviews them; can you elaborate on that?

We're a young startup, building this in phases right now. We're at phase one, which means that you can write and record the questions into Rumble Studio and then send them. The experience right now for the guests is a type questionnaire form. Still, with the addition of the audio capture, a guest can click the public link or the link that you shared that goes straight into Rumble Studio, they read the question, or they can optionally listen to the host actually speak out, so there's more feeling and detail in the audio, and then they can record the answer. When they're happy with that, they can move to the next question; if they're not, they can delete, redo, and build the question up in multiple parts. They can go make a cup of tea, think about the answer, and then come back and record it again. There are a number of advantages, but they go through it question by question.

So the core of it is recording audio from the guest, but there's also the ability to capture text, images, videos, and we're adding more and more of these node types. Right now, it's a static type questionnaire; there are no follow-ups. As a host, you capture that audio, and when you go back into the platform, you download it or load it up into our export mix feature. You see your questions, you see the guest's answer, and you have the option to then record a follow-up comment or some kind of narrative. That's how you can simulate a real conversation. In Rumble Studio's current version, there's no ability to ask follow-up questions on the fly as the guest is recording, which is the vision for Rumble.

That was one of the things that I thought of when doing it. It's great because you can give these thoughtful responses, so I really liked that aspect, but will you be able to monitor when a guest is recording their response and then just shoot over a follow-up?

This is the AI bet that we're coming into. So phase two, and this is actually what our data scientists are working on, is a set of technologies that listen to what the guests say so that they hear the question and record their first answer. That answer is recorded in audio is then transcribed into text, and then it's run through a series of modules that we're developing to extract different features from that speech, both text form and audio form.

In text, we can do things like pull out keywords using named entity recognition and analyze those; we can identify the topics in general that they're talking about using topic modeling. We can also analyze the audio and do emotion detection, sentiment analysis, figuring out how they're feeling. We can measure the speed at which they're speaking to coach them to speak more quickly or more slowly. We build up all these characteristics on how they said it and what they said, and then we can build the decision-making part of that, which then decides how to react.

So the follow-up isn't done by the host; it's done by the AI?

Absolutely. That's the plan for Rumble, to be able to generate a follow-up question in real-time and put it to the guests. So to give you some examples, say a guest gives a particularly short answer, which is not great for a podcast; you can just say, "could you tell me a bit more, please?" It's super easy and secure. You probably don't even need AI to do that. Also, they could mention a keyword that's super hot in the news; maybe they mentioned will Smith, so you'll say, "what do you think about the will Smith thing?"

So these open questions lead the guests to talk a bit more, and in that way, you can capture more for the same amount of effort, bearing in mind you've just written one question into Rumble and just sent them a link. At this point, Rumble is doing it all for you. They're asking these follow-up questions, capturing more audio, and simultaneously making the experience more dynamic for the guests, so they don't think it's just a questionnaire as it asks them on the spot stuff they hadn't prepared. It makes it more spontaneous, which improves the audio for the listener.

It's really unique, considering the way podcasts are done now. I've been deep-diving into API and creating ways to automate workflows, and this technology seems to pair perfectly with that. After you send over the question list, if you automate it properly, your podcast is done and published, and you don't even have to do anything else. That's just incredible. Still, what kind of organizations and individuals are you seeing this technology helping the most today?

We're running experiments with the three main segments, which are: individual creators, influencers; marketing agencies who want to create audio content for their customers as quickly and cheaply as possible, as well as offer new innovative forms of content to their customers; and brands who want to create branded podcasts for themselves, or audio segments that they can convert into videos for social media or put on their website. Generally, a lot of brands are thinking about podcasts these days because podcasts are super hot.

So we're going to come back to audio and dive a lot more into everything that you're doing in this space, but I want to touch briefly on your time spent in China, where you lived for seven years. Can you tell us a bit about that experience and what initially made you leave the UK, where you're from?

I graduated in science, and then I did a few years in management consultancy in the city of London, which was great, but not really the job for me; that was the push factor. The pull factor towards China was a number of things; my friend had moved there and suggested doing a startup; he said it was super cheap to live over there. This was back in 2009, so China was a little bit different back then. Then a number of things happened in my personal life which made me want to have a career change. I realized working in the big city wasn't for me. So that was the reason why I moved over there, and I don't regret it for a second because it was a fantastic experience.

What is the entrepreneurial environment like? I guess it's probably a bit different now, and you can talk about that if you know anything about it, but when you went over there, what was the entrepreneurial environment like for startups?

There was a lot of energy; there's a lot more going on than I thought. I was more exposed to the small Western community of startup entrepreneurs than to the enormous burgeoning community of Chinese startup entrepreneurs. I think they mix a lot more now, but they were a little more separate when I was there. There's no comparison in terms of size. I think starting a startup in China is a very difficult thing, and anyone who does it successfully hats off to them because China is very protective of the way they do business.

I know there are a lot of horror stories of Westerners that do set up a business over there, and as soon as they achieve a level of success, the government just swoops in and steals it off of them and moves it over to a Chinese partner. So, you have to set up your business with a Chinese partner. They won't let you just create a business on your own and then just extract value from their country like that. The support mechanisms, compared to what I've found to be in France, London, or the US, were just not there.

When I was there, they didn't follow through. We actually did a startup competition; we got second place in this hackathon weekend, which is cool. The district of Xi'an said they would support us, but then the meetings that followed just weren't organized, and it fell apart. It was clear that they weren't particularly committed to the process, and there just wasn't the support framework that I've discovered in Europe, to be honest. So it was very difficult.

So what kept you there for those seven years then?

I did a number of different things. When I first got there, I started with my friends in this translation marketplace. We were complete noobs; we didn't know what we were doing around startups. There wasn't a huge amount of support as well. So we were just winging it and learning on the fly, but it takes time. You've got to invest in making the mistakes, and so this dragged on, partly because we hired a very small agency to start building the code, which I think is error number one. You really need a CTO who can actually build it in the company.

It was taking so long that we set up a second thing, which was an iPhone app as my friend's idea. It was a simple iPhone app for healthy eating that would help people track their fruit and veg consumption. There were three of us; we've put in a thousand dollars each and six months to build it. We got this cool freelancer to build the code, a really amazing designer from the US that did an incredible job with the graphics. It all just came together because it was a much smaller, more manageable project. We got a really high-quality product out. It ended up getting featured by Apple on the iTunes store, which back then was probably a lot easier than it is today. It got many downloads, and we were really excited. Then we got contacted by a company who offered to buy it, and we sold it for a considerable sum.

It gave us loads of confidence about entrepreneurship. It was possible and gave us some money, which goes a lot further in China to be able to continue doing the first startup, which ended up eating a lot of that money up, which we ended up abandoning after four years. So the first idea lasted four years, and the second startup idea lasted about a year, and it was much more profitable. Then I joined an American startup called Gather Health, an incredible experience because I got to work as a product manager there and go through a full startup journey. They were funded, and I was working in a team of 30, split between China and India. That was my career in China.

I got there a little bit late. The bigger changes happened in the decades before that. So I think if you really want that kind of feeling of being in a country that's going through massive changes and there's a lot of opportunities, go to one of the other countries that are developing more quickly. I hear about Vietnam, for example; maybe you'll have more of a senior experience there today, as I did in 2009 in China.

So, if you could give someone a piece of advice if they wanted to go to China, would you recommend it today? What would that advice be?

I don't know if I'd recommend it to them, not to have the same experience I did anyway. I would absolutely recommend visiting, and I would say, if you do, you need to go with someone who speaks Chinese and is a proper guide who knows the city and can take you around. Still, I don't know about living there now because the prices have gone way up, the government is way harsher on a lot of things, and there isn't that kind of feeling of an economy in a society changing as quickly. I mean that they're changing, of course, and developing quickly, but not in the same way as 2009.

What was the city that you were based in?

Beijing.

I know you've been living now in France since 2016. Why did you change to France specifically?

For a number of reasons, in Beijing, I met my now wife, Veronique, who's French. I have French citizenship because of my family history; I've got dual nationality, and Brexit was a bit of a problem at the time as well, but coming to France was super easy. I wanted to study, so I went back to school, and universities in France are excellent and much cheaper than in the UK. I ended up doing a two-year data science master's in France for a fraction of the cost it would be in the UK, and I wouldn't even want to think what it costs in the US.

I'm curious; you say you have dual nationality. Did you speak a little French? Did you pick up Mandarin? How has the language barrier traveling to such proud non-English speaking countries like China and France?

They got that reputation, but to be honest, especially in the people I meet in the circles like the startup world and the tech world, everyone speaks really good English. It is hard for a Brit or an English speaker to improve their French. Maybe it sounds like an excuse, maybe it is, but it's hard work to improve your French when everybody can speak English if you want. So you just speak English because when you're in a conversation, you start in French, then they immediately realize that it would be easier in English.

Right. You cannot learn another language if you're English because one, they want to practice, and two, they don't want to waste their time.

Yeah, you're wasting their time in a way, just using them for language practice. So it's hard work. I know it's important to get to a certain level because it shows you're making an effort, and there are certain things like administration getting around, you do need to know a level of French, but you don't need to be a hundred percent fluent.

China was very difficult in the first two years. I'd say much harder because they really don't speak as much English. I was very reliant on my friends in the first couple of years, then I learned enough to be able to do a lot of stuff on my own, like go to the bank and shop and all that, but still nowhere near enough to be able to work.

French, you can understand a bit, but Mandarin is completely different.

Definitely, to be honest, I know people who speak fluent Chinese, but it's not fluent enough for them to be competitive in the workspace. So even though they are amazing at speaking Chinese and have invested years of their lives, they're still unable to develop their careers in the way they want. They end up moving back to the states or wherever, just because there's that barrier.

Anyways, back to audio. You've been in the voice technology space for some time now, not only founding your current company, Rumble Studio, but with your popular voice tech podcast. You're no doubt considered a leader and an innovator in the voice tech space. So, as more and more tools and software come out and cut down the traditional time and length it takes to produce and edit audio content, Rumble Studio being one of them. How do you see the growth of this industry over the next five years?

When I joined the voice tech community a few years ago, there was a lot more buzz. It was newer, and there was a lot of optimism that the voice interfaces would be the new mobile apps because you can install them as skills or enable these skills, and then they would solve a variety of your day-to-day problems, but that hasn't really transpired. Certain apps are handy, but generally, there are only two or three killer apps on a smart speaker: on-demand music; smart home, being able to turn off light switches; or access features on your phone, as you can do with Siri. Transactional tasks. Still, we're a long way off from being able to speak naturally with a device and enjoy some of the high-level benefits that were promised back then.

Today's trends are moving more towards the enterprise, which was something that didn't happen initially. It was much more consumer-focused, with smart speakers coming out and the enterprise taking a back seat. Then things have moved more towards the enterprise, especially call centers and customer service. From speaking with other experts in the field, I understand that things are moving more towards the enterprise and will continue to do so.

That's news to me. Speaking of the audio industry, you were just touching on it, but considering you went back to France to obtain a master's degree in data science, where you've learned how to code specifically for AI. How far off do you think we are to having voice assistance for the consumers that you can interact with and teach via your own voice? I wanted to emphasize the teaching aspect, such as Alexa, some of the easy, small customizations she can't do or just shuts up. How far are we off from being able to teach via voice commands a voice application?

I completely know the frustration because I've got the Google stuff set up at home, and I ask the same things every day, but it still has the same problem. It's incredible. I think we're still a way off to be honest, because there are many variables that they still need to fix. There are the different audio conditions and the different accents. I live in a bilingual household, and it's awful with French. I think also the investment in the consumer aspects has declined. I don't think the companies are investing as much now that the hype has died down, and there doesn't seem to be as much money in it. So the research will continue for sure. No doubt we will get there, but I think we're a few years off at working flawlessly, and more than that, to have a genuine back and forth, interesting conversation. I think that there's definitely research to be done.

Amazon Alexa, which was kind of the forefront leader of this space, came out almost 10 years ago, and it feels like after they released that, they left it behind, no updates. What would it require? Is it some kind of translation from your voice into coding? Because that's gotta be pretty complex to really come down to being able to teach and train a voice bot.

It's a huge stack of technologies, from being able to translate the audio of your voice into text and then to being able to interpret the text. That's where a lot of the work goes, the natural language understanding, figuring out what you mean. Not the words you said, but even with perfect words, you could mean many different things. There's a lot of nuance there, a lot of context. It needs to know who you are personally, where you are, what you were doing at the time, how you're feeling, and what it is that you were trying to achieve two minutes ago, which adds context to what you're trying to do.

There's a huge amount of stuff that we just take for granted as humans, and we just wrap all that up and understand what someone wants. Machines really struggle with that. I get disappointed from just turning on a light switch. What I want is obvious, is not ambiguous, and still, there are problems with that. So I think a lot of the hype has died down because people feel a little bit disappointed that you can't even do simple things like that reliably. It's not a reliable light switch, and it's embarrassing if you use it in front of people.

I totally get it. Many people probably aren't aware of this, but as a podcaster, we translate things for Google search to put in a post, and you can see on just a robotic translation how terrible it is. Not only miss words, but it puts abbreviations in the wrong places. It doesn't understand at all what you're saying. So I'm curious about what you're referring to and how organizations are now focusing on the enterprises? What is the main focus and transition to that enterprise?

I'm not entirely sure; it's been a while since I've covered this stuff. I haven't spoken to as many people working in the enterprise on my podcasts because I was more enthused about the consumer-type applications. I understand that you have to think about these core technologies as more than just smart speakers. They're not just consumer devices that you transact with in your home, but the natural language understanding, for example, the transcription or the emotion detection, can be used in the backend for big enterprise installations, like call centers.

To give you an example, there's a company that I worked at called Batvoice AI, which does a dashboard that helps customer service agents do their job better by measuring the emotion, the intent, and various other characteristics of both the agent and the customer that they're speaking to. They can give a live dashboard for the agent to say, "this customer's getting more irate, or this customer is more likely to buy," so they can adapt.

The same thing can be used in the IVR, the interactive voice response. That it's moving from pressing keys with these huge menus, to asking you what you want, you telling it, and then it, hopefully, accurately interpreting what you want, and directing you through to the right service. Also, that emotion detection is really useful to be able to hear whether the customer is annoyed. What you really want is for people to be circumventing your technology.

So you've talked about the emotional aspect of understanding voice, audio. How is Rumble Studio incorporating that?

We're actually working on the emotion thing right now. So you want to do basic sentiment analysis, like low emotion detection on the voice. You can classify the voice under five or six different basic emotions, and that's just one characteristic that we take into account to decide what to do. I would say it's similar to the IVR system. Still, with Rumble, the purpose is to understand the guests well enough to decide whether to give them some coaching to help them answer better or to offer a new question to move the conversation forwards.

Still, it's those two things right now, and they're the skills you have as a professional podcaster. You're always thinking, "do I need to encourage the guests more? Do I need to ask them for more information to keep this thought going? Do I need to switch to a different topic to make the episode flow better and bring new ideas into the episode?" you're always having to make decisions like, is there more to go? Is there more information to be extracted? For that, I think you need to measure a lot of different things as to what the guests are saying.

I'm going to give away all the secrets, but we leverage things like concept models as well, to be able to know hierarchies that already exist, ontologies, things like that, that actually know what the structure of ideas and concepts are. So you can go down the rabbit hole and propose questions that you can get from the bank of questions that we're developing, templated questions; also, you can look online, on search engines, and things that can show you the questions that people are interested in hearing the answers to.

So we can look at APIs that show us these questions and reformulate those into a way that works on a podcast. That's how we can move the conversation in a direction that makes sense in the discussion that has already happened, and talk about things people actually want to listen to.

That's incredible. So if in 10 years or 5 years, you implement all of the things that you want to do with Rumble Studio, that technology sounds like it could be applicable for other things as well. Do you see in 10 years all of what you want to be implemented correctly and moving over into other things?

For sure. Right now, we're focusing on podcasts because it's super hot, but there are internal podcasts, company communications for bigger companies, client testimonials, reviews. There could be recruitment interviews; politicians could use it, for example, to engage the public, ask anything or answer questions over social media. The ability to have automated conversations at scale with people has inspired the voice technology community for a long time through these smart speaker-type applications.

Still, I think more and more will be used for engagement with communities not necessarily to answer customer service questions but just to create engagement to provide these new interactive experiences between brands or individuals and their audiences. At the same time, because these are automated, you can record that content and be able to use it for your content marketing. This is the thing that I'm really excited about, being able to create content interactively with your audience.

In that same light, you said that audio is the most effective content for building authority, trust, and leads. Can you elaborate on why audio specifically is so effective at doing those things?

Audio is the most intimate form of content. You're directly whispering your words into the person's ear over a long period of time. That's a very influential method of communicating with someone; it's very personal. It gives people the time to absorb a lot of ideas, hear all the richness of the tone of your voice and think about it at the same time. Because there's a pressure, they're consuming; they're processing. It's a much more active experience than watching TV, where you are staring at it, having these images wash over you, and maybe some of the words are going in or not. Podcasting is much more active, and that's why I think it's great for educating audiences and inspiring people because you're really getting the message across, and that's what brands and individuals want to do. They want to communicate their ideas and influence other people, and I think audio is superb at that.

It's such a good point what you said about being more interactive even than TV because it does not let your imagination run; it just tells you pretty much what you're supposed to imagine, whereas audio allows your brain to fill in that picture. So, as more and more businesses adapt to this audio content model for advertising and brand awareness, do you see the effectiveness, authority, and trust that we were just talking about taking a U-turn?

Do you mean the more that brands use audio content, the more full of ads it's going to become? Is that what you're saying?

Social media, Facebook or Twitter today, have a lot of advertisement mindsets in the sense of subliminal messaging, and we were just talking audio content has this authority for trust and leads in for the individual consumer. Does that take a turn if we get to the point of Facebook or Twitter?

That's something that brands need to watch out for. Branded podcast, which is a certain type of podcast created by the brand themselves, as opposed to them putting ads on other people's podcasts, which is right now a 1 billion-plus dollar industry in the US alone. That exists, and it's growing fast, putting us on other people's podcasts, but another aspect that is also growing very fast is creating your own podcasts. One of the golden rules here is to not stuff it full of ads about yourself, not to talk about yourself too much, because nobody wants to listen to a 30-minute or an hour-long ad. It completely breaks that trust and that intimacy. Like when you know you're just being sold to.

People want content, they want to learn something; they want value from you. You've got a lot more to gain as a brand or as an influencer by just providing that content and building that trust consistently over time and not stuffing it full of ads. Because that feeling that you generate in people will stay with them, and when they come to think of their need for a certain product or advice on something, they will naturally look to you. It's at that point that you can influence them. So you have to be very careful. It's not to say you can't put your brand on it at all, or links or ads or anything, but you have to be very careful with people's time and respect that.

With the rise of those branded podcasts, as opposed to just advertising thrown into podcasts. Do you see a rise in Hollywood-style productions, where almost a book is being blown into your ears? Because a lot of people, instead of publishing books, are just going straight to audio, and there's such a cool use case where half of what you get in a movie is the sounds and the audio that comes through. So, I feel that's a great opportunity for people like GE. They don't even have to use their advertisement telling you to come by GE, but they make a cool story.

Totally. We just listened to one, actually. In Rumble, we've got a podcast club where we listen to a podcast every week, and then we get together as a team just to discuss it for half an hour on a production and a content level. It's really helpful because we get exposed to all these different types of podcasts, and one of them we listened to recently was Hypnopolis 2, by BMW. It's really cool because it's exactly what you just said, it's felt, it sounds like a feature film. They've obviously spent a fortune on it. The sound design is unbelievable, and it's interactive. The second series is interactive, so you can play with it either on a smart speaker or on the web. You choose your own adventure where you hear this amazing chapter; then you decide where you're going to go, and you have something related to that.

That's incredible. That's like a video game almost.

Exactly, things are emerging, and that's a really good point because the world of advertising, promotion, experiences, games, and content, they're all merging together. You get to a point where you really can't tell if it's one thing or another, and in the case of Hypnopolis, they don't mention BMW at all. There's not a single mention of it; even the intro is just pure value, but you know it's made by BMW because it's on their website. So it rubs off; it builds the feeling about the brand, that they're futuristic and all that.

Well, that's what I'm hoping for in the future. Considering audio space is almost like a black hole for search engines, I feel it's this blank canvas in which these audio creators, podcasters, and Rumble Studio, have an opportunity to really create outside of what the web currently is today. Because it doesn't have the monopoly in which our search engines filter you through how they want. Do you think that, as a community, podcasters have responsibility for the proper implementation of branded podcasts, interactive voice adverts, and nano-casting?

I think there's definitely an ethos amongst the audio committee, the podcasting community, to produce good quality podcasts and not stuff them full of ads. There's also rising competition, so if you are producing a poor-quality podcast, you're not going to get the attention because there are just so many things to choose from. The thing that I would encourage the most is for people to try experimenting and not just fall in and do what everyone else has done before.

This is one of the key things I want from everyone at Rumble Studio, to be innovative, to try new formats with the tools that we're creating or with the tools that already exist, but not to produce more of the same. We encountered that a lot of people who are not directly connected to the audio world have a very fixed view of what a podcast is; it's basically people talking to each other on a mic. Still, audio is like video; think about the number of different types of TV shows and films and types of online videos out there; there's no comparison. A YouTube video is nothing like a Hollywood movie, but it's all video.

Audio should be seen in the same way, not just as two guys talking to each other on a mic. You can do anything with audio, just like BMW did. You can do a full feature, and there are many others, nano casting, micro casting. You can do a two-minute daily update, and that can be habit building; it can be addictive. It can be useful to keep up to date with stuff. There are just so many possibilities with audio, and it's much earlier than video as art. So, the doors are wide open for creators, startup innovators, and anyone else to come along and just invent these new formats.

I'm curious because I got into podcasts hunting for educational content, so the last thing in the world I want is to see it become something that's just ads all the time.

I completely agree. I think podcasts are the antithesis to all that clickbaity performance marketing-type stuff that you get online. It's slower; you give time to let the ideas develop, etc., but it is a time-consuming process. So if we can develop tools that allow this content to be produced more efficiently by a wider variety of people, then we can bring more people into the space and get more ideas into audio. That is a good that we can bring to humanity by having more people create valuable content, rather than all these clickbait blogs that don't really offer any value.

So Rumble Studio is an innovator in that space.

I would like to think so. I would like to think that asynchronous today maybe isn't as dynamic as the conversation we're having right now, but in its place, it allows the guests to really think about the ideas they want to present. It allows them to re-record, to really get everything out. It also allows people who would never be invited to a podcast or would want to appear on a podcast, to have a voice on audio channels, including people who just couldn't, like people with physical disabilities. With asynchronous, you can record audio with a mic or synthesize audio with a synthetic voice or a voice clone, which are good enough to listen to in audio content. So once you make this framework of asynchronous conversations, you can plug in these different modules and open it up to a much wider variety of people.

Just that contemplation that you were saying where a guest actually has an opportunity to sit and think about it makes it fair. I spend a long time researching and creating scripts, writing out what I want to say, and then the guest just has to come up with an answer on the spot.

Bouncing back to the growth of podcasting, the US has a huge number of listeners compared to the rest of the world. I saw a few months ago about a third of its citizens, over 110 million, listened to podcasts, and worldwide it's only about 400 million. Understanding language barriers yourself, how do you see one being able to capture this growth in other countries, in places like Europe, South America, China, and India, without being able to speak those languages. Is there a technology in any sense that will be able to take that voice and then translate it into languages for those other consumptions?

I would love to see Rumble Studio being able to do that one day, especially with synthetic voices. In the asynchronous way, you can plan out an interview, and then you can invite not just different guests to record answers but different hosts to be able to record answers in different languages. So that's possible today if you really set it out correctly, you can already benefit from planning the episode. I suppose you could translate those questions, just using an online translation, and then get native speakers to actually speak them.

Still, when you bring synthetic voices into the mix, you could actually do that all yourself. Imagine you write your 10 questions, you translate them into Japanese and Korean and everything else online, then send them out, and you get those answers back. Granted, you probably don't know what the guest is saying. You aren't going to review it as much, but Rumble Studio does transcribe. So if you get the transcription, you could then translate that back to English, and then you could release that content. So it's possible with asynchronous and synthetic to be able to release podcasts in foreign languages.

I'm assuming this is a bit further out, but how far off do you think we are to be able to take the conversation that me and you are having right now and then translate it with our voices into another language?

Yeah, I think we're a little ways off from doing it all in real-time, but there are definitely models. For example, Veritone, a company that does speech to speech technologies, can change the tone of your voice from one to another, but it can also change your voice into a Spanish accent. So you can read Spanish in a Spanish accent, said in the way a Spanish person would say it. You would speak English, and then you translate that into Spanish. Text to speech will be able to read that Spanish, not just in a Spanish voice, but in a voice clone of you, speaking Spanish. I think it uses transfer learning or something like that to be able to take the characteristics of a native Spanish speaker but the vocal characteristics of your personal voice and put them together. So now it's got Spanish-speaking Hugh authentically.

There's an influencer called Bryan Barletta from Sounds Profitable that we're working with. He's using Rumble Studio to do the Pod Scape, interviews of different companies in the podcasting space. He recently did another experiment doing exactly that, translating his voice into Spanish using Veritone, and the results were very convincing. Still, this is an offline process, but Veritone, I think they do have a real-time component to that technology. So at some point, maybe all of that can be done on the fly, and you could be speaking English into your mic, and the person on the other end could be hearing perfect Spanish a couple of seconds later.

I know in the UN they have just translators that speak over, correct? That would be so ideal because my view of a future world, especially with travel, is you have your little AirPods in, and then you go around and speak to somebody, and it's translated.

Oh, for sure, the Babel Fish.

Exactly. It's really cool that they have it, I thought it was a hypothetical question, but they actually have it already implemented and created today.

It's coming. I haven't seen a real-time one where it goes back and forth. That's the next level, but offline, you can do it. I think one of the things that need to improve is the amount of voice data that you're required to train one of those voices, because perhaps right now, you need to spend a few hours in front of the mic, reading certain scripts in order for them to capture all the different syllables and everything; also the cost, because they have to run it through their graphic cards and machine learning architecture, which is quite costly, but every year this becomes lower and lower, the barrier to training or own voice.

I've trained my voice through at least three or four companies, which was just a few hundred bucks, but every one of them has something wrong with the voice. They either sound exactly like me but are very monitored, or they've got the rhythm of the voice but the fluctuations are off. I haven't found one that's perfect yet. With Rumble Studio, I want to be able to integrate all of these voice clones and text-to-speech companies because I think that more and more companies will train their own branded voices. They'll own it, and then they will want to come to a platform like Rumble Studio and use that voice. So we need to be able to integrate with those partners.

Excellent. I always ask a few questions at the end, if you could take all your life experiences and turn around and give somebody one piece of advice from those experiences, what would that piece of advice be?

Listen to your gut; I don't want to be cheesy, but if you feel like something's not working for you, change it, don't ever continue doing something that you think you should do or that you had originally decided was your life plan. Just look at the evidence and if you feel like there's something that you need to change your life, then just do it.

I agree with that completely. Where can people reach out to you or get in contact with Rumble Studio?

You can go to our brand new website: rumble.studio, or you can sign up for our newsletter rumble.studio/newsletter. If you want to give the product a try, you can sign up through the site, or you can book a demo off me at rumble.studio/demo, I'll be happy to run you through it.

Sweet. Carl Robinson. Thank you so much, man. I appreciate your time.

Asynchronous Audio Technology w/ Carl Robinson

Recent Posts

Comments

Toarc United