In this episode of the Cherryleaf Podcast, we explore an intriguing development in content creation: AI-generated audio from technical documentation.
We’ve prepared two examples to showcase this technology, and we discuss its potential impact and applications.
Transcript
Speaker 1
This is the Cherryleaf podcast. Hello and welcome to our latest episode. In this episode, we’re going to be exploring an interesting development in content creation, and that is AI generated audio from technical documentation. We’ve prepared 2 examples to showcase how this can be used, and we’ll also dive into the potential impacts and situations where it might be applied. We’re talking about a product from Google called NotebookLM.
So let me give you some background. In September 2024, Google announced that NotebookLM would be officially available globally and available in over 100 languages. Previously to that it was only available in the USA. If you’ve not heard of NotebookLM, it’s a free AI powered research assistant that allows people to interact with what it calls trusted source content, your own content to be able to get interesting insights from the content that you or your organisation might have.
And the way that it works is you begin by uploading your source content into NotebookLM, and that can be meeting transcripts, book quotations, documents, spreadsheets, chapters from novels, corporate documents and so on. And when you’ve uploaded those documents, NotebookLM then processes them, and it can then transform that content into a variety of different outputs.
It also has a chatbot facility, so you can start to ask questions about your content. And it will give you answers. But in this situation what we want to do is look at some of the content that it can create for you. And it can create things like outlines, study guides and more.
The reason for creating these new documents, according to Google, is to generate new ideas and what it calls connect the dots, summarise your sources into briefing documents, FAQ’s, study guides so that. You become more creative and that you can brainstorm and start to make connections between ideas.
Now let’s talk about the feature that’s causing a stir, and that is the audio overview capability, because NotebookLM lets you listen to a conversation based on your source material. And some people have realised that it could be used to generate podcasts.
I first heard about this on Twitter, where there were several users who were sharing their experiences. Let me quote from what they posted to Twitter. Here are a few reactions. Ethan Mollick, who is very well known as an AI expert wrote:
Google’s NotebookLM is the current best “wow this is amazing & useful” demo of AI Here I gave it the entire text of my book, it turned it into a podcast, a study guide, FAQ, timeline & quite accurate chat Listen to the first few minutes of the “podcast.” Seriously, just listen.
And Ozgur Ozer tweeted.
I just tried Google’s NotebookLM and it was mind blowing. I only gave it the link to Cursor from and it generated an entire podcast in just a couple of minutes. Now I’m seriously thinking about starting a podcast with it. Despite a few glitches, it’s really impressive and most of the time you can’t even tell it’s AI. That’s just perfect.
And Ryan Morrison shared
The @GoogleAI NotebookLM is an incredible tool. I gave it a dozen of the most recent pre-print research papers on the TRAPPIST-1 planetary system, PDF versions of the Wikipedia entries for those systems and a NASA overview. In minutes it was able to summarise them and create a 10-minute conversational podcast.
As I mentioned, we decided to test this out ourselves, so let me tell you about what we found. We started by uploading the user manual for Audacity, which is an open source audio editing application. Its manual is under Creative Commons, so it’s perfect for testing. And we wanted to see what notebook could generate without any other additional input, just the user guide.
So one of the things they produced was an FAQ. It generated an FAQ, and the FAQ had eight topics within it. I’ll go through the eight that it created. Those were
What is Audacity and what are its key features?
Second one was how do I install Audacity?
Third was What’s the difference between open and import?
4th was how do I record audio using Audacity?
5th was How can I adjust the volume of specific sections in an audio track?
6 was How do I remove unwanted sections from a track?
7 was What’s the purpose of silence function?
And the final one was How do I get more assistance?
So there’s a list of the types of questions that people are likely to ask. I think that’s not a bad list. I know FAQ’s can be controversial because often they’re not frequently asked, and often the answers aren’t actually answers to the questions.
It also produced a study guide, which was essentially the series of questions to test you on your knowledge, and then the answers.
And the questions it asked were more conceptual things like: what is sound and how do we hear it? Explain the concept of sample rates and differentiate between PCM and compressed audio.
That might not be so useful for us, but we’ll talk about that a little bit later.
The third thing it created was a briefing document. NotebookLM created a summarised briefing document that contained the basics about sound, digital audio and Audacity’s functionalities. And finally, the fourth thing we got it to generate was the audio overview.
So let me play a clip of that now.
Speaker 3
Ever record something and just think, man, I wish I could cut out that awkward cough.
Speaker 2
Or leg when you’re trying to layer, you know vocal stylings over a smooth jazz track.
Speaker 3
This is why we have Audacity, and thankfully our listeners sent in the manual for Audacity.
Speaker 2
The ultimate guide.
Speaker 3
The ultimate guide. So we’re diving into the Audacity manual. It is free, which is wonderful. Let’s unlock the secrets.
Speaker 2
It’s free.
Speaker 3
Of this powerful audio editing software.
Speaker 2
I think that free part is what gets a lot of people initially interested, but then what’s so fascinating is how many people stick around because it is so powerful and it’s become this favourite tool, not just for hobbyists but for independent creators, yeah.
Speaker 3
Yeah.
Speaker 3
Absolutely. And it all starts with understanding this thing called digital audio. And I will say. The manual jumps right into sound waves, microphones, which is all well and good, but you know, I’m thinking get to the part where I can make my dog.
Speaker
Yeah.
Speaker 2
Of course, yeah.
Speaker 3
Sound like? Barry White. That’s all I want.
Speaker 2
Exactly I hear. You and honestly grasping the basics of digital audio unlock so much of what makes Audacity really tick. So like take sample rate for example. The manual mentions it’s how frequently Audacity measures a sound wave. But what does that mean for us in the real world?
Speaker 1
OK.
Speaker
Sure. OK.
Speaker 2
You know.
Speaker 3
That’s the part where I’m always like, what does that mean for me? Yeah. Hit me with the why should I care?
Speaker 2
OK. Yeah. Think of it like this. Imagine you’re streaming a movie, a higher sample rate is going to be like streaming it in HD.
Speaker 3
OK. OK.
Speaker 2
It’s that crystal clear picture where you’re catching every single detail. Lower sample rate think. Old, blurry YouTube videos, you know, they get the job done, but you’re losing some of the nuances.
Speaker
Right.
Speaker 3
So higher sample rate, better audio quality, but probably bigger file size.
Speaker 2
Yes. You got it.
Speaker 1
I’ll halt it there.
The audio file in total is 11 minutes long. We’ve posted a version of it onto YouTube on the Cherryleaf channel on YouTube. We’ll provide a link in the show notes if you want to listen to the whole thing.
So that’s all generated by AI. The ums, the ahs, in the conversation, the anecdotes and so on.
We wanted to see how far we could push it. Next, we wanted to see how it handled an OpenAPI specification file. We chose the Transport for London’s API specification file. And this relates to the TfL’s API, which provides access to real time transport data like timetables and status updates across the different modes of transport that are available in London.
So just using that JSON file of the endpoints and methods. NotebookLM generated an audio overview. So let me play a part of that.
Speaker 2
Ever get that feeling? Like, uh, you’re just scratching the surface of something like there’s this whole other level of understanding just waiting to. The you know, unlocked.
Speaker 3
I know that feeling.
Speaker 2
Today, we’re diving deep, really deep into a data set that’s just got layers upon layers of potential. We’re talking about Transport for London’s unified API.
Speaker 3
The data that powers those apps we all use, and I’m telling you when the next bus is going to show up, but it’s way.
Speaker
Yeah.
Speaker 3
More than just that. Oh, way more.
Speaker 2
And that’s where our guide comes in. This document TfLAPI.TXT. It’s a bit, shall we say, dense, but that’s why we’re here to break it all down.
Speaker 3
Think of this document as a road map for developers. It shows you all the data TfL makes available and how to actually access.
Speaker 2
It so we’re going beyond just using a transportation app here. We’re talking about understanding the nuts and bolts of how those apps even.
Speaker 3
Work exactly at the heart of it all is the API. That’s application programming interface. It’s how your app actually talks to the TfL servers.
Speaker 2
OK, so for someone like me not exactly a coding whiz. Can you give me like a real world example of this conversation in action?
Speaker 3
OK, so imagine you’re a city planner, right? And you’re really focused on pedestrian safety.
Speaker 2
Important stuff.
Speaker 3
This API gives you crazy granular data. Like down to the second, how long a pedestrian crossing stays lit versus the timing of traffic flow at a specific intersection?
Speaker 2
Wow, that’s that’s hyperlocal.
Speaker 3
Right. And with that data and now you can pinpoint areas that need adjustment, maybe that crosswalk timer needs to be longer during rush hour?
Speaker 2
So we’re talking about making those tiny tweaks that could actually make the city safer.
Speaker 3
Exactly. That’s the power of this API. Raw data transformed into actionable insights that can lead to real improvements for everyone.
Speaker 2
Now that’s what I call a deep dive. So we’re talking about way more than just like bus timetables here, right? Right. What kind of stuff? What kind of data can we actually find in this API?
Speaker 3
It’s kind of mind blowing, actually. Hmm. You’ve got real time data feeds, historical records, going way back. We’re even talking hyper local stuff like air quality readings from sensors near bus stops.
Speaker 2
OK, now you’ve got my attention.
Speaker 1
And we’ll stop it here.
If you want to listen to the full 6 minutes again, that’s available on YouTube.
So just to restate, that audio file was created only using the Open API specification or Swagger file.
After recording the podcast, there was an update to NotebookLM, so we’re adding this into our original recording.
And what has happened is that you can now add public YouTube URLs and audio files and websites to your project.
And let me read from Google’s blog.
Since launching, we’ve continued to add support for a wide range of source materials using the multimodal capabilities in Gemini 1.5. Today, you can now add public YouTube URLs and audio files directly into your notebook, alongside PDFs, Google Docs, Slides, websites and more.
In our early testing, people are using these new source types in interesting ways:Analyzing videos and lectures: When you upload YouTube videos to NotebookLM, it summarizes key concepts and allows for in-depth exploration through inline citations linked directly to the video’s transcript. It’s great for comparing perspectives across multiple sources on a specific issue, and you can view the videos inside NotebookLM with the embedded YouTube player.
Making connections within audio recordings: You can streamline team projects by adding audio recordings and having NotebookLM search across the transcribed conversations to locate specific information, eliminating the need for listening to long audio files for the important nuggets.
Creating study guides: You can transform class recordings, handwritten notes and lecture slides into comprehensive study guides with a single click. These automatically generated guides consolidate all of the key information for convenient access.
We did another experiment, another test with the Audacity project that we developed. And in addition to the Audacity manual, we added some YouTube videos from a variety of different people who’ve done videos on things like tutorials, tips, how to make a professional podcast.
And we also included a link to the Audacity Forum to see what would happen.
We asked it to create a user guide on using Audacity to create professional sounding podcast. It generated a 1 to 1/2 page document wasn’t particularly useful or good.
We asked it to tell us about some of the bugs in Audacity, and it summarised the various bugs that have been reported in the Audacity Forum, so that was quite useful for looking through a large set of content and summarising the information.
And we asked it to create a tutorial on how to create professional podcasts with Audacity. And this time what it generated was much more useful than our request for a user guide.
We ended up with probably about two pages of content, but quite useful in giving advice on what to do and tips and tricks for configuring the system.
We also got it to generate an audio file and I’ll play a little bit of that. I think the result was better with the extra content, but there lies a problem with all of this. In that these YouTube videos were not created by Audacity. They were created by individual people, and that raises the question, what about copyright? The videos are copyrighted under presumably standard YouTube licences. Are we able to take those transcripts and summarise them and repurpose them? Or are we breaking the copyright rules by taking other people’s intellectual property that they’ve created and put onto YouTube and using them in NotebookLM?
Let’s play and extract from the audio file it created when it had this additional source content of YouTube videos and the Forum in addition to the Audacity manual.
Speaker 2
Hey everyone and welcome back. Ever feel like your audio editing could use like a boost? Umm, yeah. Today we’re taking a deep dive into Audacity. OK. And trust me, there’s more to this free software than meet.
Speaker 3
Right.
Speaker 2
The eye. We’ve got a mountain of resources here. YouTube tutorials from the pros forum threads bursting with insider tips. Even the official Audacity manual.
Speaker 3
Yeah. Oh wow. And the timing is perfect. The Audacity community is buzzing about the latest version. Oh, really? Yeah. So we’ll make sure to highlight all the exciting new features that are getting people talking.
Speaker 2
Oh. Very cool, very cool. You know, it’s clear you’re someone who loves to find those hidden gems, those power user tricks that can really elevate your work. And that’s what this deep dive is all about. Uncovering the secrets of Audacity to help you edit like a pro.
Speaker 3
Ah. You think of this as distilling the collective wisdom of the most experienced Audacity users out there.
Speaker 2
OK.
Speaker 3
We’ll be focusing on those practical techniques and efficient workflows that can transform your editing process.
Speaker 2
Love it. So let’s jump right into it. One thing that really stood out to me was a tutorial by Mike from Music Radio Creative. He was raving about Audacity’s noise reduction, calling it extremely well made. Now we all know how important clean audio is, right?
Speaker
Hmm.
Speaker 3
Absolutely, whether it’s background, hiss, hums or even those clicks and pops unwanted. Noise can really distract your listeners. This is where Audacity shines, yeah.
Speaker
Yeah.
Speaker 2
OK. So Mike actually walks through the entire process step by step. First, you capture a noise profile, basically a sample of the unwanted sound. Then you use the spectral view to visually target the specific frequencies you want to remove. It’s like having an X-ray for your audio.
Speaker
Right. Hmm.
Speaker 3
The spectral view is a perfect example of how Audacity has evolved beyond a basic editor. This level of control used to be exclusive to like expensive professional software. Wow, with a bit of practise you can isolate and remove even the most stubborn noises, leaving your audio crystal clear.
Speaker 1
And let me end it there.
I think that’s an improvement on the 1st audio file that it created.
You might notice some similarities in the style and voices used between the Audacity and the TfL examples. That will be interesting to see if Google offers more customisation options in the future, so the tone can change and the voices perhaps.
These examples raise some important questions. Is this technology useful for technical communication, developer relations and marketing, or a gimmick?
And we must point out that it did make some mistakes. It said “open appy” instead of OpenAPI. With the TfL API later on in the recording, it incorrectly claims that you could predict future accident numbers using historical data. And I’m not necessarily sure if the API the TfL API does actually give information on traffic lights and how fast they operate for.
But on the whole. I think this tool is useful, particularly as a first draft generation tool. And it could especially be useful for creating content for blog posts. Or developer-focused materials. If we’re talking about the API.
And the FAQ could also perhaps be used as web copy on the site, addressing the key questions that people might want to know about a product.
And the study guides have potential for online courses for some testing, maybe some multiple choice questions. That type of thing.
Now NotebookLM isn’t the only tool that can do this.
Steve Metcalfe tweeted about using Claude, another AI tool, to generate similar results. He used Claude to create a custom prompt that could generate a podcast script from his source content, and then he used another app called from Elevenlabs to generate the audio. With Elevenlabs, you can generate 30 minutes of audio content for $5.00 a month.
We did do some other experiments with NotebookLM. For example, asking it to analyse conversations in a forum to see if it could do some statistical analysis. And it wasn’t as good as ChatGPT on doing more mathematical or statistical type analysis.
Its strengths seem to lie in large bodies of contents requiring hundreds of thousands of tokens. That’s the way that AI measures content. It seems to be good in that scenario.
In summary, this tools, these AI tools do have some real potential for repurposing the technical documentation content that technical writers create, and applying it using it for marketing, Developer Relations or training purposes.
The content is possibly is good enough to publish as is, but it’s probably safer to use it as a starting point that you take it and then polish the content. Maybe steer it in a unique direction so that you avoid the content sounding if it’s audio, or even sounding if its written texts are too similar to other content that NotebookLM is generating.
But what do you think? Let me know your thoughts. You can contact us by e-mail. It’s info@cherryleaf.com if you’re interested in the training courses we have on using AI in technical communication and our services in writing content for APIs, then you can again contact us or look at our website. It’s www.cherryleaf.com. Thank you for listening.
Leave a Reply