Under the Hood

CHAPTER 01 / 10 Discussion

Podcast Discoverability and Technical Indexing

The podcast "How to Get Discovered" introduces its technical episode, focusing on how transcript indexing works for search engines. The hosts establish a rule to define all technical terms and acronyms to ensure listener comprehension. The core question for the episode is why a properly structured transcript page differs significantly from a transcript pasted into a regular show notes page in the eyes of a search engine.

how to get discovered· podcast discoverability· technical indexing· show notes· transcripts

00:00 Welcome back to How to Get Discovered. I'm Maya And I'm Tom! HTGD is the show where we argue about how podcasts get found Last week was the synthesis episode and Tom said it was the most useful conversation We'd had which, I have transcribed for my records Don't! I've transcribed it Today's episode is the technical one Under The Hood – how transcript indexing actually works Why search engines treat a transcript page differently from show notes page. What makes the transcript good for search? And what makes the transcripts wall of useless text I want to set a rule for this episode Set the rule The rule is, every time you use technical word that's not in normal English, i get to make you explain it

00:48 No three-letter acronyms without a translation. No phrases like structured data without a definition If a listener has to look something up to follow the conversation, we've failed That's a fair rule It's a fair rule I'll try...I may slip Tom will catch me I will catch you And this is gonna come up later I have something to admit at some point in this episode Oh good I'm setting it up early so you can be patient I'll be patient Let's get into it. Okay, I want to start with the question that is actually the technical heart of the season which is why is a transcript on a properly structured page different from the same transcript pasted into a regular show notes page? Because they're not the same to a search engine! Right but why not? To a human reader they look identical – same words, same content… maybe

01:47 Maybe the same length. If you scroll past both, they look like text with a podcast player on top To a human reader? Sure! To a search engine what's different? To a search engine almost everything And the difference isn't visible from the outside It's in the underlying structure of the page The bits that the search engine reads that the human doesn't see And the question is whether those invisible bits actually matter They do. A lot! And I'm going to spend the next half hour explaining why, in terms you can actually use. So the first thing to understand is that when a search engine looks at a webpage — Google, Bing, the AI Search ones… all of them — it doesn't see the page the way YOU see it. You see a layout – you see headings and paragraphs and a player and maybe a sidebar with links to other episodes

CHAPTER 02 / 10 Discussion

Search Engine Page Interpretation and Metadata

Search engines like Google and Bing interpret webpages differently than humans. Instead of visual layout, they see a document with an underlying structure, including titles, headings, body text, and metadata. Metadata, defined as "data about data," provides search engines with crucial information about the page's purpose, such as whether it's a podcast episode, article, or recipe, enabling specialized indexing and display in search results.

search engines· webpage structure· metadata· Google· Bing· AI search

00:48 No three-letter acronyms without a translation. No phrases like structured data without a definition If a listener has to look something up to follow the conversation, we've failed That's a fair rule It's a fair rule I'll try...I may slip Tom will catch me I will catch you And this is gonna come up later I have something to admit at some point in this episode Oh good I'm setting it up early so you can be patient I'll be patient Let's get into it. Okay, I want to start with the question that is actually the technical heart of the season which is why is a transcript on a properly structured page different from the same transcript pasted into a regular show notes page? Because they're not the same to a search engine! Right but why not? To a human reader they look identical – same words, same content… maybe

01:47 Maybe the same length. If you scroll past both, they look like text with a podcast player on top To a human reader? Sure! To a search engine what's different? To a search engine almost everything And the difference isn't visible from the outside It's in the underlying structure of the page The bits that the search engine reads that the human doesn't see And the question is whether those invisible bits actually matter They do. A lot! And I'm going to spend the next half hour explaining why, in terms you can actually use. So the first thing to understand is that when a search engine looks at a webpage — Google, Bing, the AI Search ones… all of them — it doesn't see the page the way YOU see it. You see a layout – you see headings and paragraphs and a player and maybe a sidebar with links to other episodes

02:40 The search engine sees, at root, a document. A document with a structure. Some bits are titles, some bits are headings, some bits are body text, some bits are metadata about the page itself, and some bits are signals about what the page is for. Stop! Define metadata? Right. Metadata is data about data — in this case, data about the page. The page has a title. The page has a description. The page has tags that say what kind of page it is — Is it an article? Is it a podcast episode? Is it a recipe? Is it a product? Those tags are metadata and they're written in the page in a way that the human visitor doesn't see but the search engine does

03:28 Okay, so a page can tell a search engine I am specifically a podcast episode page rather than just I am some page. Exactly. And that turns out to matter a lot because the search engine can then treat it as a podcast episode page, it can show it in different ways and search results, it can pull out the transcript and surface specific moments, it can show with the episode in podcast-specific carousels and panels none of which it can do if it just sees a generic page This is the structured data thing you promised you'd say for this episode? This is the structured data thing

CHAPTER 03 / 10 Discussion

Structured Data and Schema.org for Podcasts

Structured data, following standards like Schema.org, allows webpages to communicate their specific content type to search engines. For podcasts, this means marking up a page to explicitly state it's a podcast episode, including its title, description, duration, and transcript. This structured approach helps search engines understand the content without guessing, leading to better legibility for both machines and AI systems, rather than just higher ranking. Most show notes pages lack this due to hosting platform priorities.

structured data· schema.org· podcast episodes· search engine optimization· hosting platforms

03:28 Okay, so a page can tell a search engine I am specifically a podcast episode page rather than just I am some page. Exactly. And that turns out to matter a lot because the search engine can then treat it as a podcast episode page, it can show it in different ways and search results, it can pull out the transcript and surface specific moments, it can show with the episode in podcast-specific carousels and panels none of which it can do if it just sees a generic page This is the structured data thing you promised you'd say for this episode? This is the structured data thing

04:06 Should I define structured data? Yes, please. Structured data is the bits of the page that follow a specific format that search engines have agreed on There's a standard called schema dot org which is basically a vocabulary You can mark up your page to say this is a podcast episode Here's its title here's its description here's it's duration here's its host here's the show it belongs to here's the transcript All in a format the search engine recognizes. So instead of guessing what the page is, it knows! And this is something you add to the page? Something you—or more realistically, the platform that publishes the page on your behalf—adds. Most show notes pages don't have it. Most transcript pages —the kind I keep banging on about—do. Why don't show notes pages have it?

04:59 Mostly because they're generated by hosting platforms that didn't prioritize it. The page exists to host an embedded player and a bit of text, nobody bothered to mark it up as a specific kind of page with specific properties so Google sees a page with a player in some texts and doesn't know what it is It indexes it like a blog post that happens to have audio on it Okay. And the version of the page that has the markup gets treated differently? Gets treated very differently, shows up differently in search results is more likely to be surfaced for specific queries is and this is the bit that increasingly matters more readable to AI systems that are building their understanding of what podcasts exist on what topics So it's not about ranking higher exactly It's about being legible

CHAPTER 04 / 10 Discussion

Transcript Structure and Addressable Moments

An unstructured "wall of text" transcript, even if accurate, is seen by search engines as one undifferentiated block, limiting its discoverability to broad topics. A structured transcript, however, breaks content into sections with headings, paragraphs, and crucial timestamps. These timestamps create "addressable moments," allowing search engines to index specific parts of an episode, enabling users to land directly on a relevant segment when searching for a particular question, rather than just the episode page.

transcript structure· timestamps· search indexing· specific moments· freelance topics

05:50 It's about being legible. That's a really good way to put it The unstructured page is legible to human and illegible to machine The structured page is legible to both Now, the second thing A transcript on a page isn't the same as a transcript that works And the difference is mostly about structure Imagine you've got an hour-long podcast. You run it through a machine transcription service, and you get back a text file. The text file is — let's be generous — about 10 thousand words… maybe 12. It's mostly accurate—the speakers are roughly attributed. Sounds reasonable! Now imagine you paste that text file onto a web page under an embedded player

06:35 What does a search engine see? 10,000 words of text. Right! But more specifically...10,000 words of text with no structure—no headings, no paragraphs in any meaningful sense… probably no timestamps… probably no clear topic breaks….just a wall Okay. To a search engine, that wall is — and I'm going to oversimplify here — but this is roughly true. That wall is one big undifferentiated noun. It's a page that's about something… But the search engine can only get the broadest sense of what. It might pick up some keywords... it might notice that freelance appears a lot… but it can't tell which paragraph is about rate negotiation and which paragraph is about chasing invoices And the structured version?

07:24 The structured version breaks the transcript into actual sections, with actual headings, with actual paragraph breaks and crucially with timestamps that link to specific moments. So the section about rate negotiation has a heading that says Rate Negotiation. This section has time stamps. This section can be linked to directly. There's a URL that takes you to that specific moment. Stop! Why does that last bit matter? Because the search engine doesn't just want to index, this page is about freelance topics. It wants to index this specific moment in this episode is the answer to this specific question which it can only do if the moment has its own address, its own URL So instead of one page about freelancing The same episode becomes what dozens of addressable moments

08:19 That's exactly right. The episode becomes, let's say 15 or 20 distinct moments each with its own URL each indexable independently Each able to surface for its own query So when somebody Googles how do I negotiate a freelance rate they don't land on the show's homepage They don't even land on the episode page. They land on a specific moment in the specific episode where you talk about rate negotiation, and the page plays starting from that moment." That's a different shape of result! It's a completely different shape of result… And it's only possible because the transcript is structured — The wall-of-text version of the same content can't do any of this

CHAPTER 05 / 10 Discussion

Search Engine Indexing Mechanics and Ranking

For a transcript page to appear in search results, Google must know the page exists, crawl and index it, and then rank it as a good answer to a search query. Submitting a sitemap helps Google discover pages. Crawling depends on the domain's historical usefulness and authority. The page's structure, metadata, transcript quality, speed, and inbound links all contribute to its ranking, though listeners remain unaware of this complex process.

search engine indexing· crawling· ranking· sitemap· domain authority· Google

09:04 This is the bit where I think I actually want to ask the question. I came in to ask go how does the indexing? Actually happen like mechanically The transcript exists, the structure exists, the timestamps exist What's actually happening between that page existing on the open web and that page showing up in a search result Right so Google and the others crawl the web They have programs that visit pages, read them and feed them back into an index. The index is basically a giant database that knows which pages contain which words and which topics. When somebody searches Google looks in the index and ranks the results

09:49 For a transcript page to show up in a search result, three things have to happen. One, Google has to know the page exists Two, Google has to have crawled it and indexed it Three, when somebody searches, Google has to decide that this page is one of the better answers to that search And each of those steps is its own problem. Each of those steps is it's own problem. Step one is easier than it used to be, you submit a sitemap which is a list all your pages to Google directly. Step two depends on Google deciding the page is worth crawling which depends on whether the domain has been useful in the past which goes back to the domain authority stuff

10:34 Step three is where everything we've talked about all season actually matters. The structure, the metadata, the transcript quality, the page speed, the links pointing at the page—all of it! And the listener doesn't see any of this? The listener doesn't see any of this. The listener types a question, gets a result, taps the result, lands on a podcast... They have no idea any of this happened Okay, now I want to bring it back to the domain conversation. Because we did the philosophical version of this in episode 2 – whose house are you building? And now I want to do the technical version. Embraced Why does the same transcript on the same machine hosted on two different domains perform differently in search? Because the domains have different histories

CHAPTER 06 / 10 Discussion

Domain Authority and Podcast Transcript Hosting

The domain where a podcast transcript is hosted significantly impacts its search performance. Domains with a long history, reputable links, and strong engagement signals accumulate "authority" over time. New pages on high-authority domains rank faster and higher. When transcripts are hosted on a podcast platform's default subdomain, the show benefits less from this authority, as it's spread across all podcasts on that platform. Hosting on a custom domain builds authority for the podcaster's own property.

domain authority· search performance· podcast transcripts· hosting platforms· subdomains

10:34 Step three is where everything we've talked about all season actually matters. The structure, the metadata, the transcript quality, the page speed, the links pointing at the page—all of it! And the listener doesn't see any of this? The listener doesn't see any of this. The listener types a question, gets a result, taps the result, lands on a podcast... They have no idea any of this happened Okay, now I want to bring it back to the domain conversation. Because we did the philosophical version of this in episode 2 – whose house are you building? And now I want to do the technical version. Embraced Why does the same transcript on the same machine hosted on two different domains perform differently in search? Because the domains have different histories

11:25 different histories, different what we called authority in episode two. And the way that translates technically is Google has been crawling some domains for 15 years. It has data on every page they've ever published. Some of those pages have been linked to from other reputable places. Some of them have been read by humans for a long time engagement signals all of that builds up And when a new page goes up on that domain, it inherits the credibility. So…a new page on a high-history domain ranks faster than the same page on a new domain? Faster and higher! This is the bit that's relevant for the podcaster—when your transcripts are hosted on a hosting platform's default subdomain you don't get any of that. The hosting platform has a domain. The platform's domain has authority

12:18 But that authority is spread across all the podcasts they host. Your specific show on its specific subdomain is a tenant, not a property owner. Right! Whereas if the transcripts live on your own domain—yourshowname dot com slash episode slash whatever—every page that goes up there is building authority for your domain, not somebody else's. Years from now, your domain has its own history. Its own credibility. Its own search performance." This is the CNAME conversation again… This is the CNAME conversation again...from episode 2 But I want to do the technical version now because the question I left hanging in episode two was how is this page actually served?

13:04 Mechanically, how does the listener type archive.myshowname.com slash episode slash whatever and end up on a page hosted by somebody else? Walk me through it When somebody types a URL into their browser the first thing that happens is a DNS lookup DNS domain name system is roughly the phone book of the internet It translates archive dot my show name com into the address of a server A CNAME record in your DNS is a kind of redirect. It says, when somebody looks up archive dot my show name dot com send them to a different address the address of the service hosting your transcripts So the URL the listener sees is yours The actual server they hit is somebody else's Exactly And here's the important bit The search engine doesn't really care about the server it cares about the URL

CHAPTER 07 / 10 Discussion

CNAME Records and Domain Ownership for SEO

A CNAME record in a domain's DNS acts as a redirect, allowing a custom URL (e.g., `archive.myshowname.com`) to point to a server hosted by a third-party service. While the actual server is external, the URL seen by the listener and, crucially, by search engines, remains the podcaster's own domain. This ensures that search authority and compound interest over time accrue to the podcaster's domain, rather than the hosting platform's.

CNAME record· DNS lookup· domain ownership· URL· search engine indexing· authority accrual

12:18 But that authority is spread across all the podcasts they host. Your specific show on its specific subdomain is a tenant, not a property owner. Right! Whereas if the transcripts live on your own domain—yourshowname dot com slash episode slash whatever—every page that goes up there is building authority for your domain, not somebody else's. Years from now, your domain has its own history. Its own credibility. Its own search performance." This is the CNAME conversation again… This is the CNAME conversation again...from episode 2 But I want to do the technical version now because the question I left hanging in episode two was how is this page actually served?

13:04 Mechanically, how does the listener type archive.myshowname.com slash episode slash whatever and end up on a page hosted by somebody else? Walk me through it When somebody types a URL into their browser the first thing that happens is a DNS lookup DNS domain name system is roughly the phone book of the internet It translates archive dot my show name com into the address of a server A CNAME record in your DNS is a kind of redirect. It says, when somebody looks up archive dot my show name dot com send them to a different address the address of the service hosting your transcripts So the URL the listener sees is yours The actual server they hit is somebody else's Exactly And here's the important bit The search engine doesn't really care about the server it cares about the URL

14:01 The URL is yours, so the page indexes against your domain. The authority accrues to your domain — the compound interest over years is yours. And without that? You're just renting! You're just renting… and there's a related thing. Once the transcripts are on your domain, you can connect the domain to Google Search Console. We talked about this in episode 2 but I want to do the technical version now too. Go Search Console is Google's free tool. It tells you, for any domain you control which queries are bringing people to your pages which pages are being clicked which pages are appearing in search results but not being clicked

CHAPTER 08 / 10 Discussion

Google Search Console for Podcast Discoverability

Google Search Console is a free tool that provides critical data for any domain owner, including which search queries lead to their pages, page click-through rates, average search positions, and pages appearing in results but not clicked. This data is invaluable for understanding discoverability and optimizing content. However, this tool is only accessible for domains a user can prove they control, meaning platform-hosted transcripts cannot leverage this data directly.

Google Search Console· domain control· search queries· impressions· click-through rate· podcast metrics

14:01 The URL is yours, so the page indexes against your domain. The authority accrues to your domain — the compound interest over years is yours. And without that? You're just renting! You're just renting… and there's a related thing. Once the transcripts are on your domain, you can connect the domain to Google Search Console. We talked about this in episode 2 but I want to do the technical version now too. Go Search Console is Google's free tool. It tells you, for any domain you control which queries are bringing people to your pages which pages are being clicked which pages are appearing in search results but not being clicked

14:44 what your average position is for any given query. It is, without exaggeration the most useful piece of free software a podcaster who cares about discoverability can use and you can only use it for domains you can prove you control Which means platform-hosted transcripts You can't connect them to Search Console The data exists Google is generating it but you just can't see it. That's the bit that bothered me most in episode 2 I remember, it bothered you because it was the part where there was actual money on the table or actual data anyway Right PodHerd on the higher tier has the search console integration set up so you can plug your domain in and they show you the metrics directly which is, I'll be honest, the thing that made the biggest difference for me

CHAPTER 09 / 10 Discussion

PodHerd Integration and Personal Experimentation

One of the hosts admits to setting up a personal podcast feed on PodHerd's starter tier, which offers Search Console integration. This move, described as "begrudging experimentation," aims to test the discoverability claims made throughout the season. The host plans to gather three months of data to determine if the strategies discussed, particularly regarding metrics and domain authority, prove effective.

podherd· search console integration· podcast metrics· personal experiment· begrudging conversion

14:44 what your average position is for any given query. It is, without exaggeration the most useful piece of free software a podcaster who cares about discoverability can use and you can only use it for domains you can prove you control Which means platform-hosted transcripts You can't connect them to Search Console The data exists Google is generating it but you just can't see it. That's the bit that bothered me most in episode 2 I remember, it bothered you because it was the part where there was actual money on the table or actual data anyway Right PodHerd on the higher tier has the search console integration set up so you can plug your domain in and they show you the metrics directly which is, I'll be honest, the thing that made the biggest difference for me

15:36 Not because the metrics changed anything about what I made, but because for the first time I could see what was actually happening. Which episodes were earning impressions? Which queries were bringing people in? Which episodes were ranking but not being clicked which probably meant that title or the description wasn't doing its job. Okay This is the admission. Oh! I told you i'd have one. Tell me. I set up my own feed on PodHerd about three weeks ago Tom... I did the thing You did the-? I did the thing on the starter tier because I'm not yet ready to admit I'm convinced enough to use the CNAME version, I'm using their domain podherd dot com slash my show which I'm fine with for now Tom

16:30 I'm not converted. I'm experimenting You are converted You're experimenting because you're convinced! I am experimenting because I want to see if the things you've been saying for seven episodes are actually true And the cheapest way to find out is...to try it That is the most begrudging way anybody has ever told me they did a thing I'm a begrudging man How long are you gonna give it? Three months, three months and then I'll have data And then either I'll have to admit you were right, or I'll have ammunition for the rest of my life. Either way you'll have data. Either way I'll have data! This is genuinely a big moment for the show... Don't make a big deal of it!! I'm making a big deal of it. The whole season has been pointing at this. I'm sorry I'm peaking in episode 8… It's the right episode for it. Next week. Next week is the Data one. Compounding

CHAPTER 10 / 10 Discussion

Season Conclusion and Future Data-Focused Episode

The hosts conclude the episode, highlighting the significance of the personal experiment. The next episode will focus on "compounding" and "long tail traffic," exploring how a podcast's back catalog performs over years, a topic now directly relevant to the host's ongoing data collection.

compounding· long tail traffic· back catalog· podcast experiment· data analysis

16:30 I'm not converted. I'm experimenting You are converted You're experimenting because you're convinced! I am experimenting because I want to see if the things you've been saying for seven episodes are actually true And the cheapest way to find out is...to try it That is the most begrudging way anybody has ever told me they did a thing I'm a begrudging man How long are you gonna give it? Three months, three months and then I'll have data And then either I'll have to admit you were right, or I'll have ammunition for the rest of my life. Either way you'll have data. Either way I'll have data! This is genuinely a big moment for the show... Don't make a big deal of it!! I'm making a big deal of it. The whole season has been pointing at this. I'm sorry I'm peaking in episode 8… It's the right episode for it. Next week. Next week is the Data one. Compounding

17:34 the long tail traffic story. What happens to a back catalog over years? I want to see that one because i'm now mechanically the subject of that question for myself. You're the subject of the experiment! I'm the subject of my own experiment which i am not pleased about you're delighted i'm not delighted Open. Open is a big move from where you started! Don't get used to it. Thanks for listening to How To Get Discovered, we'll see ya next week! See ya next week!

Clip Generation