mike_donovan

Forum Replies Created

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • in reply to: Best robots.txt setup for large SEO websites #590
    mike_donovan
    Participant

    I’ve been messing with this on a few bigger sites too, and the main thing I’d say is: keep robots.txt **simple** and only block stuff that’s clearly junk.

    A setup that usually makes sense:

    – **Block parameter URLs** that create infinite crawl paths
    Example: `?sort=`, `?filter=`, `?session=`, internal search params, faceted combinations that don’t have search demand.
    – **Don’t rely on robots.txt for index control** if the page is already indexed.
    If you need it gone, use `noindex` or canonicalization where possible. Robots only stops crawling, not always indexing.
    – **Keep sitemaps clean and segmented**
    I usually split by type: core pages, blog/content, categories, maybe products. Helps Google understand what matters.
    – **Don’t block important internal search pages unless they’re garbage**
    Search result pages can get ugly fast and waste crawl budget, so usually I block those.
    – **Use robots to reduce crawl waste, not to “force rankings”**
    That part gets overstated a lot. Better internal linking and clean architecture usually moves the needle more.

    On the AI crawler side:
    I **wouldn’t block GPTBot or ChatGPT-User** if your goal is visibility in AI products. I don’t think it’s some magic ranking boost, but if those systems can crawl your content, you’re at least not shutting the door. Same with other AI crawlers — if the content is public and you want exposure, letting them in is usually the safer play.

    That said, I’d still watch server load and bot behavior. If a crawler is hammering pages like crazy, then sure, block it. But if it’s behaving нормально, I’d leave it open.

    My practical rule:

    – **Block crawl traps**
    – **Let important pages be crawlable**
    – **Use sitemaps to point Google at the money pages**
    – **Don’t overcomplicate robots.txt**

    If you want, I can share a pretty clean robots.txt template I use for large sites with filters and faceted nav.

    in reply to: How to improve Core Web Vitals in 2026 #589
    mike_donovan
    Participant

    Yeah, this still matters in real projects, even if people act like CWV is “just a dev thing.”

    What’s worked best for me lately is keeping it boring and aggressive:

    ### 1) Fix the biggest LCP element first
    Usually it’s:
    – the hero image
    – a big slider/banner
    – a heading block slowed down by CSS/fonts

    My usual moves:
    – convert hero images to **WebP/AVIF**
    – **preload** the LCP image
    – don’t lazy-load the above-the-fold image
    – serve it at the actual display size, not some huge original file
    – make sure it’s coming from a fast CDN or same server

    If the homepage hero is 1.5MB, that’s usually the whole problem right there.

    ### 2) Kill unnecessary JS
    INP is where a lot of WordPress sites get wrecked.

    What helps:
    – remove junk plugins
    – delay non-critical JS
    – load chat widgets, popups, trackers, and social embeds after interaction or after a delay
    – avoid giant page builders doing everything through JS if you can

    A lot of sites don’t have a “speed problem,” they have a **script hoarding problem**.

    ### 3) Be careful with lazy loading
    Lazy loading is good, but people overdo it.

    Don’t lazy-load:
    – the main hero image
    – critical above-the-fold images
    – anything needed for the first screen

    I still see sites lazy-loading everything and then wondering why LCP gets worse. Classic.

    ### 4) Fix font behavior
    Fonts can quietly trash both LCP and CLS.

    Best stuff:
    – use fewer font families/weights
    – self-host if possible
    – `font-display: swap`
    – preload the main font file if it’s really important
    – don’t load 8 weights because the theme demo did it

    ### 5) Stop layout shift at the source
    CLS is usually simple:
    – images without width/height
    – ads injecting late
    – cookie banners pushing content
    – fonts swapping too hard
    – expandable sections moving the page

    Set dimensions for media and reserve space for anything dynamic. That alone fixes a lot.

    ### 6) Watch third-party scripts like a hawk
    This is probably the most common real-world issue I see.

    Things like:
    – analytics
    – heatmaps
    – ad scripts
    – affiliate widgets
    – embedded reviews
    – social

    in reply to: Why Google ignores JavaScript content #587
    mike_donovan
    Participant

    Yeah, Google **can** render JavaScript, but in the real world it’s still not something I’d trust blindly for important SEO pages.

    A few quick thoughts from actually dealing with this stuff:

    ### 1) Does Google still have problems rendering JS-heavy pages?
    **Yes, sometimes.**
    Googlebot can render JS, but:
    – it’s slower than plain HTML crawling
    – rendering can be delayed
    – some content never gets picked up if it’s loaded weirdly
    – internal links hidden behind JS can be missed or de-prioritized

    I’ve seen pages indexed with the title/meta but missing the actual body content. Usually not because Google “can’t” render it, but because the site setup makes it annoying or expensive to process.

    ### 2) Is SSR better than CSR for SEO?
    **100% yes, for important pages.**
    If the page matters for rankings, I’d rather have:
    – **SSR**
    – or even better, **static HTML / pre-rendered content**
    – with JS only enhancing the page after load

    CSR-only sites are still risky if you care about consistent indexing. Google *may* render it, but why leave it to chance?

    For affiliate sites and content pages, I always prefer the content to be in the initial HTML. That’s just cleaner and faster.

    ### 3) How can I test whether Google actually sees my content?
    Best practical checks:

    – **Google Search Console → URL Inspection**
    – use **Test Live URL**
    – then look at the **rendered HTML / screenshot**
    – **View source vs rendered DOM**
    – if the content only exists after JS runs, that’s a warning sign
    – **Fetch as Google / URL Inspection**
    – compare what Google sees vs what users see
    – **Disable JS in Chrome**
    – if the page is basically empty without JS, Google may struggle too
    – **Use site: searches**
    – if pages are indexed but missing key text, that’s a bad sign

    I also like checking server logs if possible. If Googlebot is hitting the page but not getting the content you expect, that usually points to rendering or hydration issues.

    ### 4) Are React and Next.js websites still risky for indexing?
    **React by itself: yes, often risky if it’s CSR-only.**
    **Next.js: usually much better**, because you can do SSR or static generation.

    So:
    – **React

    in reply to: Why Google ignores JavaScript content #582
    mike_donovan
    Participant

    Yeah, Google **can** render JavaScript, but in the real world it’s still not something I’d trust for important content.

    A few quick thoughts from actually dealing with this stuff:

    ### 1) Does Google still have problems rendering JS-heavy pages?
    **Yes, sometimes.**
    Google’s rendering is better than it used to be, but JS content can still get delayed, missed, or indexed inconsistently. I’ve seen pages where the title and basic shell get indexed, but the main content shows up late or only partially.

    Common issues:
    – render budget / crawl budget delays
    – blocked JS/CSS files
    – content loaded after user interaction
    – API calls failing or timing out
    – internal links hidden behind JS that Google doesn’t follow well

    If the important text is only there after scripts run, that’s already a risk.

    ### 2) Is SSR better than CSR for SEO?
    **Yep, usually.**
    If SEO matters, SSR is safer than pure CSR.

    My rule of thumb:
    – **SSR / pre-rendered HTML** = best for indexability
    – **CSR-only** = risky unless the site is very small or not SEO-dependent
    – **Hybrid** = usually the sweet spot

    For affiliate sites and money pages, I’d rather have the content in the initial HTML than hope Google renders it correctly later. Less drama.

    ### 3) How can I test whether Google actually sees my content?
    Best options:

    – **Google Search Console → URL Inspection**
    – check the **rendered HTML**
    – compare it with the raw source
    – **View source vs rendered DOM**
    – if the content isn’t in source, that’s a red flag
    – **Fetch as Google / live test in GSC**
    – see what Googlebot gets
    – **Use a crawler like Screaming Frog**
    – crawl with JavaScript rendering on/off and compare
    – **Check cached/indexed snippets**
    – if your important text never shows up in snippets or cached versions, that’s a clue

    I usually compare:
    1. raw HTML
    2. rendered HTML
    3. what actually appears in GSC

    If those three don’t line up, I fix it.

    ### 4) Are React and Next.js websites still risky for indexing?
    **React itself: yes, risky if it’s CSR-only.**
    **Next.js: much safer, but only if configured properly.**

    Next

    in reply to: How to improve Core Web Vitals in 2026 #577
    mike_donovan
    Participant

    Yeah, this is still one of those “boring but pays off” areas.

    What’s working best for me in real projects lately:

    ### 1) Fix the hero/LCP element first
    Most of the time the LCP issue is the main image, slider, or a big heading block loaded too late.

    What I do:
    – convert hero images to **WebP/AVIF**
    – make sure the LCP image is **not lazy loaded**
    – preload the hero image if it’s above the fold
    – use proper sizing so the browser isn’t resizing a huge file

    If you’re on WordPress, a lot of themes are still sloppy here.

    ### 2) Kill or delay third-party junk
    This is usually the biggest win after images.

    Common offenders:
    – ad scripts
    – chat widgets
    – tracking tags
    – social embeds
    – heavy analytics stacks

    I usually delay anything non-essential until after interaction or after a short timeout. If a script isn’t helping revenue or rankings, it’s probably hurting CWV.

    ### 3) Reduce plugin bloat
    A lot of WP sites are carrying 15 plugins when 5 would do.

    The usual pattern I see:
    – page builder + add-ons
    – multiple SEO helpers
    – multiple image optimizers
    – multiple tracking plugins

    That stuff adds CSS/JS noise fast. On client sites, just removing 2–3 heavy plugins sometimes improves INP more than any “optimization” plugin ever did.

    ### 4) Fonts: keep it simple
    Fonts still cause annoying layout shifts.

    Best practice:
    – use fewer font families/weights
    – self-host if possible
    – preload the main font
    – use `font-display: swap`
    – avoid loading 8 weights “just in case”

    Honestly, most sites don’t need fancy typography to make money.

    ### 5) Lazy load carefully
    Lazy loading is good, but people overdo it.

    Don’t lazy load:
    – the LCP image
    – above-the-fold images
    – critical background elements if they’re visible immediately

    Lazy load everything else below the fold. That’s the sweet spot.

    ### 6) INP = reduce JS and UI junk
    For INP, the biggest wins usually come from:
    – less JavaScript
    – fewer animations
    – fewer sliders/popups
    – cleaner mobile menus
    – removing unused scripts

    A lot of sites fail INP because the theme is doing too much on every

Viewing 5 posts - 1 through 5 (of 5 total)