TL;DR
I simply launched generate-sitemap 1.10.0, a GitHub Motion for producing XML sitemaps for static web sites. The generate-sitemap GitHub Motion is carried out in Python, and generates an XML sitemap by crawling the GitHub repository containing the html of the positioning, utilizing commit dates to generate <lastmod>
tags within the sitemap.
This launch, generate-sitemap 1.10.0, introduces an choice to specify a listing of directories and/or particular person information to exclude from the sitemap. The Motion already robotically excluded particular person html information from the sitemap if a noindex directive to robots was specified within the head of the web page, in addition to exclusions primarily based on contents of the positioning’s robots.txt. This launch provides the power to specify extra paths to exclude from the sitemap. The motivating case got here from a function request from a consumer who wished to have the ability to exclude a listing of content material frequent throughout a number of pages from the sitemap (e.g., the pages that rely on that content material must be in sitemap, however not essentially the shared html).
Changelog 1.10.0 – 2023-11-15
Added
- Capability to specify record of paths to exclude from sitemap, by way of new enter
exclude-paths
.
Dependencies
Extra Data
Please take into account starring generate-sitemap’s GitHub repository:
Generate an XML sitemap for a GitHub Pages web site utilizing GitHub Actions
Take a look at all of our GitHub Actions: https://actions.cicirello.org/
The generate-sitemap GitHub motion generates a sitemap for a web site hosted on GitHub
Pages, and has the next options:
- Help for each xml and txt sitemaps (you select utilizing one of many motion’s inputs).
- When producing an xml sitemap, it makes use of the final commit date of
every file to generate the<lastmod>
tag within the sitemap entry. If the file
was created throughout that workflow run, however not but dedicated, then it as a substitute makes use of
the present date (nonetheless, we suggest if potential committing newly created information first). - Helps URLs for html and pdf information within the sitemap, and has inputs
to regulate the included file sorts (defaults embody each html and pdf information within the sitemap). - Now additionally helps together with URLs for a consumer specified record of
extra file extensions within the sitemap. - …
For extra info, see my earlier submit about generate-sitemap right here on DEV, in addition to its webpage.
The generate-sitemap GitHub motion generates a sitemap for a web site hosted on GitHub Pages. Helps each xml and txt sitemaps. Makes use of the final commit date of every file to generate the lastmod tags in XML sitemaps. Parses robots.txt and scans html information for noindex directives, excluding URLs if noindex directives or disallows discovered.
actions.cicirello.org
The place You Can Discover Me
Comply with me right here on DEV and on GitHub:

Or go to my web site:
Vincent A. Cicirello – Professor of Pc Science at Stockton College – is a
researcher in synthetic intelligence, evolutionary computation, swarm intelligence,
and computational intelligence, with a Ph.D. in Robotics from Carnegie Mellon
College. He’s an ACM Senior Member, IEEE Senior Member, AAAI Life Member,
EAI Distinguished Member, and SIAM Member.
cicirello.org