There are those features that you always have to implement when coming towards the end of the project and think about going live.
robots.txt is one of them.
In general the robots.txt is responsible for telling Search Engines what parts of the website to index and which ones not. That does not mean that Search Engines care. If you want to read more: https://developers.google.com/search/docs/crawling-indexing/robots/intro
Luckily Headless SXA provides that feature so you don't have to implement something yourself.
Within the Settings Item of your site you find the Robots content field in the Robots section. Here you can enter basically anything.
When the field is blank it show the following when calling your site/robots.txt
User-agent: * Disallow: / Sitemap: http://xmcloudcm.localhost/sitemap.xml
When adding a string to the field: such as "This is my robots content" it will show like this in your website after saving the item:
This is my robots content Sitemap: http://xmcloudcm.localhost/sitemap.xml
As you can see, the default is overwritten by the values I provided. Only the reference to the sitemap.xml is kept.
Note: This is just an example and not useful robots content.
When not running locally, but in a cloud setup don't forget to publish the item so it becomes effective on your rendering host.
The Service caring about returning the content from the field can be found following this path:
\src\Project\Sugcon\SugconAnzSxa\src\pages\api\robots.ts
(Code Example taken from https://github.com/Sitecore/XM-Cloud-Introduction)
This is configured here:
\src\Project\Sugcon\SugconAnzSxa\src\lib\next-config\plugins\robots.js
(Code Example taken from https://github.com/Sitecore/XM-Cloud-Introduction)
Created: 4.10.2022
XM Cloud NextJs JSS SXA Headless SXA