Created: 1 Apr 2025, last update: 1 Apr 2025
Content Migration Tip 3 - Handling Illegal Characters in Sitecore Serialization
This is Part 3 of my series on Sitecore CLI serialization. While the previous posts covered duplicate item names and clone issues, this time we focus on illegal characters in item names, a problem that can disrupt the serialization process.
Not all characters are allowed in a Sitecore item name. While this is configurable via the InvalidItemNameChars and ItemNameValidation settings, certain characters cause unexpected issues during serialization. The slash (/) character is a well-known problem, but surprisingly, even the tab character (\t) can cause issues when using file serialization via the Developer tab in Sitecore. Oddly enough, the tab character is not considered illegal by default in Sitecore. However, CLI serialization can handle the tab character without problems.
Even if illegal characters don’t immediately break content migration, replacing them proactively prevents future issues.
The Slash (/) Character Issue
When an item name contains a /, Sitecore should replace it with # when serializing, resulting in a filename like Sample#Item.yml. However, an issue arises where instead of replacing the /, Sitecore creates a directory named Sample and places the serialized item inside another directory named Item, leading to filesystem integrity issues. This issue is not detected on the first serialization pull but will be discovered in subsequent pulls.
Example of the issue:
PS C:\projects\itemnametest> dotnet sitecore ser pull
[master] [A] /sitecore/content/Home/Sample/Item (6ee553d2-f96e-40ba-b130-a0d6b2872a3a)
[master] Discovered 1 changes after evaluating 194 total items.
[master] Applying changes...
[master] Changes have been applied to serialized items (1 subtrees).
[roles] Discovered 0 changes after evaluating role list.
[users] Discovered 0 changes after evaluating user list.
PS C:\projects\itemnamtest> dotnet sitecore ser pull
[master] [/sitecore/content/home] INVALID PATH: Item ~/Sample/Item did not match parent item path,
[master] [/sitecore/content/home] ~/
[master] [/sitecore/content/home] > MANUAL FIX: Item path in ~\Home\Sample\Item.yml should be updated to
[master] [/sitecore/content/home] /sitecore/content/Home/Item
Discovered changes will not be applied.
Filesystem integrity validation found errors in serialized files! Use the validate command to fix these.
The Tab (\t) Character Issue
When using built-in serialization via the Developer tab in the Sitecore Content Editor, the tab character can cause a crash. Below is an example of the error recorded in the logs:
ManagedPoolThread #1 21:06:48 INFO Serializing master/sitecore/content/Home/Sample Item tab
ManagedPoolThread #1 21:06:48 ERROR Exception
Exception: System.Reflection.TargetInvocationException
Message: The target of an invocation has caused an exception.
Source: mscorlib
at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)
at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)
at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at Sitecore.Reflection.ReflectionUtil.InvokeMethod(MethodInfo method, Object[] parameters, Object obj)
at Sitecore.Jobs.JobRunner.RunMethod(JobArgs args)
at (Object , Object )
at Sitecore.Pipelines.CorePipeline.Run(PipelineArgs args)
at Sitecore.Pipelines.DefaultCorePipelineManager.Run(String pipelineName, PipelineArgs args, String pipelineDomain, Boolean failIfNotExists)
at Sitecore.Pipelines.DefaultCorePipelineManager.Run(String pipelineName, PipelineArgs args, String pipelineDomain)
at Sitecore.Jobs.DefaultJob.DoExecute()
at Sitecore.Abstractions.BaseJob.ThreadEntry(Object state)
Nested Exception
Exception: System.ArgumentException
Message: Invalid characters in path.
Source: mscorlib
at System.IO.Path.CheckInvalidPathChars(String path, Boolean checkAdditional)
at System.IO.Path.IsPathRooted(String path)
at Sitecore.Data.Serialization.PathResolver.MapItemPath(String itemPath, String root)
at Sitecore.Data.Serialization.Default.DefaultItemStorageProvider.GetFileIdentifier(Item objectToStore)
at Sitecore.Data.Serialization.Default.DefaultItemSerializationManager.DumpItem(Item item)
at Sitecore.Shell.Framework.Commands.Serialization.DumpItemCommand.Dump(Item item)
A High-Performance Approach to Detecting and Fixing Illegal Characters
To handle this efficiently, I developed a PowerShell script that scans for illegal characters and replaces them in item names. Like my script for detecting duplicate items, this script uses a direct database query for maximum speed and efficiency. This ensures that no items are missed due to search index filtering. Since SQL does not track item paths, the script processes the entire content tree and filters by path only when retrieving items.
This method works even for large databases, handling over a million items is not a problem, assuming that only a few thousand items contain illegal characters.
It checks for invalid characters like / \ : ? " < > | [ ] and hidden issues like tab (\t), which can break Sitecore’s built-in serialization.
Using [Sitecore.Data.Items.ItemUtil]::ProposeValidItemName(), this Sitecore API suggests a valid itemname. ProposeValidItemName respects the Sitecore configuration settings for invalid characters, specifically the InvalidItemNameChars setting in Sitecore.config. Be sure this includes all characters you want to remove, otherwise, update the script to replace them manually (e.g., tab).
The path to scan is configurable (default: /sitecore/content), so you can adjust it if you need to include additional sections, such as the Media Library.
Import-Function -Name Invoke-SqlCommand
# Configurable path
$targetPath = "/sitecore/content/"
$masterconnection = [Sitecore.Configuration.Settings]::GetConnectionString("master")
$query = @"
SELECT ID, Name, ParentID, TemplateID
FROM Items
WHERE Name LIKE '%'+CHAR(9)+'%'
OR Name LIKE '%/%'
OR Name LIKE '%\\%'
OR Name LIKE '%:%'
OR Name LIKE '%?%'
OR Name LIKE '%"%'
OR Name LIKE '%<%'
OR Name LIKE '%>%'
OR Name LIKE '%|%'
OR Name LIKE '%[%'
OR Name LIKE '%]%'
"@
$results = Invoke-SqlCommand -Connection $masterconnection -Query $query
$results | ForEach-Object {
$itemtocorrect = Get-Item -Path "master:" -ID "{$($_.ID)}"
if ($itemtocorrect.Paths.FullPath.StartsWith($targetPath)) {
Write-Host $itemtocorrect.Paths.FullPath
$itemName = [Sitecore.Data.Items.ItemUtil]::ProposeValidItemName($itemtocorrect.Name)
$itemName = $itemName -replace "\t", " "
Write-Host $itemName
$itemtocorrect.Editing.BeginEdit()
$itemtocorrect.Name = $itemName
$itemtocorrect.Editing.EndEdit()
}
}
This might be a simpler issue compared to duplicates or clones, but it occurs frequently enough to earn its place in this series. With this script, you have a fast and efficient solution that can handle large databases. I hope it saves you time in your migration projects!
Links
Content Migration Tip 1 - Handling Clones in Sitecore Serialization
Content Migration Tip 2 - Handling Duplicates in Sitecore Serialization
Sitecore content migration - Part 1: Media analysis
Sitecore content migration - Part 2: Media migration
Sitecore content migration - Part 3: Converting content