AI recommended character set for API use
const ACCOUNT_REF_PATTERN = /^[a-z0-9][a-z0-9-]{1,61}[a-z0-9]$/// 59 chars user input + 4 char random (nanoid alphabet)const accountRef = `${userInput.slice(0, 59)}-${nanoid(3)}`// Gives 238,328 combinations (62^3)
Claude
For account references that need to work across URLs, cookies, and headers, Iโd recommend sticking to the unreserved characters from RFC 3986:
Recommended character set for account references:
A-Z
(uppercase letters)a-z
(lowercase letters)0-9
(digits)-
(hyphen)_
(underscore).
(period)~
(tilde)
This gives you 66 characters total and avoids encoding issues across all your use cases.
Characters to avoid by context:
Context | Avoid | Reason |
---|---|---|
URL paths | / ? # [ ] @ | Path separators, query delimiters |
Query params | & = ? # + | Query syntax, encoding issues |
Cookies | ; , = \ space | Cookie syntax delimiters |
Headers | Control chars (0-31, 127), high-bit chars (128-255) | HTTP header restrictions |
General | % + <space> | Percent-encoding conflicts, ambiguity |
For account labels, Iโd recommend:
- All printable ASCII characters (32-126)
- Exclude: ASCII control characters (0-31, 127)
- Include: Letters, digits, punctuation, symbols, and spaces
- This gives you readable text while avoiding terminal control sequences and non-printable characters
Regex patterns:
// Account referenceconst ACCOUNT_REF_PATTERN = /^[A-Za-z0-9._~-]+$/
// Account labelconst ACCOUNT_LABEL_PATTERN = /^[\x20-\x7E]+$/
The reference pattern is very conservative but guarantees no encoding headaches. The label pattern allows readable text including spaces while blocking problematic control characters.
Account labels with Emoji
For allowing emoji like smiley faces while maintaining security, Iโd recommend a whitelist approach with specific Unicode blocks:
Recommended Unicode ranges for account labels:
- Basic Latin (U+0020-U+007F) - Standard printable ASCII
- Emoji blocks:
- Emoticons (U+1F600-U+1F64F) - ๐๐๐๐คฃ๐๐๐ ๐๐๐๐๐๐๐๐ฅฐ๐๐๐
- Miscellaneous Symbols (U+2600-U+26FF) - โ๏ธโญโค๏ธโจโก
- Dingbats (U+2700-U+27BF) - โ โโญ
- Transport/Map Symbols (U+1F680-U+1F6FF) - ๐๐ฏ๐
- Miscellaneous Symbols and Pictographs (U+1F300-U+1F5FF) - ๐๐๐
Security considerations to avoid:
- Text direction controls (U+202A-U+202E, U+2066-U+2069) - Can cause spoofing
- Variation selectors (U+FE00-U+FE0F) - Rendering inconsistencies
- Zero-width characters (U+200B-U+200D) - Invisible characters
- Private use areas (U+E000-U+F8FF) - Undefined behavior
- Surrogates (U+D800-U+DFFF) - Can break string handling
TypeScript implementation:
// Conservative emoji-friendly patternconst ACCOUNT_LABEL_PATTERN = /^[\x20-\x7E\u{1F600}-\u{1F64F}\u{1F300}-\u{1F5FF}\u{1F680}-\u{1F6FF}\u{2600}-\u{26FF}\u{2700}-\u{27BF}]+$/u
// Or more restrictive - just emoticonsconst SIMPLE_EMOJI_PATTERN = /^[\x20-\x7E\u{1F600}-\u{1F64F}]+$/u
Additional validation:
- Length limits - Emoji can be 1-4 bytes each
- Normalization - Use
String.normalize('NFC')
for consistent representation - Homograph detection - Consider visually similar characters
Recommendation: Start with just the Emoticons block (U+1F600-U+1F64F) which covers most common smiley faces, then expand based on user needs. This minimizes attack surface while providing good UX.
The key is being explicit about what you allow rather than trying to block everything dangerous - Unicode has many edge cases that can cause security issues.
AWS Service naming considerations
Yes, AWS services have additional naming restrictions beyond general URL safety. Here are the key constraints:
AWS Resource Naming Restrictions:
S3 Buckets (most restrictive):
- Only
a-z
,0-9
,-
,.
- No uppercase letters
- No consecutive periods
- Canโt start/end with
-
or.
- 3-63 characters
IAM Resources (roles, policies, users):
A-Z
,a-z
,0-9
,+
,=
,,
,.
,@
,-
,_
- No spaces
- Path components can use
/
Lambda Functions:
A-Z
,a-z
,0-9
,-
,_
- No periods, no spaces
CloudFormation Stack Names:
A-Z
,a-z
,0-9
,-
- No underscores, no periods
- Must start with letter
DynamoDB Tables:
A-Z
,a-z
,0-9
,-
,_
,.
- 3-255 characters
CDK-Specific Considerations:
// CDK often generates resource names by concatenating// account-ref + stack-name + resource-type + random-suffixconst stackName = `MyApp-${accountRef}-Stack`const bucketName = `myapp-${accountRef.toLowerCase()}-bucket-${randomSuffix}`
Safest character set for AWS compatibility:
// Ultra-safe for all AWS servicesconst AWS_SAFE_PATTERN = /^[A-Za-z0-9-]+$/
// Slightly more permissive (works with most services)const AWS_COMPATIBLE_PATTERN = /^[A-Za-z0-9._-]+$/
Specific gotchas:
- Case sensitivity: S3 bucket names must be lowercase
- Hyphens: Canโt start/end resource names in many services
- Periods: Problematic in S3 bucket names (SSL cert issues)
- Underscores: Not allowed in CloudFormation stack names
- Length limits: S3 buckets (63), most others (64-255)
Recommended approach:
// For account references used in AWS resource namesconst AWS_ACCOUNT_REF_PATTERN = /^[A-Za-z][A-Za-z0-9-]*[A-Za-z0-9]$/
// Ensures:// - Starts with letter// - Ends with letter or digit// - Only contains letters, digits, hyphens// - No consecutive special characters
This pattern works across all AWS services and CDK constructs without requiring transformation or encoding.
Grok
- https://grok.com/share/bGVnYWN5_c38fd7d4-13d6-4884-93ea-b4152d6b37dd
- and https://grok.com/chat/d8694aa5-4edb-4210-b723-a198bcf0896b
To determine which additional characters might complicate handling with AWS services or AWS Cloud Development Kit (CDK) names, particularly in the context of your account references and labels (including emoji), letโs explore the constraints imposed by AWS services and the CDK. The goal is to identify characters that could cause issues in contexts like resource naming, CloudFormation templates, or CDK constructs, and to ensure compatibility with your requirements for account references (used in URLs, cookies, headers) and account labels (including smiley faces/emoji). Iโll guide you through the considerations, referencing AWS-specific restrictions and CDK behaviors, and provide a comprehensive view of problematic characters.
1. Context Recap: Account References and Labels
- Account References: Used in URLs (path/query), cookies (keys/values), and headers. Recommended to use [A-Za-z0-9\-_] (alphanumeric, dash, underscore) for maximum compatibility.
- Account Labels: More permissive, allowing printable ASCII (U+0020โU+007E) and emoji (e.g., U+1F600โU+1F64F for smiley faces, plus other emoji ranges like U+1F300โU+1F5FF, U+1F900โU+1F9FF, U+2700โU+27BF, U+2600โU+26FF). These are primarily for display but may appear in databases, JSON, or headers.
- New Consideration: You want to allow emoji in labels and now need to know which characters (beyond those already discussed) could cause issues with AWS services or CDK naming conventions.
2. AWS Services: Character Restrictions
AWS services impose varying restrictions on resource names, which can affect account references and labels. These restrictions depend on the service (e.g., S3, Lambda, IAM, DynamoDB) and context (e.g., resource names, tags, ARNs). Below, Iโll outline characters that complicate handling across AWS services, focusing on those relevant to your use case.
General AWS Resource Naming Constraints
AWS resource names often share common restrictions but vary by service. The most restrictive services provide a baseline for safe character sets. Based on AWS documentation and community insights, here are key considerations:
-
Common Allowed Characters: Most AWS services allow:
- Alphanumeric: A-Z, a-z, 0-9
- Safe symbols: - (hyphen), _ (underscore), sometimes . (dot)
-
Common Restrictions:
-
Case Sensitivity: Most services are case-sensitive, but some (e.g., RDS DB subnet groups) convert names to lowercase, which can cause issues if uppercase is used. No Title Found
-
Length Limits: Vary by service (e.g., S3 bucket names: 3โ63 characters; ElastiCache cluster names: 1โ20 characters). [What are the most restrictive aws resource name limitations]](https://stackoverflow.com/questions/46052869/what-are-the-most-restrictive-aws-resource-name-limitations-e-g-characters-and)
-
Special Characters: Many services prohibit or restrict special characters like /, :, ?, #, &, =, %, @, spaces, and control characters.
-
Unicode/Emoji: Most AWS resource names are limited to ASCII. Unicode (including emoji) is generally supported only in specific contexts like tags or metadata fields (e.g., AWS Elemental MediaConnect tags allow UTF-8 up to 128 characters for keys and 256 for values).tagging-restrictions
-
Problematic Characters in AWS Services
The following characters are likely to cause issues in AWS resource names or related contexts:
Character | Unicode/ASCII | Issue in AWS Services | Relevant Services |
---|---|---|---|
/ | U+002F | Reserved in ARNs, URLs, and IAM role names; causes deployment errors in CDK. | IAM, S3, CloudFormation, CDK |
:, ?, #, &, = | Various | URL reserved characters; invalid in most resource names; require encoding. | S3, Lambda, URLs |
Space ( ) | U+0020 | Invalid in most resource names; requires encoding in URLs (%20) and cookies. | S3, Lambda, IAM, Cookies |
Uppercase Letters | AโZ (U+0041โU+005A) | Converted to lowercase in some services (e.g., RDS DB subnet groups); can cause mismatches. | RDS, DynamoDB |
_ (underscore) | U+005F | Prohibited in S3 bucket names; allowed elsewhere but may cause UI rendering issues. | S3 |
Consecutive Hyphens (โ) | U+002D | Prohibited in S3 bucket names; may cause parsing issues in other services. | S3 |
. (dot) | U+002E | Allowed in some services (e.g., S3 bucket names) but not others (e.g., IAM roles). | IAM, Lambda |
Control Chars | U+0000โU+001F, U+007F | Invalid in all resource names; cause parsing or security issues (e.g., log poisoning). | All services |
Non-ASCII (e.g., emoji) | >U+007F | Invalid in most resource names; supported in tags/metadata but requires UTF-8 handling. | S3, Lambda, IAM (names); Tags |
Bidirectional Controls | U+202AโU+202E | Can cause text rendering issues or spoofing; not allowed in most contexts. | All services |
Service-Specific Examples
-
S3 Buckets:
-
Restrictions: 3โ63 characters, lowercase alphanumeric, hyphens only (no underscores, no consecutive hyphens, no ending hyphen). [what-are-the-most-restrictive-aws-resource-name-limitations-e-g-characters-and]](https://stackoverflow.com/questions/46052869/what-are-the-most-restrictive-aws-resource-name-limitations-e-g-characters-and)
-
Example Issue: A bucket name like My-Bucket (uppercase) or my_bucket (underscore) will fail.
-
Implication for Labels: If labels with emoji are used to generate S3 bucket names, theyโll fail unless transformed to lowercase ASCII (e.g., johns-account-๐ โ johns-account-smiley).
-
-
IAM Roles:
-
Restrictions: 1โ64 characters, alphanumeric, =, ,, ., @, -. No / or other special characters. [cdk issues 28502]](https://github.com/aws/aws-cdk/issues/28502)
-
Example Issue: Using / in an IAM role name causes CDK deployment failures.
-
Implication for References: Stick to [A-Za-z0-9\-_] to avoid issues.
-
-
RDS DB Subnet Groups:
-
Restrictions: Lowercase only; CDK silently converts uppercase to lowercase, causing mismatches if not anticipated.[
](https://medium.com/nerd-for-tech/aws-cdk-things-you-should-know-d620a2de9669)
-
Example Issue: A label like MyDB ๐ used in a DB subnet group name will fail due to uppercase and emoji.
-
-
ElastiCache:
-
Restrictions: 1โ20 characters, alphanumeric, hyphens. [what-are-the-most-restrictive-aws-resource-name-limitations-e-g-characters-and]](https://stackoverflow.com/questions/46052869/what-are-the-most-restrictive-aws-resource-name-limitations-e-g-characters-and)
-
Example Issue: Short length limit makes it one of the most restrictive services.
-
-
Tags:
-
Restrictions: Keys (128 chars), values (256 chars), UTF-8, allowed characters: letters, numbers, spaces, ., :, +, =, @, _, /, -. [tagging-restrictions]](https://docs.aws.amazon.com/mediaconnect/latest/ug/tagging-restrictions.html)
-
Implication for Labels: Emoji are supported in tags, but you must ensure UTF-8 encoding and test rendering.
-
3. AWS CDK: Naming Conventions and Issues
The AWS CDK generates CloudFormation templates, which impose additional constraints on resource names and logical IDs. CDKโs naming behavior can introduce complexities, especially when custom names (like account references or labels) are used.
CDK Naming Behavior
-
Logical IDs: CDK generates unique logical IDs for CloudFormation resources by concatenating construct paths (e.g., Stack1/MyBucket) and adding an 8-digit hash to avoid collisions. These IDs must be alphanumeric and cannot contain slashes or other separators. [identifiers]](https://docs.aws.amazon.com/cdk/v2/guide/identifiers.html)
-
Physical Names: You can specify physical names (e.g., bucketName for an S3 bucket), but these must adhere to service-specific restrictions. If omitted, CDK generates names to avoid conflicts. [best-practices]](https://docs.aws.amazon.com/cdk/v2/guide/best-practices.html)
-
Issues with Custom Names:
-
Long Names: CDK-generated logical IDs can become long and unreadable, breaking CloudFormation UI or exceeding length limits. [cdk issues 10796]](https://github.com/aws/aws-cdk/issues/10796) aws-cdk-generated-resource-identifiers-are-horrible-and-not-readable-any-way-to
-
Invalid Characters: Using forbidden characters (e.g., / in IAM role names) causes deployment failures at runtime, not synthesis.aws-cdk issues 28502
-
Case Sensitivity: Some services (e.g., RDS) convert names to lowercase, causing mismatches if uppercase is used in CDK code.aws-cdk-things-you-should-know-d620a2de9669
-
Emoji/Unicode: CDK does not natively validate Unicode in physical names, so emoji in resource names will fail unless transformed to ASCII-compliant names.
-
Problematic Characters in CDK
-
Slash (/): Causes deployment errors in IAM roles and other resources; reserved for ARNs and paths.[
-
Colon (:): Used in ARNs; invalid in most resource names and logical IDs.
-
Spaces and Special Characters: Invalid in logical IDs and most physical names; require encoding in other contexts.
-
Uppercase: Converted to lowercase by some services, causing inconsistencies.aws-cdk-things-you-should-know-d620a2de9669
-
Unicode/Emoji: Not supported in logical IDs or most physical names; may work in tags but requires UTF-8 handling.
CDK-Specific Recommendations
-
Avoid Hardcoding Names: Let CDK generate physical names to avoid conflicts and service-specific restrictions.[
](https://docs.aws.amazon.com/cdk/v2/guide/best-practices.html)
-
Override Logical IDs Sparingly: Use overrideLogicalId only when necessary, ensuring alphanumeric characters.
-
Validate Names Early: Implement custom validation in CDK code to catch invalid characters before deployment.[
-
Use Tags for Labels: Store account labels (with emoji) as tags rather than physical names to leverage UTF-8 support.[
](https://docs.aws.amazon.com/mediaconnect/latest/ug/tagging-restrictions.html)
4. Impact on Account References and Labels
Account References
-
Stick to Safe Set: The recommended [A-Za-z0-9\-_] is safe across AWS services and CDK:
- Compatible with S3, IAM, Lambda, and other restrictive services.
- Avoids issues with URLs, cookies, and headers.
- Prevents CDK deployment failures (e.g., no / or :).
-
Additional Restrictions:
-
Avoid consecutive hyphens (โ) for S3 compatibility.
-
Keep length under 20 characters for ElastiCache compatibility, or 64 for general use.[
-
Avoid uppercase if used in RDS or similar services that enforce lowercase.[
](https://medium.com/nerd-for-tech/aws-cdk-things-you-should-know-d620a2de9669)
-
Account Labels with Emoji
-
Emoji Support: Emoji are supported in tags and metadata fields (e.g., AWS Elemental MediaConnect tags). However, they are invalid in most resource names (e.g., S3 buckets, IAM roles, Lambda functions).[
](https://docs.aws.amazon.com/mediaconnect/latest/ug/tagging-restrictions.html)
-
Complications:
-
Resource Names: If labels are used to generate resource names (e.g., S3 buckets, DynamoDB tables), emoji will cause failures. Transform labels to ASCII (e.g., John ๐ โ john-smiley) before using in resource names.
-
CDK Logical IDs: Logical IDs in CloudFormation templates must be alphanumeric. Emoji in labels used as construct IDs will cause synthesis errors.
-
Length Limits: Emoji are multi-byte (e.g., ๐ is 4 bytes in UTF-8). A 256-character label with emoji could exceed byte-length limits in tags (256 bytes) or other fields.[
](https://docs.aws.amazon.com/mediaconnect/latest/ug/tagging-restrictions.html)
-
Rendering Issues: Some AWS consoles or tools may not render emoji correctly, especially older systems.
-
-
Mitigation:
- Store labels as tags or metadata, not resource names.
- Use UTF-8 encoding consistently across database, API, and UI.
- Validate byte length (e.g., 256 bytes max) to avoid exceeding limits.
- Escape special characters (<, >, &, โ, โ) in HTML/JSON contexts to prevent XSS or parsing issues.
5. Additional Problematic Characters
Beyond the characters already discussed (control characters, URL reserved characters, bidirectional controls), consider these additional cases that could complicate AWS or CDK handling:
-
Consecutive Dots (..): May be interpreted as path traversal in some contexts; prohibited in some services (e.g., S3 bucket names).
-
Leading/Trailing Hyphens: Prohibited in S3 bucket names; may cause issues in other services.[
-
Percent (%): Requires encoding in URLs (%25); invalid in most resource names.
-
Plus (+): Allowed in some contexts (e.g., tags) but requires encoding in URLs (%2B).[
](https://docs.aws.amazon.com/mediaconnect/latest/ug/tagging-restrictions.html)
-
Combining Characters: Unicode combining marks (e.g., U+0300โU+036F) can alter emoji or text appearance, causing rendering or normalization issues.
-
Zero-Width Characters: Characters like zero-width space (U+200B) or joiner (U+200D) can disrupt parsing or rendering; avoid in labels.
6. Recommendations for Safe Implementation
Account References
- Character Set: Stick to [A-Za-z0-9\-_] (max 64 characters).
- Validation: Use regex ^[a-zA-Z0-9\-_]{1,64}$ in CDK or API code.
- CDK Usage: Avoid using references as construct IDs unless transformed to alphanumeric; let CDK generate physical names.
- Example: user123, account-abc_456
Account Labels
- Character Set: Allow printable ASCII (U+0020โU+007E) and emoji (U+1F600โU+1F64F, U+1F300โU+1F5FF, U+1F900โU+1F9FF, U+2700โU+27BF, U+2600โU+26FF).
- Validation: Use regex like ^[\x20-\x7E\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F900-\U0001F9FF\U00002700-\U000027BF\U00002600-\U000026FF]{0,256}$ (adjust for language-specific Unicode support).
- Byte Length: Limit to 256 bytes to accommodate multi-byte emoji.
- CDK Usage: Store labels as tags (e.g., resource.addTag(โLabelโ, label)) rather than physical names. Ensure UTF-8 encoding in CloudFormation templates.
- Example: Johnโs Account ๐, Team A ๐
AWS and CDK Best Practices
-
Use Tags for Labels: Store emoji-containing labels in tags, which support UTF-8 and have generous length limits (128 chars for keys, 256 for values).[
](https://docs.aws.amazon.com/mediaconnect/latest/ug/tagging-restrictions.html)
-
Transform for Resource Names: If labels are used to derive resource names, strip or replace emoji and special characters (e.g., John ๐ โ john-smiley).[
](https://medium.com/%40aetekpo/best-practices-for-naming-conventions-for-aws-resources-ed0c04e554fc)
-
Automate Validation: Use tools like CloudFormation Guard or custom CDK assertions to enforce naming rules.[
-
Test Across Services: Test labels in S3, IAM,
Grok part 2
To design an API where account references are used in URLs, cookies, headers, and potentially as cookie keys, while also defining a more permissive character set for account labels, we need to carefully consider which ASCII characters are safe and practical for each use case. Letโs explore this step-by-step through a Socratic approach to guide your decision-making.
Account References in URLs, Cookies, and Headers
First, letโs think about the account reference, which will appear in sensitive contexts like URL paths, query parameters, cookies, and headers. What properties do you think are essential for an account reference to ensure itโs safe and interoperable across these contexts? For instance, should it be short, human-readable, or restricted to certain characters to avoid encoding issues?
Consider the constraints of URLs. URLs have specific rules defined by standards like RFC 3986. Which characters do you think might cause issues in a URL path or query parameter if not encoded? For example, would including a space or a question mark in an account reference create problems? Why might certain characters, like slashes or ampersands, complicate parsing in URLs?
Now, think about cookies and headers. Cookies (per RFC 6265) and HTTP headers have their own restrictions. What happens if you include a comma or semicolon in a cookie value? Could including non-printable ASCII control characters, like a null byte or line feed, break cookie parsing or pose security risks? Similarly, in headers, which characters might be problematic due to how servers or clients interpret whitespace or special characters?
Letโs narrow it down. A common approach for identifiers like account references is to use a limited, safe subset of ASCII characters to minimize encoding and parsing issues. For example, alphanumeric characters (A-Z, a-z, 0-9) are generally safe. Would you consider adding other characters, like hyphens or underscores, to make the reference more readable or flexible? Why or why not? What about avoiding characters like percent signs (%) or equals signs (=), which have special meanings in URLs or cookies?
To help you reason through this, letโs consider a potential character set for account references. A safe choice might be:
- Alphanumeric: A-Z, a-z, 0-9
- Optional safe symbols: - (hyphen), _ (underscore), . (dot)
Why might these be good choices? Are there any contexts where even these characters could cause issues? For instance, could a dot in a cookie key be misinterpreted? Would you want to enforce case sensitivity, or make the reference case-insensitive to simplify handling?
Account Labels
Now, letโs explore the account labels, which youโve indicated should be more permissive, allowing spaces but excluding ASCII control characters like the delete character. What do you think the purpose of the label is? Is it for display to users, storage in a database, or something else? Why would allowing spaces be useful, and what risks might come with including them in certain contexts, like URLs or JSON?
Since you want to exclude ASCII control characters (e.g., delete, null, carriage return), letโs define what โcontrol charactersโ means. In ASCII, control characters are typically those with codes 0โ31 and 127 (delete). Do you think all of these should be excluded, or are there specific ones youโre most concerned about? For example, why might a delete character (ASCII 127) or a null byte (ASCII 0) be particularly problematic?
For labels, a more permissive set might include:
- Alphanumeric: A-Z, a-z, 0-9
- Common punctuation: space, hyphen (-), underscore (_), dot (.), comma (,), apostrophe (โ), etc.
- Excluding control characters (ASCII 0โ31, 127)
Does this seem like a reasonable starting point? What other characters might you want to allow for user-friendly labels? For example, would you include symbols like ampersands (&) or hashtags (#)? Why might these be problematic in some systems, like databases or web forms?
Reference Table for Characters to Avoid
You asked for a reference table listing characters to avoid in different contexts. Letโs reason through what such a table might include. For each context (URL path, query parameters, cookies, headers, labels), what characters do you think should be avoided, and why? Hereโs a starting point to guide your thinking:
- URL Path (RFC 3986):
- Reserved characters: /, ?, #, [, ], @, !, $, &, โ, (, ), *, +, ,, ;, =
- Why avoid? These have special meanings (e.g., / separates path segments, ? starts query parameters).
- Non-printable characters (ASCII 0โ31, 127): Can break parsing or pose security risks (e.g., injection attacks).
- Space: Requires encoding (%20), which can complicate usage.
- URL Query Parameters:
- Reserved: &, =, #, +, space
- Why avoid? & separates key-value pairs, = separates keys from values, and spaces need encoding.
- Control characters: Same risks as in paths.
- Cookies (RFC 6265):
- Forbidden in values: ,, ;, space, control characters (ASCII 0โ31, 127)
- Forbidden in keys: Same as values, plus =, as it separates key from value.
- Why avoid? These can break cookie parsing or lead to malformed requests.
- HTTP Headers:
- Forbidden: Control characters (ASCII 0โ31, 127), non-ASCII characters.
- Restricted: Whitespace (space, tab) at the start or end, as it may be trimmed or cause parsing errors.
- Why avoid? Headers are sensitive to formatting, and control characters can cause security issues.
- Account Labels:
- Avoid: Control characters (ASCII 0โ31, 127) due to risks in storage, display, or processing.
- Caution: Characters like <, >, & (potential XSS risks in web contexts), or quotes (โ, โ) that might break JSON or SQL if not escaped.
Would you like to refine this table further? For example, should we specify which characters are safe rather than focusing on those to avoid? Would you want to test specific characters in each context to see how they behave in real systems?
Recommended Character Sets
For account references, a conservative approach is to use:
- A-Z, a-z, 0-9, -, _, .
- Why? These are safe across URLs, cookies, and headers, requiring minimal encoding. Theyโre also human-readable and unlikely to cause parsing issues.
- Questions: Should the length be limited (e.g., 1โ64 characters)? Should it be case-sensitive?
For account labels, a more permissive set could be:
- A-Z, a-z, 0-9, space, -, _, ., ,, โ, !, :
- Exclude: ASCII 0โ31, 127, and potentially risky characters like <, >, &, โ, \ unless properly escaped in context.
- Questions: Should labels have a maximum length? Are there specific display or storage constraints to consider?
Next Steps
To solidify your design, what specific use cases or systems (e.g., web frameworks, databases) are you targeting? Are there particular edge cases, like internationalization or legacy system compatibility, you want to explore? Would it help to create a sample account reference and label, then test how they behave in a URL, cookie, or header?
If youโd like a visual representation of safe vs. unsafe characters for each context, would you want me to generate a chart comparing the allowed characters across URLs, cookies, headers, and labels? This could help clarify the differences visually.
What do you think is the next question we should tackle to refine your APIโs character set choices?