let's say that Lyberta uses it with the expectation that everything will be amazing in a couple of months.
Now try that with emojis that have skin tone modifier and gender modifier. Or rainbow flag emoji. I can't even paste it here because it breaks forum. I assume most of your text contains grapheme clusters consisting of 1 scalar value. That's too easy. Also, proper operator [] would be incredibly slow...
Casing: what about German "SS / ß"?
>>> s = "ß"
>>> s.upper()
'SS'
>>> s = "HEISSE"
>>> print(s.lower().replace("ss", "ß"))
heiße
For sorting you need a collation table because same characters sort differently in different languages. So obviously .lower() and .sort() are wrong because they take not enough parameters.
There's nothing you're trying to express that cannot be expressed in ASCII using words
Actually, I don't know how the memory model is supposed to work with C strings. string.h contains both string functionality and malloc/calloc, so I assume it uses similar heap allocation algorithms. What you could do is malloc your 32-bit Unicode scalar arrays (AKA UTF-32) and "null"-terminate them with a value of -1. I forget malloc's semantics, but you would be addressing the memory as int32_t* in this case. (Right? Unicode doesn't need uint32, does it?)
char *foo;
foo = malloc(sizeof(foo) * 200); /* 200-character string */
int32_t* my_unicode_string;
my_unicode_string = malloc( sizeof(my_unicode_string) * 200 ); // 200 code point Unicode string
// fu™ -- the fluffrabbit Unicode library
struct fu_utf32string {
size_t length;
uint32_t* data;
};
// Functions go here.
C has functions to make strings bigger or smaller, therefore strings aren't just arrays, they're heap buffers.
/* constant */
const char *foo = "foostr";
/* In stack */
char bar[200];
strcpy(bar, "barstr");
/* In heap */
char *baz;
baz = malloc(sizeof(baz) * 200);
strcpy(baz, "bazbaz");
free(baz);
onpon4 {l Wrote}:
- {l Code}: {l Select All Code}
char *foo;
foo = malloc(sizeof(foo) * 200); /* 200-character string */
char *foo;
foo = malloc(sizeof(char) * 200); /* 199-character string */
fluffrabbit {l Wrote}:2-byte Unicode is called UTF-16 and it sucks.
It's 199 character string (last char has to be null)
const char *foo = "foostr";
char *bar, *baz;
bar = malloc(sizeof(bar) * (strlen(foo) + 1));
strcpy(bar, foo);
bar[3] = '2';
bar[4] = '\0';
baz = malloc(sizeof(baz) * (strlen(foo) + strlen(bar) + 1));
sprintf(baz, "%s%s", foo, bar);
printf("%s and %s\n", bar, baz); /* Result: "foo2 and foostrfoo2" */
free(bar);
free(baz);
in malloc you must do sizeof(char) (which is always 1), not foo (ponter to char).
Users browsing this forum: Majestic-12 [Bot] and 1 guest