FreeGameDev Forums < Off-topic

by **Lyberta** » 13 Aug 2019, 05:18

Deleted.

by **smcameron** » 13 Aug 2019, 06:04

Wow, look how many problems Lyberta ran into in a single post that can be completely avoided by simple ASCII?

I mean, use your words. There's nothing you're trying to express that cannot be expressed in ASCII using words, e.g. your failed attempt at "pregnant male" that was ultimately successful via the device of using the ASCII characters, "pregnant male".

by **Lyberta** » 13 Aug 2019, 06:12

Deleted.

by **onpon4** » 13 Aug 2019, 12:48

let's say that Lyberta uses it with the expectation that everything will be amazing in a couple of months.

Not at all what I'm talking about, and I don't see why you would go with something you hate in the expectation that someone else will just make it better.

Now try that with emojis that have skin tone modifier and gender modifier. Or rainbow flag emoji. I can't even paste it here because it breaks forum. I assume most of your text contains grapheme clusters consisting of 1 scalar value. That's too easy. Also, proper operator [] would be incredibly slow...

Like I said, you make do with something imperfect and do the best you can. Again, you're falling into the nirvana fallacy and this leads to nothing getting done.

Casing: what about German "SS / ß"?

{l Code}: {l Select All Code}: >>> s = "ß" >>> s.upper() 'SS'

Converting "SS" to "ß" is trickier, you need to know that you want to convert it to that and not "ss" (that would require awareness of what language you're using and might even require awareness of context hints), but then you can do something like this:

{l Code}: {l Select All Code}: >>> s = "HEISSE" >>> print(s.lower().replace("ss", "ß")) heiße

For sorting you need a collation table because same characters sort differently in different languages. So obviously .lower() and .sort() are wrong because they take not enough parameters.

https://docs.python.org/3/library/functions.html#sorted

The sorted() function in Python (and the .sort() function of iterables) takes a "key" argument which is a function that determines how sorting should be done. Typically you make this a lambda expression, though you can also define a larger function for it separately.

Look, again, I'm not saying that it's perfect. I'm saying you need to learn to deal with things that aren't 100% absolutely perfect, or you'll never get anything done. So that means e.g. you take what Python offers, use it to the best of your ability to get the game in a working state, and make extensions onto what's available where it's insufficient. Yes, it might be a bit messy; I understand it's something you have a hard time dealing with, and I'm sympathetic. It would be nice if everything was perfect. But we're in an imperfect world; nothing is perfect for anyone. That applies to proprietary software developers too, probably moreso, because usually they just use kludges or fail to even think of this stuff at all.

Let me just tell a personal anecdote: I started contributing to Naev a few years back after playing version... 0.5.2, I think? There were a lot of problems with the game, like the fact that any time you saw an enemy your ship stopped moving, the fact that you had to go through something like a dozen tutorials before you got any idea of how to play, and the fact that small ships were essentially just moving targets for big ships to pick off. But it was playable. I could have just abandoned Naev and started over from scratch, but had I done that, I wouldn't have made much if any progress. Instead what I did was go into Naev's code and start fixing things up, and as a result of that (from multiple people, not just me), I'd say that Naev is a much, much better game now than it was back then. It's still imperfect; AFAIK Unicode support is lacking, for instance. But it's getting to a better and better place over time.

That's what software development is about. We all extend on each other's efforts, and if you cut yourself off from the progress made by others because their code isn't 100% perfect, or because they're missing some features, or because their Unicode support fails to account for all cases, you're in a much more difficult position.

There's nothing you're trying to express that cannot be expressed in ASCII using words

Uh, no, that's not true at all. It's mostly true in English, but there are other languages, like Japanese, that need other characters. To say nothing of languages that have things like tone indications (like Thai), or phonemes that Latin script isn't built to deal with. Unicode addresses that.

by **drummyfish** » 13 Aug 2019, 15:43

I'd just use extended ASCII for player names and chat, really. It will save you two years of development (according to your analysis) and will make no trouble to 99% of players. Okay, if I wanted to satisfy Asian people, I'd be thinking about maybe some custom KISS character codes with constant width of 2 bytes that would include Hiragana/Katakana and such things, but I wouldn't ever go for full Unicode variable width mess unless I'm writing a book reader or a web browser or something. You don't need to implement string sorting and such things, that's never needed in chat or names.

https://web.archive.org/web/20180119212 ... GoodEnough

by **fluffrabbit** » 14 Aug 2019, 01:05

2-byte Unicode is called UTF-16 and it sucks. For UTF-8 all you need is a converter from a string of UTF-8 text to an array of 32-bit Unicode scalars. In C++ this is trivial because of vector and various Unicode-specific string types that never work. In C, you can't have null-terminated arrays.

Actually, I don't know how the memory model is supposed to work with C strings. string.h contains both string functionality and malloc/calloc, so I assume it uses similar heap allocation algorithms. What you could do is malloc your 32-bit Unicode scalar arrays (AKA UTF-32) and "null"-terminate them with a value of -1. I forget malloc's semantics, but you would be addressing the memory as int32_t* in this case. (Right? Unicode doesn't need uint32, does it?)

by **onpon4** » 14 Aug 2019, 04:24

Actually, I don't know how the memory model is supposed to work with C strings. string.h contains both string functionality and malloc/calloc, so I assume it uses similar heap allocation algorithms. What you could do is malloc your 32-bit Unicode scalar arrays (AKA UTF-32) and "null"-terminate them with a value of -1. I forget malloc's semantics, but you would be addressing the memory as int32_t* in this case. (Right? Unicode doesn't need uint32, does it?)

C strings are just arrays of chars terminated by the null character. As for malloc:

{l Code}: {l Select All Code}: char *foo; foo = malloc(sizeof(foo) * 200); /* 200-character string */

by **fluffrabbit** » 14 Aug 2019, 06:32

C has functions to make strings bigger or smaller, therefore strings aren't just arrays, they're heap buffers. You have to build something like that for Unicode.

I was thinking something like:

{l Code}: {l Select All Code}: int32_t* my_unicode_string; my_unicode_string = malloc( sizeof(my_unicode_string) * 200 ); // 200 code point Unicode string

May or may not be correct. I'm forgetting pointer semantics, especially with sizeof and syntax stuff. But the basic idea is simple. Gotta have stuff like UnicodeMakeString, UnicodeAppendString, UnicodeCompareStrings, etc.

EDIT: How are we back to this Unicode discussion? Text isn't expressive at all. Anyways, UTF-32 is confirmed unsigned FYI. I was thinking something like this:

{l Code}: {l Select All Code}: // fu™ -- the fluffrabbit Unicode library struct fu_utf32string { size_t length; uint32_t* data; }; // Functions go here.

Because I use C++ I don't actually need to manage my own memory, but it's a fun idea for a project, the goal being to make a C99 Unicode library in under 500 LoC, or better yet under 200.

by **onpon4** » 14 Aug 2019, 10:51

C has functions to make strings bigger or smaller, therefore strings aren't just arrays, they're heap buffers.

What functions are you referring to? And what do you mean when you say "heap buffer"? That's not a term I'm familiar with.

In any case C strings can be either in the stack or in the heap. It just depends how you declare them:

{l Code}: {l Select All Code}: /* constant */ const char *foo = "foostr"; /* In stack */ char bar[200]; strcpy(bar, "barstr"); /* In heap */ char *baz; baz = malloc(sizeof(baz) * 200); strcpy(baz, "bazbaz"); free(baz);

by **drummyfish** » 14 Aug 2019, 11:20

onpon4 {l Wrote}:
{l Code}: {l Select All Code}
char *foo; foo = malloc(sizeof(foo) * 200); /* 200-character string */

It's 199 character string (last char has to be null), and in malloc you must do sizeof(char) (which is always 1), not foo (ponter to char).

{l Code}: {l Select All Code}: char *foo; foo = malloc(sizeof(char) * 200); /* 199-character string */

fluffrabbit {l Wrote}:2-byte Unicode is called UTF-16 and it sucks.

No, UTF 16 is still variable length, I am literally talking about my custom encoding with constant width of 2 bytes, of which I could only use e.g 512 values. First 256 values would correspond to some extended ASCII version, and stuff above that would simply be chars I would like to add to that like Cyrilic or Hiragana/Katakana. These strings would still be 16-bit 0 terminated, you would have random access to characters (no special UTF algorithms as Lyberta said), it would waste a few bytes but who cares (you can view it as a space to adding more characters if needed), it's simple and not prone to bugs, and you can implement it in a day. Rendering the string would depend on your engine -- if you have an engine that already supports printing UTF strings, you can quite simply convert my string to UTF just for printing it (while for manipulation, storing etc. you keep the simplicity of my approach). If your engine only supports printing ASCII, you can use 2 fonts to print my string. If you don't have any such engine, you can simply create a bitmap font table in which each place corresponds to the character code (e.g. 0 - 511).

This is KISS, suckless, just works, not buggy, no bloat, no dependencies.

by **onpon4** » 14 Aug 2019, 11:46

It's 199 character string (last char has to be null)

I'm aware. The null character is still a character, though. I didn't feel the need to explain that. But yes, you need to allocate an extra character for the null character:

{l Code}: {l Select All Code}: const char *foo = "foostr"; char *bar, *baz; bar = malloc(sizeof(bar) * (strlen(foo) + 1)); strcpy(bar, foo); bar[3] = '2'; bar[4] = '\0'; baz = malloc(sizeof(baz) * (strlen(foo) + strlen(bar) + 1)); sprintf(baz, "%s%s", foo, bar); printf("%s and %s\n", bar, baz); /* Result: "foo2 and foostrfoo2" */ free(bar); free(baz);

in malloc you must do sizeof(char) (which is always 1), not foo (ponter to char).

The construction I gave is invalid in C++, but generally preferred in C, and definitely valid. It avoids problems if you change the type later on, especially if the malloc call is separated by the declaration by more distance.

by **fluffrabbit** » 14 Aug 2019, 11:52

These are all terrible ideas.

More to the point: C++ uses heap allocation to make objects automagic. The *best* way to do UTF-anything in C is to replicate that functionality as closely as possible. You don't need null-terminated strings, you don't need the string functions from string.h, what you need is your own dynamic array implementation designed specifically for dealing with Unicode strings. Then it is very simple to pass in your UTF-8 string as written in your source code and render it with stb_truetype or SDL_ttf. I see no other sane way of doing it. UTF-8 -> intermediate array on the heap which is freed when you're done with it -> stb_truetype.

by **GunChleoc** » 14 Aug 2019, 13:42

This is how I implemented support for rendering Arabic and Japanese with the help of the ICU:

https://bazaar.launchpad.net/~widelands-dev/widelands/trunk/view/head:/src/graphic/text/bidi.cc

The code is GPLv2+, so please feel free to take.

It would probably have been less code if I had used harfbuzz.

For audio, since you're using SDL anyway, have you considered using SDL_Mixer?

FreeGameDev Forums < Off-topic

5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Re: 5 Steps To Start Making Games

Who is online