Nobody Knows What a String Is

The title might seem a bit nonsensical… and awkwardly long too. You think you know what a string is right? You’ve used them more times than you can count.
Perhaps you’re most familiar with JavaScript, and you can define a string, check what’s in it and even pass it into a function or get one out. Perhaps you’ve used a string in python and had a similar experience or maybe in C#.

But let me ask you, do you know your string’s encoding? – Maybe you do, the language you write in for your day job probably uses UTF-8 encoding. This means that every ‘character’ takes up 1-4 bytes. If you’re wondering why I’ve put character in quotes, you should look up the utf-8 specification – It does not attempt to define a character in an sense.
That aside, we can say we know what a string is well right? Well, not really – do you know where your string is stored? I’m not talking about the memory address or which virtual machine or garbage collector keeps track of it, I’m asking you if your string is heap or stack allocated. I’m asking, is your string backed by a ‘string pool’ data structure. Does this matter? It can sometimes matter if you’re using an object pool when you’re using reference based equality, but let’s sweep that under the rug for now…

How Long is a Piece of String?

Do you know what metadata is stored about your string? In rust, a String has 3 pieces of data stored, length capacity and a pointer to the first character of the string. I am not familiar with the intricacies of how JavaScript and Python would store their strings but I can imagine it would be similar. Well that’s simple at least, we can be confident of what a string would look like in memory right? Erm, no not really.

The C language famously has a very problematic way of reprinting strings. A string in C is just a pointer of type char and that’s it. so how do we know how long it is? We don’t really but we can calculate it. It has a 0 byte aka “Null terminator” at the end. This byte is very easy to loose due to the buggy C standard library not always caring if it’s there or sometimes not placing a new one when it should be. Nonetheless, if we need the length we can get it by walking the entire string and counting every single step until we find a null byte. This has more problems like that you cannot have a zero’d byte in the string except at the very end.
It’s so problematic in fact, that the string has been reinvented in C countless times. My favorite approach I’ve seen is “Simple dynamic strings” https://github.com/antirez/sds where the ‘start’ of the string isn’t actually the start, but a region just after the metadata. This obviously has its own downsides but I think it deserves an honnourable mention.

The rust community has taken the approach to add a new string wrapper class/struct for each type of string you may want to interact with. This can smooth over the edges by adding some type safety so you don’t put the wrong thing in the wrong place, but it’s also got its downsides. The most obvious downside is how beginner hostile this is – String, &str and &[u8] are a few different ways of encapsulating a string. The difference between them all takes a lot of knowledge to understand and work with. This is bad because, beginners who just want to print “hello world”, are slammed with a huge amount of confusing information. It’s clearly a problem to me because the question “What’s the difference between a &str and a String” is very common. I don’t think most people even know the full answer – A String is a struct with length, capacity and a pointer. When used, it allocates bytes on the heap and includes associated functions to modify and read it.A &str is a reference to a string literal with length. &str is not a struct, it’s a compiler intrinsic and the length associated with it is removed at compile time, it’s a kind of ‘smart pointer’. Confused yet? That is the correct response.

What Even is A Character?

Well, let’s put that nonsense aside, at least we can assume that a string something that gets at utf-8, right? No… no not really. Allow me to introduce you to windows’ WCHAR, it’s a unit of 2 bytes used to encode utf-16. So, if you want to interact with the windows kernel directly with utf-8, you have to use one of thier wrapper functions that end in A (only available in windows’ C++ compiler toochain).

So let’s recap. We’ve talked about how a string can be:

A pointer to null-terminated utf-8
A struct with length, a pointer to utf-8 and (not all the time, but often) capacity
A pointer to null-terminated utf-16
A pointer to utf-8, with length available at compile time
A pointer to a region of memory with metadata preceding it and null terminated utf-8 following it.

Let’s put aside the fact there are countless encodings with varied support, strings can be stored left-to-right or right-to-left, Haskell strings are different again, and that there are other niche approaches too! Check out my hybrid heap/stack string implementation if you like https://github.com/largenumberhere/short_string .

Strings aren’t Real

Now let’s quickly dip our toes in some assembly, don’t be scared it’ll be quick!
What’s a string in assembly? There is no such thing.
In assembly there are no types, just bits and bytes and their meaning differs depending on what function (or system call) you’re calling. The most common way to store a string in assembly is reserve some bytes in a row, which you could say is a crude array. The Linux system calls sometimes expect a null terminated string and other times expect a string and length, so you’re best off to use a C string and store its length. keep in mind, this string is a fixed size. Any attempt to write past the last position, will cause strange behavior or a fault.

Somebody knows What a String is

Okay, I think that’s enough torture.
Now, take step back and breathe.
Most of these quirks about “Strings” are not important in higher level languages and there strings are somewhat of a solved problem that you rarely have to think about. Just use whatever your standard library hands you and if you have any issues, learn about the quirks of it.
In C, just use a malloc’d char array like god intended or pick a nice library that’s compatible with C’s builtin functions – trust me don’t try to fiddle with char arrays on the stack, it’ll bite you.
In rust, use a String and clone it around as necessary. If you learn about the ins and outs of lifetimes and ownership, then you will understand the complexity increase introduced by &str and when it may be problematic and when it may be a worthwhile tradeoff.
It’s not as bad as it sounds, but it’s good to be aware that strings are complicated and there’s many ways to approach them

January 7, 2025

rose

Uncategorized

74 responses to “Nobody Knows What a String Is”

Kathryn3379 says:

April 14, 2025 at 12:04 am

http://test.kuli4kam.net/gallery/image/936-12/

Reply
Daniel4513 says:

April 21, 2025 at 5:28 am

Awesome https://is.gd/tpjNyL

Reply
June137 says:

April 22, 2025 at 11:39 pm

Very good https://is.gd/tpjNyL

Reply
Breanna2011 says:

April 23, 2025 at 6:05 pm

Awesome https://shorturl.at/2breu

Reply
Daniel1858 says:

April 24, 2025 at 3:58 am

Very good https://shorturl.at/2breu

Reply
Kara1262 says:

April 24, 2025 at 5:39 am

Good https://shorturl.at/2breu

Reply
Gabriel4227 says:

April 25, 2025 at 9:20 am

Very good https://lc.cx/xjXBQT

Reply
Marian4125 says:

April 25, 2025 at 10:47 am

Very good https://lc.cx/xjXBQT

Reply
Joy3867 says:

April 25, 2025 at 6:19 pm

Good https://lc.cx/xjXBQT

Reply
Lance2907 says:

April 26, 2025 at 2:02 pm

Very good https://t.ly/tndaA

Reply
Tina2300 says:

April 26, 2025 at 5:16 pm

Awesome https://t.ly/tndaA

Reply
Monica3935 says:

April 26, 2025 at 6:30 pm

Awesome https://t.ly/tndaA

Reply
Andrea613 says:

April 26, 2025 at 10:30 pm

Good https://t.ly/tndaA

Reply
Jordyn265 says:

April 26, 2025 at 11:40 pm

Very good https://t.ly/tndaA

Reply
Aniya4703 says:

April 27, 2025 at 4:51 am

Very good https://t.ly/tndaA

Reply
Haley3691 says:

April 27, 2025 at 8:55 pm

Very good https://urlr.me/zH3wE5

Reply
Hunter2433 says:

April 28, 2025 at 11:57 pm

Awesome https://is.gd/N1ikS2

Reply
Lora2639 says:

April 29, 2025 at 2:33 am

Very good https://is.gd/N1ikS2

Reply
Victoria4224 says:

April 29, 2025 at 11:11 am

Awesome https://is.gd/N1ikS2

Reply
Janice51 says:

April 29, 2025 at 8:55 pm

Good https://is.gd/N1ikS2

Reply
Marshall4551 says:

April 30, 2025 at 1:03 am

Good https://is.gd/N1ikS2

Reply
Freya1367 says:

April 30, 2025 at 3:43 am

Awesome https://is.gd/N1ikS2

Reply
Ivan1977 says:

April 30, 2025 at 7:11 am

Awesome https://is.gd/N1ikS2

Reply
Luis1045 says:

April 30, 2025 at 7:53 am

Awesome https://is.gd/N1ikS2

Reply
Dane1619 says:

April 30, 2025 at 8:29 am

Very good https://is.gd/N1ikS2

Reply
Aurora268 says:

April 30, 2025 at 10:07 am

Good https://is.gd/N1ikS2

Reply
Naomi4871 says:

April 30, 2025 at 10:16 am

Very good https://is.gd/N1ikS2

Reply
Heather511 says:

April 30, 2025 at 12:50 pm

Good https://is.gd/N1ikS2

Reply
Madeline2342 says:

April 30, 2025 at 4:45 pm

Very good https://is.gd/N1ikS2

Reply
John2439 says:

April 30, 2025 at 5:50 pm

Good https://is.gd/N1ikS2

Reply
Joe2596 says:

April 30, 2025 at 5:50 pm

Good https://is.gd/N1ikS2

Reply
London4296 says:

April 30, 2025 at 8:23 pm

Good https://is.gd/N1ikS2

Reply
Luna2108 says:

April 30, 2025 at 8:40 pm

Very good https://is.gd/N1ikS2

Reply
Bernice2012 says:

April 30, 2025 at 8:43 pm

Very good https://is.gd/N1ikS2

Reply
Cliff973 says:

May 1, 2025 at 1:04 am

Very good https://is.gd/N1ikS2

Reply
Eloise2789 says:

May 1, 2025 at 6:45 am

Awesome https://is.gd/N1ikS2

Reply
Jermaine4483 says:

May 1, 2025 at 9:05 am

Very good https://is.gd/N1ikS2

Reply
Travis2173 says:

May 1, 2025 at 11:34 am

Awesome https://is.gd/N1ikS2

Reply
Alana3807 says:

May 1, 2025 at 12:29 pm

Good https://is.gd/N1ikS2

Reply
Craig1773 says:

May 23, 2025 at 10:57 pm

Top https://shorturl.fm/YvSxU

Reply
Jacob2297 says:

May 24, 2025 at 3:57 am

Awesome https://shorturl.fm/oYjg5

Reply
Ethan571 says:

May 24, 2025 at 6:14 am

Very good https://shorturl.fm/TbTre

Reply
Vanessa1637 says:

May 24, 2025 at 7:58 am

Good https://shorturl.fm/j3kEj

Reply
Phyllis2975 says:

May 24, 2025 at 8:10 am

Awesome https://shorturl.fm/5JO3e

Reply
Lindsey3209 says:

May 24, 2025 at 12:25 pm

Good https://shorturl.fm/j3kEj

Reply
Jonas2994 says:

May 24, 2025 at 12:42 pm

Very good partnership https://shorturl.fm/9fnIC

Reply
Isaac3469 says:

May 24, 2025 at 8:46 pm

https://shorturl.fm/TbTre

Reply
Dean3385 says:

May 25, 2025 at 7:15 am

https://shorturl.fm/N6nl1

Reply
Eva541 says:

May 25, 2025 at 7:49 pm

https://shorturl.fm/a0B2m

Reply
Kyle4657 says:

May 25, 2025 at 8:47 pm

https://shorturl.fm/TbTre

Reply
Sophie4056 says:

May 26, 2025 at 6:52 am

https://shorturl.fm/j3kEj

Reply
Junior3512 says:

May 26, 2025 at 11:57 am

https://shorturl.fm/6539m

Reply
Maxwell2440 says:

May 26, 2025 at 1:38 pm

https://shorturl.fm/j3kEj

Reply
Barbara2009 says:

May 27, 2025 at 3:03 am

https://shorturl.fm/A5ni8

Reply
Alejandra4031 says:

May 27, 2025 at 8:41 am

https://shorturl.fm/6539m

Reply
Haven1986 says:

May 27, 2025 at 6:47 pm

https://shorturl.fm/YvSxU

Reply
Adele1978 says:

May 28, 2025 at 3:27 am

https://shorturl.fm/bODKa

Reply
Stella1802 says:

May 28, 2025 at 9:28 pm

https://shorturl.fm/m8ueY

Reply
Jocelyn4828 says:

May 29, 2025 at 2:02 am

https://shorturl.fm/oYjg5

Reply
Jeffrey1746 says:

May 29, 2025 at 6:45 am

https://shorturl.fm/m8ueY

Reply
Brendan4350 says:

May 29, 2025 at 7:14 pm

https://shorturl.fm/a0B2m

Reply
Alexandria4224 says:

May 29, 2025 at 7:20 pm

https://shorturl.fm/bODKa

Reply
Madelyn4810 says:

May 29, 2025 at 9:48 pm

https://shorturl.fm/m8ueY

Reply
Alejandra3354 says:

May 30, 2025 at 9:46 am

https://shorturl.fm/XIZGD

Reply
Xavier588 says:

May 30, 2025 at 6:23 pm

https://shorturl.fm/PFOiP

Reply
Kristin1663 says:

May 31, 2025 at 2:42 pm

https://shorturl.fm/0EtO1

Reply
Collin1115 says:

June 1, 2025 at 5:32 am

https://shorturl.fm/xlGWd

Reply
Clayton2432 says:

June 1, 2025 at 8:29 am

https://shorturl.fm/uyMvT

Reply
Blair4200 says:

June 1, 2025 at 9:41 am

https://shorturl.fm/MVjF1

Reply
Lewis1388 says:

June 1, 2025 at 12:30 pm

https://shorturl.fm/uyMvT

Reply
Jay449 says:

June 1, 2025 at 8:29 pm

https://shorturl.fm/eAlmd

Reply
Harmony2584 says:

June 1, 2025 at 8:55 pm

https://shorturl.fm/fSv4z

Reply
Valentina4752 says:

June 2, 2025 at 6:26 am

https://shorturl.fm/YZRz9

Reply
Jonah1604 says:

June 2, 2025 at 8:42 am

https://shorturl.fm/LdPUr

Reply

CatCompiler.dev