The problem around bidi text/characters still is bigger than this. (I haven't checked if at all - and how well - any of these Unicode documents cover this.) We allow identifiers with LTR characters, too!
E.g. pay attention to the outputs of this program:
fn f(ע: i32, ע_: i32) {
println!("addition, we like it: {}", ע + ע_);
println!("here's a subtraction: {}", 1 - ע_);
println!("some comparisons? {}", ע > ע_);
println!("here's a subtraction: {}", ע - 1_);
}
fn main() {
f(42, 1337);
}
addition, we like it: 1379
here's a subtraction: -1336
some comparisons? false
here's a subtraction: 41
(the Monaco editor on the playground - if you copy the code there - has some mechanisms in place to display the program code unmangled)
As mentioned, I haven't found out yet if this is covered by documents (such as the one you linked); but as far as I can tell it could be possibly be solved by adding some strategically placed U+200E actually
I think minimally the following 3:
fn f(ע: i32, ע_: i32) {
println!("addition, we like it: {}", ע + ע_);
println!("here's a subtraction: {}", 1 - ע_);
println!("some comparisons? {}", ע > ע_);
println!("here's a subtraction: {}", ע - 1_);
}
fn main() {
f(42, 1337);
}
..eh.. that is:
"
fn f(ע: i32, ע_: i32) {
println!(\"addition, we like it: {}\", ע\u{200e} + ע_);
println!(\"here's a subtraction: {}\", 1 - ע_);
println!(\"some comparisons? {}\", ע\u{200e} > ע_);
println!(\"here's a subtraction: {}\", ע\u{200e} - 1_);
}
fn main() {
f(42, 1337);
}
"
Another example could be bidi chars in strings:
fn main() {
let tuple = ("foo ע" ,"א bar");
dbg!(tuple);
}
[src/main.rs:3:5] tuple = (
"foo ע",
"א bar",
)
Note that here, too, the broken code doesn’t contain any added invisible characters at all; but adding some invisible characters could pretty much fix the situation for "simple" text&code editing/display tools, e.g. with one U+200E added:
fn main() {
let tuple = ("foo ע" ,"א bar");
dbg!(tuple);
}
"
fn main() {
let tuple = (\"foo ע\"\u{200e} ,\"א bar\");
dbg!(tuple);
}
"
But that doesn’t necessarily solve the issue completely&nicely either, as it's still annoying to work with then, for editing the code on one hand (though possibly rustfmt
could insert these as needed?) and also now the code looks more broken in the smarter editors that do highlight the control character.
One MVP approach could just be to at least lint against all such cases (by default) where U+200E or U+200F or RTL characters would lead to broken syntax when layed out, which (broken syntax) would include both syntactical elements that have been reordered, or syntactical elements that were/are made to stand next to each other without any additional visible separation.
I'm not even getting started with all the issues & logical errors that come about when you think about string interpolation and bidi... of course Rust's format_args
formatting string can also become visually mixed up, and also bidi-unaware people may simply be very confused from the result of string-interpolation at run-time.