-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add TryFormat span-based methods to most primitive types #15069
Conversation
public StringBuilder Append(bool value) => Append(value.ToString()); | ||
public StringBuilder Append(bool value) | ||
{ | ||
if (value.TryFormat(RemainingCurrentChunk, out int charsWritten)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool.ToString
doesn't allocate. Is there a benefit to using TryFormat
here vs. the previous simpler implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I'll revert that. I was in copy-and-paste mode :)
@@ -196,6 +196,11 @@ public String ToString(String format, IFormatProvider provider) | |||
return Number.FormatInt32(m_value, format, NumberFormatInfo.GetInstance(provider)); | |||
} | |||
|
|||
public bool TryFormat(Span<char> destination, out int charsWritten, string format = null, IFormatProvider provider = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In UTF8 APIs, we use StandardFormat instead of format string to avoid having to parse the string. Would we want to use StandardFormat here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would that support custom formats? I'd have to look at the profiling data, but for these standard formats, IIRC passing the format string was maybe only 2% of the overall cost of the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right. Custom formats is a problem. I think string is ok for now. We can always add optimized overloads if we need them.
Though (FYI), It's probably 2% for the average scenario. But there are scenarios, e.g. formatting sections of a GUID, where you repeatedly have to format with D2 and D4. And the formatted numbers are quite small. When I implement it some time ago using strings, the string parsing was significantly more than 2%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should string format = null
be ReadOnlySpan<char> format = default
?
I've been looking at making use of these new TryFormat
methods to avoid allocations inside the implementation of StringBuilder.AppendFormat
(used by string.Format
/interpolated strings).
Right now, if any of the format items has a specified format, it has to allocate substrings.
itemFormat = format.Substring(startPos, pos - startPos); |
It'd be nice if those could be non-allocating slices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there are scenarios, e.g. formatting sections of a GUID, where you repeatedly have to format with D2 and D4. And the formatted numbers are quite small. When I implement it some time ago using strings, the string parsing was significantly more than 2%.
It is more, though also far from dominating. I took your GUID formatting example:
using System;
using System.Diagnostics;
using System.Globalization;
class Program
{
static void Main()
{
uint a = 0x159491E6;
ushort b = 0x1BC1;
ushort c = 0xB0D9;
ulong d = 0x1D477C38F186;
var sw = new Stopwatch();
var arr = new char[100];
int charsWritten;
var provider = NumberFormatInfo.GetInstance(CultureInfo.CurrentCulture);
for (int i = 0; i < 50_000_000; i++)
{
Span<char> dst = arr;
a.TryFormat(dst, out charsWritten, "X8", provider);
dst = dst.Slice(charsWritten);
b.TryFormat(dst, out charsWritten, "X4", provider);
dst = dst.Slice(charsWritten);
c.TryFormat(dst, out charsWritten, "X4", provider);
dst = dst.Slice(charsWritten);
d.TryFormat(dst, out charsWritten, "X12", provider);
}
}
}
At least in that example:
ParseFormatSpecifier
is ~5.3%. That's not nothing, but it's certainly not the majority contributor. We might also be able to tweak it a bit to eek out some more throughput.ReadOnlySpan.TryCopyTo
is ~9.9%, but theBuffer.Memmove
it's using for the actual copy is ~3.9%, so there's 6% in there that going to something other than the actual copy (the trace shows ~2.6% in theReadOnlySpan.TryCopyTo
body and ~3.4% in theSpan.CopyTo
body thatTryCopyTo
calls). If we're looking to optimize for such percentages, I'd prefer to see us start by looking at making TryCopyTo faster, as that'll accrue to many other scenarios. If we then also want to look at overloads ofTryFormat
that take aStandardFormat
, we certainly can. As you say, we can always add them later.
To be clear, I'm not against adding StandardFormat
to the core and adding overrides that use it; I just don't think that takes the place of the string-based format, and I'm not sure it's currently the most pressing matter.
Of course, it's also possible/likely that there are ways to further optimize methods like TryInt32ToHexStr
, at which point with that getting faster, the costs associated with the parsing will increase relative to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when would I want to pass span of chars as a format?
See @justinvp's scenario above. #15069 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would StandardFormat overloads address this as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would StandardFormat overloads address this as well?
For standard formats 😄 AppendFormat supports custom formats, too, though, e.g.
var sb = new StringBuilder();
sb.AppendFormat("{0:[##-##-##]}", 123456);
Console.WriteLine(sb.ToString());
outputs:
[12-34-56]
We could say we don't care about optimizing for custom formats. In that case, if we had a StandardFormat.TryParse
method, and overloads that took StandardFormat
, callers could try parsing as a standard format, using that overload if it was standard, and otherwise fall back to using Substring and the string-based overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KrzysztofCwalina, here's a concrete example:
DateTime now = DateTime.UtcNow;
Guid id = Guid.NewGuid();
decimal value = 3.50m;
string log = string.Format("{0:s}: Event {1:B} occurred: The value is: {2:C2}", now, id, value);
string.Format
's implementation (StringBuilder.AppendFormat
) has to extract the formats for each format item, e.g. "s"
, "B"
, and "C2"
. It'd be nice if we didn't have to allocate strings for these.
Of course, if this is the only place where this comes up, we could get away with internal support for ReadOnlySpan<char>
, as @stephentoub mentions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened https://github.com/dotnet/corefx/issues/25337 to track this. We don't need to hold up this PR for it.
internal static partial class Number | ||
{ | ||
// Constants used by number parsing | ||
private const Int32 NumberMaxDigits = 32; // needs to == NUMBER_MAXDIGITS in coreclr's src/classlibnative/bcltype/number.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "NumberMaxDigits" in CoreRT was actually 31 (with the buffer size being 32 to allow for the NUL terminator.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's seems that it's actually 32 in CoreRT but CoreRT is missing the extra space for the NUL terminator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I have it as 32 digits + space for the null terminator. Should I make it 31+1 instead? Or fine as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might have to make it 50+1 after all if we want to be fully compatible: the 50 seems to have come from the fact that the NumberToDecimal
looks ahead 20 digits to decide whether to round up or down when you have a string ending with an even number followed by a '5':
int count = 20; // Look at the next 20 digits to check to round |
So 29 digits for Decimal max precision + 20 digits lookahead + 1 to sleep better at night = 50.
Consider this example:
StringBuilder sb = new StringBuilder();
sb.Append('.');
sb.Append('2', 28);
sb.Append('5');
sb.Append('0', 10);
sb.Append('1');
String s = sb.ToString();
Decimal d = Decimal.Parse(s);
Console.WriteLine(d);
prints:
0.2222222222222222222222222223
on desktop and
0.2222222222222222222222222222
on ProjectN
Personally, I'd be fine with either 50+1 or 32+1 - the "20 digits lookahead" is a pretty arbitrary policy. But we should make a decision and make it match (Utf8Parser as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, to minimize impact on coreclr here I'll stick with 50+1. We can subsequently change to 32+1 if desired.
private const Int32 NumberMaxDigits = 32; // needs to == NUMBER_MAXDIGITS in coreclr's src/classlibnative/bcltype/number.h | ||
|
||
[StructLayout(LayoutKind.Sequential)] | ||
internal unsafe struct NumberBuffer // needs to match layout of NUMBER in coreclr's src/classlibnative/bcltype/number.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare as "ref struct" so C# can now actually enforce the "stack only" nature of this struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't because then I couldn't do Unsafe.AsPointer on ref this (since ref structs can't be used as generic arguments), but I can expand the fixed buffer into individual fields, similar to how you did in corefx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real problem is that C# has unnecessary limitations on what you can do with fixed buffers. You would want to write this:
public char* digits => (char*)Unsafe.AsPointer(ref _digits[0]);
but it won't compile for no good reason.
If you want to save yourself from having 50 padding fields, you can workaround it by doing this:
ref struct NumberBuffer
{
[StructLayout(LayoutKind.Sequential, Size = 50*sizeof(char))]
struct Digits { }
private Digits _digits;
public char* digits => (char*)Unsafe.AsPointer(ref _digits);
}
It will generate pretty much the same IL and metadata as what you get for the fixed buffer, except that it will not trigger the unnecessary limitations.
cc @VSadov What it would take to fix the unnecessary limitations on what you can do with fixed buffers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I think I tried the nested ExplicitLayout trick in corefx too but it also had the same bad debugging experience - VS somehow wants to see actual named fields or it won't give you an accurate view of the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to save yourself from having 50 padding fields, you can workaround it by doing this
I'll do that, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. The current code in coreclr does this:
coreclr/src/mscorlib/src/System/Number.cs
Line 335 in fc8bd03
public static readonly Int32 NumberBufferBytes = 12 + ((NumberMaxDigits + 1) * 2) + IntPtr.Size; |
That's 118 on 32-bit and 122 on 64-bit. But that doesn't seem to account for any padding. The coreclr code has:
coreclr/src/classlibnative/bcltype/number.h
Lines 22 to 29 in fc8bd03
struct NUMBER { | |
int precision; | |
int scale; | |
int sign; | |
wchar_t digits[NUMBER_MAXDIGITS + 1]; | |
wchar_t* allDigits; | |
NUMBER() : precision(0), scale(0), sign(0), allDigits(NULL) {} | |
}; |
Are we using pack==1 in native? Otherwise, it would seem the native size would actually be 120 on 32-bit and 128 on 64-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using pack==1 in native?
I think that's what the "include <pshpack1.h>" is supposed to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what the "include <pshpack1.h>" is supposed to do.
Ok, good, I'll do the same for the managed struct.
private const Int32 NumberMaxDigits = 32; // needs to == NUMBER_MAXDIGITS in coreclr's src/classlibnative/bcltype/number.h | ||
|
||
[StructLayout(LayoutKind.Sequential)] | ||
internal unsafe struct NumberBuffer // needs to match layout of NUMBER in coreclr's src/classlibnative/bcltype/number.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to match layout of NUMBER
It's not quite matching right now - the C++ version has an "alldigits" field - only used for BigInt but if you're passing these down via FCalls, it's playing with fire not to have the space here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks for the catch. Will fix.
f59f900
to
f1fd4d5
Compare
RE: #15069 (comment) BTW, the inability to do Currently a fixed buffer is treated as an opaque blob of data that can be pointed at in unsafe context. Your concerns will be addressed when https://github.com/dotnet/csharplang/blob/master/proposals/fixed-sized-buffers.md gets more traction. |
|
||
namespace System.Text | ||
{ | ||
internal ref struct ValueStringBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutable structs ftw 😉
@dotnet-bot test Windows_NT x64 corefx_baseline |
@jkotas, thanks for getting CI straightened out. Any feedback or concerns with the PR? |
LGTM |
Would you prefer to merge this as |
There's a lot of commits here, but I'd like to keep the original code separate from the changes. Let me squash some of the commits locally and I'll push up a smaller set that can then be merged. |
Moves existing managed parsing code to shared, to be shared with corert. Takes the managed formatting port from corert and moves that to shared as well.
f1fd4d5
to
d6c7726
Compare
- Optimize NumberBuffer passing by reference instead of value. It's a large struct of ~50 bytes; copying it around has non-trivial cost. - Replace formatting StringBuilder with ref struct and stack allocation. Avoids lots of allocation and associated throughput costs. - Improve perf of 'D' formatting of 32-bit and 64-bit integers. - Remove array allocations accessing NumberFormatInfo props. - Accessing array properties like PercentGroupSizes clones the corresponding field. That's unnecessary here, as we don't mutate the array. - Remove int[] allocation from NumberToStringFormat. Span makes it easy to start with stack space and grow to an allocated array as needed. - Improve perf of hex formatting of integers. Including removing some sizable allocations. - Manually inline several hot functions called in only one place. - Tweak some range comparisons in ParseFormatSpecifier. - Avoid large stackallocs in NumberToString{Fixed}. It's incurring non-trivial overheads. - Tweak perf of ValueStringBuilder. In particular, make Append(string) faster for the single-char case, which is extremely common in integer formatting due to its prevalence in strings in NumberFormatInfo.
- Remove dead "bigNumber" code. - Remove custom wcslen function. Use String's. - Delete dead fcalls from runtime. FormatInt32, FormatUInt32, FormatInt64, and FormatUInt64 are no longer needed. Delete them and many of the helpers used only by them.
Match the original native code's use of pointers.
Avoid string allocations when appending primitives to StringBuilder. We try to format into the existing array space, and only fall back to using ToString when there isn't enough room and we're going to grow the builder anyway.
d6c7726
to
81b2ef4
Compare
…rshared Add TryFormat span-based methods to most primitive types Signed-off-by: dotnet-bot <[email protected]>
…rshared Add TryFormat span-based methods to most primitive types Signed-off-by: dotnet-bot <[email protected]>
…rshared Add TryFormat span-based methods to most primitive types Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
…rshared Add TryFormat span-based methods to most primitive types Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
…rshared Add TryFormat span-based methods to most primitive types Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
…rshared Add TryFormat span-based methods to most primitive types Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
The end result of this PR is a TryFormat method being added to:
However, most of the commits lead up to that by porting code from corert to the shared partition. corert's formatting implementation is a managed port of the native formatting implementation in coreclr. This then moves coreclr back to using the same managed implementation. I've split this into a bunch of commits to help make it easier to review. The primary goals here are:
The first commit brings the core NumberBuffer type to the shared partition. Until we've entirely removed formatting from native code, we still need NumberBuffer to play well with the NUMBER type that's used for the remaining decimal, double, and float formatting in the runtime.
The next commit moves the integer parsing logic into shared. This parsing logic was already in managed code in coreclr, with almost identical code in corert. This rationalizes the two and moves that to shared.
The next commit rationalizes a difference between NUMBER in coreclr and NumberBuffer in corert: coreclr had a buffer for 50 chars whereas corert for 32. There doesn't appear to be any necessary reason to have the larger size, so this consolidates to the 32.
The next commit separates out the decimal and floating-point parsing logic and moves it to shared; the formatting logic for decimal, double, single remain in native for now.
Then a bunch of commits add in optimizations that were causing the managed implementation to be significantly slower than the native. Many of these involve reviewing the native code and making the managed implementation look more like it, e.g. using pointers more to avoid bounds checks and the like.
Additionally, several commits delete now dead code, such as the majority of the formatting implementation from the runtime.
Finally, two commits add additional features. One adds the new TryFormat methods that build on this managed formatting, and another that uses TryFormat in StringBuilder to avoid string allocations in methods like StringBuilder.Append(int).
I ran a bunch of Benchmark.NET tests locally. Though there's a lot of fluctation on my machine, my takeaways are:
There's likely room for further improvement, and it should hopefully be easier now that it's managed and shared with corert. We can also look at reducing the amount of unsafe code involved, switching over to using spans as we determine it doesn't regress perf impactfully.
I also did not do a lot to clean up the formatting of the code from corert. That can be done subsequently.
Note, too, that this port enables TryFormat, which is allocation-free, compared to ToString, which needs to of course allocate the resulting string.
cc: @jkotas, @vancem, @atsushikan, @KrzysztofCwalina, @danmosemsft
Contributes to https://github.com/dotnet/corefx/issues/22403
Contributes to https://github.com/dotnet/coreclr/issues/13544
Contributes to https://github.com/dotnet/corefx/issues/22616