Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Add TryFormat span-based methods to most primitive types #15069

Merged
merged 6 commits into from
Nov 19, 2017

Conversation

stephentoub
Copy link
Member

The end result of this PR is a TryFormat method being added to:

  • Boolean
  • SByte
  • Byte
  • Int16
  • Int32
  • Int64
  • UInt16
  • UInt32
  • UInt64

However, most of the commits lead up to that by porting code from corert to the shared partition. corert's formatting implementation is a managed port of the native formatting implementation in coreclr. This then moves coreclr back to using the same managed implementation. I've split this into a bunch of commits to help make it easier to review. The primary goals here are:

  • Exposing TryFormat on our primitive types
  • Consolidating on a single managed parsing/formatting implementation shared across coreclr/corert
  • Not regressing performance

The first commit brings the core NumberBuffer type to the shared partition. Until we've entirely removed formatting from native code, we still need NumberBuffer to play well with the NUMBER type that's used for the remaining decimal, double, and float formatting in the runtime.

The next commit moves the integer parsing logic into shared. This parsing logic was already in managed code in coreclr, with almost identical code in corert. This rationalizes the two and moves that to shared.

The next commit rationalizes a difference between NUMBER in coreclr and NumberBuffer in corert: coreclr had a buffer for 50 chars whereas corert for 32. There doesn't appear to be any necessary reason to have the larger size, so this consolidates to the 32.

The next commit separates out the decimal and floating-point parsing logic and moves it to shared; the formatting logic for decimal, double, single remain in native for now.

Then a bunch of commits add in optimizations that were causing the managed implementation to be significantly slower than the native. Many of these involve reviewing the native code and making the managed implementation look more like it, e.g. using pointers more to avoid bounds checks and the like.

Additionally, several commits delete now dead code, such as the majority of the formatting implementation from the runtime.

Finally, two commits add additional features. One adds the new TryFormat methods that build on this managed formatting, and another that uses TryFormat in StringBuilder to avoid string allocations in methods like StringBuilder.Append(int).

I ran a bunch of Benchmark.NET tests locally. Though there's a lot of fluctation on my machine, my takeaways are:

  • There are significant improvements in the new implementation for "G" (the default) and "D", upwards of 10-20%.
  • "X" appears to be 5-10% faster in the managed implementation.
  • "E" and "F" are 5-10% slower in the managed implementation.
  • The rest appear to be within noise in one direction or the other.

There's likely room for further improvement, and it should hopefully be easier now that it's managed and shared with corert. We can also look at reducing the amount of unsafe code involved, switching over to using spans as we determine it doesn't regress perf impactfully.

I also did not do a lot to clean up the formatting of the code from corert. That can be done subsequently.

Note, too, that this port enables TryFormat, which is allocation-free, compared to ToString, which needs to of course allocate the resulting string.

cc: @jkotas, @vancem, @atsushikan, @KrzysztofCwalina, @danmosemsft
Contributes to https://github.com/dotnet/corefx/issues/22403
Contributes to https://github.com/dotnet/coreclr/issues/13544
Contributes to https://github.com/dotnet/corefx/issues/22616

public StringBuilder Append(bool value) => Append(value.ToString());
public StringBuilder Append(bool value)
{
if (value.TryFormat(RemainingCurrentChunk, out int charsWritten))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool.ToString doesn't allocate. Is there a benefit to using TryFormat here vs. the previous simpler implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I'll revert that. I was in copy-and-paste mode :)

@@ -196,6 +196,11 @@ public String ToString(String format, IFormatProvider provider)
return Number.FormatInt32(m_value, format, NumberFormatInfo.GetInstance(provider));
}

public bool TryFormat(Span<char> destination, out int charsWritten, string format = null, IFormatProvider provider = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In UTF8 APIs, we use StandardFormat instead of format string to avoid having to parse the string. Would we want to use StandardFormat here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that support custom formats? I'd have to look at the profiling data, but for these standard formats, IIRC passing the format string was maybe only 2% of the overall cost of the operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right. Custom formats is a problem. I think string is ok for now. We can always add optimized overloads if we need them.

Though (FYI), It's probably 2% for the average scenario. But there are scenarios, e.g. formatting sections of a GUID, where you repeatedly have to format with D2 and D4. And the formatted numbers are quite small. When I implement it some time ago using strings, the string parsing was significantly more than 2%.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should string format = null be ReadOnlySpan<char> format = default?

I've been looking at making use of these new TryFormat methods to avoid allocations inside the implementation of StringBuilder.AppendFormat (used by string.Format/interpolated strings).

Right now, if any of the format items has a specified format, it has to allocate substrings.

itemFormat = format.Substring(startPos, pos - startPos);

It'd be nice if those could be non-allocating slices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there are scenarios, e.g. formatting sections of a GUID, where you repeatedly have to format with D2 and D4. And the formatted numbers are quite small. When I implement it some time ago using strings, the string parsing was significantly more than 2%.

It is more, though also far from dominating. I took your GUID formatting example:

using System;
using System.Diagnostics;
using System.Globalization;

class Program
{
    static void Main()
    {
        uint a = 0x159491E6;
        ushort b = 0x1BC1;
        ushort c = 0xB0D9;
        ulong d = 0x1D477C38F186;

        var sw = new Stopwatch();
        var arr = new char[100];
        int charsWritten;

        var provider = NumberFormatInfo.GetInstance(CultureInfo.CurrentCulture);
        for (int i = 0; i < 50_000_000; i++)
        {
            Span<char> dst = arr;

            a.TryFormat(dst, out charsWritten, "X8", provider);
            dst = dst.Slice(charsWritten);

            b.TryFormat(dst, out charsWritten, "X4", provider);
            dst = dst.Slice(charsWritten);

            c.TryFormat(dst, out charsWritten, "X4", provider);
            dst = dst.Slice(charsWritten);

            d.TryFormat(dst, out charsWritten, "X12", provider);
        }
    }
}

Here's a profile:
image

At least in that example:

  • ParseFormatSpecifier is ~5.3%. That's not nothing, but it's certainly not the majority contributor. We might also be able to tweak it a bit to eek out some more throughput.
  • ReadOnlySpan.TryCopyTo is ~9.9%, but the Buffer.Memmove it's using for the actual copy is ~3.9%, so there's 6% in there that going to something other than the actual copy (the trace shows ~2.6% in the ReadOnlySpan.TryCopyTo body and ~3.4% in the Span.CopyTo body that TryCopyTo calls). If we're looking to optimize for such percentages, I'd prefer to see us start by looking at making TryCopyTo faster, as that'll accrue to many other scenarios. If we then also want to look at overloads of TryFormat that take a StandardFormat, we certainly can. As you say, we can always add them later.

To be clear, I'm not against adding StandardFormat to the core and adding overrides that use it; I just don't think that takes the place of the string-based format, and I'm not sure it's currently the most pressing matter.

Of course, it's also possible/likely that there are ways to further optimize methods like TryInt32ToHexStr, at which point with that getting faster, the costs associated with the parsing will increase relative to it.

Copy link
Member Author

@stephentoub stephentoub Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would I want to pass span of chars as a format?

See @justinvp's scenario above. #15069 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would StandardFormat overloads address this as well?

Copy link
Member Author

@stephentoub stephentoub Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would StandardFormat overloads address this as well?

For standard formats 😄 AppendFormat supports custom formats, too, though, e.g.

var sb = new StringBuilder();
sb.AppendFormat("{0:[##-##-##]}", 123456);
Console.WriteLine(sb.ToString());

outputs:

[12-34-56]

We could say we don't care about optimizing for custom formats. In that case, if we had a StandardFormat.TryParse method, and overloads that took StandardFormat, callers could try parsing as a standard format, using that overload if it was standard, and otherwise fall back to using Substring and the string-based overload.

Copy link

@justinvp justinvp Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KrzysztofCwalina, here's a concrete example:

DateTime now = DateTime.UtcNow;
Guid id = Guid.NewGuid();
decimal value = 3.50m;

string log = string.Format("{0:s}: Event {1:B} occurred: The value is: {2:C2}", now, id, value);

string.Format's implementation (StringBuilder.AppendFormat) has to extract the formats for each format item, e.g. "s", "B", and "C2". It'd be nice if we didn't have to allocate strings for these.

Of course, if this is the only place where this comes up, we could get away with internal support for ReadOnlySpan<char>, as @stephentoub mentions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened https://github.com/dotnet/corefx/issues/25337 to track this. We don't need to hold up this PR for it.

internal static partial class Number
{
// Constants used by number parsing
private const Int32 NumberMaxDigits = 32; // needs to == NUMBER_MAXDIGITS in coreclr's src/classlibnative/bcltype/number.h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "NumberMaxDigits" in CoreRT was actually 31 (with the buffer size being 32 to allow for the NUL terminator.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's seems that it's actually 32 in CoreRT but CoreRT is missing the extra space for the NUL terminator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I have it as 32 digits + space for the null terminator. Should I make it 31+1 instead? Or fine as is?

Copy link

@ghost ghost Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have to make it 50+1 after all if we want to be fully compatible: the 50 seems to have come from the fact that the NumberToDecimal looks ahead 20 digits to decide whether to round up or down when you have a string ending with an even number followed by a '5':

int count = 20; // Look at the next 20 digits to check to round

So 29 digits for Decimal max precision + 20 digits lookahead + 1 to sleep better at night = 50.

Consider this example:

StringBuilder sb = new StringBuilder();
sb.Append('.');
sb.Append('2', 28);
sb.Append('5');
sb.Append('0', 10);
sb.Append('1');
String s = sb.ToString();

Decimal d = Decimal.Parse(s);
Console.WriteLine(d);

prints:

0.2222222222222222222222222223

on desktop and

0.2222222222222222222222222222

on ProjectN

Personally, I'd be fine with either 50+1 or 32+1 - the "20 digits lookahead" is a pretty arbitrary policy. But we should make a decision and make it match (Utf8Parser as well.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, to minimize impact on coreclr here I'll stick with 50+1. We can subsequently change to 32+1 if desired.

private const Int32 NumberMaxDigits = 32; // needs to == NUMBER_MAXDIGITS in coreclr's src/classlibnative/bcltype/number.h

[StructLayout(LayoutKind.Sequential)]
internal unsafe struct NumberBuffer // needs to match layout of NUMBER in coreclr's src/classlibnative/bcltype/number.h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare as "ref struct" so C# can now actually enforce the "stack only" nature of this struct?

Copy link
Member Author

@stephentoub stephentoub Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't because then I couldn't do Unsafe.AsPointer on ref this (since ref structs can't be used as generic arguments), but I can expand the fixed buffer into individual fields, similar to how you did in corefx.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real problem is that C# has unnecessary limitations on what you can do with fixed buffers. You would want to write this:

public char* digits => (char*)Unsafe.AsPointer(ref _digits[0]);

but it won't compile for no good reason.

If you want to save yourself from having 50 padding fields, you can workaround it by doing this:

ref struct NumberBuffer
{
   [StructLayout(LayoutKind.Sequential, Size = 50*sizeof(char))]
   struct Digits { }

   private Digits _digits;

   public char* digits => (char*)Unsafe.AsPointer(ref _digits);
}

It will generate pretty much the same IL and metadata as what you get for the fixed buffer, except that it will not trigger the unnecessary limitations.

cc @VSadov What it would take to fix the unnecessary limitations on what you can do with fixed buffers?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I think I tried the nested ExplicitLayout trick in corefx too but it also had the same bad debugging experience - VS somehow wants to see actual named fields or it won't give you an accurate view of the data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to save yourself from having 50 padding fields, you can workaround it by doing this

I'll do that, thanks.

Copy link
Member Author

@stephentoub stephentoub Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The current code in coreclr does this:

public static readonly Int32 NumberBufferBytes = 12 + ((NumberMaxDigits + 1) * 2) + IntPtr.Size;

That's 118 on 32-bit and 122 on 64-bit. But that doesn't seem to account for any padding. The coreclr code has:
struct NUMBER {
int precision;
int scale;
int sign;
wchar_t digits[NUMBER_MAXDIGITS + 1];
wchar_t* allDigits;
NUMBER() : precision(0), scale(0), sign(0), allDigits(NULL) {}
};

Are we using pack==1 in native? Otherwise, it would seem the native size would actually be 120 on 32-bit and 128 on 64-bit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using pack==1 in native?

I think that's what the "include <pshpack1.h>" is supposed to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what the "include <pshpack1.h>" is supposed to do.

Ok, good, I'll do the same for the managed struct.

private const Int32 NumberMaxDigits = 32; // needs to == NUMBER_MAXDIGITS in coreclr's src/classlibnative/bcltype/number.h

[StructLayout(LayoutKind.Sequential)]
internal unsafe struct NumberBuffer // needs to match layout of NUMBER in coreclr's src/classlibnative/bcltype/number.h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to match layout of NUMBER

It's not quite matching right now - the C++ version has an "alldigits" field - only used for BigInt but if you're passing these down via FCalls, it's playing with fire not to have the space here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks for the catch. Will fix.

@VSadov
Copy link
Member

VSadov commented Nov 17, 2017

RE: #15069 (comment)

BTW, the inability to do public char* digits => (char*)Unsafe.AsPointer(ref _digits[0]); is a known problem.

Currently a fixed buffer is treated as an opaque blob of data that can be pointed at in unsafe context.
There is indeed no indexer to be used in safe code and that would be nice to have.

Your concerns will be addressed when https://github.com/dotnet/csharplang/blob/master/proposals/fixed-sized-buffers.md gets more traction.
You will be able to do what you want and possibly will not be forced to use pointers/unsafe at all.
Hopefully soon - in 7.3 or so.

CC: @jaredpar @tannergooding


namespace System.Text
{
internal ref struct ValueStringBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable structs ftw 😉

@jkotas
Copy link
Member

jkotas commented Nov 18, 2017

@dotnet-bot test Windows_NT x64 corefx_baseline
@dotnet-bot test Ubuntu x64 corefx_baseline
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test
@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test

@stephentoub
Copy link
Member Author

@jkotas, thanks for getting CI straightened out. Any feedback or concerns with the PR?

@jkotas
Copy link
Member

jkotas commented Nov 18, 2017

LGTM

@jkotas
Copy link
Member

jkotas commented Nov 18, 2017

Would you prefer to merge this as merge or as squash and merge?

@stephentoub
Copy link
Member Author

Would you prefer to merge this as merge or as squash and merge?

There's a lot of commits here, but I'd like to keep the original code separate from the changes. Let me squash some of the commits locally and I'll push up a smaller set that can then be merged.

Moves existing managed parsing code to shared, to be shared with corert.
Takes the managed formatting port from corert and moves that to shared
as well.
- Optimize NumberBuffer passing by reference instead of value. It's a large struct of ~50 bytes; copying it around has non-trivial cost.
- Replace formatting StringBuilder with ref struct and stack allocation. Avoids lots of allocation and associated throughput costs.
- Improve perf of 'D' formatting of 32-bit and 64-bit integers.
- Remove array allocations accessing NumberFormatInfo props.
- Accessing array properties like PercentGroupSizes clones the corresponding field.  That's unnecessary here, as we don't mutate the array.
- Remove int[] allocation from NumberToStringFormat. Span makes it easy to start with stack space and grow to an allocated array as needed.
- Improve perf of hex formatting of integers. Including removing some sizable allocations.
- Manually inline several hot functions called in only one place.
- Tweak some range comparisons in ParseFormatSpecifier.
- Avoid large stackallocs in NumberToString{Fixed}. It's incurring non-trivial overheads.
- Tweak perf of ValueStringBuilder. In particular, make Append(string) faster for the single-char case, which is extremely common in integer formatting due to its prevalence in strings in NumberFormatInfo.
- Remove dead "bigNumber" code.
- Remove custom wcslen function. Use String's.
- Delete dead fcalls from runtime. FormatInt32, FormatUInt32, FormatInt64, and FormatUInt64 are no longer needed.  Delete them and many of the helpers used only by them.
Match the original native code's use of pointers.
Avoid string allocations when appending primitives to StringBuilder.  We try to format into the existing array space, and only fall back to using ToString when there isn't enough room and we're going to grow the builder anyway.
@jkotas jkotas merged commit 7e4c7de into dotnet:master Nov 19, 2017
dotnet-bot pushed a commit to dotnet/corert that referenced this pull request Nov 19, 2017
…rshared

Add TryFormat span-based methods to most primitive types

Signed-off-by: dotnet-bot <[email protected]>
jkotas added a commit to dotnet/corert that referenced this pull request Nov 19, 2017
…rshared

Add TryFormat span-based methods to most primitive types

Signed-off-by: dotnet-bot <[email protected]>
@stephentoub stephentoub deleted the coreclrnumbershared branch December 5, 2017 04:00
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jan 13, 2018
…rshared

Add TryFormat span-based methods to most primitive types

Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
dotnet-bot pushed a commit to dotnet/corefx that referenced this pull request Jan 13, 2018
…rshared

Add TryFormat span-based methods to most primitive types

Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
safern pushed a commit to dotnet/corefx that referenced this pull request Jan 16, 2018
…rshared

Add TryFormat span-based methods to most primitive types

Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
safern pushed a commit to dotnet/corefx that referenced this pull request Jan 16, 2018
…rshared

Add TryFormat span-based methods to most primitive types

Signed-off-by: dotnet-bot-corefx-mirror <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants