Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Special-case T==char in string.Concat(IEnumerable<T>) #14298

Merged
merged 1 commit into from
Oct 4, 2017

Conversation

stephentoub
Copy link
Member

This allows string.Concat to be used as an efficient mechanism for creating a string from an IEnumerable<char>. The JIT specializes the implementation for char vs non-char, so there's minimal impact on the non-char case, and for the char case, we a) avoid creating a string for each individual char, and b) use StringBuilder's fast path for adding individual chars. This can result in a massive allocation savings for long enumerations (for <= 1 char, there's no difference, but each character after that is an allocation saved), and for more than a few characters can yield up to a 2x increase in throughput.

Closes https://github.com/dotnet/corefx/issues/24395
cc: @joperezr, @AlexGhiondea, @jkotas

This allows string.Concat to be used as an efficient mechanism for creating a `string` from an `IEnumerable<char>`.  The JIT specializes the implementation for char vs non-char, so there's minimal impact on the non-char case, and for the char case, we a) avoid creating a string for each individual char, and b) use StringBuilder's fast path for adding individual chars.  This can result in a massive allocation savings for long enumerations (for <= 1 char, there's no difference, but each character after that is an allocation saved), and for more than a few characters can yield up to a 2x increase in throughput.
// Special-case T==char, as we can handle that case much more efficiently,
// and string.Concat(IEnumerable<char>) can be used as an efficient
// enumerable-based equivalent of new string(char[]).
using (IEnumerator<char> en = Unsafe.As<IEnumerable<char>>(values).GetEnumerator())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS, I previously had this as:

IEnumerable<char> charsValues = (IEnumerable<char>)values;

but that was causing the JIT to emit CORINFO_HELP_CHKCASTANY, which added a measurable overhead for small enumerables; switching to use Unsafe.As fixed the regression. Is it expected that the JIT wasn't able to remove what was effectively a IEnumerable<char> to IEnumerable<char> cast?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is code in the jit's assertion prop for cast optimization. Seems like it ought to be firing in a case like this. I'll take a closer look.

My intention is to also add support for optimizing casts into the jit's front-end (similar to what I have been doing recently for type equality checks). The earlier we can prune away code in these type-specializing kinds of methods, the better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @AndyAyersMS. Here's a standalone repro:

using System;
using System.Collections.Generic;

class Program
{
    static void Main() => MyMethod(new char[0]);

    static void MyMethod<T>(IEnumerable<T> chars)
    {
        if (typeof(T) == typeof(char))
        {
            foreach (char c in (IEnumerable<char>)chars) { }
        }
    }
}

The beginning of the Jit_Disasm for MyMethod is:

G_M9149_IG01:
       55                   push     rbp
       4883EC30             sub      rsp, 48
       488D6C2430           lea      rbp, [rsp+30H]
       488965F0             mov      qword ptr [rbp-10H], rsp

G_M9149_IG02:
       488BD1               mov      rdx, rcx
       48B938CED75BF97F0000 mov      rcx, 0x7FF95BD7CE38
       E890EF395F           call     CORINFO_HELP_CHKCASTANY
       488BC8               mov      rcx, rax
       49BB20009909F97F0000 mov      r11, 0x7FF909990020
       3909                 cmp      dword ptr [rcx], ecx
       41FF13               call     gword ptr [r11]System.Collections.Generic.IEnumerable`1[Char][System.Char]:GetEnumerrator():ref:this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion prop doesn't fire because we don't seed the assertion table with known initial facts. In particular we arguably should assert that the types of arguments and locals are at least their declared types.

However, since the assertion table space is a scarce resource, we probably can't afford to start adding these initially known facts by default. In particular there can be a lot of locals, and at times the information is not useful -- for instance knowing that a ref type is at least object doesn't add any value.

Opened #14308 to record these shortcomings of assertion prop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@stephentoub
Copy link
Member Author

@dotnet-bot test OSX10.12 x64 Checked Build and Test please ("Java / Jenkins hit a remoting error that caused the build to fail.")

Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvement looks good! thanks Steve

@stephentoub stephentoub merged commit e67311d into dotnet:master Oct 4, 2017
@stephentoub stephentoub deleted the stringconcat_char branch October 4, 2017 02:14
@weitzhandler
Copy link

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants