-
Notifications
You must be signed in to change notification settings - Fork 642
Cache FS in LS #996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache FS in LS #996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a caching layer for file system accesses in ProjectHost
to improve module resolution performance and clears the cache whenever the project graph is updated.
- Wraps the host’s
FS()
in acachedvfs.FS
via a newprojectHostWithCachedFS
- Updates
NewProject
to use the cached host and clears the cache inupdateGraph
- Leaves
NewInferredProject
unchanged (uses uncached host)
Comments suppressed due to low confidence (3)
internal/project/project.go:142
NewInferredProject
still uses the original host without wrapping incachedvfs
. This leads to inconsistent caching behavior between inferred and regular projects. Consider wrapping the host here as well, e.g.,host = newProjectHostWithCachedFS(host)
.
func NewInferredProject(compilerOptions *core.CompilerOptions, currentDirectory
internal/project/project.go:169
- [nitpick] The type name
projectHostWithCachedFS
is lengthy and mixed-case. Consider renaming tocachedProjectHost
for improved readability.
type projectHostWithCachedFS struct {
internal/project/project.go:174
- There's no existing test coverage for the caching wrapper. Add unit tests that verify: 1) FS methods are delegated correctly, and 2)
ClearCache()
actually resets the cache.
func newProjectHostWithCachedFS(host ProjectHost) *projectHostWithCachedFS {
We did this in Strada too (though we did it before creating program in service) this is known to help perf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this was on my to-do list.
Unfortunately this method doesn't really work, the project test system replaces the FS out from under the cache, but the cache FS grabs the FS on creation. I've been able to change the tests to not do that, but I'm not clearing the cache in places that seemingly we need it. So this will need more work. |
Oh oops, sorry, this was supposed to be a draft when I sent it 😄 |
CI fails because the new ATA code runs before updateGraph and therefore reads from the cached FS before it's cleared. |
internal/project/project.go
Outdated
@@ -462,6 +486,7 @@ func (p *Project) updateGraph() bool { | |||
|
|||
start := time.Now() | |||
p.Log("Starting updateGraph: Project: " + p.name) | |||
p.host.fs.ClearCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the cache only for updateGraph duration as otherwise we dont know if its correct or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably saw this with typings installer tests but in practice this could be watch events as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where you're telling me to put the call; I don't know where this would go otherwise.
Are you saying to clear the cache at the end of updateGraph?
I can do it that way, but I don't think that would affect ATA, since that code can happen any time concurrently (part of the reason why I am not giving it a cached FS, and plan to decouple it from the Project in a followup to fix the races)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i meant to keep the cache only for duration of update graph " so uncached before and after that call. (eg of how we did this in strada - this ensures the correctness if for some reason we need to check with FS on file existence etc https://github.com/microsoft/TypeScript/blob/main/src/services/services.ts#L1782 where cache lives only for program creation and then goes away
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, defer this call so it runs when updateGraph returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start caching at start of updateGraph and end cacheing when returning from it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, PTAL
internal/project/project.go
Outdated
@@ -460,6 +484,8 @@ func (p *Project) updateGraph() bool { | |||
return false | |||
} | |||
|
|||
defer p.host.fs.ClearCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this still keeps cache outside of upate graph right. You probably want cache to be present only during updateGraph and use uncached version outside of it, that is it should not cache not consult cache when outside the updateGraph call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you want me to pass a custom host into the Program?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble is that the entire project is "a host", so it's pretty tough to do that, unless I make the cached FS able to be disabled entirely on some sort of toggle, which seems odd...
(This is why I hate every object turning into a host, but here we are...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe what I should do is create the cache in Program, which is how it's done for the other code (NewCachedFSCompilerHost
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think if you create cache in program you wont be able to use it for your globs etc for watching.. you would want cache host to be "enabled" only for duration of the update graph. we did this eg in tsbuild in strada where we would cache only when build was in progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I did it, but it's not pretty 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to set and unset cacheHost inself ? so that its not checked all over the place?
also i think ata needs to used uncached version all the time? since it could be doing things in background while project is updating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really do not want to swap the hosts out from under anything. That was done once before in testing and it was a pain because many things call FS()
, etc, storing off the value, and so are not updated, which then has to be chased down. Ensuring everything is always static is very simplifying.
I'm less concerned about the extra check just because it's only a single atomic access most of the time.
also i think ata needs to used uncached version all the time? since it could be doing things in background while project is updating
I thought about this, but if both are happening concurrently, there's no harm in both using the same cached info, since updateGraph will clear it on exit anyhow.
Another ATA test race 😦 |
I think this is right; within a Project, cache the FS accesses and clear them out on graph update for safety. Then, things get faster in load time in module resolution and so on.
Not a mega fan of the whole host wrapping but I don't see a better way.