Skip to content

Cache FS in LS #996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 5, 2025
Merged

Cache FS in LS #996

merged 14 commits into from
Jun 5, 2025

Conversation

jakebailey
Copy link
Member

I think this is right; within a Project, cache the FS accesses and clear them out on graph update for safety. Then, things get faster in load time in module resolution and so on.

Not a mega fan of the whole host wrapping but I don't see a better way.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a caching layer for file system accesses in ProjectHost to improve module resolution performance and clears the cache whenever the project graph is updated.

  • Wraps the host’s FS() in a cachedvfs.FS via a new projectHostWithCachedFS
  • Updates NewProject to use the cached host and clears the cache in updateGraph
  • Leaves NewInferredProject unchanged (uses uncached host)
Comments suppressed due to low confidence (3)

internal/project/project.go:142

  • NewInferredProject still uses the original host without wrapping in cachedvfs. This leads to inconsistent caching behavior between inferred and regular projects. Consider wrapping the host here as well, e.g., host = newProjectHostWithCachedFS(host).
func NewInferredProject(compilerOptions *core.CompilerOptions, currentDirectory

internal/project/project.go:169

  • [nitpick] The type name projectHostWithCachedFS is lengthy and mixed-case. Consider renaming to cachedProjectHost for improved readability.
type projectHostWithCachedFS struct {

internal/project/project.go:174

  • There's no existing test coverage for the caching wrapper. Add unit tests that verify: 1) FS methods are delegated correctly, and 2) ClearCache() actually resets the cache.
func newProjectHostWithCachedFS(host ProjectHost) *projectHostWithCachedFS {

@sheetalkamat
Copy link
Member

We did this in Strada too (though we did it before creating program in service) this is known to help perf.

Copy link
Member

@andrewbranch andrewbranch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was on my to-do list.

@jakebailey
Copy link
Member Author

Unfortunately this method doesn't really work, the project test system replaces the FS out from under the cache, but the cache FS grabs the FS on creation.

I've been able to change the tests to not do that, but I'm not clearing the cache in places that seemingly we need it. So this will need more work.

@jakebailey jakebailey marked this pull request as draft May 30, 2025 21:03
@jakebailey
Copy link
Member Author

Oh oops, sorry, this was supposed to be a draft when I sent it 😄

@jakebailey
Copy link
Member Author

CI fails because the new ATA code runs before updateGraph and therefore reads from the cached FS before it's cleared.

@jakebailey jakebailey marked this pull request as ready for review June 2, 2025 21:30
@@ -462,6 +486,7 @@ func (p *Project) updateGraph() bool {

start := time.Now()
p.Log("Starting updateGraph: Project: " + p.name)
p.host.fs.ClearCache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the cache only for updateGraph duration as otherwise we dont know if its correct or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably saw this with typings installer tests but in practice this could be watch events as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where you're telling me to put the call; I don't know where this would go otherwise.

Are you saying to clear the cache at the end of updateGraph?

I can do it that way, but I don't think that would affect ATA, since that code can happen any time concurrently (part of the reason why I am not giving it a cached FS, and plan to decouple it from the Project in a followup to fix the races)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant to keep the cache only for duration of update graph " so uncached before and after that call. (eg of how we did this in strada - this ensures the correctness if for some reason we need to check with FS on file existence etc https://github.com/microsoft/TypeScript/blob/main/src/services/services.ts#L1782 where cache lives only for program creation and then goes away

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, defer this call so it runs when updateGraph returns?

Copy link
Member

@sheetalkamat sheetalkamat Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start caching at start of updateGraph and end cacheing when returning from it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL

@@ -460,6 +484,8 @@ func (p *Project) updateGraph() bool {
return false
}

defer p.host.fs.ClearCache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this still keeps cache outside of upate graph right. You probably want cache to be present only during updateGraph and use uncached version outside of it, that is it should not cache not consult cache when outside the updateGraph call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you want me to pass a custom host into the Program?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble is that the entire project is "a host", so it's pretty tough to do that, unless I make the cached FS able to be disabled entirely on some sort of toggle, which seems odd...

(This is why I hate every object turning into a host, but here we are...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what I should do is create the cache in Program, which is how it's done for the other code (NewCachedFSCompilerHost).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if you create cache in program you wont be able to use it for your globs etc for watching.. you would want cache host to be "enabled" only for duration of the update graph. we did this eg in tsbuild in strada where we would cache only when build was in progress

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a shot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I did it, but it's not pretty 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to set and unset cacheHost inself ? so that its not checked all over the place?

also i think ata needs to used uncached version all the time? since it could be doing things in background while project is updating

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really do not want to swap the hosts out from under anything. That was done once before in testing and it was a pain because many things call FS(), etc, storing off the value, and so are not updated, which then has to be chased down. Ensuring everything is always static is very simplifying.

I'm less concerned about the extra check just because it's only a single atomic access most of the time.

also i think ata needs to used uncached version all the time? since it could be doing things in background while project is updating

I thought about this, but if both are happening concurrently, there's no harm in both using the same cached info, since updateGraph will clear it on exit anyhow.

@jakebailey
Copy link
Member Author

Another ATA test race 😦

@jakebailey jakebailey requested a review from sheetalkamat June 5, 2025 01:01
@jakebailey jakebailey enabled auto-merge June 5, 2025 19:29
@jakebailey jakebailey added this pull request to the merge queue Jun 5, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2025
@jakebailey jakebailey added this pull request to the merge queue Jun 5, 2025
Merged via the queue into main with commit 9bc5e3f Jun 5, 2025
23 checks passed
@jakebailey jakebailey deleted the jabaile/lsp-fs-cache branch June 5, 2025 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants