NewFutureOfWork Report2023
NewFutureOfWork Report2023
Future of Work
Report 2023
A summary of recent research from
Microsoft and around the world that
can help us create a new and better
future of work with AI.
Microsoft New Future of Work Report aka.ms/nfw
Editors and Authors
• Editors: Jenna Butler (Principal Applied Research Scientist), Sonia Jaffe (Principal Researcher), Nancy Baym (Senior Principal
Research Manager), Mary Czerwinski (Partner Research Manager), Shamsi Iqbal (Principal Applied & Data Scientist), Kate
Nowak (Principal Applied Scientist), Sean Rintel (Senior Principal Researcher), Abigail Sellen (VP Distinguished Scientist), Mihaela
Vorvoreanu (Director Aether UX Research & EDU), Brent Hecht (Partner Director of Applied Science), and Jaime Teevan (Chief
Scientist and Technical Fellow)
• Authors: Najeeb Abdulhamid, Judith Amores, Reid Andersen, Kagonya Awori, Maxamed Axmed, danah boyd, James Brand, Georg
Buscher, Dean Carignan, Martin Chan, Adam Coleman, Scott Counts, Madeleine Daepp, Adam Fourney, Dan Goldstein, Andy
Gordon, Aaron Halfaker, Javier Hernandez, Jake Hofman, Jenny Lay-Flurrie, Vera Liao, Siân Lindley, Sathish Manivannan, Charlton
Mcilwain, Subigya Nepal, Jennifer Neville, Stephanie Nyairo, Jacki O'Neill, Victor Poznanski, Gonzalo Ramos, Nagu Rangan, Lacey
Rosedale, David Rothschild, Tara Safavi, Advait Sarkar, Ava Scott, Chirag Shah, Neha Shah, Teny Shapiro, Ryland Shaw, Auste
Simkute, Jina Suh, Siddharth Suri, Ioana Tanase, Lev Tankelevitch, Mengting Wan, Ryen White, Longqi Yang
2
Microsoft New Future of Work Report aka.ms/nfw
3
Microsoft New Future of Work Report aka.ms/nfw
This report emerges from Microsoft’s New Future of Work initiative
Microsoft has helped shape information work since its founding.
However, a confluence of recent circumstances – remote work,
hybrid work, LLMs – have created an unprecedented opportunity
for the company to reimagine how AI and other digital
technologies can make work better for everyone.
Since its inception, the New Future of Work (NFW) initiative has
brought together researchers from a broad range of
organizations and disciplines across Microsoft to focus on the
most important technologies shaping how people work. The
initiative is working to create the new future of work – one that is
equitable, inclusive, meaningful, and productive – instead of
predicting or waiting for it. It does this by conducting primary
research and synthesizing existing research to share with the
research community. This report is one of the many public
resources it has produced.
The reader can find the New Future of Work initiative’s many
other research papers, practical guides, reports and whitepapers
at the initiative’s website: https://aka.ms/nfw.
https://aka.ms/nfw
4
Microsoft New Future of Work Report aka.ms/nfw
Report overview
This report provides insight into AI and work practices. In it you will find content related to:
• LLMs for Information Work: How do LLMs affect the speed and quality of common information work tasks? LLMs can boost
productivity for information workers, but they also require careful evaluation and adaptation.
• LLMs for Critical Thinking: How can LLMs help us break down and build up complex tasks? LLMs can help us tackle complex
tasks by provoking critical thinking, enabling microproductivity, and shifting the balance of skills.
• Human-AI Collaboration: How can we collaborate effectively with LLMs? Effective collaboration with LLMs depends on how
we prompt, complement, rely on, and audit them.
• LLMs for Complex and Creative Tasks: How can LLMs tackle tasks that go beyond simple information retrieval or
generation? LLMs can support complex and creative tasks by, for instance, enhancing metacognition.
• Domain-Specific Applications of LLMs: How are LLMs being used and affecting different domains of work? We focus
specifics on software engineering, medicine, social science, and education.
• LLMs for Team Collaboration and Communication: How can LLMs help teams work and communicate better? LLMs can
help teams improve interaction, coordination, and workflows by providing real-time, retrospective feedback and leveraging
holistic frameworks.
• Knowledge Management and Organizational Changes: How is AI changing the nature and distribution of knowledge in
organizations? LLMs might, for instance, finally eliminate knowledge silos in large companies.
• Implications for Future Work and Society: What implications will AI have for the future of work and society? We can shape
AI’s impact by addressing adoption disparities, fostering innovation, leading like scientists, and remembering that the future of
work is in our control.
Microsoft New Future of Work Report aka.ms/nfw
Dell’Acqua, F., et al. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. SSRN Working Paper 4573321.
Noy, S., & Zhang, W. (2023). Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. SSRN preprint.
Microsoft Study: Spatharioti, S. E., et al. (2023). Comparing Traditional and LLM-based Search for Consumer Choice: A Randomized Experiment. arXiv preprint. 6
Microsoft New Future of Work Report aka.ms/nfw
Copilot for M365 saves time for a variety of tasks in lab studies and surveys
Users also report Copilot reduces the effort required. Effects on quality are mostly neutral
Microsoft’s AI and Productivity Report synthesizes results from 8 early studies, most focused on the use of M365 Copilot
for information worker tasks for which LLMs are most likely to provide significant value (Cambon et al., 2023).
• Tasks included meeting summarization, information retrieval, and content creation
• Study participants with Copilot completed experimenter-designed tasks in 26-73% as much time as those without
Copilot
• A survey of enterprise users with access to Copilot also showed substantial perceived time savings
• 73% agreed that Copilot helped them complete tasks faster, and 85% said it would help them get to a good first
draft faster.
• Many studies found no statistically significant or meaningful effect on quality
• However, in the meeting summarization study where Copilot users took much less time, their summaries
included 11.1 out of 15 specific pieces of information in the assessment rubric versus the 12.4 of 15 for users
who did not have access to Copilot.
• In the other direction, the study of M365 Defender Security Copilot found security novices with Copilot were
44% more accurate in answering questions about the security incidents they examined.
• A study of the Outlook “Sound like me” feature found Copilot users like many aspects of the emails it generated
more than human-written ones, but could sometimes tell the difference between Copilot writing versus human
writing.
• Of enterprise Copilot users, 68% of respondents agreed that Copilot actually improved quality of their work.
Task completion times for lab studies of
• Users also reported tasks required less effort with Copilot Copilot for M365 (Cambon et al 2023)
• In the Teams Meeting Study, participants with access to Copilot found the task to be 58% less draining than
participants without access
• Among enterprise Copilot users, 72% agreed that Copilot helped them spend less mental effort on mundane or
repetitive tasks
Microsoft Study: Cambon et al (2023), Early LLM-based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity. MSFT Technical Report.
7
Microsoft New Future of Work Report aka.ms/nfw
The evidence points to LLMs helping the least experienced the most
Mostly early studies have found that new or low-skilled workers benefit the most from LLMs.
• In studying the staggered rollout of a generative AI-based conversational
assistant, Brynjolfsson et al. (2023) found that the tool helped novice and low-
skilled workers the most.
• They found suggestive evidence that the tool helped disseminate tacit
knowledge that the experienced and high-skilled workers already had.
• In a lab experiment, participants who scored poorly on their first writing task
improved more when given access to ChatGPT than those with high scores on the
initial task (see graph, Noy and Zhang 2023).
• Peng et al. (2023) also found suggestive evidence that Github Copilot was more
helpful to developers with less experience.
• In an experiment with BCG employees completing a consulting task, the bottom-
half of subjects in terms of skills benefited the most, showing a 43% improvement
in performance, compared to the top half whose performance increased by 17%
(Dell’Acqua et al., 2023). Green triangles represent those who got access to ChatGPT for
the second task. Their scores across the two tasks are less
• Recent work by Haslberger et al. (2023) highlights some complexities and nuance correlated. (Noy & Zhang 2023)
Brynjolfsson, E., et al. (2023). Generative AI at Work. NBER Working Paper 31161.
Haslberger, M. et al. (2023) No Great Equalizer: Experimental Evidence on AI in the UK Labor Market. SSRN Working Paper 4594466,
Dell’Acqua, F., et al. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. SSRN Working Paper 4573321.
Noy, S., & Zhang, W. (2023). Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. SSRN Working Paper 4375283.
Microsoft Study: Peng, S., et al. (2023). The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv preprint 2302.06590. 8
Microsoft New Future of Work Report aka.ms/nfw
Microsoft Study: Sarkar, A. (2023). Exploring Perspectives on the Impact of Artificial Intelligence on the Creativity of Knowledge Work: Beyond Mechanised Plagiarism and Stochastic Parrots Proceedings of the ACM
Symposium on Human-Computer Interaction for Work (CHIWORK 2023).
Kneupper, C. W. (1978). Teaching argument: An introduction to the Toulmin model. College Composition and Communication 29, 3..
Sun, N., et al. (2017). Critical thinking in collaboration: Talk less, perceive more. Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems.
Lee, S., et al. (2023). Fostering Youth’s Critical Thinking Competency About AI through Exhibition. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 9
Bezjak, S. et al, (2018). Open Science Training Handbook
Microsoft New Future of Work Report aka.ms/nfw
AI can enhance microproductivity practices
AI can be harnessed to augment human capabilities through novel task management strategies
• The concept of “microproductivity”, in which complex tasks are decomposed into smaller subtasks and
performed in “micromoments” by the person most skilled to do so, can be enhanced through automation
(Teevan 2016).
• For example, Kokkalis et al. (2013) demonstrated that high level tasks broken into multistep action
plans through crowdsourcing result in people completing significantly more tasks (47.1% task
completion) compared to the control condition of no plans (37.8%). These benefits were scaled by
applying NLP algorithms to automatically create action plans for a larger variety of tasks based on a
training set of similar tasks, and the plans were further refined through human intervention.
• Kaur et al. (2018) showed that using a fixed vocabulary to break down comments in a document into
a series of subtasks resulted in a 28% increase in subtasks that can be handed off to crowdsourcing
or automation, leaving a smaller percentage of subtasks left for the document author.
• AI can help with automatic identification of micromoments and microtasks, improving overall quality and
Decomposing high level tasks into concrete steps (plans) makes them
efficiency. more actionable resulting in higher task completion rates. Online
• Contextual identification of micromoments based on preceding activities and location can yield up to crowds do the decomposition, algorithms identify and reuse existing
plans. (Kokkalis 2013)
80.7% precision (Kang et al. 2017); such micromoments can be used for learning (Cai et al. 2017),
creation of audiobooks (Kang et al. 2017), editing documents (August et al. 2020), and coding
(Williams et al. 2018).
• White et al. (2021) demonstrated how machine learning can be leveraged to automatically detect
microtasks from user-generated task lists resulting in a positive precision of 75%, and forecast
duration, with the best classifier performance for tasks with duration of 5 minutes.
Microsoft Study: Teevan, J. (2016). The future of microwork. XRDS 23, 2.
Kokkalis, N., et al. 2013. TaskGenies: Automatically Providing Action Plans Helps People Complete Tasks. ACM Transactions on Computer-Human Interaction 20, 5.
Kaur, H. et al. 2018. Creating Better Action Plans for Writing Tasks via Vocabulary-Based Planning. Proceedings of the ACM on Human-Computer Interaction. 2, CSCW.
Kang, B. et al. (2017). Zaturi: We Put Together the 25th Hour for You. Create a Book for Your Baby. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW ‘17).
Cai, C. J., Ren, A., & Miller, R. C. (2017). WaitSuite: Productive Use of Diverse Waiting Moments. ACM Transactions on Computer Human Interaction 24, 1.
Microsoft Study: August, T., et al. (2020). Characterizing the Mobile Microtask Writing Process. 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI ‘20). 10
Microsoft Study: Williams, A., (2019). Mercury: Empowering Programmers' Mobile Work Practices with Microproductivity. Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology
Microsoft Study: White, R. W., et al. (2021). Microtask Detection. ACM Trans. Inf. Syst. 39, 2.
Microsoft New Future of Work Report aka.ms/nfw
Analyzing and integrating may become more important skills than
searching and creating
With content being generated by AI, knowledge work may shift towards more analysis and critical
integration
• Information search as well as content production (manually typing, writing
code, designing images) is greatly enhanced by AI, so general information
work may shift to integrating and critically analyzing retrieved information
• Writing with AI is shown to increase the amount of text produced as well
as to increase writing efficiency (Biermann et al. 2022, Lee et al 2022)
• With more generated text available, the skills of research,
conceptualization, planning, prompting and editing may take on more
importance as LLMs do the first round of production (e.g., Mollick 2023). The critical integration “sandwich”: when AI handles production, human critical
thinking is applied at either end of the process to complete knowledge
• Skills not directly to content production, such as leading, dealing with workflows (Sarkar, 2023).
Biermann, O. C., et al. (2022). From Tool to Companion: Storywriters Want AI Writers to Respect Their Personal Values and Writing Strategies. Proceedings of the 2022 ACM Designing Interactive Systems Conference (DIS '22).
Mina, L., et al. (2022). CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI '22).
Mollick, E. (2023). My class required AI. Here's what I've learned so far. One Useful Thing
LinkedIn (2023). Future of Work Report: AI at Work. 11
Microsoft New Future of Work Report aka.ms/nfw
Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35.
Chung, H. W., et al. (2023) Scaling instruction-finetuned language models. arXiv preprint.
Pryzant, R., et al. (2023). Automatic Prompt Optimization with Gradient Descent and Beam Search. arXiv preprint.
Yang, C., et al. (2023). Large language models as optimizers. arXiv preprint.
Nori, Harsha, et al. Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine arXiv preprint.
Fernando, C., et al. (2023). Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint. 13
Microsoft New Future of Work Report aka.ms/nfw
Brade, S. et al. (2023). Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. arXiv preprint.
Herbert H. Clark. 1996. Using Language (1st edition ed.). Cambridge University Press.
Chung, J.J.Y., and Adar, E. (2023) PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology.
Dell’Acqua, F. et al. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. SSRN working paper.
Microsoft Study: Microsoft WorkLab (2023). The art and science of working with AI.
Microsoft Study: Teevan, J. (2023). To work well with GenAI, you need to learn how to talk to it. Harvard Business Review.
OpenAI (2023) Prompt Engineering.
Yao, Z. et al.. (2023) Do Physicians Know How to Prompt? The Need for Automatic Prompt Optimization Help in Clinical Note Generation. arXiv preprint. 14
Microsoft New Future of Work Report aka.ms/nfw
Complementarity is a human-centered approach to AI collaboration
Humans and AI can “collaborate” in many ways: from each party acting as a collaborative team member,
to a person overseeing an AI automation loop, to AI simulating a human.
• Sheridan and Verplank (1978) introduced the Level of Automation (LOA) framework, to classify how
responsibility can be divided between human and automation (see figure). It has been widely applied,
e.g., in self-driving vehicles and process control.
• Computers share load with humans by extending human capabilities or relieving the human to
make their job easier, or
• Computers trade load with humans by through being a back-up in case the human falters, or
completely replacing the human.
• Based on the idea of LOAs, Parasuraman and Wickens (2000) outlined a model to determine what
should be automated and to what extent. It has been applied in the analysis of contemporary systems
(Mackeprang et al. 2019).
• A human-centered approach takes a complementary perspective, in which human and AI are partners Distribution of task-load between humans and
that balance out each other’s weaknesses (Lubars and Tan, 2019). Examples include mixed initiative- computers/automation. (Sheridan and Verplank 1978)
interaction (Horvitz 1999), collaborative control where human and machines are involved in the same
activity (Fong et al. 2001) and coactive design that focuses on supporting interdependency between the
human and AI (Johnson et al. 2011).
Sheridan, T.B. and W.L. Verplank (1978). Human and Computer Control of Undersea Teleoperators. Technical Report.
Parasuraman, R., and C.D. Wickens (2008). Humans: Still Vital After All These Years of Automation. Human Factors, 50(3).
Mackeprang, M. et al. (2019). Discovering the Sweet Spot of Human-Computer Configurations: A Case Study in Information Extraction. Proceedings of the ACM Human-Computer Interaction. 3, CSCW.
Lubars, B. and C. Tan. (2019). Ask not what AI can do, but what AI should do: towards a framework of task delegability. Proceedings of the 33rd International Conference on Neural Information Processing Systems.
Microsoft Study: Horvitz, E. (1999). Uncertainty, Action, and Interaction: In Pursuit of Mixed-Initiative Computing. Intelligent Systems, 6.
Fong, T. et al. (2001). Collaborative control: A robot-centric model for vehicle teleoperation. The Robotics Institute
Johnson, M. et al. (2011). Beyond Cooperative Robotics- The Central Role of Interdependence in Coactive Design. IEEE Intelligent Systems 26, 3. 15
Microsoft New Future of Work Report aka.ms/nfw
Passi, S. and Vorvoreanu, M. (2022). Overreliance on AI Literature Review. Microsoft Research preprint.
Agarwal, N., et al. (2023). Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology. NBER Working Paper 31422.
Danry, V., et al. (2023). Don’t Just Tell Me, Ask Me: AI Systems that Intelligently Frame Explanations as Questions Improve Human Logical Discernment Accuracy over Causal AI explanations. Proceedings of the 2023 CHI
Conference on Human Factors in Computing Systems (CHI '23). 16
Microsoft New Future of Work Report aka.ms/nfw
Spatharioti, S. et al. (2023). Comparing Traditional and LLM-based Search for Consumer Choice: A Randomized Experiment. arXiv preprint.
Vasconcelos, H. et al (2023). Generation Probabilities Are Not Enough: Exploring the Effectiveness of Uncertainty Highlighting in AI-Powered Code Completions. arXiv preprint. 17
Microsoft New Future of Work Report aka.ms/nfw
Microsoft Study: Gordon, A. et al. (2023). Co-audit: tools to help humans double-check AI-generated content. Microsoft Research preprint.
Ferdowsi, K. et al. (2023). ColDeco: An End User Spreadsheet Inspection Tool for AI-Generated Code. Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).
Liu, M.X., Sarkar, A. et al. (2023). “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. Proceedings of the 2023 CHI Conference on Human
Factors in Computing Systems.
Mündler, N. et al. (2023). Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation. arXiv preprint, 18
Microsoft New Future of Work Report aka.ms/nfw
LLMs have made giant steps forward in multilingual performance, but there is
still much to be done
• Multilingual LLMs will reduce the barriers to information access (Nicholas et al. 2023) and help realize transformative applications
at scale (Nori et al. 2023)
• The impact of this can be much higher in low and middle socioeconomic regions where resources are scarce
• However, many problems still remain. For instance, GPT4 performance is still best on English, and performance drops
substantially as we move to mid- and low-resource languages (Ahuja et al. 2023)
• Many language families don’t have enough data for adequate training (Patra et al. 2023)
• Non-Latin scripts are under-represented on the web, so LLMs perform worse on non-Latin text even in high resource languages,
such as Japanese (Ahuja et al. 2023)
• Lack of relevant linguistic and societal context in languages and cultures will impact task level performance for LLMs, for example
in handling dialects within the same language family (Hada et al. 2023)
• There is still little investigation into the multilingual performance of applications built on LLM derived artifacts, for example
knowledge-bases built on low quality embeddings will not perform as well
Nicholas, G. et.al. (2023). Lost in Translation: Large Language Models in Non-English Content Analysis. arXiv pre-print
Nori, H. el. al. (2023). Capabilities of GPT-4 on Medical Challenge Problems. arXiv preprint.
Ahuja, K. et.al. MEGA: Multilingual Evaluation of Generative AI. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Patra, et.al. Everything you need to know about Multilingual LLMs. ACL 2023 Tutorial
Hada, R. et.al. Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? arXiv preprint.
OpenAI (2023). GPT-4 Technical Report. arXiv preprint. 20
Microsoft New Future of Work Report aka.ms/nfw
Microsoft Study: Hwang, A. et al (2023). Seeking authenticity in creative writing with LLMs. In preparation
Microsoft Study: Palani, S. et al. (2023). Amethyst: A Creative Process-Focused Notebook That Leverages Large Language Models. (under review)
Microsoft Study: Brockett C, Dolan B et al. (2023) Project Emergence
Microsoft Study: Yeh, C. et al (2023). GhostWriter: Augmenting Human-AI Writing Experiences Through Personalization and Agency. (under review) 21
Microsoft New Future of Work Report aka.ms/nfw
Bing Chat is frequently used for professional and more complex tasks
Compared to traditional search, consumers use (LLM-based) Bing Chat for more topics in professional
domains and for more complex tasks
• Counts et al. (2023) analyze a sample of fully-anonymized,
consumer-facing Bing Chat conversations and Bing searches
from May-June 2023
• Using GPT-4 to group these conversations and searches by
topics, they find (see graph):
• 69% of Bing Chat conversations are in domains oriented
toward professional tasks.
• 39% of Bing Search sessions are in professional task domains
• Counts et al. also categorize the complexity of the chats and
searches sessions according to Anderson and Krathwohl’s et al.’s
(2001) taxonomy of “Remember”, “Understand”, “Apply”, “Analyze”,
and “Create”.
• In Bing Chat 36% of conversations are high complexity
(Apply, Analyze, or Create)
• But in Bing Search, only 13% are high complexity
“Fast AI” and “Slow AI”: Different LLM experiences require different latencies
Many interactions with LLMs require rapid iteration. However, some don’t, and the “slow search”
literature points to ways systems can use that extra time to deliver better results to end users.
• One well-known challenge with LLM systems is latency between issuing a prompt and receiving a response (e.g., Lee et al.
2023) and a great deal of research is happening to reduce this latency (e.g., Kaddour et al. 2023).
• For many use cases, low latency is essential: we know from traditional search that even small increases in latency can
substantially affect the user experience (e.g., Shurman and Brutlag 2009).
• However, the literature on “slow search” (Teevan et al. 2014) highlights how some use cases do not need fast responses, and
this additional time can open up a whole new design space for AI applications.
• People are willing to wait hours and days for responses to many types of high-importance questions, such as in forums like
StackOverflow (Bhat et al. 2014) and in social media (Hecht et al. 2012).
• With more time to return a response, LLMs can issue multiple prompts, search over more documents using retrieval-
augmented generation approaches, do additional refining of answers, and much more that probably has not been
considered yet. Researchers might want to ask, “If I had minutes and not milliseconds, what new types of experiences could I
create?”
• The “Slow AI” user experience needs to be different than the “fast AI” experience, clearly communicating the system’s status,
helping people understand the benefits of delayed response, and providing ways to interrupt or redirect if it appears things
are off-track (Teevan et al. 2014). The observed relationship in one study between
willingness-to-wait and wait time for different levels
• Bing’s Deep Search experience provides a real-world example of how a “fast AI” experience (standard Bing Chat) can be of search result quality in traditional search (Teevan
complemented by “slow AI” one (Microsoft 2023). et al. 2013)
Bhat, V. et al. (2014). Min(e)d your tags: Analysis of question response time in stackoverflow. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Microsoft study: Hecht, B. et al. (2012). SearchBuddies: Bringing Search Engines into the Conversation. Proceedings of the International AAAI Conference on Web and Social Media, 6, 1.
Lee, M., et al. (2023) Evaluating Human-Language Model Interaction. arXiv preprint.
Kaddour, Jean, J.H. et al.. (2023). “Challenges and Applications of Large Language Models.” arXiv preprint.
Microsoft Bing Blog (2023). Introducing Deep Search,
Shurman, E. and Brutlag, J. (2009). Performance related changes and their searcher impact. Velocity.
Microsoft study: Teevan, J. et al. (2014) Slow Search. Communications of the ACM 57, 8.
Microsoft study: Teevan, Jaime, Kevyn Collins-Thompson, Ryen W. White, Susan T. Dumais, and Yubin Kim. (2013) “Slow Search: Information Retrieval without Time Constraints.” HCIR ’13. 23
Microsoft New Future of Work Report aka.ms/nfw
Microsoft Study: Kumar, H. et al (2023). Math Education with LLMs: Peril or Promise? (Work in progress.)
Microsoft Study: Hofman, J.M., et al. (2023). A Sports Analogy for Understanding Different Ways to Use AI. Harvard Business Review. 25
Microsoft New Future of Work Report aka.ms/nfw
DiMicco, J.M. et al. (2007) The Impact of Increased Awareness While Face-to-Face, Human–Computer Interaction, 22:1-2.
Samrose, S. et al. (2020). Immediate or Reflective?: Effects of Real-time Feedback on Group Discussions over Videochat. arXiv preprint.
Leshed, G. et al. (2009). Visualizing real-time language-based feedback on teamwork behavior in computer-mediated groups. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '09).
Tausczik, Y.R. and J.W. Pennebaker (2013). Improving teamwork using real-time language feedback. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). 28
Microsoft New Future of Work Report aka.ms/nfw
Kim, J., and J.A. Shah (2016). Improving Team’s Consistency of Understanding in Meetings. IEEE Transactions on Human-Machine Systems 46.5.
Samrose, S. et al. (2021). MeetingCoach: An Intelligent Dashboard for Supporting Effective & Inclusive Meetings. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI '21).
Ellwart, T. et al (2015). Managing information overload in virtual teams: Effects of a structured online team adaptation on cognition and performance. European Journal of Work and Organizational Psychology, 24:5.
Webber, S. et al. (2019). Team challenges: Is artificial intelligence the solution? Business Horizons, 62(6). 29
Microsoft New Future of Work Report aka.ms/nfw
AI may help leaders and teams plan and iterate on workflows
Workflow planning will benefit from AI’s ability to track task interdependence.
• AI can help to allocate team member roles based on their present work schedules and their skill sets,
attitudes, and actions (Sowa et al. 2020).
• AI can track how well task interdependence status is synchronized, measuring workload and redistributing
the workload of individual team members to ensure that a team acts in a coherent manner (Khakurer et al.
2022).
• Case 1: Train traffic control. An AI assistant could effectively measure and inform team members about their own and
other team members’ workload, and effectively automate task delegation (Maaike et al. 2017).
• Case 2: Construction. ChatGPT generated a logical sequence of tasks, breaking down steps needed and handling
dependencies among the proposed tasks (Prieto et al. 2023). Results suggested that AI-enabled tools could generate
or enhance agendas based on project details, such as the scope of work a user provides. Not all the proposed tasks
agreed with the scope of work, but ChatGPT showed promising performance and received positive user feedback
(Prieto et al. 2023).
• Case 3: Urban planning. With enough information about the project scope and the team, AI could effectively plan the
workflow. However, collaborative planning platforms should integrate human feedback in the loop to refine workflow
suggestions, offer alternatives, and balance multiple perspectives and considerations (Wang et al. 2023).
Workflow planning can benefit from AI’s ability to
• AI help in delegating management responsibilities can be an effective form of human-AI collaboration track task interdependence. Image Credit: Bing
(Hemmer et al. 2023), freeing management to focus on team vision. Image Creator
• As AI becomes more prominent in workflow planning, it is critical to consider the possible externalities and
challenges raised in the “algorithmic management” literature (e.g., Lee 2018).
Khakurel, J. and K. Blomqvist (2022). Artificial Intelligence Augmenting Human Teams. A Systematic Literature Review on the Opportunities and Concerns. International Conference on Human-Computer Interaction.
Lee, M.K. (2018). Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management. Big Data & Society. 5, 1.
Harbers, Maaike, and M.A. Neerincx (2017). Value sensitive design of a virtual assistant for workload harmonization in teams." Cognition, Technology & Work 19.
Hemmer, P. et al (2023). Human-AI Collaboration: The Effect of AI Delegation on Human Task Performance and Task Satisfaction. IUI 2023.
Prieto, S. et al (2023). Investigating the use of ChatGPT for the scheduling of construction projects. Buildings 13,. 4.
Sowa, K. (2021). Cobots in knowledge work: Human–AI collaboration in managerial professions. Journal of Business Research, 125.
Wang, D. (2023). Towards automated urban planning: When generative and chatgpt-like ai meets urban planning. arXiv preprint. 30
Microsoft New Future of Work Report aka.ms/nfw
LLMs may help address one of the greatest problems facing organizations:
knowledge fragmentation
Organizational knowledge is fragmented across documents, conversations, apps and devices, but LLMs
hold the potential to gather and synthesize this information in ways that were previously impossible.
• Knowledge fragmentation is a key issue for organizations. Organizational knowledge is distributed across
files, notes, emails (Whittaker & Sidner, 1992), chat messages, and more. Actions taken to generate, verify, and
deliver knowledge often take place outside of knowledge 'deliverables’, such as reports, occurring instead in
team spaces and inboxes (Lindley & Wilkins, 2023).
• LLMs can draw on knowledge generated through, and stored within, different tools and formats, as and when
the user needs it. Such interactions may tackle key challenges associated with fragmentation, by enabling
users to focus on their activity rather than having to navigate tools and file stores, a behavior that can easily
introduce distractions (see e.g., Bardram et al. 2019).
• However, extracting knowledge from communications raises implications for how organization members are
made aware of what is being accessed, how it is being surfaced, and to whom. Additionally, people will need
support in understanding how insights that are not explicitly shared with others could be inferred by ML
systems (Lindley & Wilkins, 2023). For instance, inferences about social networks or the workflow associated
with a process could be made. People will need to learn how to interpret and evaluate such inferences.
Fragmented knowledge could be pulled
together with AI. Image Credit: Bing Image
Creator
Bardram, J. et al. (2019). Activity-centric computing systems. Communications of the ACM, 62, 8.
Lindley, S. and D.J. Wilkins (2023). Building Knowledge through Action: Considerations for Machine Learning in the Workplace. ACM Transactions on Computer-Human Interaction 30, 5.
Whittaker, S. and C. Sidner (1996). Email overload: exploring personal information management of email. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '96). 32
Microsoft New Future of Work Report aka.ms/nfw
• David Nye’s (1997) classic study of how Americans responded to the invention of electricity argues that
interpretations fell on a spectrum from utopian hopes (ranging from world peace to modest life improvements) to
dystopian fears (ranging from global destruction to more daily inconveniences). Contemporary discourses of AI
dramatically increasing productivity or leading to human extinction can reflect the same sociotechnical
interpretive dynamics.
• People in organizations do not always accept technologies that on the face of it seem to be improvements. Action
research in British coal mines in the 1950s (Trist & Bamforth, 1951) showed that understanding this resistance
required understanding people, organizations, and technologies as part of a single sociotechnical system: “a web-
like arrangement of the technological artefacts, people, and the social norms, practices, and rules” (Sawyer &
Tyworth, 2006, p. 51).
• An important implication is that new technologies, such as applications powered by LLMs, should be developed
through participation with people in the contexts in which they will be deployed. “The rationale for adopting
socio-technical approaches to systems design is that failure to do so can increase the risks that systems will not
make their expected contribution to the goals of the organization” (Baxter & Sommerville, 2011, p. 4) Nye’s classic text on technology
and American culture
Nye, D. E. (1997) Narratives and Spaces: Technology and the Development of American Culture, New York: Columbia University Press
Baym, N. and N. B. Ellison (2023). Toward work's new futures: Editor's Introduction to Technology and Future of Work special issue. Journal of Computer-Mediated Communication 28(4).
Trist, E. L, and K. W. Bamforth (1951). Some social and psychological consequences of the longwall method of coal-getting: An examination of the psychological situation and defences of a work group in relation to the social
structure and technological content of the work system. Human relations 4.1.
Sawyer, S., and M. Tyworth (2006) Social informatics: Principles, theory, and practice. Social Informatics: An Information Society for all? In Remembrance of Rob Kling: Proceedings of the Seventh International Conference on
Human Choice and Computers (HCC7), IFIP TC 9.
Baxter, G. and I. Sommerville (2011). Socio-technical systems: From design methods to systems engineering. Interacting with computers 23.1. 33
Microsoft New Future of Work Report aka.ms/nfw
How AI tools are perceived by knowledge workers and whether they fit their
work context can determine if they will be effectively adopted
• Perceptions of new technologies and knowledge workers’ willingness to adopt them can be
influenced by how they are used and discussed in workplaces. For example, early work in the Social
Influence Model of Technology Use found that initially, perceptions of email’s usefulness were
influenced by how co-workers used and talked about the technology (e.g., Schmitz & Fulk 1991).
• Knowledge workers’ ability to effectively adopt new technologies can also be influenced by how well
the tools fit their workflows. Poor contextual fit means they might feel limited and lack the means or
time to make an informed decision (Yang et al. 2019; Khairat et al. 2018). Human Factors research
shows that disrupting domain experts’ workflows can also limit their ability to apply their
expertise (Elwyn et al. 2013; Klein, 2006) and decision-making strategies learned with experience
(Sterman & Sweeney 2004).
• Knowledge workers form perceptions of AI systems and anticipate related workflow changes before Image credit: Microsoft stock image
using them. For example, Rezazade Mehrizi’s (2023) ethnographic study of how radiologists interpret
AI shows that even though most had not worked with technology, they co-constructed frames for
understanding how it would shape their work, ranging from expectations that it would automate
them away, to envisioning AI as likely to enhance or rearrange their work, to expecting that their work
would become increasingly about communicating to the AI to make it work more effectively.
Schmitz J. & Fulk J. (1991). Organizational colleagues, media richness, and electronic mail: A test of the social influence model of technology use. Communication Research, 18(4).
Rezazade Mehrizi, M. H. (2023). Pre-framing an emerging technology before it is deployed at work: the case of artificial intelligence and radiology, Journal of Computer-Mediated Communication, 28, 4.
Yang, Q., et al. (2019). Unremarkable ai: Fitting intelligent decision support into critical, clinical decision-making processes. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
Khairat, S., et al. (2018). Reasons for physicians not adopting clinical decision support systems: Critical analysis. JMIR medical informatics, 6, 2.
Klein, G. et al. (2006). Making sense of sensemaking 1: Alternative perspectives. IEEE intelligent systems, 21(4).
Elwyn, G. et al. (2013). Many miles to go...”: A systematic review of the implementation of patient decision support interventions into routine clinical practice. BMC medical informatics and decision making, 13(2).
Sterman, J. D., and Sweeney, L. B. (2004). Managing complex dynamic systems: challenge and opportunity for. In Henry Montgomery, Raanan Lipshitz, & Berndt Brehmer (Eds.), How professionals make decisions. CRC Press.
34
Microsoft New Future of Work Report aka.ms/nfw
Elish (2019) examined the history of autopilot in aviation. Some of her key observations
were:
• AI-supported autopilot systems were deemed "safer" than pilot-flown airplanes but
policymakers mandated pilots/copilots to be available "just in case" the
machine failed.
• Pilots were not trained for this new role and sometimes were ill-equipped to handle
sudden hand-off when things went wrong.
• Pilots became a “moral crumple zone”: Since pilots had to take over at the worst
possible moments and struggled, they were often blamed for crashes.
Elish’s work and others highlights the importance of building technologies that deeply
engage with actual human capacity and of ensuring that an entire sociotechnical system
works well in the context in which it is operated.
As Elish writes, these findings highlight the importance of focusing on the true “value
and potential of humans…in the context of human-machine teams”.
Elish, M. (2019). Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction. Engaging Science, Technology, and Society 5. 36
Microsoft New Future of Work Report aka.ms/nfw
Early evidence shows disparities in adoption follow traditional digital divide
Looking at searches in traditional Bing for "ChatGPT" or "Chat GPT“ can show which counties have
higher rates of interest
Association between rates of search for Chat GPT and a one standard deviation difference in
county-level variables (Daepp, 2023)
Microsoft study: Daepp, M. (2023). The Emerging AI divide in the United States. In progress. 37
Microsoft New Future of Work Report aka.ms/nfw
Most jobs will likely have at least some of their tasks affected by LLMs
Many studies have used AI’s current capabilities to try to measure where AI will have the most
impact – either by making some people more productive or by replacing some roles
• A study by OpenAI found that approximately 80% of the U.S. workforce could have at least 10% of their
work tasks affected by the introduction of GPTs (Eloundou et al. 2023)
• Around 19% of workers may see at least 50% of their tasks impacted.
• A study by LinkedIn researchers categorized each job category by whether few of its associated skills
will be impacted by AI (Insulated) or, if many of its skills will be impacted, whether it also has many skills
that are complementary (Augmented) or does not have complementary skills (Disrupted). (Linkedin
2023, see graph).
• Augmented jobs are particularly likely to see a shift in the composition of tasks workers do and
the skills they rely on most
• Research by Goldman Sachs suggests that organizations in Developed Markets may have more tasks
exposed to AI than in Emerging Markets.
• However, the ultimate effects of new technologies on jobs are very hard to predict because they depend
on how the technology is adopted. Historical examples show a wide range of possible effects:
• Direct Distance Dialing technology almost entirely replaced the profession of switchboard
operation in the 1930s. (Carmi, 2015)
• ATMs did not replace bank tellers, despite fears that they would. Instead, the jobs evolved—less
Share of LinkedIn members in occupations likely to be augmented, disrupted or
time spent on basic tasks like counting bills, and more on complex customer issues. (Bessen, 2015) insulated, by industry as calculated by the Linkedin Economic Graph Research
• Similarly, the introduction of basic chatbots in the early 2010's generated changes to jobs in the Institute (Kimbrough and Carpanelli, 2023).
customer service industry, but did not eliminate them. (CFPB, 2022)
Eloundou et al. (2023) GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv preprint.
Kimbrough, K. and Carpanelli, M. (2023) Preparing the Workforce for Generative AI Insights and Implications. Linkedin Economic Graph Research Institute
Goldman Sachs (2023) The Potentially Large Effects of Artificial Intelligence on Economic Growth (Briggs/Kodnani)
Bessen, J. (2015). Learning by Doing: The Real Connection Between Innovation, Wages, and Wealth. Yale University Press.
Carmi, E. (2015). Taming Noisy Women: Bell Telephone’s female switchboard operators as a noise source. Media History, 21(3).
Consumer Financial Protection Bureau (2022). Chatbots in consumer finance. 38
Microsoft New Future of Work Report aka.ms/nfw
Acemoglu, D. and Johnson, S. (2023) Power and Progress: Our Thousand-year Struggle Over Technology and Prosperity. PublicAffairs
Koyama, M., and J. Rubin. (2022) How the World Became Rich: The Historical Origins of Economic Growth. John Wiley & Sons.
Brynjolfsson, E. (2022) The Turing Trap: The Promise & Peril of Human-Like Artificial Intelligence. Daedalus. 39
Microsoft New Future of Work Report aka.ms/nfw
The future of work is a choice, not a predetermined destiny
Instead of “How will AI affect work?”, the question should be “How do we want AI to affect work?”
• Despite the way people sometimes talk about innovation, it is not a natural force; it is largely
the product of societal factors, all of which are within human control (e.g., Bijker et al. 2012).
• As was the case for hybrid work, it is often important to reframe predictive questions about
AI’s relationship to work into questions about values and strategic goals (e.g., Weyl 2022).
Rather than “What will the future of work look like?”, we should ask “What do we want it to
look like?”
• Several major actors in AI have stated what they think the future of work should look like,
including in OpenAI’s charter and Microsoft’s Copilot vision.
• The scientific literature suggests that achieving many goals regarding the future of work and
AI will require joint action across and within model builders, people who use models, and
people who create content that is used by models (e.g., Vincent and Hecht 2023).
• If we anticipate problems emerging at the intersection of technology, work and who they
benefit, it is almost always within the ability of humans – collaborating together – to fix those
problems (Hecht et al. 2018).
• Some examples of coalitions in which Microsoft is involved that are tackling key problems
include the Coalition for Content Provenance and Authenticity, the Biden-Harris
administration’s voluntary AI commitments, and Microsoft partnership with the AFL-CIO.
The C2PA is one coalition Microsoft is involved in to help address key
challenges raised by LLMs.
Hecht, B., et al. (2018). It’s Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process. ACM Future of Computing Blog.
Weyl, E.G. (2022). Sovereign Nonsense. RadicalxChange.
Vincent, N. and Hecht, B. (2023). Sharing the Winnings of AI with Data Dividends: Challenges with “Meritocratic” Data Valuation. EAAMO ’23 (2023).
Bijker, W.E. et al. (2012). The Social Construction of Technological Systems, anniversary edition: New Directions in the Sociology and History of Technology. MIT Press 40
Microsoft New Future of Work Report aka.ms/nfw
Teevan, J. (2023) From Documents to Dialogues. Generative AI: Hackathon Closing Ceremony, Carnegie Melon University. 41