Edit 11/26/2024 7:00am PT: Microsoft, via Twitter (below), has now stated that the company does not use the data to train its large language models (AI models).
In the M365 apps, we do not use customer data to train LLMs. This setting only enables features requiring internet access like co-authoring a document. https://t.co/o9DGn9QnHbNovember 25, 2024
It is not a secret that Microsoft's Office has Connected Experiences which analyze content created by users. However, according to @nixCraft, an author of Cyberciti.biz, Microsoft's Connected Experiences feature automatically gathers data from Word and Excel files to train the company's AI models. This feature is turned on by default, meaning user-generated content is included in AI training unless manually deactivated. However, this deactivation is a very convoluted process. Microsoft has yet to comment on the information, so take it with a grain of salt [EDIT: as stated above, Microsoft has now said this feature does not enable AI].
This default setting allows Microsoft to use documents such as articles, novels, or other works intended for copyright or commercial purposes without explicit consent. The implications are significant for creators and businesses relying on Microsoft Office for proprietary work, as their data could become part of the company's AI development. For this reason, anyone concerned about protecting their intellectual property or sensitive information should take action immediately.
To do so, users must actively opt out by finding and disabling the feature in settings. The process requires unchecking the box 'Turn on optional connected experiences' that is enabled by default.
On a Windows PC, the steps include going to File > Options > Trust Center > Trust Center Settings > Privacy Options > Privacy Settings > Optional Connected Experiences and unchecking the box. Seven steps to disable a critical feature that is turned on automatically seems very convoluted.
Microsoft's approach mirrors a broad trend in the tech industry, where other companies have introduced similar features to train their AI models. While all AI models are trained on something generated by humans, doing so without their consent is unethical, to put it mildly.
Microsoft has not publicly confirmed or denied that it uses content from Excel and Word documents generated by users of Microsoft Office to train its AI models. Nonetheless, there is a clause in Microsoft's Services Agreement that grants the company 'a worldwide and royalty-free intellectual property license to use Your Content.'
"To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services," the clause reads.