Data is king: Why content creators must rethink their role in the AI era - FT中文网
登录×
电子邮件/用户名
密码
记住我
请输入邮箱和密码进行绑定操作:
请输入手机号码,通过短信验证(目前仅支持中国大陆地区的手机号):
请您阅读我们的用户注册协议隐私权保护政策,点击下方按钮即视为您接受。
双语电台

Data is king: Why content creators must rethink their role in the AI era

Content creators may feel the most profound shift and play a more important role as data becomes a strategic asset in the AI era
00:00

{"text":[[{"start":9.53,"text":"This article only represents the author's own views."}],[{"start":13.76,"text":"As the global AI race heats up, it’s becoming clear that data doesn’t lose its value once large models reach the reasoning stage. On the contrary, it’s even more critical due to the need for dynamic knowledge. The so-called “last mile” of high-quality datasets often determines a model’s ultimate performance."}],[{"start":36.15,"text":"That is likely why Facebook parent Meta Platforms (META.US) made a $14.3 billion strategic investment in Scale AI, a company focused on data labeling and cleaning for AI training."}],[{"start":53.18,"text":"Scale AI provides structured, high-quality datasets to OpenAI, Meta, Google and other tech giants by combining the output of massive human labor with automated pipelines. Its data labeling process involves tagging images, texts or audio with meaningful metadata — such as identifying pedestrians in a photo or labeling the point of an article. Data cleaning eliminates errors, duplicates or irrelevant material to ensure consistency and accuracy."}],[{"start":87.37,"text":"Another example of the growing value of quality data is a recent licensing deal between The New York Times and Amazon (AMZN.US), which allows fact-checked editorial content to be used for training AI models. A similar agreement between the Associated Press and OpenAI has also been signed."}],[{"start":109.52000000000001,"text":"Though these arrangements are described as content licensing, they reflect a deeper shift: content has become data, and data has become a service. These deals highlight how media organizations are reassessing the value of their content, while AI developers continue to pursue high-quality material with growing urgency."}],[{"start":131.46,"text":"In contrast, the Chinese-language AI ecosystem faces unique challenges, such as a shortage of publicly available data, lack of large-scale professional annotation and difficulty digitizing classical and cultural texts at scale. Such obstacles highlight the challenges facing development of localized large AI models."}],[{"start":155.99,"text":"Chinese-language materials are relatively scarce"}],[{"start":159.62,"text":"A white paper published by Alibaba Research Institute notes that English accounts for 59.8% of all crawlable web text, while Chinese represents just 1.3%. Wikipedia, a commonly used open resource, has over 7 million English articles, whereas there are only 1.5 million Chinese — less than a quarter of the volume."}],[{"start":184.85,"text":"This imbalance creates a major disadvantage. Without sufficient publicly available Chinese material, local large language models in Chinese may fall far behind their English-language counterparts in natural understanding and text generation — potentially leading to culturally mismatched outputs and a sense that these models have “consumed too much foreign ink.”"}],[{"start":209.9,"text":"Chinese authorities have long recognized this gap and have taken steps to address it. Platforms such as People’s Daily and Xinhua are actively constructing curated, high-quality materials, consisting of vetted news, commentary and policy interpretation, designed to ensure alignment with official values and to support AI safety from a moral and ideological standpoint."}],[{"start":237.43,"text":"Initiatives like the \"Cyber Research Large Language Model\" further concentrate on integrating data from legal and policy documents, state media and other publications, reinforcing alignment with Chinese values."}],[{"start":252.22,"text":"In China, such value alignment has become a basic requirement for any domestic AI system. While China has yet to produce a company of Scale AI’s size, several local firms, including Aishu Technology, Testin, iFlytek (002230.SZ) and Haitai Ruisheng (688787.SH), are building up their capabilities in large-scale data annotation and cleaning. The Shanghai AI Lab is also developing a platform-based material processing system in partnership with policy and academic resources, laying the foundation for a “Chinese version of Scale AI.”"}],[{"start":293.65,"text":"According to market research firm IDC, the value of China’s AI training data market was estimated at $260 million in 2023, and is expected to grow to approximately $2.32 billion by 2032, representing a compound annual growth rate of 27.4%."}],[{"start":317.23999999999995,"text":"Ultimately, the performance of any AI model depends on the content it consumes. In the AI era, content creators — especially those in journalism — must recognize that they are no longer merely material providers. They are now an integral part of the data services supply chain."}],[{"start":337.37999999999994,"text":"When news stories, commentary, academic papers and cultural archives are structured, semantically labeled and integrated into AI training pipelines, their value shifts from real-time information to durable data assets. Content creators who proactively organize and annotate their materials, and pursue licensing partnerships with AI developers, may find themselves unlocking new revenue opportunities."}],[{"start":367.2099999999999,"text":"It’s time for content to be seen not just as narrative, but also as infrastructure."}],[{"start":384.2499999999999,"text":""}]],"url":"https://audio.ftmailbox.cn/album/a_1750297349_2997.mp3"}

版权声明:本文版权归FT中文网所有,未经允许任何单位或个人不得转载,复制或以任何其他方式使用本文全部或部分,侵权必究。

气候科技在欧洲谋求自主的进程中展现出战略价值

生产绿色技术的初创企业在推动政治格局向安全与独立转变中发挥着关键作用。

防务初创企业瞄准拦截弹市场

随着战争成本飙升,交战规则正随之改变。拦截弹因其能中和来袭打击的各类飞机和导弹,已跻身最抢手的军备清单前列。

英伟达的高利润率能否持续?

这家芯片制造商与台积电的共生关系曾带来可观收益,但也构成了关键弱点。台积电有充足动力支持英伟达的增长,但并无义务维护其利润率。

美国与伊朗库尔德武装就反政权行动进行磋商

驻扎在伊拉克境内的伊朗库尔德武装已请求特朗普政府给予情报、武器和训练方面的支持,他们称美国尚未同意这些请求。

聊天机器人诱发妄想的现实

大型语言模型若认为用户需要角色扮演,便会欣然配合,它常常把迎合客户置于传递真相之上。去年夏天发布的新模型GPT-5,特别着重于降低迎合度。

以色列预期与伊朗的战争将持续数周

官员和分析人士称,以色列意在摧毁伊斯兰政权的关键能力。
设置字号×
最小
较小
默认
较大
最大
分享×