MLCommons Announces Expansion of Industry-Leading AILuminate Benchmark
World leader in AI benchmarking announces new partnership with India's NASSCOM; updated reliability grades for leading LLMs
SAN FRANCISCO, May 29, 2025 (GLOBE NEWSWIRE) -- MLCommons® today announced that it is expanding its first-of-its-kind AILuminate benchmark to measure AI reliability across new models, languages, and tools. As part of this expansion, MLCommons is partnering with NASSCOM, India's premier technology trade association, to bring AILuminate's globally recognized AI reliability benchmarks to South Asia. MLCommons is also unveiling new proof of concept testing for AILuminate's Chinese-language capabilities and new AILuminate reliability grades for an expanded suite of large language models (LLMs).
”We're looking forward to working with NASSCOM to develop India-specific, Hindi-language benchmarks and ensure companies in India and around the world can better measure the reliability and risk of their AI products,” said Peter Mattson, President of MLCommons. “This partnership, along with new AILuminate grades and proof of concept for Chinese language capabilities, represents a major step towards the development of globally inclusive industry standards for AI reliability.”
“The rapid development of AI is reshaping India's technology sector and, in order to harness risk and foster innovation, rigorous global standards can help align the growth of the industry with emerging best practices,” said Ankit Bose, Head of NASSCOM AI. “We plan to work alongside MLCommons to develop these standards and ensure that the growth and societal integration of AI technology continues responsibly.”
The NASSCOM collaboration builds on MLCommons' intentionally global approach to AI benchmarking. Modeled after MLCommons' ongoing partnership with Singapore's AI Verify Foundation, the NASSCOM partnership will help to meet South Asia's urgent need for standardized AI benchmarks that are collaboratively designed and trusted by the region's industry experts, policymakers, civil society members, and academic researchers. MLCommons' partnership with the AI Verify Foundation – in close collaboration with the National University of Singapore – has already resulted in significant progress towards globally-inclusive AI benchmarking across East Asia, including just-released proof of concept scores for Chinese-language LLMs.
AILuminate is also unveiling new reliability grades for an updated and expanded suite of LLMs, to help companies around the world better measure product risk. Like previous AILuminate testing, these grades are based on LLM responses to 24,000 test prompts across 12 hazard categories – including including violent and non-violent crimes, child sexual exploitation, hate, and suicide/self-harm. None of the LLMs evaluated were given any advance knowledge of the evaluation prompts (a common problem in non-rigorous benchmarking), nor access to the evaluator model used to assess responses. This independence provides a methodological rigor uncommon in standard academic research or private benchmarking.
“Companies are rapidly incorporating chatbots into their products, and these updated grades will help them better understand and compare risk across new and constantly-updated models,” said Rebecca Weiss, Executive Director of MLCommons.”We're grateful to our partners on the Risk and Reliability Working Group – including some of the foremost AI researchers, developers, and technical experts – for ensuring a rigorous, empirically-sound analysis that can be trusted by industry and academia like.”
Having successfully expanded the AILuminate benchmark to multiple languages, the AI Risk & Reliability Working Group is beginning the process of evaluating reliability across increasingly sophisticated AI tools, including mutli-modal LLMs and agentic AI. We hope to announce proof-of-concept benchmarks in these spaces later this year.
About MLCommons
MLCommons is the world leader in building benchmarks for AI. It is an open engineering consortium with a mission to make AI better for everyone through benchmarks and data. The foundation for MLCommons began with the MLPerf® benchmarks in 2018, which rapidly scaled as a set of industry metrics to measure machine learning performance and promote transparency of machine learning techniques. In collaboration with its 125+ members, global technology providers, academics, and researchers, MLCommons is focused on collaborative engineering work that builds tools for the entire AI industry through benchmarks and metrics, public datasets, and measurements for AI risk and reliability.
Press Inquiries:
press@mlcommons.org
- 都市爱情剧《难哄》今日开播 新人演员苏棋崭露头角引期待
- L&T Technology Services、ISG和CNBC TV18推出数字工程奖
- 西门子医疗第二季度业绩强劲,受关税影响调整每股收益预期
- 伊顿向领先的电动汽车制造商提供创新的ELocker差速器系统
- “红色足迹·中国水彩魅影” 刘正国画展在南京图书馆展厅隆重开幕
- 功夫胖《野风筝》直击内心 唱出人生酸甜苦辣
- 王凯获任猎鹰计划短片季艺术顾问 以专业助推青年影人
- 2024年全球NMN产品十大排名:权威测评靠谱抗衰老品牌
- Bitget 更新 2025 年 2 月储备金证明,储备金率增至 186%
- 国货品牌之光,不忘初心,于楠,凯萨蒂持续深耕品牌实力!
- 领克07 EM-P:重新定义豪华混动轿车新标准
- Caseware Announces Acquisition of LeaseJava
- Philips highest ranked medical technology company among Clarivate Top 100 Global Innovators
- 东莞市唤想灯饰有限公司——用灯饰演绎文化魅力与设计美学
- 中国人保为金诺防水承保产品责任险,为消费者保驾护航!
- NIQ联合京东小魔方X京东C2M智造平台发布《2024年新品消费趋势报告》
- ReNAgade Therapeutics任命Joe Bolen博士为首席科学创新官
- 武汉汉街万达影城绘画活动顺利进行,携手迪士尼共同开启梦幻艺术之旅!
- 告状“告”来的“好书记”--心系人民 ——记北京市通州区梨园镇大马庄村原书记房广成
- Mavenir Digital BSS现已登陆Amazon Web Services市场
推荐
-
产业数字化 为何需要一朵实体云? 改革开放前,国内供应链主要依靠指标拉动,其逻 资讯
-
周星驰新片《少林女足》在台湾省举办海选,吸引了不少素人和足球爱好者前来参加 周星驰新片《少林女足》在台湾省举办海选,吸 资讯
-
一个“江浙沪人家的孩子已经不卷学习了”的新闻引发议论纷纷 星标★ 来源:桌子的生活观(ID:zzdshg) 没 资讯
-
抖音直播“新红人”进攻本地生活领域 不难看出,抖音本地生活正借由直播向本地生活 资讯
-
海南大学生返校机票贵 有什么好的解决办法吗? 近日,有网友在“人民网领导留言板&rdqu 资讯
-
新增供热能力3200万平方米 新疆最大热电联产项目开工 昨天(26日),新疆最大的热电联产项目—&md 资讯
-
大家一起关注新疆乌什7.1级地震救援见闻 看到热气腾腾的抓饭马上就要出锅、村里大家 资讯
-
透过数据看城乡居民医保“含金量” 缴费标准是否合理? 记者从国家医保局了解到,近期,全国大部分地区 资讯
-
中国减排方案比西方更有优势 如今,人为造成的全球变暖是每个人都关注的问 资讯
-
私域反哺公域一周带火一家店! 三四线城市奶茶品牌茶尖尖两年时间做到GMV 资讯