By D&I Chairs: Mona Diab, Martha Yifru
Inspiration
Happy 60th anniversary, ACL! How better to celebrate than by embracing the CL community’s diversity, showcasing how we can be genuinely inclusive. In accordance with the ACL 2022 theme on linguistic diversity, we announce the inauguration of the year long challenge 60-60: Globalization via localization. Imagine an Uzbek high schooler reading CL papers in Uzbek or a Cambodian college undergrad writing a senior thesis on language models in Cambodian, and this is the new modus operandi in CL. Language barriers are no longer gateways to scientific access or innovation.
ACL will make history by trailblazing multilingual scientific communication!
ACL at its inception in 1962 was The Association for Machine Translation and Computational Linguistics (AMTCL), in recognition of the significance of MT in human communication. The first meeting for AMTCL, later known as ACL, was in 1963. Hence, we would like this challenge to last for a year.
The 60-60 Challenge
60-60 is an initiative towards multilingual CL scientific communication in all modalities: text, speech, and sign language. It is an effort to remove the ingrained linguistic bias in the scientific landscape in general and CL science in particular. To date, multilingual scientific communication has been an aspiration, we believe CL has the power and the know-how to make this a reality at scale.
60-60 should be a catalyst in the democratization of computational linguistics sciences, maximizing CL’s global reach. The scientific community is able to correspond and communicate seamlessly using any modality (text, speech, video, sign language, etc.): empowering scientists to think scientifically in their native tongues, inspire the younger upcoming cohorts of STEM scientists, providing all a voice to unlock their creativity and innovation without a language barrier. Who dreams in a second or third language?
Drinking our own “Irish coffee”
One aspect of 60-60 is using our technologies inwards. How far have our technologies come in facilitating human communication? Hey Siri! What do you think Alexa? It is about time we level the playing field empowering global CL science. Language is not only our trade but also our vehicle. Our technologies have reached a level of maturity that allows us to think big.
The seed 60-60 initiative
We announce the inauguration of a new year long Diversity & Inclusion Special Initiative 60-60, 60 languages for the 60th ACL anniversary. 60-60 aims to spur research on the processing of CL scientific content leveraging our current and emerging NLP technologies. The D&I team created the following workstreams and resulting resources:
-
Speech translation
-
English native and Spanish cross lingual closed captioning of all oral content at the conference.
-
(24hr delayed) cross lingual closed captioning/subtitling for all the plenary events/videos (keynotes, opening, closing, panels, etc) into 10 languages: Arabic, Chinese, French, Hindi, Irish, Japanese, Portuguese, Russian, and Ukrainian.
-
(24hr delayed) cross lingual voice over/dubbing for all the plenary events/videos (keynotes, opening, closing, panels, etc) into 10 languages (same as 1b).
-
-
Text Translation (available now)
-
Machine translation of 10K paper titles and abstracts randomly selected from the ACL anthology from 2017-2021 and all the titles and abstracts from ACL 2022 (~1.3K) into 60 languages: Afrikaans, Albanian, Amharic, Armenian, Arabic, Azerbaijani, Bengali, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, Estonian, Filipino, Finnish, French, Georgian, German, Greek, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kazakh, Korean, Lithuanian, Macedonian, Malay, Malayalam, Maltese, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhala, Slovak, Somali, Swahili, Swedish, Tamil, Tibetan, Turkish, Ukrainian, Urdu, Uzbek, and Vietnamese.
-
Reference human translation for a subset of a randomly selected 100 titles and abstracts into 20 languages: Arabic, Bulgarian, Chinese, Croatian, Danish, Dutch, French, German, Hindi,Indonesian, Irish, Japanese, Korean, Persian, Portuguese, Russian, Swahili, Turkish, and Ukrainian.
-
Evaluation of MT quality for those 20 languages.
-
-
Scientific expressions and CL Terminology Curation (available now)
-
Curating the most frequent 4,234 terms and expressions used over the years (across 71K papers) with 5 sentences each reflecting contextual term uses.
-
Machine Translations of the source context sentences for the most frequent 1,000 terms into the 60 languages in 2a
-
Human reference term generation of the first 1,000 terms into 20 languages in 2b.
-
-
Sign language interpretation for the SLPAT (Speech and Language Processing for Assistive Technologies) ACL 2022 Workshop
-
A website showcasing all of the above (available now).
How 60-60?
The core team: It took a village to build this seed initiative while balancing confidentiality against diversity and inclusion. We wanted to not only produce a diverse and inclusive outcome but also reflect D&I in how we went about the initiative. Thus, the makeup of the core team operationalized diversity and inclusion. We engaged with academic teams across the globe (National University of Singapore, Yale University, University of Illinois Urbana Champaign, NYUAD, and King Saud University), big tech (Baidu, Meta), medium tech (AppTek) companies, non profit companies/organizations (AI2), and last but not least, featuring startups (aiXplain) as well as annotation companies (YaiGlobal). aiXplain, through their robust AI/NLP marketplace platform, was the coordinator of the various MT/ASR/TTS needs for realizing this initiative, putting us on solid ground to offset this initiative. aiXplain team leveraged technologies from AWS, Azure, AppTek, ModernMT, and Google for MT/ASR/TTS. The lionshare of the translation was by Baidu. Meta and YaiGlobal commissioned the human reference translations for both the titles & abstracts as well as terminology. The website was built by volunteers from Meta and National University of Singapore.
Aspirations for 60-60?
By ACL 2023,
-
we will have a complete translation of the entire ACL Anthology into 60 languages;
-
we will have a comprehensive standardized scientific and CL terminology list with contextual examples in 60 languages;
-
we will have the capability to have live cross lingual CC and Dubbing into 60 (or is that too much to ask, we can negotiate:)) languages;
-
we will have a comprehensive repo for all the talks and videos from the CL community curated and translated both subtitled and dubbed into 60 languages;
-
we will spur research into sign language (we committed resources to having sign language interpretations for the SPLAT workshop providing material for research);
-
we plan initiatives beyond translation for democratization, eg. demystification of CL/AI in general via simplification
Call to Action
-
We would like to leverage our internal scientific community for crowdsourcing translations for all modalities — so please volunteer!
-
We have many tasks that need all hands as well as leaders, so please step up and volunteer
-
Shared tasks anyone? Interspeech challenges, WMT challenges?
-
Other ideas?
Let the conversation begin: dei.ACL.6060@gmail.com
Acknowledgements & Notable Mentions
We would like to highlight and acknowledge the work and dedication of:
-
Mohamed Elbadrashiny (aiXplain) who led a team (Thiago Castro Ferreira, Lucas Pavanelli, Salaheddin Alzubi) generating all the machine translations and their evaluations, crosslingual CC and Voice Over leveraging technologies for MT/ASR/TTS;
-
Evgeny Matusov (AppTek) for English ASR and Spanish voice-over;
-
Georgiana Dinu (Amazon AWS) for language Identification on the ACL anthology papers;
-
Kyle Lo & Lucy Wang (AI2) for leading and coordinating the paper processing and terminology curation effort;
-
Irene Li & Dragomir Radev (Yale University), Qingyun Wang & Heng Ji (UIUC), and Chen Zhang & Grandee Li & Haizhou Li (National University of Singapore) for English Scientific terminology extraction and curation;
-
Hend Alkhalifa (King Saud University, Saudi Arabia) & Nizar Habash (NYU-AD) for Arabic CL terminology;
-
Qingyun Wang & Heng Ji (UIUC) for cross-lingual terminology wikification;
-
Brian Bui (Meta) and Zeid Kasri (YaiGlobal) for providing/commissioning/monitoring the titles/abstracts/terminology reference translations;
-
Badr Alkhamissi (Meta) and Danqing Luo, Siqi Cai, Yiming Chen, Chen Zhang & Grandee Lee (National University of Singapore) for designing and hosting the ACL 2022 D&I 60-60 website;
We would like to acknowledge the very generous contributions from the following:
aiXplain (Hassan Sawaf, Rama Chakaki); YaiGlobal (Mokhtar Sadok); Baidu (Xiaoyun Bao, Zhongjun He); AppTek (Mudar Yaghi); Meta (Sergey Edunov, Necip Fazil Ayan); ACL 2022 (Bernardo Magnini & Priscilla Rassmussen)
Thanks to all technology creators, providers, and researchers for mainstreaming their products that directly contributed to this seed initiative: Amazon AWS, Azure, Google, ModernMT, AppTek, and Underline.io.
Thanks to the following for valuable conversations and feedback:
-
ACL 2022 GC & PC: Bernardo Magnini, Smaranda Muresan, Aline Villavicencio, Preslav Nakov; Laird Smith (ACL 2022 website) & Joel Tetreault (ACL 2022 publicity chair)
-
Community members: Alon Lavie (Unbabbel); Katrin Kirchhoff, Geogiana Dinu, Marcello Federico (Amazon AWS); Imed Zitouni (Google); Asli Celikyilmaz, Nisha Deo (Meta); Tom Hope (AI2); Pascale Fung (HKUST); Rada Mihalcea (UMich); Tom Hope (AI2), Isabelle Augenstein (University of Copenhagen); Thamar Solorio (Univ of Houston); Owen Rambow (Stony Brooke University).
Finally, thank you to the ACL community for 60 years of amazing accomplishments!
Appendix: Company Profiles
aiXplain is the place where nothing stands between you and the power of AI. Build, diagnose, and improve your AI systems and datasets continuously, efficiently and effortlessly. We provide unique and insightful tools, developed by highly experienced leaders in ML science, to serve suppliers, researchers, and customers in the human language technology fields. Our mission is to truly democratize AI by making it accessible to members at every stage of their business development. To learn more visit https://aixplain.com/
AppTek is a global leader in artificial intelligence (AI) and machine learning (ML) technologies for automatic speech recognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U) and text-to-speech (TTS) technologies. The AppTek platform delivers industry-leading, real-time streaming and batch technology solutions in the cloud or on-premises for organizations across a breadth of global markets such as media and entertainment, call centers, government, enterprise business, and more. Built by scientists and research engineers who are recognized among the best in the world, AppTek’s multidimensional 4D for HLT (human language technology) solutions with slice and dice methodology covering hundreds of languages/dialects, domains, channels and demographics drive high impact results with speed and precision. For more information, please visit http://www.apptek.com.
Baidu Translate has been committed to helping users overcome language barriers and find easy, quick access to information and services on the back of Baidu's edge in NLP and Internet, since the first day it was founded in 2010. Its NMT system, released in 2015, now supports more than 200 languages, over 40,000 translation directions. Handling contents in various forms such as text, speech, image, video, etc., it serves hundreds of millions of users and processes over hundreds of billions of characters of translation requests every day.
YaiGlobal is your one-stop-shop for facilitating your NLP/ML projects. We have years of industry experience in managing Big Data using state of the art AI and Cloud computing. We transcribed and translated thousands of files in tens of languages for various customers and our own projects. Whether your project requires generating and/or annotating training data, adopting ML solutions, or applying cloud computing, YaiGlobal will be your dependable partner to deliver solid results.