1、Bridging the Digital Divide with:Strategies for Inclusive Digital TransformationLidong BingDirector of the Language Technology LabDAMO Academy of Alibaba GroupJuly 31st,2024.Mekong ForumWhich have you used?https:/ wildly GenAI is adopted46.872.784.484.7JAPANGERMANYCHINAUSCompanies using generative A
2、I(in percent)Generative AI will be every where on our planet https:/ AI will be every where on our planet How inclusive LLMs are?LLMs(ChatGPT,Claude,LLaMA,Mistral,etc.)are widely used globally,how multilingual they are?Most(famous)models generally exhibit strong performance in EnglishHigh-resource l
3、anguages(e.g.,Chinese)also receive relatively good supportWhat about other languages?Stats source:https:/ of LLMs for Southeast AsiaLinguistic studies have revealed that there are more than 6,500 human languages in the world.Southeast Asia is a linguistically diverse region of the world,e.g.300 dial
4、ects in ID.Global models lack SEA-lang support.Latin vs non-Latin performance contrastSome SEA languages lack data severelyLack multilingual instruction dataNot all languages are created equal!https:/commons.wikimedia.org/wiki/File:Flag_map_of_South_East_Asia.pngState of LLMs for Southeast Asialatin
5、-script v.s.non-latin scriptNot all languages are created equal-big performance gap between:high-resource v.s.low-resourceM3Exam:A Multilingual,Multimodal,Multilevel Benchmark for Examining Large Language Models.NIPS Dataset and Benchmarks 2023.SeaLLMs for Southeast Asian LanguagesBuilt to serve Sou
6、theast Asia with support for English,Chinese,Indonesian,Vietnamese,Malay,Thai,Lao,Khmer,Burmese&Tagalog.Aim to achieve greater reception from research communities and industries in Southeast Asian countries.Adapted to local culture and regulationsGoal of SeaLL