《Fanar:以阿拉伯语为中心的大型语言模型.pdf》由会员分享,可在线阅读,更多相关《Fanar:以阿拉伯语为中心的大型语言模型.pdf(9页珍藏版)》请在三个皮匠报告上搜索。
1、Qatar Computing Research Institute(QCRI)1Arabic-Centric Large Language Modelswww.fanar.qaHamad Bin Khalifa UniversityQatar Computing Research Institute(QCRI)Fanar-A National ProjectProject SponsorProject inception and development(30+scientists and engineers)Technology partners&infrastructure provide
2、rsCross-Partners Project ManagementPartners Hamad Bin Khalifa UniversityQatar Computing Research Institute(QCRI)Motivation Behind Building FanarPreserve Arabic language and its dialects in the era of AI and LLMsCultural alignment&awareness relevant to 0.5B Arabs&2.0B MuslimsTechnology ownership&Digi
3、tal sovereigntyBuilding in-house capacity Content%on the WebHamad Bin Khalifa UniversityQatar Computing Research Institute(QCRI)Fanar ModelsA family of highly capable Arabic-centric LLMsTrained on the largest amount of Arabic content(Over 1.3T cleaned tokens)40%Arabic,50%English,10%codeDual ModelsFa
4、nar Star:developed full-stack from scratch in-house 7B parametersFanar Prime:trained on Gemma 2 9B model 8.7B parametersModern Standard Arabic(MSA)+Multi-dialectal supportHighly tuned and custom components Multi-modality support(e.g.,audio,images,etc.)Hamad Bin Khalifa UniversityQatar Computing Rese
5、arch Institute(QCRI)Fanar Agentic FrameworkFanar Star(7B)Fanar Prime(8.7B)Islamic RAGAttribution RAGMulti-Modal GenerationInput Safety FiltersOutput Safety FiltersAuthentication/Admission GateRecency RAGClassification&Orchestration Backbone Connecting to Fanar through UI or APIs Different services c
6、onnected through orchestration framework Services are activated based on prompt classification Fanar models are the brain of the systemHamad Bin Khalifa UniversityQatar Computing Research Institute(QCRI)Fanar Distinguishing Features(I)Fact-