No question core dome is committed to providing excellent AGI arithmetic solutions, with large model energy efficiency optimization toolkit as the core, downward linkage of a number of domestic chip companies, upward through the smart computing cloud services, smart computing all-in-one machine a variety of ways to serve the large model algorithm enterprises, collaborative arithmetic, algorithms, ecology to promote the industry's large model of the efficient landing, to build a large model of the AGI era of infrastructure.
Based on the Vaultless Big Model Energy Efficiency Optimization Toolkit, it integrates the arithmetic power of domestic chips to build a unified arithmetic power base, and provides a variety of accelerated arithmetic cloud services including NVIDIA, AMD, Haikou, and TENCENT, shielding the hardware differences and allowing for immediate use out of the box.
For large model private deployment scenarios, we integrate compute acceleration cards, self-developed IP, optimization toolkits, and industry large models to build large model all-in-one machine, maximizing and optimizing the ROI of large model landing.
The founding team members come from the Department of Electronics of Tsinghua University and head Internet/AI companies, with rich industrial experience and successful entrepreneurial experience, rich technical accumulation and academic precipitation, and have published more than 200 high-level academic papers in the field of AI system optimization.
Relevant achievements include: (1) GPU high-efficiency arithmetic library surpassing NVIDIA's commercial library, realizing better performance on mid-range process GPUs than commercial software on high-end process GPUs; (2) high-efficiency sparse inference acceleration architecture to support large models, improving the computational speed of sparse neural networks, graph neural networks, etc. by 1-3 orders of magnitude. The team has already achieved a 50% reduction in latency for large language models on NVIDIA GPUs, and in the future will further combine the capabilities of underlying arithmetic optimization, sparse acceleration, hardware feature awareness, and efficient interconnections to increase the overall acceleration ratio up to 10 times.