现阶段应对 AI 诈骗的研究进展主要包括以下方面:
Require that developers of the most powerful AI systems share their safety test results and other critical information with the U.S.government.In accordance with the Defense Production Act,the Order will require that companies developing any foundation model that poses a serious risk to national security,national economic security,or national public health and safety must notify the federal government when training the model,and must share the results of all red-team safety tests.These measures will ensure AI systems are safe,secure,and trustworthy before companies make them public.Develop standards,tools,and tests to help ensure that AI systems are safe,secure,and trustworthy.The National Institute of Standards and Technology will set the rigorous standards for extensive red-team testing to ensure safety before public release.The Department of Homeland Security will apply those standards to critical infrastructure sectors and establish the AI Safety and Security Board.The Departments of Energy and Homeland Security will also address AI systems’ threats to critical infrastructure,as well as chemical,biological,radiological,nuclear,and cybersecurity risks.Together,these are the most significant actions ever taken by any government to advance the field of AI safety.Protect against the risks of using AI to engineer dangerous biological materials by developing strong new standards for biological synthesis screening.Agencies that fund life-science projects will establish these standards as a condition of federal funding,creating powerful incentives to ensure appropriate screening and manage risks potentially made worse by AI.Protect Americans from AI-enabled fraud and deception by establishing standards and best practices for detecting AI-generated content and authenticating official content.The Department of Commerce will develop guidance for content authentication and watermarking to clearly label AI-generated content.Federal agencies will use these tools to make it easy for Americans to know that the communications they receive from their government are authentic—and set an example for the private sector and governments around the world.
transformative developments yet tocome.27LLMs provide substantial opportunities to transformthe economy and society.For example,LLMs can automate the process of writing code andTransport apps like Google Maps,and CityMapper,use AI.Artificial Intelligence in Banking Industry:A Review on Fraud Detection,Credit Management,and Document Processing,ResearchBerg Review of Science and Technology,2018.Accelerating fusion science through learned plasma control,Deepmind,2022; Magnetic control of tokamak plasmasthrough deep reinforcement learning,Degrave et al.,2022.Why Artificial Intelligence Could Speed Drug Discovery,Morgan Stanley,2022.AI Is Essential for Solving the Climate Crisis,BCG,2022.General Purpose Technologies – Handbook of Economic Growth,National Bureau of Economic Research,2005.The UK Science and Technology Framework,Department for Science,Innovation and Technology,2023.In 2022 annual revenues generated by UK AI companies totalled an estimated £10.6 billion.AI Sector Study 2022,DSIT,2023.DSIT analysis estimates over 50,000 full time workers are employed in AI roles in AI companies.AI Sector Study 2022,DSIT,2023.For example,AI can potentially improve health and safety in mining while also improving efficiency.See AI on-side:howartificial intelligence is being used to improve health and safety in mining,Axora,2023.Box 1.1 gives further examples of AIdriving efficiency improvements.Large Language Models Will Define Artificial Intelligence,Forbes,2023; Scaling Language Models:Methods,Analysis &Insights from Training Gopher,Borgeaud et al.,2022.A pro-innovation approach to AI regulationfixing programming bugs.The technology can support genetic medicine by identifying linksbetween genetic sequences and medical conditions.It can support people to review and
随着AI不断发展,AI应用中的新功能带来新漏洞,现有企业,研究学者已加强对“越狱”的研究。OpenAI提出了通过“指令层次结构”来修复“忽略所有先前指令“攻击的方法。这确保LLM不会为用户和开发人员的指令分配同等优先级。这已在GPT-40 Mini中得到部署。Anthropic在多重越狱方面的工作表明了“警告防御”的潜力,它在前面和后面添加警告文本,以警示模型不要被越狱。与此同时,Gray Swan AI的安全专家已试用“断路器”。它不是试图检测攻击,而是专注于重新映射有害表示,这样模型要么拒绝遵守,要么产生不连贯的输出。他们发现这比标准拒绝训练效果更好。LLM测试初创公司Haize Labs与Hugging Face合作创建了首个红队抵抗组织基准。它汇编了常用的红队数据集并根据模型评估它们的成功率。同时,Scale根据私人评估推出了自己的稳健性排行榜。除了越狱之外,还可能存在更隐蔽的攻击虽然越狱通常是安全挑战中早已公开的事实,但潜在的攻击面要广泛得多,涵盖从训练到偏好数据和微调的所有内容。例如伯克利和麻省理工学院的研究人员创建了一个看似无害的数据集,但它会训练模型响应编码请求产生有害输出。当应用于GPT-4时,该模型始终按照有害指令行事,同时避开常见的保护措施。安全研究LLM能否提高自身可靠性?