library
You rely on software. For instance, you might use the Mozilla Firefox browser or depend on OpenSSL, the cryptographic toolkit that has become a pillar of the modern web. Perhaps the web pages you visit regularly--or the ones your business depends on--are built with components like the web application framework Flask or the database technology MySQL.
Easily overlooked is that the security of the ecosystem, especially the package managers, is handled by, well, hardly anyone.
That's right: the programmer equivalents of Apple's app store, like the Python Package Index, have become critical to modern digital society and yet few people, let alone organizations, have the funding, incentives, and tools to secure them. (OpenSSF, a nascent industry collaboration to secure software, is a bright spot though! Similarly, Benjamin Balder Bach and Hanno Boeck have done admirable work hunting for vulnerabilities and bringing attention to Python typosquatting.)
Earlier this year, IQT Labs started an collaboration with Martin Čarnoguský of sourcecode.ai to build a tool we call AuraBorealis, a web application that makes searching for vulnerable, anomalous, and malicious Python packages easy and, dare we say it, fun. Based on Aura, a static analysis tool Martin is developing to scan source code, AuraBorealis can help anyone concerned with the safety of the entire Python Package Index or the integrity of a subset of Python packages. The GitHub repository can be found here. If you are interested in this app, particularly in discussing a beta test of this app, contact us at jmeyers@iqt.org. You can also create GitHub issues and send pull requests to Aura or AuraBorealis.
The remainder of this blog post explains:
- the static analysis tool Aura, which powers AuraBorealis by scanning python packages for indicators of potential maliciousness,
- AuraBorealis, the user-facing web app that organizes your search for vulnerable and malicious Python packages, and
- the over 20 vulnerable, anomalous, or malicious Python packages found with Aura and AuraBorealis.
Aura: A Python Static Analysis Tool Designed for Large-Scale Package Scanning
Aura is a static analysis tool, which means it can search for indicators of suspicious, anomalous, or malicious code within a Python package without executing the code. Aura can scan hundreds of thousands of Python packages,
A common use of Aura involves looking for code that accidentally contains hardcoded passwords. Security teams using this feature could then notify the software developer responsible for that Python package and ask the developer to change any leaked passwords, thereby protecting themselves and other organizations that use that package. Aura can also scan a particular type of installation script (a "setup.py" script) for anomalies; longtime Python community leaders have acknowledged the dangers of this type of installation script and its susceptibility to abuse. Aura also checks for obfuscated code, performs taint analysis, and can be configured to search for custom patterns. The Aura documentation includes a full list of detections.
While the scan results for a single package can be consumed simply via a command line terminal, a scan of several hundred thousand Python packages can produce approximately 50 GB of audit data. We created AuraBorealis, described below, to help the security-conscious deal with this amount of records.
AuraBorealis: A Web App for Handling Large-Scale Python Security Data
AuraBorealis is the front-end web interface that an IT security team or software developer can use to assess the security of Python packages. Public and private sector organizations can use this tool to vet the Python packages underpinning their operations.
AuraBorealis is an app that presents the user with a series of pre-built tables designed to make it easier to search for potentially anomalous and malicious code. A screenshot of the main user interface is below.
Figure 1. Screenshot of AuraBorealis Homepage
AuraBorealis is a Flask-based Python web app that uses an Elastic database to store roughly ~800 million objects output from Aura's comprehensive scan of the Python Package Index.
Email jmeyers@iqt.org if you are interested in further improvements to AuraBorealis, would like to discuss the project, or would be interested in beta-testing the app. Alternatively, consider submitting issues and pull requests on the AuraBorealis GitHub page.
How We Found Twenty Vulnerable or Malicious Python Packages
Aura's audit data--whether summarized in AuraBorealis or accessed in another format of your choosing--contains a wealth of security-related information that your organization (or the administrators of the Python Package Index) can use to secure your Python supply chain.
In fact, recent analysis using Aura data identified 20 distinct packages with vulnerabilities. (See table 1.)
Vulnerability TypePackage CountLeaked PyPi Credentials11Using Code Downloaded from Pastebin or Other External Site6Leaked Other Credentials5Obfuscated Source Code2
Table 1. Count of Packages by Vulnerability Type. Some packages contain multiple vulnerabilities and so are double-counted.
Eleven packages were leaking Python Package Index credentials, meaning the software developers who created them left their username and password exposed. Malicious actors could abuse these credentials by adding malicious code or taking other harmful actions. Six packages downloaded code from external websites such as Pastebin. In other words, the software developers who published these packages built in the capability to--at any time--change the code that users of these packages execute. Yes, that's dangerous for anyone using those packages and, unless you like the idea of strangers changing your code, should be avoided. Five packages leaked other credentials, such as an Amazon Web Services S3 username and password. Two packages had highly obfuscated code that is, at the least, very suspicious. And one package was confirmed to be malware and removed from the Python Package Index.
You Can't Run, You Can't Hide, But You Can Use Aura and AuraBorealis
You rely on software. Society's dependence on software has become immense and irreversible. AuraBorealis offer one approach to understanding Python packages. We've used it to identify 20 vulnerable or malicious Python packages. If you or your organization depends on Python, we encourage you to use these tools and help us improve them. And if you are interested in piloting these tools or discussing this topic, please contact us at jmeyers@iqt.org.
Thank you to Bentz Tozer, Luke Berndt, Mike Chadwick, and Adam Van Etten for helpful review. Thank you also to George Lewis.
Related Content
To learn more about related research, please explore the following articles:
"Toward Secure Code Reuse," IQT Blog, Feb. 2021
"Counting Broken Links: A Quant's View of Software Supply Chain Security," USENIX ;login:, Dec. 2020
"pypi-scan: A Tool for Scanning the Python Package Index for Typosquatters," IQT Blog, Oct. 2020
"Who Will Pay the Piper for Software Maintenance? Can We Increase Reliability as We Increase Reliance?," USENIX ;login:, Jun. 2020
Martin Čarnoguský, "Attacks on Package Managers," Masaryk University, Bachelor's Thesis, 2019.