Many organizations utilizing the favored open supply Apache Airflow platform to schedule and handle workflows could also be exposing credentials and different delicate information to the Web due to how they use the expertise, researchers have discovered.
Safety vendor Intezer this week stated it lately found a number of misconfigured Airflow cases exposing delicate info belonging to organizations throughout a number of industries, together with manufacturing, media, monetary companies, info expertise, biotech, and well being.
The uncovered information included person credentials for cloud internet hosting companies, cost processors, and social media platforms, together with Slack, AWS, and PayPal. Intezer discovered that a minimum of among the information uncovered through misconfigured Airflow cases might permit menace actors to achieve entry to enterprise networks or execute malicious code and malware in manufacturing environments and on Apache Airflow itself.
“It’s fairly simple to search out uncovered cases,” says Ryan Robinson, safety researcher at Intezer. To find one, all a menace actor should do is scan IP addresses and verify them for the anticipated HTML file. “It’s trivial to search out delicate info on uncovered cases, however to take advantage of it to run code is way more durable and requires a strong understanding of every platform,” Robinson provides.
Organizations use Apache Airflow to create and schedule automated workflows, together with these associated to exterior companies, akin to AWS, Google Cloud Platform, Microsoft Azure, Hadoop, Spark, and different Apache software program. A survey of its utilization in 2020 confirmed most of its customers are information engineers, scientists, or information analysts at midsize to giant corporations. Greater than three-quarters of organizations do little to no customization of the expertise earlier than utilizing it.
Airflow permits customers to orchestrate jobs that contain a number of duties, Robinson says. For instance, he says, a job may contain producing reviews, then emailing them to shoppers; one other job may contain amassing, processing, and importing information to AWS buckets.
Whereas Airflow provides customers a number of choices to make use of it securely, organizations can put information in danger by the way in which they use the platform.
Intezer, as an example, discovered insecure coding practices to be the commonest trigger for credential leaks in Airflow. Intezer’s analysis uncovered a number of Airflow cases wherein passwords had been hardcoded both into the Python code for orchestrating duties or in a function that enables a person to outline a variable worth. In different cases, Intezer discovered customers misusing an Airflow function known as Connections and storing passwords in plaintext as an alternative of encrypting them.
“Airflow provides good choices to retailer delicate info securely by their Connections function,” Robinson says. The function permits organizations to make sure passwords which are used to push and pull information from different methods are saved in encrypted vogue. “For instance, a job will obtain information from one platform utilizing an API key, then course of this information in one other job and retailer this information in a database utilizing a password to attach. One workflow could must work together with a number of distant methods,” Robinson says. Customers usually misuse the Connections function or straight hardcode the credentials into the Python scripts, bypassing the function altogether, he notes.
Intezer discovered different methods wherein customers can put enterprise information in danger by insecure use of Airflows. One instance entails the settings associated to an Airflow configuration file that usually accommodates delicate info, akin to passwords and keys. If the setting isn’t safe, anybody can entry the configuration file from the Net server person interface, Intezer stated in its report. Equally, a function in older variations of Airflow that enables customers to run advert hoc database queries is harmful as a result of it requires no authentication and permits anybody with server entry to get info from the database.
Intezer recommends all organizations utilizing Apache Airflow replace to the newest 2.0.0 model of the platform and to ensure that solely licensed customers are allowed to connect with it.
“Model 2.0.0 has made nice enhancements in safety,” Robinson says. The brand new model has a totally supported API, in contrast to the experimental API in earlier variations. Different main enhancements embrace imposing authentication and eradicating delicate info from logs, in addition to adjustments to the construction of the principle configuration file, he says. Some older — and harmful — options akin to Advert-Hoc Question have been deprecated within the new model of Airflow.
Robinson says it is exhausting to know for certain if attackers are concentrating on insecurely configured Airflow platforms; nonetheless, he says it could be an inexpensive assumption that Airflow cases have been focused.