Avoid manual, painful data work. Unlock automation with our AWS SFTP service
You have data stuck within internal ERP, POS, and CRM systems. You want to bulk export data from Salesforce Marketing Cloud, Adobe Analytics, and other cloud applications.
The Openbridge Batch Data Pipeline Service (DPS) is a fully managed, S3 SFTP service. You can push data to our SFTP service, which allows you to transfer recurring batch feeds quickly.
What are some use cases for Openbridge SFTP S3 Transfer Gateway?
Our AWS SFTP service is designed with openness, flexibility, and simplicity in mind. An ecosystem built on this foundation ensures that customers can get value from their data more quickly.
- Leverage existing IT investments: Perfect for automated SFTP exports from internal systems like an ERP, ETL, MySQL, SQL Server, Oracle or other enterprise systems
- 3rd-party support: Process exports from 3rd party systems like Salesforce Marketing Cloud (see How to export data from ExactTarget) and Adobe Analytics (see How to export data from Adobe Analytics)
- Ad hoc file processing: Process ad hoc CSV files (e.g., sales reports, media plans, lookup files or any other CSV file) that you want to get loaded into your data lake or warehouse
- Cloud or on-premise: Use our fully-managed cloud service or deploy the service on-premise.
- AWS S3 or Google Cloud Storage: Our service can support either AWS S3 and Google Cloud storage file systems.
"Thanks to Openbridge, we can now communicate and act on the marketing performance data. Analysts, execs, and team members from multiple departments can compare, filter and organize the exact data they need on the fly, in one report. No more waiting for several, static reports to fill their inbox on Monday morning"
Benefits of AWS SFTP batch processing pipelines
Our cloud-based, AWS SFTP data pipeline increases the velocity of getting CSV files batched from upstream systems to a target data lake or cloud warehouse.
- Schemas and Tables: Automated and dynamic schema creation, versioning and history
- Versioning: Automated table and view creation, management and versioning
- Scale: Whether you have 1K file to 1TB file sizes or 10 files to 100,000 files, we scale with you
- Destinations: We handle the processing, routing, and loading to your target data lake or warehouse destination
- On-premise: Run our Docker API or your choice of batch clients securely on-premise or in the cloud
- Deduplication: Avoid duplicates with our automated deduplication of previously loaded records
- Testing: Use our web or API to validate CSV files for compliance with established norms such as RFC4180
AWS SFTP file transfer testing and validation
Comma-separated values (CSV) are commonly used for exchanging data between systems with SFTP, a typical transport method. While this format is standard, it can present difficulties for data processing. Why? Different tools, or export processes, often generate outputs that are not CSV files or have variations that are not considered "valid" according to the RFC4180.
Pair the Openbridge AWS SFTP service with our file validation API. Test CSV files for compliance with established norms such as RFC4180 prior to sending them to the SFTP service. Our open API will assist users in determining the quality of CSV data prior to delivery to upstream SFTP data pipelines. The API will also generate a schema for the tested file, which can further aid in validation workflows.
AWS transfer for SFTP vs. Openbridge SFTP S3 Gateway
The Amazon and Openbridge SFTP S3 services enable you to set up a Secure Shell File Transfer Protocol (SFTP), FTPES, or FTP into and out of Amazon Simple Storage Service (Amazon S3 buckets) storage. However, this is where the similarities end.
While AWS Transfer supports necessary file transfers, the Openbridge SFTP S3 Transfer Gateway offers the most feature-rich, enterprise-grade file transfer, and data processing solution available on the market.
Openbridge supports file sharing as well as creating data pipelines that allow you to use SFTP to automate, process, and load to target data lakes or warehouse like Azure Data Lake, BigQuery, AWS Athena, AWS Redshift, or Redshift Spectrum.
Why create your own SFTP server? Use our managed SFTP service for consolidating data to a data lake or cloud warehouse so you can easily use your favorite analytic tools like Grow, Tableau, Microsoft Power BI, or Looker.
Setup SFTP server AWS, Google Cloud, Azure, or on-premise
If you would like a self-service, self-hosted option, we do offer a license for Openbridge SFTP S3 Transfer Gateway. Packaged in a portable Docker service, our on-premise, self-hosted option is perfect for customers that need more control over the service. Deploy on AWS, Google, or as an Azure SFTP, FTPES, or FTP service. The self-hosted version allows you to add live virus and malware scanning, quotas, traffic shaping, geolocation restrictions, and other processing rules.
We make setting up an on-premise, AWS, Google Cloud, or Azure SFTP server a breeze. Contact us for more details on how you can take advantage of this solution.
Keeping track with automated data catalog
If upstream data changes, we automatically version tables and views with a data catalog. Data is analyzed, and the system trained. Data governance rules trigger the automated creation of databases, views, and tables in a destination warehouse or data lake for transformed data.
Data integrity, consistency, and accuracy
We de-duplicate data assets from real-time or batch source systems to improve data accuracy. We use machine learning algorithms behind the scenes to learn how to identify duplicate records prior to loading data into a target system.
Automatic, efficient data partitioning
Data partitioning is part of our data lake and pipeline processing, this optimization ensures the minimal data scanning for a query. Our optimization approach improves performance by reducing the cost of data stored in your lake or clouse warehouse.
Open-source, optimized Apache Parquet
We convert data into an efficient and optimized open-source columnar format, Apache Parquet. Using Parquet lowers costs when you execute queries as the files columnar format optimizes for interactive query services like Amazon Athena, Redshift Spectrum.
Data routing to preferred destinations
Data routing allows you to easily map a data source to a target data destination. Route data to different regions or you can choose to route some data sourcs to a data lake and others to a cloud warehouse. This allows you to easily partition according to preferred data governance strategies.
Improved data literacy with metadata generation
We append additional metadata unique to information resident in a record. Your tables and views will include a series of system generated fields that provide users with vital information about the meaning of the data we collected on your behalf.
Faster innovation, flexibility, and freedom from vendor lock-in. We help customers stay nimble so you can meet whatever your priority the business demands, now or in the future.Learn more about our platform
Free your team from painful data wrangling and silos. Automation unlocks the hidden potential for machine learning, business intelligence, and data modeling.
80% of an analysts’ time is wasted wrangling data. Our platform accelerates productivity with your favorite data tools to save you time and money