Imagine you’re a data analyst at a financial services firm, responsible for processing files from dozens of vendors. Most files arrive in simple flat formats like CSV or fixed-width text, but now and then, you need to consume vendor data that comes in a hierarchical format like XML or JSON.

JSON (JavaScript Object Notation) is a lightweight, text-based data interchange format. Its flexibility makes it ideal for exchanging complex, nested data across platforms. Here’s a breakdown of what that means:

  • Lightweight: It’s designed to be easy for humans to read and machines to parse, making it efficient for data transfer.
  • Text-based: It uses plain text, which makes it universally compatible across different programming languages and platforms.
  • Data interchange format: Its primary purpose is facilitating data exchange between different systems.

JSON’s simplicity and versatility have made it the preferred format for data exchange in modern software development. It is also beneficial when exchanging data in highly hierarchical formats.

ETL (Extract, Transform, Load) Workflow is a crucial process in data warehousing and data integration. It involves moving data from various sources into a single, consistent destination, typically a data warehouse, to be analyzed and used for business intelligence. Here’s a breakdown of each component:

  • Extract:
    • This stage involves retrieving raw data from various sources. These sources can include databases, applications, files, and more.
    • The goal is to gather all the necessary data for analysis.
  • Transform:
    • This is where the data is cleaned, standardized, and transformed into a consistent format.
    • Transformations can include:
      • Cleaning: Removing errors, inconsistencies, and duplicates.
      • Standardizing: Converting data to a uniform format.
      • Filtering: Selecting relevant data.
      • Aggregating: Summarizing data.
      • Joining: Combining data from different sources.
  • Load:
    • In the final stage, the transformed data is loaded into the target data warehouse or other destination.
    • This data is now ready for analysis and reporting.

This workflow is straightforward when your source is a flat file, but what happens when it’s JSON?

Why JSON Can Be Tricky in SSIS
Microsoft SQL Server Integration Services (SSIS) is a powerful ETL tool, but JSON isn’t its strong suit. Here’s why:

  • Complexity of JSON Structures:
    • JSON’s nested and hierarchical nature can be difficult to handle within SSIS’s traditional tabular data flow paradigm. SSIS is inherently designed to work with structured, row-and-column data.
    • Parsing complex JSON structures requires significant custom scripting, often using the Script Component, which adds complexity and maintenance overhead.
  • Lack of Native JSON Parsing:
    • SSIS doesn’t have robust, built-in JSON parsing capabilities. You often rely on external components or custom code to extract the desired data.
    • This reliance can introduce dependencies and increase the potential for errors.
  • Performance Considerations:
    • Parsing large JSON files within SSIS can be resource-intensive and slow, especially when dealing with complex structures.
    • The overhead of custom scripting and data transformation can impact overall ETL performance.
  • Schema Flexibility:
    • JSON’s schema-less nature can clash with SSIS’s strict schema requirements. Changes in the JSON structure can break SSIS packages, requiring manual adjustments.
  • Development Overhead:
    • The development effort required to handle JSON in SSIS can be substantial, particularly for complex scenarios. Maintaining and troubleshooting these packages can also be challenging.

How We Made It Work
After some trial and error at NRC, we found a practical workaround to make this process work efficiently in our SSIS packages. Standard Script Source component to the rescue, we found that using Script Component transformation to read and deserialize data from the JSON file using JavaScriptSerializer from one of the .NET Assemblies – System.Web.Extensions. Once we had deserialized the data into objects, we could walk through the hierarchical structure more efficiently and create rows in our output buffers as needed. After that, it was all standard SSIS data workflow taking over, validating data formats and saving data into databases, tasks that SSIS does well and do not need to be reinvented.

At New Resources Consulting, we thrive on solving complex, real-world data challenges like this one. Whether you’re dealing with intricate data formats, integration hurdles, or performance issues, our data experts bring a blend of technical creativity and practical experience to every engagement.

Need help designing or optimizing your data workflows? Let’s talk. We’re here to help you turn even the most difficult data problems into streamlined, scalable solutions.

Sergiy Vlasik | New Resources Consulting