Large Payloads

Three NwHIN transactions allow for payloads--the Administrative Distribution, Document Submission, and Retrieve Documents specifications. All of these transactions have support for the SOAP Message Transmission Optimization Mechanism (MTOM) specification. CONNECT must support the transfer of attachments up to 1GB, both as one single 1GB attachment as well as multiple messages with an aggregate total of 1GB. The payload size specified (1GB) is before encryption/transmission.

When implementing support for large files, we considered the following:

  • Any implementation must not violate or circumvent NwHIN specifications
  • The solution design and implementation must be generic
  • Implementation-specific constructs must not propagate as a part of SOAP messages
  • Avenues for supporting implementation must be available for web service stacks that do not already support interoperability with the solution
  • Any implementation must coincide with a detailed analysis to discover the following:
    • What thresholds are added, removed, or changed compared to the standard implementation?
    • What impact does a given implementation have on changing underlying web service stack versions?

Design Goals

The aim of this design is to allow CONNECT to send and receive files of up to 1GB without incurring a negative impact on performance for either party of a message exchange and to still be compliant to the NwHIN specifications. By using streaming, a large payload message will not tie up threads and block further messages from flowing and will not require the server JVM to allocate an excessive amount of memory. Improvements in auditing and orchestration will ensure that large files do not impact the performance of the underlying CONNECT components. Allowing large files, or packets of files, up to 1 GB to flow through CONNECT opens up exchange possibilities for CONNECT adopters to send and receive more comprehensive files and different types of content as desired.

Design Considerations and Design Decisions

In the targeted implementation, support for large file transfers will be included for the Gateway-to-Gateway Document Submission, Retrieve Documents, and Administrative Distribution transactions.
HTTP logging may be limited with large files received as payloads and will need to be turned off to accommodate such transactions.

Design Approach

The approach for supporting the transfer of large files over web services is to use a concept called "streaming web services." The basic premise of streaming web services is that a web service message can be received, and processing on that message can start, even before the entire message has been fully received. This feature is generally closely linked with MTOM and multiple MIME boundaries--receiving the MIME boundary with the SOAP Envelope first and beginning to process that web service message before the other MIME boundaries have completely transferred. In other words, streaming web services allows the web service stack and the application server to begin processing a message before the entire attachment stream has been read. Streaming web services using the HTTP chunking feature is the industry standard for transferring large binaries over web service calls. 

HTTP Chunking

Streaming web services are supported by the underlying HTTP specification with the "chunking" feature. HTTP chunking is an optional feature for the sending side but is required for receiving. Therefore, in order to accept streamed large files, receiving NwHIN web services endpoints (SOAP over HTTP) must be capable of receiving chunked content. Chunked encoding is a feature of HTTP 1.1, which is part of the WS-I Basic Profile, but it can be explicitly disabled for specific implementations. If disabled, the document will be buffered before it is sent or received. Using chunking enables the sending side to stream the message and the receiving side to receive either a streamed message or not, depending on their own preferences.

JAXB

CONNECT uses JAXB objects as data objects to process and manipulate a sent or received message. JAXB is a binding framework that helps to represent the SOAP XML messages as java objects. Natively these objects would contain a large payload as a byte array--meaning this large payload would be passed through every java class/method call in CONNECT! Our thoughts on resolving this is if the payload size is above a certain configurable threshold, it will be streamed to a file on the local file system and a reference will be used during gateway processing. If it is below, then the payload will be kept in memory. This approach will provide the best performance possible as the extra overhead of saving the document to the file system should be avoided when possible.

Audit Implications

The current implementation of the Audit Repository reference adapter in CONNECT logs the entire message (including payload) that is sent or received by the gateway. This reference adapter implementation causes problems with extremely large files and supporting payloads up to 1 GB will compound this problem. We have received guidance from the NwHIN specification factory that gives the flexibility to evolve the CONNECT reference adapter to take a much more specification-compliant and much smaller message.

Orchestration Implications

Orchestration in CONNECT is the algorithm of sending a message to auditing, policy, and adapter services as well as gateway logic specific to processing a given message. Audit implications have already been discussed. There is not a precedent or clear use case for the CONNECT gateway to require the payload of a message for policy services or gateway logic; however, the adapter will require a handle to the large payload. In short, any adapters that wish to take advantage of the large payloads capability will need to be modified to be able to process them properly. Namely, it will need to be able to process the document as a stream and not as an in-memory JAXB object. See the Application Programming Interface section for more details.

Data Conversion

To minimize the number of times a file is streamed over the network, the CONNECT Gateway can be configured to parse document payloads as a file URI. This configuration is set in the gateway.properties file with the ParsePayloadAsFileURIOutbound property. When set to true, any outbound request with attached documents (Document Submission request, Admin Distribution request, and Retrieve Documents response) needs to have a base64 encoded file URI value as its payload, and it needs to be accessible to the gateway. The gateway will then access and stream that file when the message is sent out to the NwHIN. Note that this is an either/or configuration as CONNECT does not have the logic to dynamically determine whether the payload is carrying a file URI or the actual document itself.

Similarly, the property SavePayloadToFileInbound in the gateway.properties file will configure CONNECT to automatically save the payload to the file system on inbound. The location where the file is saved is defined by the PayloadSaveDirectory property. When enabled, the gateway will save the document attachment from the inbound message to the file system and replace the document in the message with the URI location of the file. The message is then forwarded to the adapter with this modified message. It will be the adapter's responsibility to delete any files saved in this manner.

CONNECT fully relies on the default capabilities of CXF to provide all its streaming capabilities, and the following CXF java options can be added to provide finer grained tuning for streaming.
* org.apache.cxf.io.CachedOutputStream.Threshold - This is the file size threshold before CXF will save the attachment to disk in the tmp directory. Defaults to 64k.?org.apache.cxf.io.CachedOutputStream.OutputDirectory - This is the tmp directory where CXF streams a file in to avoid loading the entire document in memory. Defaults to java tmp if not specifically set.

Application Programming Interfaces

A schema change was required to allow for streaming. In ihe/XDS.b_DocumentRepository.xsd and ebRS/edxl-de.xsd, the Document (RD and DS) and contentData (AD) elements are given the attribute: expectedContentTypes='application/octet-stream'. This change will direct the JAXB compiler to bind that element to a DataHandler object rather than a default bytes array.

Before
<xs:element name="DocumentResponse" maxOccurs="unbounded">
<xs:complexType>
...
<xs:element name="Document" type="xs:base64Binary"/>
...
</xs:complexType>
</xs:element>

After
<xs:element name="DocumentResponse" maxOccurs="unbounded">
<xs:complexType>
...
<xs:element name="Document" xmime:expectedContentTypes="application/octet-stream" type="xs:base64Binary"/>
...
</xs:complexType>
</xs:element>

This means that any adapters supporting the three large payload services will need to be modified to handle the payload as a DataHandler object.
In addition, configuration parameters related to payload handling in the gateway.properties file were added as part of this effort. They are given in the Data Conversion section.
There are also optional configurations added that are related to Timestamp validation. Although modifying these should not be needed on a normal use case, they are included as they may be useful. Namely these have been added:

  • TimeStampStrict - Boolean to determine whether the validator should check whether the timestamp has expired. Default value is true.?
  • TimeStampTimeToLive (analogous to ws-security.timestamp.timeToLive) - The time in seconds to append to the Creation value of an incoming Timestamp to determine whether to accept the Timestamp as valid or not. The default value is 300 seconds (5 minutes).?
  • FutureTimeToLive (analogous to ws-security.timestamp.futureTimeToLive) - The time in seconds in the future within which the Created time of an incoming Timestamp is valid. The default value is "60".

These configurations will only affect AD and DS. Again, in general, an adopter will not need to modify these values. The only time an adopter will need them is if it has another interceptor that streams the file BEFORE validation occurs. One such interceptor is the LoggingInInterceptor, which is always the first interceptor to run if you have http debug dump enabled under jvm options. But as stated under the Design Considerations and Design Decisions section, HTTP logging should be turned off when doing large file streaming.

Performance

Sending large files over the network does have performance implications. Although the implementation does not load the entire document in memory, it is still processor- and I/O-intensive to send very large files. It is recommended for gateways supporting large payloads to have sufficient resources to use this feature. In addition, downloading or uploading large files over a slow internet connection could result in sockets or the http session thread timing out due to the length of time transferring large files. CONNECT allows the user to configure the timeout length with the webserviceproxy.timeout in gateway.properties. HTTP Session thread timeout length can be configured and is application server specific.