The Path to a Successful Data Integration Project Part 1

by Fiona Hamilton, Vice President, EMEA Operations at Volante Technologies Ltd

Part 2

Part 3

The Financial Services world is littered with horror stories of late or failed data integration projects. Even those which ultimately become a success are often only implemented after much stress both emotionally for those involved, and fiscally as budgets are often exceeded. How can we learn from the past to mitigate the mistakes, often made over and over again in successive projects?

The subject of data integration is in itself complex and therefore this article will concentrate on the drivers and business case for embarking on a common integration architecture in the first place. A second article will then address the pre-implementation phase, where decisions regarding architecture and potential software selection take place, and a final third will address the actual implementation phase itself.


Why are data integration projects so difficult?

I have spent the past 29 years working in Financial Services technology and for 27 of them almost exclusively in the field of data integration, specifically around financial standards-based message integration such as the adoption of SWIFT, ISO 20022, FpML, FIX, CREST and Omgeo, to name but a few. I have been involved in considerably more than one hundred such projects in approximately 30 countries spanning most asset classes and from the front to the back office. Over that period I have probably encountered almost every mistake that can possibly be made in an integration project and in some cases perhaps made them myself, though I hope only the once. So whilst it would be impossible to list every one of those pitfalls, I would assert that apart from the usual project issues such as poor internal communication and governance that are generic across any project, data integration suffers from an almost universal and incorrect assumption; “How hard can it be?” Those five little words are the gateway to hell. Whilst undoubtedly most data integration projects are not on the scale of complexity of writing, for example, a Settlement or Trading system from scratch, they rarely can be said to be “easy”.


Why embark on a data integration project in the first place?

There are many drivers that cause financial organizations to embark on a data integration project and broadly speaking these can be grouped as suggested on the diagram below.

These drivers can be broken down into fundamental issues of responding to external regulatory or technology change, or aspirations related to improving the bottom line by increasing either productivity or releasing revenue streams more quickly. These imperatives apply to all sorts of financial institutions and their customers. However, when we strip away all the legal jargon associated with regulation – the “business speak” regarding providing increased shareholder value and technological gobbledegook relating to the latest and greatest method of deployment – it often comes down to one thing, improving the flow of information in a timely and accurate fashion.

The flow of information can either be between processing systems or between those internal systems and external parties such as customers, settlement & clearing utilities, brokers, custodians, repositories and regulators.

All but the smallest of companies will typically have multiple implemented software applications that enable their business activities. In the case of global organizations, this may amount to many hundreds. In an investment bank, for example, these disparate systems will be handling the trading, confirmation, settlement and reconciliation of various financial instruments such as equities, fixed income, FX, OTC and exchange traded derivatives, commodities and payments. In addition, there will be shared functions such as accounting, risk, CRM and compliance. Whatever the requirement for a company to operate efficiently, the applications have to share information as part of the overall business process. To complicate matters further, no organization works in isolation, so a portion of that application data must be exchanged with external parties.

Many companies therefore, understandably, organise themselves according to those functional areas with their accompanying systems acting as islands or silos of operation. Generally, they will do a sufficient job, albeit often with very high costs of implementing each connection and its ongoing maintenance. However, the challenges facing board or C-level executives are of an entirely different nature. The business needs to be considered holistically which is often difficult or near impossible because the sharing of information at either the detail or the consolidated level is simply too difficult. Implementing changes can prove difficult because the implications to IT infrastructure will throw up serious hurdles.

For example, a Group CFO requires consolidated accounting information which may come from separate geographical operating units, each operating their own general ledgers. A group compliance officer may need to report trading activity in particular instruments such as Over-the-Counter (OTC) Derivatives, which are traded in many countries, to domestic or other regulatory bodies. A CMO needs to consider trends in trading or sales activity at varying levels of detail. CIOs and CTOs then have the challenge of fulfilling these requests with an ocean of differing systems representing information in different ways. Information by definition is concerned with conveying meaning, but it is always underpinned by data.

When systems need to share information or store it in a database then it is often referred to as data. However when a system needs to communicate that data/information with external parties, then it will usually be structured into a particular format that the sender and receiver agree upon, which could be by mutual agreement (proprietary) or could be according to an international standard such as one defined by ISO, FIX Trading Community, ISDA, ANSI or UN/CEFACT. These high level standards are then often specialised by domestic, regional or commercial entities for their own purpose. For example ISO 20022 being used to underpin SEPA or DTCC Corporate Actions and FpML being the basis for various trade repository services. This externally-communicated data is normally referred to as a message. The reality is, however, that a message is just data structured in such a way that an external party can understand it. It is still just data. So when we talk about data integration it also encompasses messaging and so perhaps a better term would be “information integration”. Unfortunately, the term data integration is what we are stuck with, which is somewhat unfortunate as it inevitably makes it seem like a technical domain issue only, as opposed to information, which is generally perceived as having business and therefore ultimately dollars and cents value.


So what are the underlying barriers to providing C-level and other senior executives with the information agility they require to not only operate their business efficiently, but also upon which to make decisions to evolve and grow the business?

The heart of the problem is that no two systems will represent data, or more specifically the individual data elements, in the same way. For example, one may represent a currency as the ISO 4217 three character alphabetic code “GBP”, another as the three digit ISO 4217 numeric code “826” and yet another, perhaps a legacy mainframe system written many years ago, as “STG” meaning Sterling. Considering that even relatively simple transactions such as payments have tens, if not hundreds, of data elements all called different names, with different values and sometimes different lengths, then the challenge of supporting even a simple request such as “show me all Customer X’s transactions across all transaction types in U.S. Dollars either in the U.S. or U.K.” becomes much harder than it may at first seem. Complex instruments such as OTC Derivatives are at the other end of the complexity spectrum and thus these challenges are magnified.

To compound matters further, in the processing of a single transaction such as an equity trade it will often have to interact with more than one system even if the currency is agreed and the three character ISO 4217 code is always used. In this case it is unlikely that both systems will have the same names, content and length constraints for every single constituent field not to mention that the order of the fields is almost certainly going to be different. In its lifecycle, the trading and settlement information will have to be represented or presented in many different formats; the trade initiation from a customer in a web portal format, the capture in the order routing and trading systems, the ETC process in the middle office, the interface to the risk and compliance applications, the communication of the settlement instruction to a custodian or domestic utility such as CREST and finally the same information needs to be passed to a reconciliation system. Even this simple example requires the same information to be represented in at least seven different formats and this is by no means a complicated scenario.

As stated above, many companies organize themselves into silos sometimes with their own IT development (engineering) team or in some cases with a shared in-house or out-sourced one. Often each one of these interfaces between systems is then coded in the programming language of choice, such as Java, C++ or C#. Within each silo this generally achieves the basic minimum requirement that point-to-point integration is implemented. For small companies with limited numbers of systems this can sometimes suffice. However, for larger organizations and indeed even for smaller ones it comes with downsides that increase the more systems that are involved.

Firstly, no two programmers will ever write code in the same way and this is especially true with modern object oriented languages such as Java. If you have seven interfaces to write across multiple departments, the chances are that all seven will look different even though they all inherently represent the transformation of the same transaction from one format to another. Additionally, even though the programmer originally started from a specification written by a business analyst, inevitably the resultant code will deviate over time as changes made during bug fixing and enhancements are often not reversed back into documentation. Personnel also change or move onto other projects meaning on-going support and change become increasingly difficult and expensive as resources struggle to understand why code was written in a particular way. Also by having point solutions, very often the same functions are written many times over; for example a number of interfaces may have to understand SWIFT format messages and therefore the organization inherits three sets of code which have been written to read, create and validate that particular format. These structures also change on a yearly basis so even if the original programmers remain, the same changes need to be applied and tested three times.

In all these existing scenarios, at no point is there a consistent centralised representation of what the transaction is; so the point-to-point approach does nothing to facilitate an overall view of the transaction lifecycle that can underpin the C-level type business information requirements.


So what is the answer? A companywide shared data integration strategy that implements the following:

A normalised canonical model which is the result of a business analysis of all the data sets to create a common understanding of each field with information then either mapped into this model or out of it. This approach provides a number of benefits:
The basis for a data store combining all systems information in a common format enabling consolidated and consistent business information once persisted into a data warehouse and then accessed by either simple queries or via “big data” technology such as Apache Hadoop®.
It enables the disassociation of information sources and their destination allowing all systems data structures to map to a single format. For example, if there is a change to an external message format that in a point-to-point implementation would require all systems to communicate in that format in a canonical model approach, it only has to be implemented once and only that change need be tested, not all three systems regression tested.

A Service Oriented Architecture (SOA) typically centred around an Application Server or Service Bus technology. This provides a number of benefits:
Common functions such as validation of particular data structures or lookups of reference data or compliance checks can be shared by all systems that require them and only implemented and supported once.
This facilitates the physical means of communicating the data either as files or messages over a myriad of communications protocols which also enables centralised management of the transport infrastructure.
SOA generally comes with all the required security authentication and encryption facilities built in.

A model driven code generation tool, preferably with built-in data models for external standards. This provides an number of benefits:
Rapid development environment.
Consistent generation of code, unlike manual coding.
Generation of support documentation that exactly mirrors the function of the code.
Out-of-the-box support for external data models that are maintained and delivered pre-tested.
The basis for all transformation, enrichment, validation and routing of data/messages dynamically carried out on an end-point specific basis.
The ability to maintain end-point specific data formats and rules either based on existing standards or proprietary ones.
Automatic application of upgrades without having to re-write code.
Removes issues involving changes in programming staff as the development environment shows the graphical data structures, the validation, transformation and enrichment logic without having to look at a single line of code.

Whilst the devil is always in the detail, only by looking to implement these three constituent components of a careful and rigorous data integration strategy can a company set in place a responsive and cost efficient architecture. This will then not only enable straight-through-processing but also provide the basis for accessing information often lost to the management team. It is by no means a trivial task, but the downstream benefits are enormous to both the business and IT operations.

To know more about financial message and data integration visit