|Newsletter for e-Business consultants and practitioners||July-August 1999|
A lot of companies have problems with integration of existing disparate applications, which were not originally supposed to work together. As usual, the problem is not to find some solution, there are already too many, but too find the right one. Of course, thee is no single ideal solution, each company is special and no common solution can fit all. However it is possible to outline some approach, which may fit many and satisfy some basic requirements. Such requirements should include:
This is, probably, not a complete list, but it shows a number of most important features required from such integration architecture. Below we propose one, which fits this requirements, and may be used to build enterprise-wide systems.
Different applications use different data formats, in order to work together the data must be converted and appropriately passed from one application to another. History knows attempts to convert data directly between applications. That made n*(n+1) connectors for n applications. No need to explain how expensive this way was. So now it is considered reasonable to have an intermediate application which passes data from one application to another. This way for any new application you need only connectors to and from this intermediate application. However what if this application is discontinued or support is dropped or supported format had changed and new versions donít support old connectors any more? Many companies found answers to these questions at their own expense. So another important thing is that this intermediate format must be standard and supported by a number of replaceable, preferably off-shelf, products. The same must be true for protocol or whatever is used for data exchange. Apparently, in a modern IT world this data exchange mechanism must be network-enabled.
So what is this standard simple data format and standard wide-available data exchange mechanism? By the book it looks very much like CORBA and IIOP, and in some cases it may be a way. But how about using it both for programmatic and human interface at the same time? Besides, as many companies have found, this protocol and format are neither simple nor cheap.
The proposed solution is to use XML as a data format and http as a protocol. Though it sounds too fashionable at the moment, this combination provides answer to all requirements listed above. Let just look at them:
If you can get the data from application in any connector, you can return them via http. On the input side simple executable can be called through CGI and there is a lot more options. With all current eBusiness development, there is no many ways more simple than http protocol to call an application.
XML is certainly a very strong standard way to represent data structures. Not that this is the only way to use XML, but itís certainly a way. Itís very well defined, and at the same time flexible enough to cover virtually any needs for data representation.
Add XSL stylesheet to XML and give an http access to it, and you have HTML. IBM WebSphere can convert XML file plus XSL stylesheet into HTML on fly, Microsoft IIS is supposed to achieve that shortly, and there is a development around Apache Web-server in the same direction. And HTML is readable with any browser, what could be more standard and usual? For programmatic access itís just enough XML alone, which is well-supported in Java (in at least two ways), on Windows platform in any language capable of using ActiveX controls, including Visual Basic, C++ and a lot more, and gaining more and more popularity each day.
You may rely on browser ability to merge XML file and XSL stylesheet into HTML. In this case your users are limited in their choice of browser, but you donít care about what your web and application server is. On another hand, you can provide XML/XSL merge on a server side. In this case you have to use web/application server capable of doing that on fly, like IBM WebSphere. On an advantage side, user can run any browser, because he receives just plain HTML. To provide pure XML format in this case there are several approaches. First, simplest one, is to have different URLs to get XML and HTML data, like http://srv.com/text.xml and http://srv.com/text.html. Disadvantage is evident, you will have trouble managing any real-size system, created this way. Another way is to have domain name alias to access pure XML data. This way any request sent to http://srv.com will be considered as HTML request, and requests to http://xml.srv.com will return XML data. This is a good approach, but it requires some domain name administration. Third approach is to assign a different port numbers for XML data. In this case http://srv.com goes by default to port 80 (or 81 for https://) and returns HTML response, while http://srv.com:90 returns XML data. This way exactly the same interface (format + protocol) is used for both human and programmatic interface.
Well, what can be more standard than http, used on Internet by millions of people every day, and XML, gaining more and more popularity? May be http+HTML?
HTTP is supported by a huge number of commercial and free products for both client and server side. To run http-server you need no more than 486 (or even less) computer with 4-8 Mb RAM (thatís enough to run Linux with itís out-of-the-box Apache web-server. XML is well-supported by a number of products, not to mention that both IBM and Microsoft announced their wide support of this language, including their servers and Internet products.
Simlicity of http access is astonishing. On a Windows platform you have ActiveX control which takes you URL and returns the data. Thatís it. In Java you can read http-resources practically just like you read simple file. With XML there is less people with experience right now, however considering XML fashion, this should soon change, because handling XML data either with Windows ActiveX control, or through Java libraries is very simple.
Thatís what we said just a paragraph above. Itís simple.
With proposed approach you donít need a special driver to make a component test. All you test driver is just an HTML page with a form, hit "Submit" button and you see what your component returns. And this is true both for final component, which will regularly accessed by users, and for intermediate components, which will be mostly called programmatically. So if something goes wrong you can quickly localize mulfunctioning coponent, and all you need is a simple browser. Not bad.
That's just a "freebee" when you use http protocol. Just purchase certificate from Verisign and use https protocol instead. If you need authentication and login, any reasonable application server (including already mentioned IBM WebSphere and Microsoft IIS) will provide it for you.
Most about the whole schema is already explained above. However itís worth to look at the whole picture. The whole thing is simple: either some component or user with a browser sends http request, which is converted into XML from CGI-encoded format for user. This XML is passed to component though connector, which changes XML into whatever this application understands. After that returned data are converted back to XML and passed to the server, which either returns them "as is" or converts them into human-readable HTML using applicable XSL style sheet.
If a need would arose to use more than a single system, it is possible to connect components between different servers therefore providing a nice distributed environment, like shown in a picture below.
Currently there are two most evident options for application server product in the middle of the whole thing. It can be IBM WebSphere with servlets or Microsoft IIS with ASP. In the first case XML is processed through Java libraries, in the second one a special XML control is available to handle translation. In both cases it's really simple. For XML+XSLà HTML merge WebSphere has embedded capabilities, and IIS can achieve the same purpose with a simple plug-in. Of course, there is number of other products available, but these two provide most readily available features necessary to perform the task.
The proposed architecture answers all the requirements listed in the beginning of this article. It supports component-oriented (and object-oriented) design and provides very impressive way to show the work progress, because each component may be tested separately with a simple HTML form. It uses standard relatively inexpensive off-shelf products and requires minimal development. All basic technological elements are wide-spread and easy available. If it is based on IBM WebSphere or Apache Web-server with servlets it is multi-platform and can be run on practically on any computer in a world, including mainframes, midrange, and obsolete PCs. Even more, you don't have to limit yourself to a single particular product unless you want to limit the required base skill set. Half of your components may run under Microsoft IIS and Windows NT, while another half under WebSphere on mainframe, AS/400 or Linux, and they will work together just fine. This is essential advantage of this architecture compared even to CORBA, where it's a bit tricky to make work together CORBA-based components created for implementations of CORBA by different vendors.
Compared to CORBA this architecture lucks the name resolution service, which is basically ORB (Object Request Broker) in CORBA. As a result URLs became equivalent of UUIDs, which identify components in a unique way. Of course, it's not a big deal to implement a special name resolution component and make all other components to refer to some kind of UUID to resolve it to particular URL. In a servlet model it will be performed only once between system reboots, so it will not affect performance at all. Downside is that such component will implement proprietary name resolution. On another hand this name resolution issues will be located in communications/middleware part of the code, which eventually will be automatically generated, so replacement of name resolution will not create a problem.
That brings another matter, which is not yet handled in the best way. Though manipulating XML is pretty simple already, it will be done only in communications/middleware part, and therefore can be done even more simple through generation from IDL descriptions of interfaces. Currently there is not generators for such kind of middleware, however creation of one looks to be an effort of no more than 2 man-months, with a second month used solely for improvements and custom features.
Does this architecture really has a future? It depends. It certainly can be successfully implemented bringing a strong advantage to companies, which will dare to use it. It's relatively inexpensive to support (especially compared to alternatives) and it is certainly has a potential to "live long and prosper". Technically it is very good. However it is well-known that any technology needs to be sold to get success. And this architecture is technology-based, not product-based. And don't forget, products have sales departments, technologies usually don't. Besides, corporate IT management is not known for ability to select right technology, solutions and approaches en masse. SoÖ it's still to be seen. However this technology has another (may be crucial) advantage. You don't care what you future partners will use. Everybody can use http and XML to connect to your system. If you got a new partner, who need to get updated price lists from your database, it does not matter if they use DCOM, CORBA, DCE, EJB, or even Visual Basic in their systems, your data are readily available to them with a proper security level.
IBM DeveloperWorks (http://www.ibm.com/developer/):
Microsoft MSDN Online (http://msdn.microsoft.com):
OMG Group (http://www.omg.com/ and http://www.corba.org/)