I've listed a few interesting projects here. They are not necessarily the biggest projects but each was interesting and had their own particular challenges.
TrackerNet
2004 - 2008London Underground
A system to provide a web-based display of the trains on the LU network. It was classified as an operationally critical application so 100% uptime during operational hours was expected.
Trackenet consisted of a c# service containing an in-memory model representation of the network (trains, signals and tracks) with a svg based web graphics front end. It also archived the event delta in a SQL server instance to support a replay mode. An interesting & challenging part of this project was the variety of data sources, there was no such thing as clean a restful API, the data sources ranged from 1960’s signalling systems, physical valves located in rooms on the network to the personal radios drivers where required to carry.
Challenges
- A wide range of data sources of varying quality.
- Difficulty in scheduling production changes.
- Supporting the application with a small team.
Trackenet was an interesting project which quickly became mission-critical to the organisation. The initial parts of the network covered where those with the newest signalling system and the best quality data. This was relatively easy as we simply passed on the data from the signalling systems. The business complexity grew when older parts of the network where incorporated using any data we could obtain, the system was then required to model the relationships between tracks, signals and moving trains. Trackenet effectively became the official record of what was happening on those parts of the network.
Business complexity risk was reduced by following a Test Driven development process inspired by Kent Beck's book Extreme Programming Explained. This also allowed helped to maintain quality and continuity when new the Team evolved over time.
Sport Match & Market State Service
2011 - 2015Sporting Index
A collection of systems to collect, normalize, match and publish a large volume of data for nearly all professional sports matches around the word. The system processed everything to do with sports gambling, including market odds from most major bookmakers, match state and upcoming fixtures. The data was using to provide markets and odds offered by sporting index as well as power B to B products
The system consisted on consuming c# services publishing data to a central service in c# for matching and publishing. A variety of technologies where used, however the settled system implemented MongoDb as cache and state storage with rabbitMq for publishing with an Elasticsearch instance for matching and internal metadata. All published events where also stored in a Hadoop HDFS instance.
Challenges
- High Volume of Data.
- Initial Technology Constraints.
- Monitoring Performance & Tracing Data Issues.
- Supporting Historical Data.
The sheer volume of the data consumed and processed was one of the biggest challenges and required a highly optimised system. In the core of the system there was a constant balance between parallelization of tasks and the cost of context switching and thread starvation. As the system evolved and matured we also faced the challenge of ensuring where possible any historical data was upgraded to be compliant and or accurately versioned for downstream applications.
To handle to load and allow for parallelization an SOA approach was followed with the system broken down into as small as services as practical this also facilitated the ability to dynamically transfer processing to difference instance based on demand. Monitoring of the system was done using an internally developed interface for realtime metrics on the services with a wpf front end and splunk for log management and error tracing.
Excel Data Plugin
2019Société Générale
An excel plugin for querying and importing data from a suit of Data APIs. The plugin provided super-users controlled access to data yet still traced and monitored what data was being used.
The plugin utilised a previous API of APIs I had written for querying multiple APIs written for databases (relational databases or snowflake schema cubes). The plugin provided a query builder and if supported by the source api would allow an asynchronous download for large data sets.
Challenges
- Streaming Large Volumes.
- Creating a Generic Interface.
This tool catered to power users with a demand for substantial data volumes, making a seamless download experience crucial. Certain users located outside of France faced the challenge of unreliable network connections, and certain APIs lacked streaming support. To address these situations, an offline creation option was provided, enabling the generation and storage of files on the server. Users would then receive a download link via email for accessing the files.
By placing the mapping complexity in the API of APIs, it became feasible to create a straightforward and versatile query interface. Since the plugin client was an installed application, it facilitated the shifting of complexity to an API that could be easily updated without requiring constant distribution.