Wednesday, April 8, 2015

Presentation Layers for Data Entry



Presentation Layers for Data Entry
M 917 536 3378
maksim_kozyarchuk@yahoo.com




Overview

   Data entry applications come in a number of styles including manual data entry via form-based user interfaces, API based data entry and file-based data uploads.   Effective data entry tools often involve significant amount of custom business rules describing data validation and automatic defaulting and calculations.  Furthermore, any complete data entry system needs to supports two or more independent entry channels, these typically include manual data entry for voice orders or exception based workflows and API or file based entry for handling high volumes of data.   A key success factor for an effective data entry system is to apply a consistent set of business rules regardless of the entry channel.   In this post, I will show how to  leverage the ReactiveFramework to achieve functional web-based user interface and a file based data entry application that share the same set of calculation and validation rules.


Web Based Data Entry

  Developers of web based data entry applications, are faced with a few natural choices.  

  1. Leverage, one of the complete web frameworks such as Django, Rails which enable rapid development of CRUD entry screens

  2. Defer business rules execution until the submit button is pressed

  3. Build dynamic features of the page using client-side JavaScript

  While these choices work well in many domains, they pose a couple of challenges when applying to complex data entry screens tuned to minimize entry errors and reduce time it takes to perform the data entry.  First, they lead to client-side JavaScript that duplicates business rules maintained on the server. Second, full stack web frameworks are structured in such a way that it’s difficult to keep business logic separated from presentation logic leading to further duplication of business rules for API and file-based data entry channels.  

  The following example demonstrates how to achieve functional web based data entry user interface leveraging the ReactiveFramework separating business rules from the web framework layer. The example will build on the FXTrade use case and ReactiveFramework introduced in an earlier post.   I will use Flask for this example because I find that it requires the least amount of setup.

The example is implemented using the following routes
  • /fxtrade which renders FX Trade Entry form and performs validation of the form
  • /set_field which is called on field change event from the browser and returns the list of updated fields

ReactiveFramework requires knowledge of whether or not a field value has been updated explicitly by a user as opposed to automatically calculated.  To support this, I’ve introduced a server-side cache of active ReactiveFramework instance via set_rf()/get_rf() methods and to support state transfer between calls, I added framework_id attribute to the HTML Form and calls to set_field route.   Cacheing of ReactiveFrameworks can be implemented using any number of common server side cacheing tools(i.e. MemCache, Redis..), it’s also possible to convert the actual ReactiveFramework engine to JavaScript and run the state machine in the browser keeping only the business logic on the server.

from flask import Flask, render_template, jsonify, request
from reactive import ReactiveFramework, FXTransaction
from fxform import set_rf, get_rf, FXTradeForm

app = Flask(__name__)

@app.route('/fxtrade', methods=['GET', 'POST'])
def fxtrade():
   if request.method == 'GET':
       rf = set_rf(ReactiveFramework(FXTransaction()))
   else:
       rf = get_rf(request.form['framework_id'])
       
   form = FXTradeForm.create(rf , request.form)
   if request.method == 'POST' and form.validate():
       return "Trade Validated!!!"
   else:
       return render_template('form.html', form=form)
       
@app.route('/set_field')
def set_field():
   rf = get_rf(request.args.get('framework_id'))
   modified = rf.set_value(field_name = request.args.get('name','', type=str),
                           value = request.args.get('value'))
   
   result = dict((mfield,str(rf.get_value(mfield)))
                       for mfield in modified)
   return jsonify(result)

app.secret_key = 'Secret'
if __name__ == '__main__':
   app.debug = True
   app.run()


Above implementation works with Flask’s jinja2 templating engine and WTForms framework to handle the actual rendering of the HTML form.  Next, is a minimalistic  jinja2 template that renders FX Trade Entry form and sets up JQuery based binding to call /set_field route on field value changes and to update recalculated fields with new values.

<!doctype html>
<title>FX Trade</title>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>

<script type=text/javascript>
 $(function() {
   $('input').change( function(event) {
 $.getJSON( '/set_field', {
 name: event.target.name,
 value : event.target.value,
 framework_id : $("#framework_id").val()
     }, function(data) {
     for (var key in data) {
     if (data.hasOwnProperty(key)) {
          $("#"+key).val(data[key]);
     }
    }
     });
     return false;
   });
 });
</script>

{% macro render_field(field) %}
 <dt>{{ field.label }}
 <dd>{{ field }}
 {% if field.errors %}
   <ul class=errors>
   {% for error in field.errors %}
     <li>{{ error }}</li>
   {% endfor %}
   </ul>
 {% endif %}
 </dd>
{% endmacro %}

<h1>FXTrade</h1>
<form action="" method="post" name="validate">
   {{ form.hidden_tag() }}
   {{ form.framework_id }}
   {{ render_field(form.action) }}
   {{ render_field(form.primary_amount) }}
   {{ render_field(form.secondary_amount) }}
   {{ render_field(form.deal_fx_rate) }}
   {{ render_field(form.commission) }}
   <p><input type="submit" value="Validate"></p>
</form>
The last component is FXTradeForm class which extends the Form class from Flask’s WTForms extension package.  This class defines the actual fields presented to the user and how those fields map to ReactiveFramwork fields.  It’s important to note that the FXTradeForm class provides only the presentation layer and does not assume any responsibility for business logic.  Validation methods invoked on the form class will map to validators defined in ReactiveFramework.

from flask.ext.wtf import Form
import wtforms
from wtforms.validators import ValidationError

class FXTradeForm(Form):
   UI_FIELDS = [ ("Action",'action',wtforms.StringField),
                ('Primary Amount','primary_amount',wtforms.DecimalField),
                ('Secondary Amount','secondary_amount',wtforms.DecimalField),
                ('FX Rate', "deal_fx_rate",wtforms.DecimalField),
                ('Commission','commission',wtforms.DecimalField) ]

   @classmethod
   def create(cls, rf, form = None):
       def get_validators(rf_field_name):
           def validator(form, field):
               msg = rf.get_field( rf_field_name ).validate()
               if msg:
                   raise ValidationError(msg)
           return [validator]
       
       def build_ui_field( ui_field, rf_field_name, ui_class):
           return ui_class(ui_field,get_validators(rf_field_name))
       
       setattr(cls, 'framework_id',wtforms.HiddenField('framework_id', default = rf.id))
       for ui_field, rf_field_name, ui_class in cls.UI_FIELDS:
           setattr(cls,rf_field_name, build_ui_field(ui_field, rf_field_name, ui_class))
       return cls(form)


Above example achieves an interactive data entry experience for the web.  It can be easily adapted to other python web frameworks such as Django.  One criticism of this design may be that all calculations require a server roundtrip.  However, it’s possible to leverage python ecosystem tools such as Pyjs to move some of the ajax calls into native JavaScript that is executed on the browser.  


File and API Based Data Entry
   UI based data entry screens are an important component of an effective data entry system, however manual entry is expensive, error prone and often unnecessary.  With modern Front/Middle and Back Office systems, majority of data entry is performed through electronic means such as near real-time FIX messages or periodic file based uploads.   

   Trade files could be sent daily from trade execution platforms, prime brokers and other data vendors.   When implementing file based data entry, the first challenge is the diversity of formats the data will come in.   To support a scalable file based data entry, it’s best to set up the architecture where core file processing logic is done using a well understood and tested internal data format and to introduce DSL based ETL layer responsible for converting external formats into the internal format before validating and loading the data. ( Earlier post provides structure for setting up python based DSL)

    To support operational handling of rejections that are an inevitable part of the trade file upload process, we should create an error file which contains only rejections and has the same format as the original file.  This will allow operations users to fix errors and resubmit only the errored trades using the standard file processing interface.

  Below example demonstrates a simple python program for uploading trade files using the ReactiveFramework. You will notice that the CSVTradeLoader defines a public API that translates external to internal fields.  Creating this layer of indirection will allow us to rename ReactiveFramework fields without needing to update multiple ETL DSL that generate trade files in internal format.

from reactive import ReactiveFramework, FXTransaction
import csv

class CSVTradeLoader(object):
   FIELD_MAP = [ ("Action",'action'),
                ('Primary Amount','primary_amount'),
                ('Secondary Amount','secondary_amount'),
                ('FX Rate', "deal_fx_rate"),
                ('Commission','commission')
                ]
   REV_MAP = dict( (domain_field,csv_field)
                  for (csv_field,domain_field) in FIELD_MAP )

   STATUS_FIELD = "Status"
   
   def __init__(self, file_root):
       self.file_root = file_root
   
   @property
   def trade_file(self):
       return self.file_root + ".csv"
   
   @property
   def error_file(self):
       return self.file_root + "_error.csv"
   
   def load_trade_file(self):
       trades = []
       fields = None
       with open(self.trade_file, 'r') as f:
           reader = csv.reader(f)
           for row in reader:
               if not fields:
                   fields = row
               else:
                   trades.append(dict((f,v) for f,v in zip(fields,row)))
       return fields, trades

   def write_csv(self,file_name, header, records):
       with open(file_name, 'w') as f:
           writer = csv.DictWriter(f, header)
           writer.writeheader()
           writer.writerows(records)  
       print "Write {} records to {}".format(len(records), file_name )  
   
   def validate_trade(self, trade):
       rf =  ReactiveFramework(FXTransaction())
       
       for csv_field, rf_field in self.FIELD_MAP:
           value = trade.get(csv_field)
           if value:
               rf.set_value(rf_field, value)
       
       csv_errors = {}
       for field,message in rf.validate().items():
           csv_errors[self.REV_MAP.get(field,field)] = message
       return csv_errors
   
   def run(self):
       fields, trades = self.load_trade_file()
       if self.STATUS_FIELD not in fields:
           fields.append(self.STATUS_FIELD)
       errors = []
       for trade in trades:
           error = self.validate_trade(trade)
           if error:
               trade[self.STATUS_FIELD] = str(error)
               errors.append(trade)
       self.write_csv(self.error_file, fields, errors)
       
if __name__ == "__main__":
   loader = CSVTradeLoader('fxtrades')
   loader.run()

   I will not include the API example here since set_value()/get_value() methods from the ReactiveFramework are sufficient for most API needs, however I would recommend that similarly to file based API, internal ReactiveFramework field names should not be used as the API fields.  It would be quite acceptable to leverage file based API fields for external processing.


Other components of the data entry
   I did not provide GUI(Winforms/WPF) based example because it is conceptually very similar to the web interface, except different tools and slightly different techniques are used for laying out data entry controls and displaying error messages.   ( I may provide that example for completeness at some point in the future. ) This post also did not cover domain model binding and process of saving the trade, this will be covered in another post.  In yet another post,  I will extend this framework to support data entry for multiple asset classes.


Wednesday, April 1, 2015

Python as DSL



Python as DSL
by  Maksim Kozyarchuk




Overview

   Any moderately complex system can benefit from a Domain Specific Language(DSL) because it allows client, business and support teams to configure and extend your system rapidly and in a safe to fail way. At the same time keeping your trunk code base generic and agnostic of client and data specific business rules.

   There are many different flavors of DSLs and there isn’t one size fits all solution to handle various problem spaces that DSLs cover.  This post presents python DSL implementation, focused on the problem domains of data transformation, calculation or a combination of the two. For example, suppose your application collects data from multiple sources and converts it into a standard/domain format that the rest of your application will use.  Your business users/data analysts will be monitoring available sources of data and determining which data points to use.   Having your application support a Python based DSL will allow your users to create rich and flexible rules and mappings.


Python as a DSL
Leveraging python as DSL gives you a lot of power, however, it’s important to keep the DSL focused on a very specific problem domain.  This can be done by creating a simple yet rigid interface that DSL scripts need to adhere to.   For example, have the raw data made available in the data variable and expect final output populated in the result variable.   With above constraint, python DSL for a rule that scale the data by 100 would look as follows.

result = data*100
The machinery that executes the rule can looks as follows.

data = 10
result = None
exec(rule)
assert result is not None

Combining above two code segments with a repository of DSLs stored as plain text objects and a rules engine for determining which DSL to use, results in a very simple yet flexible DSL engine that allows for runtime configurability and extendability of your application.


Creating a safe execution environment

Python’s exec function is a powerful construct that allows you to execute a string as a python code block within the same process space.  While it’s powerful, it carries certain risks with untrusted DSL ( Ned Batchelder’s  post  discusses those risks in some detail).    These risks cannot be fully mitigated, however  within the trusted environment, it’s possible to override values of globals and locals made available to the DSL and avoid unwanted side effects.   Modifying DSL execution code as follows accomplishes that.

dsl_globals = {'__builtins__':{}}
dsl_locals = {'data': 5  }
exec(rule, dsl_globals, dsl_locals)
assert 'result' in dsl_locals

By overriding globals and setting __builtins__ to an empty dictionary, we are restricting capabilities of the DSL and removing access to importing new libraries, opening files and other python global functions.  In practice, you will include a number of trusted method and libraries into the global scope to make them available for use in the DSL.  By overriding locals, we are simplifying the interface for transferring data in/out of the DSL execution environment.



Maintaining DSL
Python has a fairly forgiving syntax, it does not require semi-columns or opening/closing curly braces.  This makes it fairly easy for a novice to get going,  however, there are still plenty of opportunities for syntax errors. iPythonNotebook could be used effectively to replicate DSL runtime environment and assist with development and testing of the DSL.  Below screenshot demonstrates simple iPython Notebook setup for DSL development.  The First section sets up the data and environment replicating the runtime environment, last section prints the result while the middle section becomes the development environment for the DSL.

Once the DSL has been developed and tested, it should be given a name and uploaded to a repository within your application.  Before accepting the DSL, you should do a few checks to validate it.  Make sure that the DSL compiles by invoking.

compile(dsl_text,"DSLName","exec")

Run lint to help ensure that there are no major syntax errors with the code.  To run lint, save the script along with initializations into a file and run
.
from pylint.epylint import lint
assert lint('test.py') == 0

The lint method will return a score indicating number of errors, warnings in your script. A number above 0 indicates that there was a problem.  Please see pylint tutorial for configuration and tuning options.


Python DLS from other platforms.

   Above examples demonstrate how to use Python as DSL from within a python environment.  But it’s also possible to leverage Python as DSL on other platforms.   For Java environment this can be done using Jython, CLR environment using IronPython  and for C++ apps using CPython interface.  Please refer to relevant document of highlighted technologies for integration details.  Regardless of technology it’s still important to create a safe sandbox when running the DSL.