Wednesday, April 1, 2015

Python as DSL



Python as DSL
by  Maksim Kozyarchuk




Overview

   Any moderately complex system can benefit from a Domain Specific Language(DSL) because it allows client, business and support teams to configure and extend your system rapidly and in a safe to fail way. At the same time keeping your trunk code base generic and agnostic of client and data specific business rules.

   There are many different flavors of DSLs and there isn’t one size fits all solution to handle various problem spaces that DSLs cover.  This post presents python DSL implementation, focused on the problem domains of data transformation, calculation or a combination of the two. For example, suppose your application collects data from multiple sources and converts it into a standard/domain format that the rest of your application will use.  Your business users/data analysts will be monitoring available sources of data and determining which data points to use.   Having your application support a Python based DSL will allow your users to create rich and flexible rules and mappings.


Python as a DSL
Leveraging python as DSL gives you a lot of power, however, it’s important to keep the DSL focused on a very specific problem domain.  This can be done by creating a simple yet rigid interface that DSL scripts need to adhere to.   For example, have the raw data made available in the data variable and expect final output populated in the result variable.   With above constraint, python DSL for a rule that scale the data by 100 would look as follows.

result = data*100
The machinery that executes the rule can looks as follows.

data = 10
result = None
exec(rule)
assert result is not None

Combining above two code segments with a repository of DSLs stored as plain text objects and a rules engine for determining which DSL to use, results in a very simple yet flexible DSL engine that allows for runtime configurability and extendability of your application.


Creating a safe execution environment

Python’s exec function is a powerful construct that allows you to execute a string as a python code block within the same process space.  While it’s powerful, it carries certain risks with untrusted DSL ( Ned Batchelder’s  post  discusses those risks in some detail).    These risks cannot be fully mitigated, however  within the trusted environment, it’s possible to override values of globals and locals made available to the DSL and avoid unwanted side effects.   Modifying DSL execution code as follows accomplishes that.

dsl_globals = {'__builtins__':{}}
dsl_locals = {'data': 5  }
exec(rule, dsl_globals, dsl_locals)
assert 'result' in dsl_locals

By overriding globals and setting __builtins__ to an empty dictionary, we are restricting capabilities of the DSL and removing access to importing new libraries, opening files and other python global functions.  In practice, you will include a number of trusted method and libraries into the global scope to make them available for use in the DSL.  By overriding locals, we are simplifying the interface for transferring data in/out of the DSL execution environment.



Maintaining DSL
Python has a fairly forgiving syntax, it does not require semi-columns or opening/closing curly braces.  This makes it fairly easy for a novice to get going,  however, there are still plenty of opportunities for syntax errors. iPythonNotebook could be used effectively to replicate DSL runtime environment and assist with development and testing of the DSL.  Below screenshot demonstrates simple iPython Notebook setup for DSL development.  The First section sets up the data and environment replicating the runtime environment, last section prints the result while the middle section becomes the development environment for the DSL.

Once the DSL has been developed and tested, it should be given a name and uploaded to a repository within your application.  Before accepting the DSL, you should do a few checks to validate it.  Make sure that the DSL compiles by invoking.

compile(dsl_text,"DSLName","exec")

Run lint to help ensure that there are no major syntax errors with the code.  To run lint, save the script along with initializations into a file and run
.
from pylint.epylint import lint
assert lint('test.py') == 0

The lint method will return a score indicating number of errors, warnings in your script. A number above 0 indicates that there was a problem.  Please see pylint tutorial for configuration and tuning options.


Python DLS from other platforms.

   Above examples demonstrate how to use Python as DSL from within a python environment.  But it’s also possible to leverage Python as DSL on other platforms.   For Java environment this can be done using Jython, CLR environment using IronPython  and for C++ apps using CPython interface.  Please refer to relevant document of highlighted technologies for integration details.  Regardless of technology it’s still important to create a safe sandbox when running the DSL.








No comments: