Jawk 4.1.00
- User Documentation Getting Started Jawk CLI AWK in Java Extensions
- Project Documentation Project Information 10 Project Reports 9
Jawk - AWK for Java
Introduction
Jawk is a pure Java implementation of AWK[1]. It executes the specified AWK scripts to parse and process text input, and generate a text output. Jawk can be used as a CLI, but more importantly it can be invoked from within your Java project.
This project is forked from the excellent Jawk project[2] that was maintained by hoijui on GitHub[3].
Run Jawk CLI
It's very simple:
- Download jawk-4.1.00-standalone.jar[4] from the latest release[5]
- Make sure to have Java installed on your system (download[6])
- Execute jawk-4.1.00-standalone.jarjust like the “traditional” AWK
See usage and examples of Jawk CLI[7].
Run AWK inside Java
The Awk class exposes several APIs to evaluate expressions and execute scripts.
Evaluate an expression
Object value = new Awk().eval("2 + 3");
Run a script directly
String output = new Awk().run("{ print $1 }", "foo bar");
Compile and invoke a script
Awk awk = new Awk();
AwkTuples tuples = awk.compile("{ print $0 }");
AwkSettings settings = new AwkSettings();
// configure input/output streams here
awk.invoke(tuples, settings);
compileForEval(...) and eval(AwkTuples, ...) provide the same workflow for expressions.
See AWK in Java documentation[8] for advanced examples.
Features
As stated earlier, Jawk interprets AWK scripts in Java. This is a full implementation of AWK, which includes:
- An intuitive text processing paradigm, tightly integrated with regular expressions.
- Functions with local, static scoping.
- Scalar and associative array (map) variables.
- Weakly typed variables for greatest flexibility with automatic string/number conversion.
- Powerful IPC constructs similar to those used by most UNIX shells (pipes and IO redirect).
- Highly intuitive error diagnostics.
Jawk also offers the following features which the original AWK does not provide:
- Output to a post-compiled, pre-interpreted format for both elimination of the compilation step and obfuscation of Jawk scripts.
- Text dumps of abstract syntax tree and intermediate code representation (tuples).
- Maintenance of associative arrays in key-sorted order.
- Error detection for printf/sprintf format parameters (via the -r argument).
- An opt-in, flexible extension facility with event blocking capabilities.
Because we're using Java, the following differences exist in order to blend easily within the Java environment:
- Jawk regular expressions are implemented with Java regular expressions. Therefore, they differ from AWK's regular expression semantics (mostly by adding functionality over AWK's regular expressions).
- printf/sprintf formatting is done by java.util.Formatter. This is markedly different from C's, and thus AWK's printf(). Java's Formatter class does not attempt to implicitly convert its argument datatypes. If differing datatypes are present than what is expected, an IllegalFormatException will occur. Therefore, the script developer must keep track of implicit type conversions in Jawk.
Differences with the original Jawk
There's a growing list of things that make our version diverge from the original Jawk written by Danny Daglas, and maintained by Robin Vobruba:
- Removed all logging framework dependencies; Jawk now reports errors solely through Java exceptions
- Removed the AWK-to-JVM bytecode compiler
- Removed the Socket extension (to get a smaller jar)
- Improved performance in parsing inputs and printed output
- Support for long integers
- Support for octal and hexadecimal notation in strings (allowing ESCcharacters to do fancy terminal effects)
- Artifact groupId and package is org.metricshub
- Added gawk's suite of unit tests
- Added bwk's suite of unit tests
- License is LGPL for the Maven artifact
Differences with other AWKs
Other versions of AWK will run through a script and issue a “runtime error” if a user-defined function is not found. Jawk does not. It attempts to resolve all function calls to defined functions at compile-time (after parsing the script and prior to assembling the intermediate code from the abstract syntax tree). This is necessary in order to produce intermediate code with branch statements fully resolved.
Other versions of AWK provide command-line parameters to choose compile-time or run-time checks for function name resolution. Jawk does not, mainly to ensure semantic analysis is done for the reasons stated above. Also, to undo these semantic checks will result in unresolved references, most likely resulting in NullPointerExceptions.
Other semantic checks include formal/actual parameter analysis and array/scalar operation verification. Again, these are necessary to produce coherent intermediate code.
