ANTLR based parser passes work phase 1.

This one took a while. Almost too long, I almost dropped the whole thing at least twice. But fortunately I didn’t. So I’m proudly pronounce the ANTLR parser, which in its current status is able to satisfy the “DDT Tests – Core” suite and I’m proud of it. The first difficulty at hand was that I had no previous experience with ANTLR, indeed I had no previous experience with compiler/parser generators at all. Then, there were the problems to understand the way how ambiguous parsing rules can be resolved in ANTLR.

Then, there was the problem of getting out of sync of Bruno’s work. And there was an enormous amount of work with the AST classes to have the proper constructors in place, thus creating the AST without the descent.compiler format converter. And then I had to chew my self through 300 test cases to make them work. All done.

This work started with a few attempts before I settled the basic framework of how I’ll proceed with it. At this point of time, I can see major drawbacks of it works, but this is due to the problem that I didn’t want to invent a completely different AST management, so that all the rest of the DDT could go on as no parser replacement had occurred. This is the reason why the grammar file is so bloated with action code.

Let’s see what is yet to be done:

  • Error handling. There’s only a minimal error reporting/recovering support in the parser to get it running but it is not sufficient at all from the user’s point of view.
  • Review of all the latest and supported language features and the correct the functionality accordingly.
  • Incremental parsing. This is something I would not be able to address in any near future because it seems quite tedious piece of work. However, it seems quite reasonable to implement it, as it seems a huge possible improvement for the performance. (Let’s say, the user is has a line like this: “class Foo {}” and types in the block: “class Foo { int a; }”. In an incremental way it would cut right to the declDefs of the DefinitionClass instead of going a new round completely build the AST from scratch.)
  • Performance: At the moment the grammar file results in a huge parser code, and there are plenty of forward checks in the grammars. Perhaps some of them is avoidable with some cleverness, and the parsing is where all the performance improvement is much needed.
  • ANTLR compatible AST hierarchy. This would result some linear performance gains and some code clarity (removing the action code blocks). The problem here, when I visited this solution is that it seems ANTLR hasn’t got decent heterogeneous AST-support. All we have is a token based constructor. It could be useful though in case of operators or single-token rules, like some attributes.
  • Building the parser will need the ANTLR binaries and additional build rules. For the releases the lexer/parser sources should be attached to the code tree as they are.

All these things would be nice to have, but only the first is needed for the complete replacement of the descent.compiler.

Corresponding code here: http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-contrib/source/list?name=feature-antlr-parser

Advertisements

LabelProvider improvements

Greetings,

Since I tried to get involved in the DDT project I always wanted to improve the label provider section. Though Bruno did not merged my changes much in to the main line, but it seems it inspired some changes since then. One of the painfully missing thing was the visual aid of the elements’ protection level. The reason why the merge did not occur before is that it supposed to be shown as an overlay.

I opened a feature branch to improve the label provider, and my first step was to show the protection level icons. Now, as I don’t think it is wise to use completely new set of icons, neither I’m a talented graphic designer, I took the JDT’s protection level icons and applied as an overlay on all elements except of functions and variables. The variables/fields and methods/functions are treated similarly to JDT, they get a completely different icon. If this is still unacceptable, I’m willing to add an option where it could be set to use the original method/field icons with overlay in normal mode, and a “JDT-style” which would work as I described above.

The only issue with the solution that I had to modify the ScriptElementImageDescriptor_Fix class in order to make it more flexible. I don’t think it is a bad choice, but I recognize that it could pose a problem with the integration to the DLTK 4.0. But if it comes to that, I will fix it anyway.

It would also be nice to have the return type description with different color but that’s gonna be an other story.

You can find the branch here. Also, here’s a screen shot, how does it look like:

Time for tidying up

As Bruno posted the contribution guidelines it is time to clean up the mess I made and separate the different features from the master branch and from each other. I’m afraid it won’t be easy given that my public master branch was already littered with these feature fragments.

First I have identify the unrelated commits in the history in order to determine the new feature branches. I was working on several problems and most of them isn’t ready to be merged to the main development line.

  • ANTLR parser. This feature involves the AST classes, removing the old ANTLR thing, and adding the new ANTLR grammar. A potential issue could be those constructor codes that was added in the type-inference branch which I integrated in to the master for accessing those modifications.
  • The D Element label provider. This feature aimed to bring a more JDT-like icon set in the necessary places, such as the Outline view, the Script View and the completion proposal list. I found that basically this is not much more than one commit. There was also a hack to get the module name in a module definition label, but I’m quite uncertain of that change. However, this is a complication as I submitted that change in a random time, so I can’t use a range of commits to separate it out. Perhaps if it is easy to do, I should just simply get rid of it. This feature would add new icon files and change the DeeModelElementLabelProvider class.
  • Static library support. Now this is a worthy feature that is the closest to being completed. However, the issue mentioned above would make it complicated to separate to its own branch. After the separation I should request Bruno to have a look on this feature. The affected code is the builder, the project preferences and spreads to places like DLTKModuleResolver in the core.parser package. The change sets are ranging from here to there.
  • Bracket inserting. Should be quite straightforward as it is only one commit.
  • Type inference. The code that was submitted on this feature does nothing really interesting at the moment. It contains some key refactorings however that probably should be used later for many reasons. One of them is to have a visitor structure that is sufficient to work with all AST class and the other is replace the getMemberScope() (and later probably other AST member-) methods with visitor based processing code. The actual type inference code is only exploratory, trying to integrate the dltk’s type inference basics as an entry point code.
  • Renewing the completion proposal collecting code. The visitor refactoring above could be potentially useful for collecting completion proposal and would be better than the existing code as it is quite obscure. This completion proposal system should be aware of the priority listing, keywords, templates, and all resolvable nodes, not just references. The latter means that completing members for expressions (such as, casting an object or so). Should contain the changes in the function definitions but it is not working at the moment (the feature that the argument list is pre-filled with the parameter names and working like the template suggestions).

At the end of the day, I had to explore the weird world of rebasing as I did not really bother to learn it before. It’s because every time I’ve encounter with it, there was always a note that it could mess up things pretty badly,  and some even says that it is like lying.

Rebasing in git is like to take a range of commits and “replay” them on the top of a branch or a specific revision. This is the perfect tool for the job I am about to perform. The work flow is like this: take the deviation point in my master branch, pick those commits that are relevant to the functionality, and apply them on the deviation point it self. Sounds almost too simple.

However, there are problems that I need to be aware. One is the problem with rewriting the history: I can’t re-base a tacking branch and expect that I can push it easily to its remote. Once I push a branch somewhere re-basing it is not an option anymore. So if that happens, and I screw up (like this happened with the first attempt to do this with the feature-static-library branch where I included commits that renders the branch virtually impossible to merge to bruno’s master), there’s no way back. The only thing I can do, if nobody hangs on my repository, that I wipe out the whole stuff and push the necessary branches from scratch. At the end of this exercise, this is likely to happen though. The only person I share my repository is bruno, and he doesn’t depend on any of my current branches.

That pesky file name capitalization is really annoying. If I want to switch to a branch that has the previous file name, I have to delete the file to get rid of the problem. Not only that, I ran in to this while I performed the rebase call, which is even more annoying as I have to fix to the right name with an additional commit.

Finally I mastered the rebasing so now there’s a new repository I’m working with. There are three feature branches so far: feature-static-libraries, feature-labelprovider-improvement, feature-antlr-parser. These branches were grown from the latest master from Bruno, so there should be no problem to merge them in to the main line.

LL(k) -> ASTNeoNode

I was working recently on an ANTLR based parser for the DDT project. As a phase 1. I’m trying to get the current AST hierachy working under this new parser without having to rely on the Descent parser. It isn’t finished yet, but it is at the level of progress where it worthy perhaps for other to look at it. To see what is missing here’s my sketchy list:

  • Missing import expression node in the AST
    import expression : ‘import’ ‘(‘ assignExpression ‘)’ ;
  • Clarify the function literals
  • Missing static if expression.
  • Missing static assert expression.
  • How to deal with conditional statements (version, debug, and such)?
  • Struct initializer must be implemented.
  • What is ExpIftype is for? Is that something to do with the template stuff? Or static if?
  • IsExpression in the AST somewhere? (It’s pretty tough expression btw).
    isExpression
    : is ( Type )
    | is ( Type : TypeSpecialization )
    | is ( Type == TypeSpecialization )
    | is ( Type Identifier )
    | is ( Type Identifier : TypeSpecialization )
    | is ( Type Identifier == TypeSpecialization )
    | is ( Type Identifier : TypeSpecialization , TemplateParameterList )
    | is ( Type Identifier == TypeSpecialization , TemplateParameterList )
  • Template declarations are missing in the parser rules.
  • Template instances are missing in the parser rules.
  • Proper attribute specifier implementation. (That is, accumulate all the attribute specifier to the corresponding definition).
  • Error handling and error recovery resembling to the Descent’s parser’s one.

The actual state is on my clone’s master branch here:
http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-root/sour…

As I try to replace the current parser without changing much of the current state of the code (I made very few, the most obvious
modifications, such as constructors for creating AST nodes without the conversion process, and in very few places I added extra fields to currently existing nodes.), I wouldn’t add at this point any new features, or mess with the AST hierarchy.
This is my first try to create parser with a parser generator, and I admit, many places there’s need to improve the current state. Later on, we should change the ASTNeoNode hierarchy to work with on the CommonTree ANTLR object, which would eliminate the need for individual object creation as an action code. (Not sure completely how, but I think it is possible to create heterogeneous trees with ANTLR using factory pattern which in turn would need the ASTNeoNode classes and
interfaces to be more consistent as they are today. As an example of inconsistencies, at the current state some nodes are using ArrayView, others use simple arrays of objects, and so on.

Static Library Support: Project dependency to static libraries

Last time I was dealing with the project build types that allowed DDT to produce static libraries out of a project. That’s a vital element for a larger project infrastructure so its implementation should be priority. This time there was an other problem related to the larger projects. It’s all nice and good to have lib files produced but we couldn’t really do anything with them at all.

Actually, the same way as most of my development I heavily relied on what Bruno did before. As far as I understood the already existing code the heavy lifting was already done. That is, the project dependencies on libraries were implemented as a skeleton, only that it wasn’t useful… yet. The only thing left to implement is to use the project dependencies’ output (the static libraries themselves) as input modules and their source directory as input directory on the referee project. But nothing is as easy as it looks. I’ve just encountered a quite disturbing fact: the DLTK documentation is just virtually non-existent. There are some vague articles in their wiki, but the API documentation is basically isn’t any useful. That makes life quite hard to develop anything on the DLTK basis and frankly I’m worried about the scenario when DDT hit huge problems with the DLTK. The actual issue I encountered looked quite easy: how can I get libraries from a different project and resolve their paths relative to the workspace. At this point of time I didn’t find anything useful for this. So at the time being I had to hack the references to other projects with a simple “../[projectname]/src” as an  import directory and “../[projectname]/lib/[projectname].lib” as a input file. But the real solution would be to get access to the other project DeeBuildOptions object which could provide me the necessary settings for the output file, its build type from its IScriptProject info the source folders as they could be more than one.

In other words, to use this new feature the user has to make sure that the directory structure is that of the default and there are no additional source folders. Also it probably won’t work with anything but DMD and possibly only on Windows. The next target is to eliminate these problems.

The corresponding change set in the source code: r341098a21488

 

Static library support

After some messing around the EGit finally I came to understand how to push my changes to the google code clone. It came as a surprise because I was sure that I understand the git architecture. The problem was that I had proper ref specs to the google code clone’s repository.

Today’s exercise was to add some support of static libraries. I found that the DeeBuildOptions class had some traces for supporting different build types but it was set to EXECUTABLE without the option to change it. In the case of static library, the DMD compiler offer an option “-lib” to build static libraries. In addition I found that the there were already a combo box to set the build type in the DeeProjectOptionsBlock which is responsible for handle the compiler options in the UI.

So this part of the job was pretty easy: Uncomment the relevant part in the DeeProjectOptionsBlock.createControl method that shows the combo box for the build type setting. To handle the default cases better than we do currently, I hid all the properties of the DeeBuildOptions class and added some handling of the default values. If there’s no set output directory yet, it will depend on the build type, for instance: if the build type is executable or dynamic library, the output directory is ‘bin’, if the build type is static library, the output directory will be the ‘lib’ directory. However, I noticed a tiny little problem here. The D programming language is a native language which means that the compiled modules are subject of linking. That is, the real output of the build process is the executable/library/dynamic library and the object files themselves are kind of a by-product of this process. Most of the build system I saw there were a separation between the intermediate files (object files) and the final output. As a next step I would like to add an intermediate directory for the object files to handle (of course, if the user wishes so, he can keep the output files mixed with the object files).

On the UI with small little hack I only update those DeeBuildOption fields that were actually edited so that my changes on the default values could come to effect. But I need to updated the artifact name’s field which I just realized I didn’t. Next commit. If you switch to LIB_STATIC build type, the output folder and the output extension will change to lib.

Also, before I committed these changes I merged Bruno’s recent changes (mostly GDC related stuff).

The corresponding change set in the source code: r080d4e50bc25

IModelElement Icons

My previous experiments with DDT were about to change how the Outline View looks like in DDT. It was a good exercise to understand the Eclipse plug-in development basics and learn Bruno’s code. However, it didn’t work out as I expected as it took quite a while to understand how the DLTK assist the plug-in to build a model from the source code. This IModelElement hierarchy represents the basic model of the project’s tree and therefore this is to be used by any graphical representation as it is in the ScriptView or the OutlineView.

Later Bruno unified the whole stuff in the DeeModelElementLabelProvider which works in three different context: Script View (navigation), Outline View (view), and the Completion Proposal (CodeAssist). As I can’t settle with the current icons I decided to change this bit of the code. Bruno made it clear that he wants to use the private/public/protected modifiers as an overlay, but I rather would not use that in the case of fields and methods. The reason for this is simple: The overlayed icons have limited space, and a field (variable) or a method (function) could possible have quite a few modifiers such as “private static immutable” thus having three overlays would look like quite crowded.

As a result, I brought my previous patch in play, adding the JDT’s icons for methods and fields. At the moment I didn’t add other things, but it only needs a little painting to get a private/public/protected overlay for other elements, such as classes, structs, enums, interfaces, etc.

I couldn’t find my complete previous work however: Previously I added the types/return types to all these fields, similarly to JDT. Anyway, the stuff looks like this right now:

The corresponding change set in the source code: r427c36a3c2e4