Catch up with the latest developments

I haven’t posted here for a while but it doesn’t mean that there were no developments. Hence a quick round up, what’s going on.

Bruno rolled out the DDT “John Mirra” v0.6.0. I haven’t found any relevant hit in my search what the name of the new release means yet, so don’t ask. I tiny bit of my code made it to this release and as far as I know, apart from a tiny snippet, this is the first ever contribution that has been incorporated from my branch. Wonderful.

I went on with the development of my pet features, this time around mostly with the expression evaluator. The primary goal of evaluating expressions in the IDE is to deduce the type of an expression for further actions, like determinate the members of the result of a particular expression. The first most obvious use of this system is type inference in case of auto-variables. At the moment this is the point where my expression evaluator is called, since the search system is a bit messy in terms of code/data model but I will soon implement the other use cases, for example using qualified expressions, instead of qualified references. Something like “(new Foo()).” should drop you a list of the methods of Foo.

Bruno and me had a serious exchange of emails, concerned with how the contributions are accepted and how are they incorporated in to the main DDT line. I made some decisions regarding how do I contribute to the DDT development, since I had some issues with the current practice. It’s not that I don’t appreciate Bruno’s methodology, but I think it there’s some fundamentally wrong with that I was around the project since a year actively and more than a year since I started to browse the code  and familiarize my self with the Eclipse development, and the only contribution that was accepted during this time was my icon overlay stuff, which is only a minor feature. This was that triggered a bit of reflection, and I decided to take the whole development in a completely different manner.

I add features to DDT because I miss them. I love the language and I need modern tools to use it as comfortably as I used to get with the other languages I professionally use, like C++. DDT is a project that has only single main developer, and me as a satellite contributor, and with the rate of releases, this project has a serious lack of momentum. So I decided to build up my own version of it, which means that I will publish my own releases about the same way as Bruno does it. I like the philosophy “release early, release often”, and I firmly believe that at the current model we won’t be able to attract more attention, thus we should be happy if we can keep the currently already low frequency of community responses, let alone the new developers. The latter looks even more desperate because those who are in to native development, rarely consider to get involved with Java, I guess. Also, the Java crowd has a strong puritan tendency within, which means only a tiny subset of hackers who could possibly ever get around with D, and DDT.

This is not a split, this is not a fork!

A screen shot of the type inference:

related code

Advertisements

ANTLR based parser passes work phase 1.

This one took a while. Almost too long, I almost dropped the whole thing at least twice. But fortunately I didn’t. So I’m proudly pronounce the ANTLR parser, which in its current status is able to satisfy the “DDT Tests – Core” suite and I’m proud of it. The first difficulty at hand was that I had no previous experience with ANTLR, indeed I had no previous experience with compiler/parser generators at all. Then, there were the problems to understand the way how ambiguous parsing rules can be resolved in ANTLR.

Then, there was the problem of getting out of sync of Bruno’s work. And there was an enormous amount of work with the AST classes to have the proper constructors in place, thus creating the AST without the descent.compiler format converter. And then I had to chew my self through 300 test cases to make them work. All done.

This work started with a few attempts before I settled the basic framework of how I’ll proceed with it. At this point of time, I can see major drawbacks of it works, but this is due to the problem that I didn’t want to invent a completely different AST management, so that all the rest of the DDT could go on as no parser replacement had occurred. This is the reason why the grammar file is so bloated with action code.

Let’s see what is yet to be done:

  • Error handling. There’s only a minimal error reporting/recovering support in the parser to get it running but it is not sufficient at all from the user’s point of view.
  • Review of all the latest and supported language features and the correct the functionality accordingly.
  • Incremental parsing. This is something I would not be able to address in any near future because it seems quite tedious piece of work. However, it seems quite reasonable to implement it, as it seems a huge possible improvement for the performance. (Let’s say, the user is has a line like this: “class Foo {}” and types in the block: “class Foo { int a; }”. In an incremental way it would cut right to the declDefs of the DefinitionClass instead of going a new round completely build the AST from scratch.)
  • Performance: At the moment the grammar file results in a huge parser code, and there are plenty of forward checks in the grammars. Perhaps some of them is avoidable with some cleverness, and the parsing is where all the performance improvement is much needed.
  • ANTLR compatible AST hierarchy. This would result some linear performance gains and some code clarity (removing the action code blocks). The problem here, when I visited this solution is that it seems ANTLR hasn’t got decent heterogeneous AST-support. All we have is a token based constructor. It could be useful though in case of operators or single-token rules, like some attributes.
  • Building the parser will need the ANTLR binaries and additional build rules. For the releases the lexer/parser sources should be attached to the code tree as they are.

All these things would be nice to have, but only the first is needed for the complete replacement of the descent.compiler.

Corresponding code here: http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-contrib/source/list?name=feature-antlr-parser

LabelProvider improvements

Greetings,

Since I tried to get involved in the DDT project I always wanted to improve the label provider section. Though Bruno did not merged my changes much in to the main line, but it seems it inspired some changes since then. One of the painfully missing thing was the visual aid of the elements’ protection level. The reason why the merge did not occur before is that it supposed to be shown as an overlay.

I opened a feature branch to improve the label provider, and my first step was to show the protection level icons. Now, as I don’t think it is wise to use completely new set of icons, neither I’m a talented graphic designer, I took the JDT’s protection level icons and applied as an overlay on all elements except of functions and variables. The variables/fields and methods/functions are treated similarly to JDT, they get a completely different icon. If this is still unacceptable, I’m willing to add an option where it could be set to use the original method/field icons with overlay in normal mode, and a “JDT-style” which would work as I described above.

The only issue with the solution that I had to modify the ScriptElementImageDescriptor_Fix class in order to make it more flexible. I don’t think it is a bad choice, but I recognize that it could pose a problem with the integration to the DLTK 4.0. But if it comes to that, I will fix it anyway.

It would also be nice to have the return type description with different color but that’s gonna be an other story.

You can find the branch here. Also, here’s a screen shot, how does it look like:

Time for tidying up

As Bruno posted the contribution guidelines it is time to clean up the mess I made and separate the different features from the master branch and from each other. I’m afraid it won’t be easy given that my public master branch was already littered with these feature fragments.

First I have identify the unrelated commits in the history in order to determine the new feature branches. I was working on several problems and most of them isn’t ready to be merged to the main development line.

  • ANTLR parser. This feature involves the AST classes, removing the old ANTLR thing, and adding the new ANTLR grammar. A potential issue could be those constructor codes that was added in the type-inference branch which I integrated in to the master for accessing those modifications.
  • The D Element label provider. This feature aimed to bring a more JDT-like icon set in the necessary places, such as the Outline view, the Script View and the completion proposal list. I found that basically this is not much more than one commit. There was also a hack to get the module name in a module definition label, but I’m quite uncertain of that change. However, this is a complication as I submitted that change in a random time, so I can’t use a range of commits to separate it out. Perhaps if it is easy to do, I should just simply get rid of it. This feature would add new icon files and change the DeeModelElementLabelProvider class.
  • Static library support. Now this is a worthy feature that is the closest to being completed. However, the issue mentioned above would make it complicated to separate to its own branch. After the separation I should request Bruno to have a look on this feature. The affected code is the builder, the project preferences and spreads to places like DLTKModuleResolver in the core.parser package. The change sets are ranging from here to there.
  • Bracket inserting. Should be quite straightforward as it is only one commit.
  • Type inference. The code that was submitted on this feature does nothing really interesting at the moment. It contains some key refactorings however that probably should be used later for many reasons. One of them is to have a visitor structure that is sufficient to work with all AST class and the other is replace the getMemberScope() (and later probably other AST member-) methods with visitor based processing code. The actual type inference code is only exploratory, trying to integrate the dltk’s type inference basics as an entry point code.
  • Renewing the completion proposal collecting code. The visitor refactoring above could be potentially useful for collecting completion proposal and would be better than the existing code as it is quite obscure. This completion proposal system should be aware of the priority listing, keywords, templates, and all resolvable nodes, not just references. The latter means that completing members for expressions (such as, casting an object or so). Should contain the changes in the function definitions but it is not working at the moment (the feature that the argument list is pre-filled with the parameter names and working like the template suggestions).

At the end of the day, I had to explore the weird world of rebasing as I did not really bother to learn it before. It’s because every time I’ve encounter with it, there was always a note that it could mess up things pretty badly,  and some even says that it is like lying.

Rebasing in git is like to take a range of commits and “replay” them on the top of a branch or a specific revision. This is the perfect tool for the job I am about to perform. The work flow is like this: take the deviation point in my master branch, pick those commits that are relevant to the functionality, and apply them on the deviation point it self. Sounds almost too simple.

However, there are problems that I need to be aware. One is the problem with rewriting the history: I can’t re-base a tacking branch and expect that I can push it easily to its remote. Once I push a branch somewhere re-basing it is not an option anymore. So if that happens, and I screw up (like this happened with the first attempt to do this with the feature-static-library branch where I included commits that renders the branch virtually impossible to merge to bruno’s master), there’s no way back. The only thing I can do, if nobody hangs on my repository, that I wipe out the whole stuff and push the necessary branches from scratch. At the end of this exercise, this is likely to happen though. The only person I share my repository is bruno, and he doesn’t depend on any of my current branches.

That pesky file name capitalization is really annoying. If I want to switch to a branch that has the previous file name, I have to delete the file to get rid of the problem. Not only that, I ran in to this while I performed the rebase call, which is even more annoying as I have to fix to the right name with an additional commit.

Finally I mastered the rebasing so now there’s a new repository I’m working with. There are three feature branches so far: feature-static-libraries, feature-labelprovider-improvement, feature-antlr-parser. These branches were grown from the latest master from Bruno, so there should be no problem to merge them in to the main line.

LL(k) -> ASTNeoNode

I was working recently on an ANTLR based parser for the DDT project. As a phase 1. I’m trying to get the current AST hierachy working under this new parser without having to rely on the Descent parser. It isn’t finished yet, but it is at the level of progress where it worthy perhaps for other to look at it. To see what is missing here’s my sketchy list:

  • Missing import expression node in the AST
    import expression : ‘import’ ‘(‘ assignExpression ‘)’ ;
  • Clarify the function literals
  • Missing static if expression.
  • Missing static assert expression.
  • How to deal with conditional statements (version, debug, and such)?
  • Struct initializer must be implemented.
  • What is ExpIftype is for? Is that something to do with the template stuff? Or static if?
  • IsExpression in the AST somewhere? (It’s pretty tough expression btw).
    isExpression
    : is ( Type )
    | is ( Type : TypeSpecialization )
    | is ( Type == TypeSpecialization )
    | is ( Type Identifier )
    | is ( Type Identifier : TypeSpecialization )
    | is ( Type Identifier == TypeSpecialization )
    | is ( Type Identifier : TypeSpecialization , TemplateParameterList )
    | is ( Type Identifier == TypeSpecialization , TemplateParameterList )
  • Template declarations are missing in the parser rules.
  • Template instances are missing in the parser rules.
  • Proper attribute specifier implementation. (That is, accumulate all the attribute specifier to the corresponding definition).
  • Error handling and error recovery resembling to the Descent’s parser’s one.

The actual state is on my clone’s master branch here:
http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-root/sour…

As I try to replace the current parser without changing much of the current state of the code (I made very few, the most obvious
modifications, such as constructors for creating AST nodes without the conversion process, and in very few places I added extra fields to currently existing nodes.), I wouldn’t add at this point any new features, or mess with the AST hierarchy.
This is my first try to create parser with a parser generator, and I admit, many places there’s need to improve the current state. Later on, we should change the ASTNeoNode hierarchy to work with on the CommonTree ANTLR object, which would eliminate the need for individual object creation as an action code. (Not sure completely how, but I think it is possible to create heterogeneous trees with ANTLR using factory pattern which in turn would need the ASTNeoNode classes and
interfaces to be more consistent as they are today. As an example of inconsistencies, at the current state some nodes are using ArrayView, others use simple arrays of objects, and so on.

Under The Bonnet: Parser, Lexer… ANTLR

Sometimes it goes like this: you start to make your well defined contribution (type inference), and as you start to assess the job to be done, you realize that some parts are bigger than you thought. So you start looking in to them time to time, and only after a while you realize that you’re step further down in the heart of the project. At this time, I fell in to the parser but for a reason: At work, I was working on a toy parser for a server with the good ol’ hand-crafted way. And I found it fun. Originally, I had the idea to  use bison, ANTLR or something similar to generate a parser, but I rejected the idea because I was afraid I couldn’t learn enough about the compiler generators to finish my task in reasonable time.

In the recent weeks, I tried to get my head around the core of the DDT.  I’ve found the current visitor accessibility unfortunately narrow for semantic analysis, type deduction or type inference, and ran in to some artefacts in the AST design. Well, this is always good news as someone has to deal with these issues. Soon enough I found that the lack of direct parser to our AST is a source of unnecessary complexity, not to mention the possible overheads. I say possible, because I couldn’t make my self carry out detailed measures. OK, OK, I’m lazy! Anyway, so the obvious choice was to look in to the question of the ANTLR parser.

Some initial experiments taught me that I shouldn’t follow Walter’s BNF-kinda description because 1. it is heavy of left-recursive rules definitions, which needs serious re-factoring, 2. there are too many differences between our current AST hierarchy and D documentation’s for explaining the language.

Static Library Support: Project dependency to static libraries

Last time I was dealing with the project build types that allowed DDT to produce static libraries out of a project. That’s a vital element for a larger project infrastructure so its implementation should be priority. This time there was an other problem related to the larger projects. It’s all nice and good to have lib files produced but we couldn’t really do anything with them at all.

Actually, the same way as most of my development I heavily relied on what Bruno did before. As far as I understood the already existing code the heavy lifting was already done. That is, the project dependencies on libraries were implemented as a skeleton, only that it wasn’t useful… yet. The only thing left to implement is to use the project dependencies’ output (the static libraries themselves) as input modules and their source directory as input directory on the referee project. But nothing is as easy as it looks. I’ve just encountered a quite disturbing fact: the DLTK documentation is just virtually non-existent. There are some vague articles in their wiki, but the API documentation is basically isn’t any useful. That makes life quite hard to develop anything on the DLTK basis and frankly I’m worried about the scenario when DDT hit huge problems with the DLTK. The actual issue I encountered looked quite easy: how can I get libraries from a different project and resolve their paths relative to the workspace. At this point of time I didn’t find anything useful for this. So at the time being I had to hack the references to other projects with a simple “../[projectname]/src” as an  import directory and “../[projectname]/lib/[projectname].lib” as a input file. But the real solution would be to get access to the other project DeeBuildOptions object which could provide me the necessary settings for the output file, its build type from its IScriptProject info the source folders as they could be more than one.

In other words, to use this new feature the user has to make sure that the directory structure is that of the default and there are no additional source folders. Also it probably won’t work with anything but DMD and possibly only on Windows. The next target is to eliminate these problems.

The corresponding change set in the source code: r341098a21488