Catch up with the latest developments

I haven’t posted here for a while but it doesn’t mean that there were no developments. Hence a quick round up, what’s going on.

Bruno rolled out the DDT “John Mirra” v0.6.0. I haven’t found any relevant hit in my search what the name of the new release means yet, so don’t ask. I tiny bit of my code made it to this release and as far as I know, apart from a tiny snippet, this is the first ever contribution that has been incorporated from my branch. Wonderful.

I went on with the development of my pet features, this time around mostly with the expression evaluator. The primary goal of evaluating expressions in the IDE is to deduce the type of an expression for further actions, like determinate the members of the result of a particular expression. The first most obvious use of this system is type inference in case of auto-variables. At the moment this is the point where my expression evaluator is called, since the search system is a bit messy in terms of code/data model but I will soon implement the other use cases, for example using qualified expressions, instead of qualified references. Something like “(new Foo()).” should drop you a list of the methods of Foo.

Bruno and me had a serious exchange of emails, concerned with how the contributions are accepted and how are they incorporated in to the main DDT line. I made some decisions regarding how do I contribute to the DDT development, since I had some issues with the current practice. It’s not that I don’t appreciate Bruno’s methodology, but I think it there’s some fundamentally wrong with that I was around the project since a year actively and more than a year since I started to browse the code  and familiarize my self with the Eclipse development, and the only contribution that was accepted during this time was my icon overlay stuff, which is only a minor feature. This was that triggered a bit of reflection, and I decided to take the whole development in a completely different manner.

I add features to DDT because I miss them. I love the language and I need modern tools to use it as comfortably as I used to get with the other languages I professionally use, like C++. DDT is a project that has only single main developer, and me as a satellite contributor, and with the rate of releases, this project has a serious lack of momentum. So I decided to build up my own version of it, which means that I will publish my own releases about the same way as Bruno does it. I like the philosophy “release early, release often”, and I firmly believe that at the current model we won’t be able to attract more attention, thus we should be happy if we can keep the currently already low frequency of community responses, let alone the new developers. The latter looks even more desperate because those who are in to native development, rarely consider to get involved with Java, I guess. Also, the Java crowd has a strong puritan tendency within, which means only a tiny subset of hackers who could possibly ever get around with D, and DDT.

This is not a split, this is not a fork!

A screen shot of the type inference:

related code

Free Software, Free Data

Many articles are around in these issues but it seems that the simple campaigning just isn’t enough. So a new article would not hurt the cause either but probably won’t achieve much. Perhaps the problem is that FS activists are confined by the verbal propaganda. Also, with as the “alternative” movements appeared and appeared just in a form that many people would call more practical compared to the “radical” Free Software movement, our camp was decimated. And than there’s rms and the FSF troubled relationship with the wider movement of free software activists.

Let’s deal with the actual issue at hand. A fair bit of our society is today computer and computer-alike device user. According to the Internet World Statistics there are already 2.267 billion users of the internet. That’s a pretty huge number. They don’t necessary have a computer or similar device, but they could use computers and the internet in internet cafés. But we need to take this further. Today every government agency, NGOs or larger business organisations are heavily relying on computers in general. I can risk to say, that most of the people of our planet is either a computer user herself, or the organisation that is responsible some of the dealings of her are using computer, and internet connected applications. I have no data on this, so don’t take it as a fact but I think it is an educated guess given the number of the internet users above and the fact that computers were invented for statistical, bureaucratic tasks in the first place. Also consider that widespread web-cam sex business where there’s a considerable number of women who are from countries like Thailand or Nigeria. Also just as a quick search will show, there’s more than 90 million GSM subscriber in Nigeria alone, and according to the wikipedia page about the telecommunication sector of Nigeria, internet access points are widespread in the cities just like every European countries. In short, there are many hundreds of millions of computer users and many more who are a indirect user of telecommunication, and is affected by these technologies, willingly or unwillingly. Indeed it could be just as well as the most of the human race. (Not to mention cats. Just google-ing the term ‘cat video’ gives 1.75 billion hits which is quite impressive in itself.)

With this staggering number of users one can imagine what a huge market opened in the last 44 years. For great part of the people today life is inconceivable without the constant communication lines between the world and themselves. For many, it would equal to the physiological loss of navigation, if one looses the connection to google maps combined with GPS, or catastrophic isolation, if one looses the access to their Facebook account or Blackberry messengers. It is not exaggeration to say, that if in this heavily computerised world the software isn’t free (free as in free speech) and where the so-called public institutions do not make use of the Free Software, our life is exposed to all sorts of threats ranging from severe disruption of our social life to giving away the control over our life completely (I know it is the case for most our times already, due to wage labour, patriarchal relationships, central democracy but at least in consumption so far we were treated at least as if we would be free and sovereign).

But this had come to an end because the economical life has its natural drive to invade whatever territory it can find to proliferate. It started with undisclosed source code and now we have hardware-supported DRM, walled gardens, software patents, cloud and copyright mafia. Of course, the attention of the internet is mostly on the internet-related policies, but there’s something profound of the sheer magnitude of effort that individual companies and entire business cartels are willing to put in to their favoured internet legislation lobby, or to “protect” their products from their own user(!). The number of different type of businesses related to the internet reveals the proportion of the involved investment and capital that flows in the computing and telecommunication sector. That’s because these are strategical industries from a business and political point of view.

For some reason, most of the user doesn’t care about this whole thing. Even numerous software developer ignores the free software movement at best, while others would be openly hostile. So what are these reasons? One of the most fundamental question that the free software movement should answer at this point of time, why are we ignored at large? Of course, there’s a hype around the Open Source and at some times it looks (or rather looked) trendy even among the politically motivated users. Activists of all sorts discovered that if their efforts are going against some powerful interest they have to make damn sure that they use tools are clear of any hijacking, that their paths of communication is the most secure relative to the level of threat they are facing with. As the communication, and especially the internet is a big copy machine it is the most reasonable to be cautious and suspicious of the electronic communication. This need of electronic privacy brings the question of Free Software to a practical level. I emphasize here the Free Software over Open Source because the latter is only designed to be open for production while Free Software also emphasize (among other things) the freedom of user of any level. Any tool is as reliable as the user could make sure of this fact. Black box products are extremely dangerous on any area of life, so they are in the software/hardware world of computation and communication. Indeed, in some areas people tend to be more concious of the products they consume, like it is the case with food, and there’s already an ongoing movement to raise the general conciousness over the matter of what one eats, what composition of materials are used in the process, and also, advocates non-processed foods, for they could and do have untold, unknown components and processes which are posing a significant risk over our health and general well being. It is however bewildering to see that the same people who argue zealously for consuming unprocessed, organic food, do not pay the same attention to their computerized, or other devices, and are happy to jump on the current trendy technological bandwagon at any given turn. Although it is just personal experience I can talk of, I can see some correlation between the Apple product consumers and the movement for healthy, unprocessed food.

So why aren’t people concious of their gadgets? Why aren’t people conscious of their personal creations and data in general? The previous metaphor can work even further. Although, the healthy food movement is way more successful than the Free Software movement, they are light years away from complete success. The ready-made food business, along with the fast-food chains or restaurants in general are flourishing despite of the nowadays widespread organic food stores and constant networked propaganda for healthy food, healthy lifestyle. Although it is a complicated issue, I believe most of it can be traced to single cause that is common in healthy food, hardware and software. That is capital.

Many Free Software advocate would explicitly refuse that Free Software is socialist, communist or any of that sort of movement. And they are right of course, because Free Software in it self do not have anything inherently socialist feature. One can demonstrate an ideal production process based on FS and well within the capitalist and market driven framework. Just give a possible scenario and let’s say, that the Intel creates a new CPU with new instruction set. We also have a free compiler collection, called GNU Compiler Collection. Intel’s management is well aware that the success of its new product depends on how quickly the new features can be adopted. They can either hire a company to create an exclusive set of compiler which is enormous task to begin with and takes years of development to reach a reasonable stability. And, while they would be doing so, they loose a lot of time in order to introduce their new product to the world of software developers. Or, they can hire a company, to create a back end of the GCC. The following advantages are already there: Given the compatibility between the different generations of Intel CPUs they have a good chance to find a back end which lacks only the very new features that the Intel want to introduce to the world. The other thing is, that the GNU Compiler Collection is a Free Software project meaning that you can access and read the source code, your can modify the source code, and you can distribute the changes what you made. So the development company, bearing the obligation of course, that they will publish the source code of their modifications, can implement the new features in the compiler collection based on the technological support they get from the Intel. Also, after they release their new back end for the new generation of CPU, the GCC implementation can serve as a reference project for many others where developers can take advantage of the already working, running and useful software to implement whatever software they would like to. Financially, Intel’s well being has been improved because the new CPU will be supported in many software environments, which in turn will sell the new CPU better. The company that has been hired will be paid for this job and can pay for the developers who participated in the project. Nothing has been given away for free in this process, a contract has been made, and the contract has been satisfied. Nothing socialist, nothing communist, no intervention in to the market process. And not at least, people who worked, got paid.

The world is however, not an ideal place. Software companies, as any other company are seeking new ways to make bigger profits. And for doing so, they realised that if they withhold the source code of their products, giving away only the binaries no other company can be hired for the next generation of changes in the software. In the previous description the source code of GCC represented the public domain and the Intel and the software company the private sector. In the example above I described an free-market ideal, where the private economical entities, like Intel and the software development company are acting in their own interest, making profit, while the general public gains more wealth during this process. But if a business entity withhold the source code, they effectively destroy the public domain, they do not act any more in the interest of the society as such, their profit motives do not improve the life of the rest of the society. In other words these companies are expropriating the public domain to become a monopoly.

This blog entry was only the beginning. I try to dig deeper in the topic as I have many ideas and I spend a good time to thinking about the future of our computing, indeed the methods organisation of the society, and producers.

ANTLR based parser passes work phase 1.

This one took a while. Almost too long, I almost dropped the whole thing at least twice. But fortunately I didn’t. So I’m proudly pronounce the ANTLR parser, which in its current status is able to satisfy the “DDT Tests – Core” suite and I’m proud of it. The first difficulty at hand was that I had no previous experience with ANTLR, indeed I had no previous experience with compiler/parser generators at all. Then, there were the problems to understand the way how ambiguous parsing rules can be resolved in ANTLR.

Then, there was the problem of getting out of sync of Bruno’s work. And there was an enormous amount of work with the AST classes to have the proper constructors in place, thus creating the AST without the descent.compiler format converter. And then I had to chew my self through 300 test cases to make them work. All done.

This work started with a few attempts before I settled the basic framework of how I’ll proceed with it. At this point of time, I can see major drawbacks of it works, but this is due to the problem that I didn’t want to invent a completely different AST management, so that all the rest of the DDT could go on as no parser replacement had occurred. This is the reason why the grammar file is so bloated with action code.

Let’s see what is yet to be done:

  • Error handling. There’s only a minimal error reporting/recovering support in the parser to get it running but it is not sufficient at all from the user’s point of view.
  • Review of all the latest and supported language features and the correct the functionality accordingly.
  • Incremental parsing. This is something I would not be able to address in any near future because it seems quite tedious piece of work. However, it seems quite reasonable to implement it, as it seems a huge possible improvement for the performance. (Let’s say, the user is has a line like this: “class Foo {}” and types in the block: “class Foo { int a; }”. In an incremental way it would cut right to the declDefs of the DefinitionClass instead of going a new round completely build the AST from scratch.)
  • Performance: At the moment the grammar file results in a huge parser code, and there are plenty of forward checks in the grammars. Perhaps some of them is avoidable with some cleverness, and the parsing is where all the performance improvement is much needed.
  • ANTLR compatible AST hierarchy. This would result some linear performance gains and some code clarity (removing the action code blocks). The problem here, when I visited this solution is that it seems ANTLR hasn’t got decent heterogeneous AST-support. All we have is a token based constructor. It could be useful though in case of operators or single-token rules, like some attributes.
  • Building the parser will need the ANTLR binaries and additional build rules. For the releases the lexer/parser sources should be attached to the code tree as they are.

All these things would be nice to have, but only the first is needed for the complete replacement of the descent.compiler.

Corresponding code here: http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-contrib/source/list?name=feature-antlr-parser

LabelProvider improvements

Greetings,

Since I tried to get involved in the DDT project I always wanted to improve the label provider section. Though Bruno did not merged my changes much in to the main line, but it seems it inspired some changes since then. One of the painfully missing thing was the visual aid of the elements’ protection level. The reason why the merge did not occur before is that it supposed to be shown as an overlay.

I opened a feature branch to improve the label provider, and my first step was to show the protection level icons. Now, as I don’t think it is wise to use completely new set of icons, neither I’m a talented graphic designer, I took the JDT’s protection level icons and applied as an overlay on all elements except of functions and variables. The variables/fields and methods/functions are treated similarly to JDT, they get a completely different icon. If this is still unacceptable, I’m willing to add an option where it could be set to use the original method/field icons with overlay in normal mode, and a “JDT-style” which would work as I described above.

The only issue with the solution that I had to modify the ScriptElementImageDescriptor_Fix class in order to make it more flexible. I don’t think it is a bad choice, but I recognize that it could pose a problem with the integration to the DLTK 4.0. But if it comes to that, I will fix it anyway.

It would also be nice to have the return type description with different color but that’s gonna be an other story.

You can find the branch here. Also, here’s a screen shot, how does it look like:

Time for tidying up

As Bruno posted the contribution guidelines it is time to clean up the mess I made and separate the different features from the master branch and from each other. I’m afraid it won’t be easy given that my public master branch was already littered with these feature fragments.

First I have identify the unrelated commits in the history in order to determine the new feature branches. I was working on several problems and most of them isn’t ready to be merged to the main development line.

  • ANTLR parser. This feature involves the AST classes, removing the old ANTLR thing, and adding the new ANTLR grammar. A potential issue could be those constructor codes that was added in the type-inference branch which I integrated in to the master for accessing those modifications.
  • The D Element label provider. This feature aimed to bring a more JDT-like icon set in the necessary places, such as the Outline view, the Script View and the completion proposal list. I found that basically this is not much more than one commit. There was also a hack to get the module name in a module definition label, but I’m quite uncertain of that change. However, this is a complication as I submitted that change in a random time, so I can’t use a range of commits to separate it out. Perhaps if it is easy to do, I should just simply get rid of it. This feature would add new icon files and change the DeeModelElementLabelProvider class.
  • Static library support. Now this is a worthy feature that is the closest to being completed. However, the issue mentioned above would make it complicated to separate to its own branch. After the separation I should request Bruno to have a look on this feature. The affected code is the builder, the project preferences and spreads to places like DLTKModuleResolver in the core.parser package. The change sets are ranging from here to there.
  • Bracket inserting. Should be quite straightforward as it is only one commit.
  • Type inference. The code that was submitted on this feature does nothing really interesting at the moment. It contains some key refactorings however that probably should be used later for many reasons. One of them is to have a visitor structure that is sufficient to work with all AST class and the other is replace the getMemberScope() (and later probably other AST member-) methods with visitor based processing code. The actual type inference code is only exploratory, trying to integrate the dltk’s type inference basics as an entry point code.
  • Renewing the completion proposal collecting code. The visitor refactoring above could be potentially useful for collecting completion proposal and would be better than the existing code as it is quite obscure. This completion proposal system should be aware of the priority listing, keywords, templates, and all resolvable nodes, not just references. The latter means that completing members for expressions (such as, casting an object or so). Should contain the changes in the function definitions but it is not working at the moment (the feature that the argument list is pre-filled with the parameter names and working like the template suggestions).

At the end of the day, I had to explore the weird world of rebasing as I did not really bother to learn it before. It’s because every time I’ve encounter with it, there was always a note that it could mess up things pretty badly,  and some even says that it is like lying.

Rebasing in git is like to take a range of commits and “replay” them on the top of a branch or a specific revision. This is the perfect tool for the job I am about to perform. The work flow is like this: take the deviation point in my master branch, pick those commits that are relevant to the functionality, and apply them on the deviation point it self. Sounds almost too simple.

However, there are problems that I need to be aware. One is the problem with rewriting the history: I can’t re-base a tacking branch and expect that I can push it easily to its remote. Once I push a branch somewhere re-basing it is not an option anymore. So if that happens, and I screw up (like this happened with the first attempt to do this with the feature-static-library branch where I included commits that renders the branch virtually impossible to merge to bruno’s master), there’s no way back. The only thing I can do, if nobody hangs on my repository, that I wipe out the whole stuff and push the necessary branches from scratch. At the end of this exercise, this is likely to happen though. The only person I share my repository is bruno, and he doesn’t depend on any of my current branches.

That pesky file name capitalization is really annoying. If I want to switch to a branch that has the previous file name, I have to delete the file to get rid of the problem. Not only that, I ran in to this while I performed the rebase call, which is even more annoying as I have to fix to the right name with an additional commit.

Finally I mastered the rebasing so now there’s a new repository I’m working with. There are three feature branches so far: feature-static-libraries, feature-labelprovider-improvement, feature-antlr-parser. These branches were grown from the latest master from Bruno, so there should be no problem to merge them in to the main line.

LL(k) -> ASTNeoNode

I was working recently on an ANTLR based parser for the DDT project. As a phase 1. I’m trying to get the current AST hierachy working under this new parser without having to rely on the Descent parser. It isn’t finished yet, but it is at the level of progress where it worthy perhaps for other to look at it. To see what is missing here’s my sketchy list:

  • Missing import expression node in the AST
    import expression : ‘import’ ‘(‘ assignExpression ‘)’ ;
  • Clarify the function literals
  • Missing static if expression.
  • Missing static assert expression.
  • How to deal with conditional statements (version, debug, and such)?
  • Struct initializer must be implemented.
  • What is ExpIftype is for? Is that something to do with the template stuff? Or static if?
  • IsExpression in the AST somewhere? (It’s pretty tough expression btw).
    isExpression
    : is ( Type )
    | is ( Type : TypeSpecialization )
    | is ( Type == TypeSpecialization )
    | is ( Type Identifier )
    | is ( Type Identifier : TypeSpecialization )
    | is ( Type Identifier == TypeSpecialization )
    | is ( Type Identifier : TypeSpecialization , TemplateParameterList )
    | is ( Type Identifier == TypeSpecialization , TemplateParameterList )
  • Template declarations are missing in the parser rules.
  • Template instances are missing in the parser rules.
  • Proper attribute specifier implementation. (That is, accumulate all the attribute specifier to the corresponding definition).
  • Error handling and error recovery resembling to the Descent’s parser’s one.

The actual state is on my clone’s master branch here:
http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-root/sour…

As I try to replace the current parser without changing much of the current state of the code (I made very few, the most obvious
modifications, such as constructors for creating AST nodes without the conversion process, and in very few places I added extra fields to currently existing nodes.), I wouldn’t add at this point any new features, or mess with the AST hierarchy.
This is my first try to create parser with a parser generator, and I admit, many places there’s need to improve the current state. Later on, we should change the ASTNeoNode hierarchy to work with on the CommonTree ANTLR object, which would eliminate the need for individual object creation as an action code. (Not sure completely how, but I think it is possible to create heterogeneous trees with ANTLR using factory pattern which in turn would need the ASTNeoNode classes and
interfaces to be more consistent as they are today. As an example of inconsistencies, at the current state some nodes are using ArrayView, others use simple arrays of objects, and so on.

Under The Bonnet: Parser, Lexer… ANTLR

Sometimes it goes like this: you start to make your well defined contribution (type inference), and as you start to assess the job to be done, you realize that some parts are bigger than you thought. So you start looking in to them time to time, and only after a while you realize that you’re step further down in the heart of the project. At this time, I fell in to the parser but for a reason: At work, I was working on a toy parser for a server with the good ol’ hand-crafted way. And I found it fun. Originally, I had the idea to  use bison, ANTLR or something similar to generate a parser, but I rejected the idea because I was afraid I couldn’t learn enough about the compiler generators to finish my task in reasonable time.

In the recent weeks, I tried to get my head around the core of the DDT.  I’ve found the current visitor accessibility unfortunately narrow for semantic analysis, type deduction or type inference, and ran in to some artefacts in the AST design. Well, this is always good news as someone has to deal with these issues. Soon enough I found that the lack of direct parser to our AST is a source of unnecessary complexity, not to mention the possible overheads. I say possible, because I couldn’t make my self carry out detailed measures. OK, OK, I’m lazy! Anyway, so the obvious choice was to look in to the question of the ANTLR parser.

Some initial experiments taught me that I shouldn’t follow Walter’s BNF-kinda description because 1. it is heavy of left-recursive rules definitions, which needs serious re-factoring, 2. there are too many differences between our current AST hierarchy and D documentation’s for explaining the language.