TikiFestStrasbourg
At this time, I think there is nothing more important in an Open Source project than to have face to face meetings. I spent the last week in Strasbourg for the TikiFest. I don’t think any of us really knew what was going to happen. I had this idea that it was a perfect moment to release the next major version of Tikiwiki. Some thought it was about security or other technical aspects. We were all wrong. During those days, we got to know each other a little more. For the last years, we have been discussing over IRC, exchanging an occasional email, but we really didn’t know each other. I think that this alone will change the project’s dynamics in the future.
Sitting together in the same room working on the project made it clear that we were all facing similar issues. We could discuss them and see how to improve the situation in the future. No mailing list or chat session could have done as much. We would get up early in the morning and not really see time fly by, alternating between discussions and code. Only our stomachs could bring us back to reality, and that was usually very late.
One of the most important changes that we made is change the release schedule, or create one for that matter. It has been over 3 years since the last major release. Trunk changed so much, we barely know what has been added. The changelog is terrific and certainly not useful to anyone due to the very long length. We decided to release twice a year. The primary reason why we could never really release is that new stuff kept being added in. The reason it came to be that way was that everyone needed that other feature to be added before the release. Otherwise, it would be years before it would be part of a stable release. Well, it has been years since we had a major stable release.
Getting more frequent releases was an easy decision. The discussion was in fact, very short. I think we had everything settled in the morning of the very first day. Later on, this allowed us to assign dates for some major changes, like fully dropping PHP 4 support. Timed-based releases have huge advantages when it comes to open source projects. It allows to observe how things are evolving, take decisions and still provide users with a decent amount of information. It’s not about saying that somewhere in the future we will drop PHP 4. We can provide a month and base it on the growth of PHP 5.
I really have the feeling I am now working on a new project, even if very little of the code changed during those last few days. Just like many other events where developers gather and discuss critical issues, I have some doubts about wether everything discussed will end up being done. However, I have a good feeling. The team is committed and deeply involved in the project. It’s a great thing to know we all share a common vision.
During the entire week, we didn’t have so many hot debates (except maybe one where I exaggeratedly bursted out). Some things are just plain good for a project, and when a project ran without any major guidance for years, some basic, typical, structures are welcome by everyone. One of the only issues we couldn’t quite agree on was what to accept in stable branches. Tikiwiki has a long history of being free for all, which was convenient for every major contributors. With the new model, stable branches are required. Not accepting new features is obvious. Some minor enhancements are acceptable, some not. UI enhancements are even harder to decide on. What is a bug anyway? Is it a major one? What is major? What to accept is an entire gray zone. It’s extremely subjective. I had to rollback commits. I hate doing that. Without people to discuss the issue in front of me, I don’t think I could have done it. In total, we reviewed around 8 commits made after RC1. It took us at least 2 hours to decide which one we couldn’t accept.
Rejecting someone’s contribution is hard. At least I could motivate myself by the fact that this was only the stable branch. They could still apply their patch on trunk. However, with faster release cycles, we will have to keep trunk as somewhat stable. We already have this idea of experimental branches (or feature branches) for new things to mature before being merged, but it really adds overhead. Subversion is quite hard to handle on the branching and merging issues. If you are familiar with version control, it really all makes sense, but it’s a barrier to entry. I started writing scripts to do the hard work, but they still require people using a shell to use them.
Establishing a software process to ensure quality without breaking the team dynamics is hard. Doing it on a community of volunteers is even harder. Some well known best practices just don’t apply. Unit testing? Forget it. Iterative development? We don’t even know what people will be working on or when they will have time to do it. With at least 6 years of development, people got used to their way and, let’s face it, it worked so far. Only part that didn’t work is that the software could never quite release. The decisions we took in Strasbourg were required to improve the releasability of Tikiwiki. I hope it will work out with the community.
For every other decision other than the release cycle, we basically had to take accountability for it. We built some sort of road map, but it can’t really apply to anyone else as we can’t assign tasks to anyone. People will keep volunteering on what they want and we have no control over that. This leads to a strange phenomenon in the roadmap. The roadmap is more about dropping things than about adding things. Tikiwiki is huge. It has features for everything. Letting people add whatever they want has always been very important. It lead to features no one could have imagined with incremental changes and clever combination of features. However, for all the great things that happened, some things were added and forgotten by their authors, remained obscur, unmaintained and unused. For all we know, it might not work at all as it rot and broke. These things are a burden for the project administrators, so they will gradually be removed. Ideally, things like removing unmaintained features and dropping support for older technologies will reduce the weight we have to carry and allow to build greater things.
He’s right, again
In some theoretical sense, developing the complete infrastructure before implementing any visible functionality might be efficient, but in a practical sense, managers, customers, and developers begin to get nervous when too much time goes by before they can actually see the software work. Infrastructure development has the potential to become a research project in creating a perfect theoretical framework[…]
Sounds like a familiar problem? Taken from Software Project Survival Guide. I can’t say it’s my favorite from McConnell, but it’s still right on.
Motivation Driven Development
Today is just one of these days I can just look back and laugh at my own behavior. I have been working on a personal project for a while now (should go public soon). Of course, it started off with many great ideas and I could have fun just thinking about it. When came the time to actually code it, motivation dropped. The problem was really that, while I had a great goal, before getting close, I had to get all the ground work done.
What happened? Well, it froze. I stopped working on it for months, until recently. When I got back to this project, I came back because there was something specific I wanted to implement in it. When I took a good look at at what I had in progress, I realized that the things I was focusing on were not getting me any close to my goals. I just left everything as is. Tests were running. Some code was not used yet. No problem.
Instead of starting over from where I was, I mapped out the high level features I wanted it to achieve and wrote down a road map. It was not based on building good foundations, not based on a good architecture. It was based on what’s needed for the software to be any useful and what I felt like working on. What did it change?
- Changes were visible on the final product
- At every step, I would get closer to being able to use it, and find out different ways to use it
- I got motivation to work on the project
- The project evolved more in the last two weeks than ever before
So, why today? Well, that feature I had half started months back, I finished it today in just 30 minutes. Months of no progress to avoid 30 minutes of work. It’s not that it was long, certainly not hard. It was a boring task. It was necessary, but along it did not do any good. Today, writing it enabled a very powerful feature. Even if it was boring, I was happy to do it because I would then see the whole thing in action. It’s not quite complete yet as it still misses a few critical features required to normal use, but I can already use it for my own needs, which is great.
Now, if I had done it a few months ago, I’m pretty sure it would have been more than 30 minutes of work. It just takes me more time to do work when I’m not motivated. It’s also likely that the feature would have been more complete. Rather than doing what it has to in order to be useful, it would have been what it should be not to be so boring to write. Goldplating? Scopecreep?
The scary part is that I’m pretty certain it’s not the first time I’ve dropped a project just because I didn’t feel like doing a tiny little part.
Who reads code samples?
Recently I have been reading Programming Collective Intelligence by Toby Seragan. I love the subject. It’s all about handling large data sets and finding useful information out of it. Finally an algorithm book that covers useful algorithms. I don’t read code-centric books very often because I think they are boring, but this one has a great variety of examples that keep it interesting as the chapters advance. There are also real world
examples using web services to fetch realistic data.
My only problem with the book is that there are way too many code samples. It may just be my training, but there are some situations where just writing the formula would have been a lot better. Code is good, but when there is a strong mathematical foundation to it, the formula should be provided. Unlike computer languages, mathematics as a language has been developed for hundreds of years and it provides a concise, unambiguous syntax. I like the author’s effort to write the code as a proof of concept, but I think it belongs in an appendix or on the web rather than between paragraphs.
Which one do you prefer?
def rbf(v1,v2,gamma=20):
dv=[v1[i]-v2[i] for i in range(len(v1))]
l=veclength(dv)
return math.e**(-gamma*l)
or
For that kind of code, I vote #2 any time. I’m not a Python programmer. I can read it without any problem, but that vector substraction did seem a little arcane at first and it took me a few seconds to figure out, and I’m about certain that even a seasoned Python programmer would have stopped on that one. It’s not that it takes really long to figure it out, but it really keeps you away from what is really important about the function. What was important was that you want to score points that are far away from each other a lower value than those that are close by. Anyone who has done math could figure it out from the formula because it’s a common pattern. From the code, would you even bother to read it?
This is a very short code sample. In fact, it’s small enough that every single detail of it can fit into your short term memory. Here is an example that probably does not. In fact, I made your life a lot easier here because this code was scattered across 4 different pages in the book.
def euclidean(p,q):
sumSq=0.0
for i in range(len(p)):
sumSq+=(p[i]-q[i])**2
return (sumSq**0.5)
def getdistances(data,vec1):
distancelist=[]
for i in range(len(data)):
vec2=data[i]['input']
distancelist.append((euclidean(vec1,vec2),i)
distancelist.sort()
return distancelist
def gaussian(dist,sigma=10.0):
exp=math.e**(-dist**2/(2*sigma**2))
return (1/(sigma*(2*math.pi)**0.5))*exp
def weightedknn(data,vec1,k=5,weightf=gaussian):
dlist=getdistances(data,vec1)
avg=0.0
totalweight=0.0
for i in range(k):
dist=dlist[i][0]
idx=dlist[i][1]
weight=weightf(dist)
avg+=weight*data[idx]['result']
totalweight+=weight
avg=avg/totalweight
return avg
or
The formula is insanely shorter, and the notation could certainly be improved. What’s the trick? It relies on well documented language features like vector operations and trims out all the python-specific code. I actually wrote more than I had to because gaussian itself is well defined in math. Because all operations used are well defined, whichever language you use will probably support them and you can use the best possible tool for your platform. The odds that I use Python when I get to use those algorithms is low, so why should I have to bother with the language specifics?
The author actually included the formula for some function in the appendix. I just think it should be the other way around.
The end
After the courses, the exams, the rituals and the various parties, I came to realize it was all finally over. I mostly completed my studies by now. I still have one last course remaining this summer, but most of the people I spent all these years with are moving on now. It really is the end, and I already feel nostalgic of those years.
This last semester was a blast. I don’t think there is a single moment I could relax. All I could do was look forward to a date just to realize I had an other blitz coming up.
- January
- Cross Lingual Wiki Engine project launch
- CUSEC 2008
- CLWE core
- February
- Architecture for Translation Synchronization document
- Version control presentation redaction for PHP Quebec
- Mid-term report for school
- March
- Blitzweekend
- CS Games
- PHP Quebec 2008
- April
- Final report for CLWE
- Paper redaction for WikiSym08
- End of semester
This list isn’t even exhaustive because it ignores most normal school assignments, my regular work and various local PHP Quebec activities. No wonder I have not seen time passing by. Next week I will be in vaccation under the sun and I am really looking forward to it.
What is ahead? Mostly fuzzy. I refused all offers to work for large companies. I will simply continue my consulting work for a while with a few ventures along the way and I will try to keep doing some work on CLWE. Back when I started college, I had no idea what would be 3 years later. When I started university, I had no plans for the post-study period 4 years in the future. Nothing changed to my vision, but it has been a long road already.
Kubuntu 8.04, X.org and xrandr
A few months ago, I made an update on my system running 7.10 and it turned bad. The package actually got corrected very fast, but I had no time to waste so I re-installed my system. I had been upgrading my Kubuntu distribution since 6.06 and didn’t have any CD for 7.10. To get multiple screens working and the non-conventional screen resolution of my laptop, I had been carrying a heavily customized version of my xorg.conf file and a few scripts to handle the different resolutions I could need (dual head at home, single head when on the road, and a cloned 1024×768 for those times I had to present. Upgrading is always a mess, but I didn’t feel like downloading all updates, so I got an alpha release for 8.04. I was surprised to see it first correctly configured my resolution to 1280×800 and didn’t need that 915resolution fix.
OK. Maybe that wasn’t the smartest move. To keep the story short, I couldn’t get the extra monitor on my desk ever since that upgrade. My xorg.conf file would not work anymore and the configuration tools were broken. Since I had too much to do, I couldn’t get the time to investigate, but today I did. I seems like the way X is handled is having a huge face lift. There was something comforting about the xorg.conf file. If something messed up, you could always fix it by knowing some arcane lines and searching for solutions from lynx. However, it seems like that file is about to go extinct. This is what my file looks like at this time:
Section "InputDevice"
Identifier "Generic Keyboard"
Driver "kbd"
Option "XkbRules" "xorg"
Option "XkbModel" "pc105"
Option "XkbLayout" "ca"
EndSection
Section "InputDevice"
Identifier "Configured Mouse"
Driver "mouse"
EndSection
Section "Device"
Identifier "Configured Video Device"
EndSection
Section "Monitor"
Identifier "Configured Monitor"
EndSection
Section "Screen"
Identifier "Default Screen"
Monitor "Configured Monitor"
SubSection "Display"
Depth 24
Virtual 2560 1024
EndSubSection
EndSection
It really is different from the 222 lines file I used to carry over from installation to installation. The only thing specific to my system on it is this section:
SubSection "Display"
Depth 24
Virtual 2560 1024
EndSubSection
The only purpose of it is to tell the Intel driver to allocate a larger framebuffer at X load time in case I want to extend the Desktop. It made sense before to allocate only what was needed at load, but since you can now change the resolution dynamically, reloading X is not an option.
With fairly simple commands, the resolution can now be changed and it’s mostly safe. Never prevented me from using my computer. The only problem is that the GUIs are lacking at this time, so those commands may not be good for everyone and there are some parameters I wish I didn’t have to understand.
To get my second monitor working, I had to use these commands:
xrandr --output VGA --right-of LVDS xrandr --newmode 1280x1024 108.88 1280 1360 1496 1712 1024 1025 1028 1060 -HSync +Vsync xrandr --addmode VGA 1280x1024 xrandr --output VGA --mode 1280x1024
Other than the second line, this is fairly simple. Tell it where the second monitor should go, register a resolution (Google told me the line, I didn’t figure it out), tell that the resolution applies to the VGA monitor and activate it. In the blink of an eye, the desktop is extended. No need to restart X anymore.
All this messing around with resolutions was always annoying for laptops running Linux. On desktop, all you have to care about is your own setup. However, with a laptop, the setup can really be anything. Now if I could just have a GUI to select the monitor resolution, I would be fully happy. I guess this is all going to arrive before the final release of 8.04. Still, the situation is a lot better than it used to be.
xrandr could use better documentation, but I doubt it was meant to be used by end users. Just calling the command lists the available screens and the known resolutions for them, so that is easy enough. However, the –help option does not help so much with what is expected as an argument. There is some help available on forums and such if you know what to search for.
How can you target the audience?
Saturday morning. PHP Quebec Conference 2008 came to an end. This year, I was making a presentation on version control techniques, branching methods and general quality assurance elements. This is not quite the typical subject you can expect at a PHP conference, but I still think bringing more general software development topics to web developers is important. It was not the first time and I guess it won’t be the last.
However, reading the feedback forms, I found out that there was great variety within the audience and I don’t know if there is a way I can address everyone. How can it be to give enough information to teach something to those who actually know quite a bit about software development and at the same time not loose those who are self-thought and basically have no structured foundation? I guess this is something I will have to try and solve for the next time I make a presentation. That and my habit of speaking too fast.
Overall, the session went fine and I am happy with it. Other than the fact that I needed 3 laptops to get the projector working, everything went fine. I noticed that I had forgotten my notes after the presentation, so that’s a good sign I didn’t really need them. The negative comments I had in the feedback forms last year didn’t show up again this time, so that’s also a good thing.
Feedback forms are great. Especially when the organization team has someone as dedicated as Anna FIlina to type back all comments and hand them out to the speakers within hours of the session. Fill those comments, write down what you think should be improved. I think most speakers gladly accept the critics when they receive them.
Anyway, if you attended my presentation and have comments about it you didn’t write on the form, feel free to email me. In the same way, if you have questions you couldn’t ask, I would be glad to help out. Who knows, if I ever give this presentation again, I might be able to incorporate a few more details.
Blitzweekend
The Blitzweekend was quite a special event. When I first saw the announcements, I thought it was like our CodeFests. In fact, the concept is very similar, but the community is completely different. It’s simply not the same people. In CodeFests, most of us are from the Open Source community. Our goal is to play with technology, collaborate on projects and work together to learn. At the Blitzweekend, this kind of collaboration did not exist so much. Outside of the Tikiwiki room, all projects were isolated. It wasn’t as much about technology as it was about business.
I still had a great time and met with very interesting people, but it wasn’t quite what I expected. Our goal for the week-end was to make a first release of Tikiwiki 1.10, which we decided to label as beta. Otherwise, it was just a great opportunity to get together and meet other contributors. The Rest of Blitzweekend was targeted towards creating a product, starting a company and making a pitch about it. The whole competition thing wasn’t for us, but I thought it was quite nice to see all those ideas.
However, it was hard to evaluate the merit of the projects. Some projects had excellent ideas and made a great pitch, but I couldn’t be convinced that they actually got anything working during the week-end. At best a nice user interface with functional links. No proofs it actually works beyond navigation. In other cases, the projects did realize something over the week-end, but it seemed like a lot was already prepared before the first day. I was a little disappointed by the lack of attention the projects made out of self-interest and passion received. Seemed like the fact that the project was useful to you and was a nice accomplishment in a week-end was of no importance. Unless you were hunting for millions, making a presentation was a waste of time.
Don’t get me wrong, I loved the event and have nothing against entrepreneurship, but I do wish the next editions will leave a little more space for technology and collaboration. On Friday night, there was a moment where people had to raise their hands depending on if they were designers, entrepreneurs or developers. The developers and designers in the room were litterally a minority. How do you actually expect to get a product out the door with such a proportion? Too much focus on business will drive developers away. Sure having a contest for business plans and prototypes is nice, but I don’t think that closing the door to the technical community is such a good idea. I think the first step would be to accept that not everyone takes on the challenge of developping something over the weekend with the sole idea of making millions out of it.
The second step would be to make sure that collaboration can actually happen. If it’s not to be with other people and share something, the challenge can be taken from home and does not require anything else. I wouldn’t have had a problem with answering a few questions and helping out other teams during the week-end.
Anyway, the whole thing was very well organized. Location was suitable, although a little far from downtown, and the shared room was a nice place to meet with the other participants (at least, those that ended up taking breaks).
Finally done
After nearly 4 years, I am finally done with the translation synchronization problem. The Cross Lingual Wiki Engine project is far from completed, but at least the change tracking mechanism is. Looking in my own archives, I first wrote about this issue in August 2004. The solution I ended up implementing is in fact very close to what I wrote the second time around in 2006, except that I dropped most of the input required by the user.
The implementation is now part of the TikiWiki CVS and will be included in the 1.10 release (or however it’s going to be called) in a few weeks from now. As it’s licenced under LGPL, feel free to rip the implementation apart and include it in your own projects. If you do so, please tell us on Wiki-Translation. I will be glad to help out.
The architecture actually uses a single table and large SQL queries to do the work. In fact, the implementation is only around 125 lines of SQL and maybe 300 lines of simple PHP to work around the TikiWiki way of handling things. The core implementation took around 6 hours including a proof of concept, then I probably worked an other 10 hours working out the queries and extracting meaningful information.
The tasks accomplished by the SQL is fairly dense. You can also read the article detailing the theory and implementation.
I definitely spent more time writing the documentation than working in the implementation. In fact, writing is short enough. Reviewing is long. Finding problems is hard. Luckily, I had Alain Désilets and Sébastien Paquet to help me in that task. There may still be room for improvements and comments are welcome.
Why didn’t I do it back in 2006? Well, I thought it would have been more complicated, and the problem was mostly theory as I didn’t have an actual test site that would need to handle more than two languages. In fact, the current implementation is the child of many discussions I had along the way. If I had done it before, it wouldn’t have been as good. Couldn’t have been without all those discussions at RoCoCo and WikiSym2007. They brought special use cases and influenced the results to cover much broader situations and do it with simplicity.
Formal proofs may be good after all
In the past few days, I spent a lot of time working on the Cross Lingual Wiki Engine Project. I finally found the solution to change tracking across multiple languages without restricting contributions. In fact, the core of it only took a few hours to write. Then I had to spend a few more to mine information out of the table. Most of the time I spent on the project recently was to write an article on how the thing works.
My primary purpose in writing the article in the first place was to explain the article to other people working with me on the project. I made a few attempts before writing it down and it didn’t work out so well. I figured I was better off trying to structure my mind before trying to communicate, and I know of no better way to do so than by writing. Since the article is part of an academic project, I figured I should give it an academic feel (and I don’t mean to make it boring). I started writing it in LaTeX using LyX. The tools are just amazing. Using them kicks me in flow in a matter of minutes. During my writing session at the Pub, I never noticed 5 hours had passed, sun had went down and the room became crowded.
Anyway, in giving it some academic feel, I went down to basic math concepts like sets and graphs. It’s quite nice because the theory maps really well to the problem. In fact, it was probably a strong influence. In all those years thinking about the problem, I followed multiple math courses and one of them was discreet maths, so I guess it oriented me towards the solution. It turned out just mapping the architecture to those concepts allowed me to explain even more details. Mathematics are a very expressive language and it has a set of solutions to many problems if you can express them correctly.
So for pages and pages, I go on explaining how it all works in math terms. Then comes the implementation section where I actually explain how it works using real technologies. Today, after the review of a first draft, I decided to add an explanation of an additional query, which happens to be quite central. I didn’t know that before. I went on to describe it and explain in which ways it is correct. That was until I realized it was wrong. Simply not accurate. False in all possible ways. I had the query rewritten twice already because I forgot some corner conditions the two first times. I was pretty certain it was good this time. Mapping it back to theory, I realized how wrong I was.
Then I went back to my code and made an attempt to catch that newly discovered corner condition. After expanding the query significantly (it had 4-5 levels of subqueries by that time), I realized that last level of nesting was really close to the real purpose. I went on and removed some code, and then more. The final expression is so simple. All I did before was run in the wrong direction.
I can’t be certain it’s right now, but at least it fits the model. My advice of the day: when working on a though data modeling problem, try to explain it in terms an academic would understand. Write the formulas and prove them. Draw diagrams using GraphViz or restrict your diagrams to 2 primitives. Write it in LaTeX to give it all an old-school academic look. The final document will be good, but nothing of value compared to what you will learn on the way.
I still need to make a few changes based on the last review, and will probably go for a second round of review, but that article will eventually be published somewhere. Stay tuned.



