PhpStorm Regular Expression search and replace & Macro function

November 10th, 2014 2 comments

As a long time Vim user I’ve learned to rely on a couple of time savers that take some time getting used too.

Mainly this would be regex search and replace and the use of macro’s. Both have a bit of a learning curve and have fairly niche applications. However, when such a situation presents itself they are an invaluable time saver.

First is the one that I use almost daily. Regular expression search and replace.  Every time you have to do a batch of change operations on line(s) that are almost but not quite the same often regex search and replace can save you a lot of time.

Today for instance I decided to convert the config system of an application I was working on from a native PHP array format to a YAML file. I don’t hate using arrays as config files for simple things, but putting them in a data storage format allows for some more flexibility. In this case I needed to have a main config file, and a “local” config file with mutations to the main config file. That way I can save the main config file in the project repository and people can keep a local config file that can overrule certain settings, like database settings/credentials and such.

Anyway, besides the point.  What this file had in common was a lot of repetitive lines that where in the same format, but obviously did not contain the same value. Regex to the rescue.


On the left you can see the original PHP array file, at the top somewhat enlarged the search & replace screen, and on the left the result after hitting the “Replace all” button.

The regex itself is not a thing of beauty, and it doesn’t have to be. It’s for a one-time use operation, it just has to do the job.  I won’t go over it in detail, but just as a short summary, I want to match the entire line, so I start with “^” and end with “$”. Then first I want to match the whitespace and put it in a capture group so I can copy the indentation. Then I want to match whatever is between the initial single quotes, they are all single quotes so no need for complicated quote matching. Then I put whatever is in between them in another capture group, I set the 0,n modifier to not-greedy so I just get the contents I want.  Then I have a throw away “match all” until the “=>” brackets.  Then for capturing the value of the array notation I had to modify the catch all to exclude “(“. This because I did not want to match the “‘something’ => array(” lines.

Then I wanted to replace the matched lines with the contents from capture group 1 and 2.  The indentation and key, add the colon, and then a space and the value, and done.

Once you’ve used these types of operations a few times it becomes really fast to come up with the initial expression, making it a very fast way to reformat 10 lines or a 100.  PhpStorm visually shows you what you are matching and how it will be replaced which makes it very easy to write the expression with immediate feedback on what is going to happen.


The the second feature, macro’s.

By default the macro functions aren’t bound to hotkeys, to fix this I bound them to ctrl+shift+q to start/stop recording, and shift+q to play the last recorded macro. As one would imagine, macro’s allow you to record keystrokes and then replay them.

Now to really make use of macro’s you will have to learn efficient ways to navigate the code. I have to admit in Vim this was a bit easier since all the ways of moving trough your code where more direct, while in PhpStorm I don’t use them that often, and when I looked in the keymap most of them aren’t even bound to keys by default.

Either way, 2 useful ones are the Home&End key to go to the beginning of the text and the end of the line, and CTRL + left/right to skip to the next word.  I must say I really miss shift+% from Vim which allows you to go to the matching bracket/parenthesis, which allowed for some really powerful stuff.  But the only equivalent I could find in the keymap was to switch between ending/beginning of brackets. Which is nice, but only about half as awesome as being able to have the same hotkey for parenthesis.

Anyway, just hit the record key, do some semi-intelligent code modifications, making good use of relative movement. Then save by hitting the record key again and replay on the spots you need it.

It’s admittedly not as powerful as the regex search and replace, but in some cases search and replace just can’t do exactly what you need it to do and then macro’s can be a real time saver.


Caching, virtual bits don’t rot

October 28th, 2014 5 comments

This is very common so I just wanted to address it because it annoys me.

Time based caching is a last resort, not a default go-to.  Virtual pages don’t turn yellow over time, data in a cache doesn’t start to slowly rot away. Caching isn’t a stop-gap solution against bad performance, it’s a layer or multiple layers in your application that you have to think about.

Proper caching strategies can improve the performance of your applications by a metric ton. The reason is obvious, instead of doing something every single time, you only do it when it’s needed. No more, no less.

But more often then I’d like I see caching thrown in as a stop-gap solution, where some part of the application couldn’t scale well enough and some caching is thrown in around it and it’s set to refresh every 5 minutes or every 10 hours or every 24 hours or what have you.  It’s ugly and it’s setting you up for a technical debt.

Caching should be a holistic solution. Applications have (spaghetti legacy code notwithstanding) natural separators between certain parts. The database model, some remote API, your controllers, etc.. These are natural places to add a caching layer. More importantly by adding caching in these places you can ensure that neither side of the code overly depends on the caching.  As opposed to slapping say a few lines of caching code around some bits of code but not others. That just adds to complexity, potentially creates unexpected behaviour, and it probably becomes impossible to do proper cache warming.

Now that we have this thin caching layer. Instead of setting a time to live and calling it a day, actually take a step back and try to get it to cache for as long as possible. Data doesn’t rot, and cached HTML output doesn’t turn yellow.  What you want is independent invalidation.

For the sake of having an example let’s say we have added some caching to our database model and when getting a User object from our repository we actually return a cached version instead of doing a database query. And we won’t invalidate that cache until the User object actually changes. We can detect when it changes by simply triggering the cache invalidation when the User object gets saved with changes.

You want to have the caching as a separate service not tightly integrated with your object model though. Because if you at some point want to do a bulk change on the Users in your database you want to be able to invalidate them all again, and perhaps more important apply cache warming so that the new users get put back into the cache even before the application actually needs it. Because nothing is worse then taking the “the first user to visit the page will trigger it” approach to things.

Another caching optimization step you can take is looking at the data and extracting data that isn’t dependant on each other into separate entities. The point here isn’t normalization, or necessarily looking at cohesion. It’s about cache strategy.  So say a User entity has a counter that keeps track of how often he has logged in. In short this means you’d have to invalidate the cache each time the user logs in, not exactly a perfect world.

So what you can do is extract that counter into it’s own entity and link it to the owning user and make it a property. Now don’t get me wrong, I’m not necessarily talking about moving about tables in your database. Just the internal object representation of the data. So before the User model had perhaps an integer loginCounter property, and now he has a LoginCounter loginCounter property.  Where the LoginCounter can be retrieved and saved by itself without disturbing the User entity, even though they might live in the same table in the database.  Objects aren’t tables and all that jazz.

Now there are unfortunately valid places where you might want time based caching.  Situations where no mater how you slice it it’s just a very expensive operation. And in those situations it’s perfectly valid to just have a cronjob or jobqueue or whatever solution and defer the entire thing to manage performance.

Anyway, if some part of your application is underperforming  take a step back instead of slapping some caching around it and calling it a day.



Categories: software development Tags: , ,

some thoughts on proof of concepts

May 28th, 2014 No comments

Twttr_sketch-Dorsey-2006This post is way too long, so here’s the TL;DR
“Don’t be afraid to write concept code while designing your project. Make sure the overall architecture is sound when you do. When starting to implement, revisit the concept code, refactor the shit out of it. Don’t be afraid to throw away large chunks of it, code is cheap, it’s the underlying ideas that you want.”

If you prefer rambling, then by all means read on.

For the last couple of weeks I’ve been working on a rather large new project with a bunch of specific non-standard needs. The part I’ve mostly been working on is only a small part of a much larger whole that my team is working on.

For the most part I’ve been writing documentation, defining how things should work, and collaborating with my team to make sure all the pieces still work together. This also entailed a LOT of R&D. Simply from experience I generally have a good idea how to solve a given problem, but I feel it’s often worthwhile just to quickly implement it to make sure it actually works.

This in turn means identifying both critical functionality, functionality without which an entire facet of the project wouldn’t work anymore, or high risk functionality. Basically solutions I thought up for problems where I’m not sure if it would actually work in practice.

Creating proof of concept code is invaluable not only for testing theories and assumptions, but also for recognizing problems and getting better insights in how to solve them. Simply code fast and dirty if you have to, the exercise is to get a feel for the solution, not to win an award for the most elegant code ever written. Do take a moment to make sure your inputs and outputs are well done though, if it for instance needs dependency injection, add dependency injection, or at least make sure it *could* work with DI. This will save work later, and makes you consider the overall architecture of your project. For instance, if some functionality needs a session, but in your architecture it was supposed to be stateless. Solve that. If you can’t make it work, then the solution doens’t work. Even if it would work if you simply hardcoded a few bits now in a quick & dirty way. The code can be dirty, but the architecture should be sound.

As a concrete example of discovering hidden problems, one of the wishes of the project was to implement HTML5 pushState technology in combination with client side template rendering. The benefits are obvious, a more responsive experience for the user and less data transfered for the server. win/win.

I had a little proof concept working in my sandbox branch and a few days later while tackling one of the other features, which was ESI (Edge Side Includes) support. Things broke. As a requirement of the pushstate stuff we wanted only to maintain one set of templates for both the back-end as well as the client side. Not a big problem. But when you introduce ESI to cut out parts of your template to become essentially their own actions, you inherently break client side rendering of templates.

Of course there are various solutions to this problem. But I dare say I wouldn’t have discovered the problem had I not spend some time making quick & dirty implementations.

Now after about 2 months of pouring out design documents, diagrams, and a fairly complete technical design. We come to the part where we actually have to start building the damn thing.

A key rule that I’m sure everyone will know is to throw away your proof of concept code. And I fully agree with that. But with an asterisk attached to it. I think it’s sort of generally understood but perhaps interesting to point out, that you shouldn’t actually throw away your concept code. You simply shouldn’t USE it. Don’t copy paste, hit F5 and if it doesn’t segfault call it a day.

What you should do is revisit it. The code served a purpose, it solved a problem, the ideas it represent are probably still correct. Especially if you took the time to make sure it made sense within the larger architecture. Write unit tests to test the functionality it adds, and define all the edge cases you can think of. It might be that everything is green across the board when you are done, but more likely then not you should have some corner cases or functionality that you didn’t end up adding to the concept code which fails.

Now simply start fixing the code, be as destructive as you feel you need to be, perhaps there’s some fancy design pattern in there that looked brilliant at the time and looks like the worst thing ever now, just yank it out and give it a think to implement things better. Add all those input validations, missing functionality, cleaning up the code, rethink method names, variable names, removing code smells, taking out hard coded things, etc..

You have your unit tests to tell you everything is still working as it should, and if you feel you refactored yourself into a big scary pit, a simple revert will give you another shot.

Chances are, at the end a fair portion of your intial code got changed, maybe even everything, and maybe you had some pretty good ideas first time around and you only needed some tweaks here and there.

But the important thing is that you didn’t start from scratch. You didn’t need to spend time thinking about how to solve the problem, you could immediately spend time consdering if your solution was correct, without necessarily still being in-love with your solution (a dangerous thing), spend time polishing and making the code better. This especially works wonders when there is some sizable chunk of time between when you wrote the concept code and when you revisit it, you can immediately identify those “WTF” parts of your code.

Currently I’m doing the exciting job of writing task/feature tickets, and from the half a dozen concepts I’ve made I’ve already identified 2 that will more then likely end up in the project with only some light refactoring, then another 2 concepts of which I’m just really not happy and in the back of my mind I’m already thinking of how to re-implement them, and I wouldn’t be surprised if I end up rewriting most of it.

And that’s Ok too. Proof of concepts allow you to make mistakes and learn from them. You’ve already tackled a problem once, and now you are allowed to do it again. Meanwhile if the back of your brain is anything like mine you’ve already been thinking about the not-quite-elegant solutions you’ve made and have been thinking of better ways to solve those problems.

Also don’t be afraid to revisit a concept again during documentation, sometimes inspiration just strikes. I’ve had a bunch of code that added functionality to twig, and it was just bugging me to no end. It wasn’t nice, it wasn’t elegant, it wasn’t correct. Then one day while writing about something else entirely the back of my brain dumped the solution for my problem, and I was able to throw away the entire mess and quite literally replace it with 15 lines of code, of which only 3 actually interacted with Twig.

So to end this rant, don’t be afraid to write concept code while designing your project or when adding an extensive feature. Just make sure the overall architecture is sound when you are done. Then don’t be afraid to revisit that concept code, and make use of the lessons and ideas it represents. Also don’t be afraid to throw away large chunks of it, code is cheap, it’s the underlying ideas that take time to build.

Categories: PHP, software development Tags:

Custom Symfony2 CLI output

October 13th, 2013 3 comments

Just a quick little post for my future self.

I’ve recently been working on a little hobby project involving some intensive CLI stuff with Symfony2. I felt the output handler was lacking though, it was a small thing but I really wanted a prefix for each output with a time and a time difference between the last message. Simple stuff to see how long certain steps took in the process I was working with.

I did some googling and it was actually rather easy to add with symfony2. You simply have to add a custom ConsoleOutput which extends the normal one. This is what the one I made looks like.


namespace testPrj\ProcessingBundle\Component;
use Symfony\Component\Console\Output\ConsoleOutput;

use Symfony\Component\Console\Formatter\OutputFormatterInterface;
use Symfony\Component\Console\Output\ConsoleOutputInterface;

class ConsoleTimeStampOutput extends ConsoleOutput implements ConsoleOutputInterface
    protected $lastTime = 0;

    public function __construct($verbosity = self::VERBOSITY_NORMAL, $decorated = null, OutputFormatterInterface $formatter = null)
        $this->lastTime = microtime(true);
        parent::__construct($verbosity = self::VERBOSITY_NORMAL, $decorated = null, $formatter = null);

    protected function doWrite($message, $newline)
        $message = $this->addTimeStamp($message);
        parent::doWrite($message, $newline);

    protected function addTimeStamp($message)
        $now = microtime(true);
        $diff = number_format($now - $this->lastTime, 5);
        $message = "[".date('H:i:s')."][{$diff}] "  . $message;

        $this->lastTime = microtime(true);
        return $message;

Then in the CLI file app/console I added.

// Custom Output handler
use testPrj\ProcessingBundle\Component\ConsoleTimeStampOutput;

$output = new ConsoleTimeStampOutput();
$application->run($input, $output);

And that was that.

Now when I print output via $output->writeln() it comes out as

[20:09:30][0.00299] Pulling till 33778503  - 2013-10-12
[20:13:34][244.00970] [20:13:34] :: Cache:False
[20:13:34][0.38880] 33649011 - 2013-10-05
[20:13:34][0.00005] Memory: 19.962341308594MB
[20:13:34][0.00003] fnd records:158
[20:13:34][0.00003] new records:42
[20:13:34][0.00003] sent records: 200
[20:13:35][0.28439] Done
Categories: PHP, software development Tags: , ,

the incredibly lazy guide to installing mod_pagespeed

November 12th, 2010 6 comments

You hate reading? you want to try out mod_pagespeed? you run a ubuntu or other debian based server? Well then just follow the following steps.

  1. get the binary package based on your architecture. (to check which one run “uname -m”. If it says x86_64, they you have a 64bit server)
    • 64 bit.
    • 32 bit.
  2. install the package (substitute am64.deb with i386.deb if you don’t have a 64bit version)
    sudo dpkg -i mod-pagespeed-beta_current_amd64.deb
  3. open up the following file with your favorite editor
  4. add all the cool features you want, i currently run this. (line 47 in the file, but it doesn’t really matter)
    ModPagespeedEnableFilters collapse_whitespace,elide_attributes
    ModPagespeedEnableFilters combine_css,rewrite_css,move_css_to_head,inline_css
    ModPagespeedEnableFilters rewrite_javascript,inline_javascript
    ModPagespeedEnableFilters rewrite_images,insert_img_dimensions
    ModPagespeedEnableFilters extend_cache
    ModPagespeedEnableFilters remove_quotes,remove_comments
  5. restart apache
    sudo service apache2 restart
  6. done.

I haven’t fully looked into mod_pagespeed and all its filters and implications there of myself, but I always like following these kinds of lazy quick guides myself to start poking around instead of actually reading something for a change. So i figured I should just make one as well.

phing + dbdeploy website deployment

November 8th, 2010 1 comment

I recently had a project with xs2theworld to help create the mobile websites for intel asia. Because this project was quite important and I wanted to step up my game I created a proper deployment strategy. No more sweaty palms while running custom scripts, pressing svn up or switching symlinks. I wanted a fully automated deployment. A deployment I could test, run and always get the same result.

Because I’ve been hearing about phing and dbdeploy from dragonbe and harrieverveer I looked into them. They ended up being excellent tools to reach my goal.

Phing is a deployment tool in which you can create a deployment “script” made up out of a ant like syntax using xml.

<copy todir="${buildDir}" >
  <fileset dir="${projRoot}">
    <include name="**" />

DBdeploy is a tool that compares your patches to your database and creates a forward SQL patch and a backwards SQL patch, aggregating your SQL patches in the forward file and the undo statements in the backwards file.

In this blog post I will highlight some of the things I did.

I created an actual ‘build’ stage, where all the website elements were processed and copied into a separate build directory. The reason for this was two-fold. Firstly I wanted to be able to check the result of a ‘build’ without it being deployed, especially the SQL patches that dbdeploy generated. Secondly, I wanted to only copy those files that were needed for the site to run. So no .git directory, no sql patches directory, etc..
This has really been a great choice, because of the separate build stage i’ve at least had two instances in which i caught a problem before deployment. Saving me from the embarrassment of having to make quick fixes while the site was in offline mode.

I created separate phing property files for different environments. (production, staging, development) this combined with a simple wrapper script that called phing resulted in a very pleasant way of deploying by just issuing a command like “./deploy build development” or “./deploy rollout production” and the inevitable “./deploy rollback production”. Much better then “phing -Denvironment=staging build”.  Property files are basically ini files that contain key/value pairs that can be referenced from within your phing build file.
Then in phing you can say “<property file=”deploy/${environment}.properties” />” and it will read the property file. Please note that “${environment}” refers to a variable. which in my case was set when calling phing. (the -Denvironment=)

Dbdeploy is a piece of software that can read your SQL patch files, compare them to the database and create a single SQL file you can run to update your database. Unfortunately dbdeploy is very fussy about the separator you use between your patch and your undo patch. Yes, undo patch. At some point you want to rollback a deployment and at that time you really do not want to find out that you can’t because the new table structure breaks the old code.
It only takes very little time to create undo statements and you will avoid excruciating minutes of frantically applying changes manually when things break.
Also when creating undo statements be sure to set them in the reverse sequence of your normal sql patch statements.

ALTER TABLE `myrecords`  ADD `rank` int NOT NULL;
RENAME TABLE `myrecords`  TO `myrecord`;

-- //@UNDO

RENAME TABLE `myrecord` TO `myrecords`;
ALTER TABLE `myrecords` DROP `rank`;

Also, that is how you should write the undo separator. Exactly like that. If you don’t dbdeploy will simply add the undo section to your deployment SQL file as well. Which is very much unwanted.
Also be sure to always, ALWAYS, ALWAYS! run both your forward SQL patch as well as your backwards SQL patch to make sure it works. Preferably not on production.

That’s about it I guess. There are many wonderful guides that will explain how to use phing and dbdeploy in detail, which is the reason I didn’t. I just wanted to pass along some things I used and thought worked nicely.
I would like to point people who want to read more about phing to the following blog posts that helped me heaps:
– Diving into Phing I
– Diving into Phing II
– Phing Build File
– How To: Simple database migrations with Phing and DbDeploy (*this was a bit outdated*)

most important though.

traveling elephpant

June 30th, 2010 5 comments

So a little while back ibuildings had this fun contest to build a path finding program that would solve a traveling salesman like problem. The constraints where pretty simple, you got a CSV file with latitude/longitude locations. You started at a certain location and you should end at a certain location. The application should then find the shortest route that touched all locations.

Now I’ll be the first to say that PHP is really not the language for that. A few years back I wrote a pathfinding tool for use with a game called EVE online in which I calculated certain trade routes based on data you could export from the game. After seeing PHP’s performance I switched to Python and more or less sliced the processing time in half if not more. Mainly because Python has specific array like types and PHP just has generic array’s, which with large data sets matters a lot. Perhaps also because my pathfinding-foo was still rather weak 🙂

However, back to now and the ibuildings challenge. The challenge would be rated on 3 criteria. Speed, lines of code and code complexity. Personally I could care less about the latter two and focused purely on speed. In the weeks that followed I had a great time comparing execution times with Remi and Andries. I think this was also key to keep diving into it and tweaking it until it was as fast as I could get it.

Sadly, my submission actually had a off-by-one bug in it which more or less instantly disqualified my entry. Yes, bit of a bummer, but such is life.

Now I had already decided to publish my code after the contest, however sadly I never really found the time to type this up. So a bit late but here is the code for my solution for the ibuildings elephpant challenge.

Below is the submitted code, and a link to download the code and original test data file so you can try it out for yourself.

download the code – Just unpack and run via php contest.php elephpant_landmarks.csv


Ok scratch the code, I’m having some trouble with getting it to play nice. Just download the tar.gz and view the code in your favorite editor.

running ubuntu on a vaio BZ series laptop

January 18th, 2010 6 comments

I recently purchased a sony vaio VGN-BZ31VT. To be short, everything works for as far as I know and care.


CPU: Intel® Core™2 Duo-processor P8700 @ 2,53 Ghz
mem: DDR2 SDRAM (2 x 2 GB)
graphics: Mobile Intel® Graphics Media Accelerator 4500MHD

Wifi: intel wifi link 5100
audio: intel HD audio
ethernet: Intel 82567 Gigabit
bluetooth: ?  2.0 + EDR

what works

Well all the basics seem to work, the special function keys on the keyboard, the mouse pad, the screen, wifi and ethernet port.
But most importantly, suspend and hibernate also work. All of this out of the box, just install and go.

incidentally, this CPU also support intel VT, it is off by default, but you can easily enable it in the bios. If like me you use virtual machines a lot, it is rather nice to have. Haven’t done any real tests to see if it is faster, but at least it’s there.

Not tested

I haven’t tested bluetooth, don’t need it.

What doesn’t work

The laptop also has a fingerprint scanner, which with some tinkering can be used. It’s not so much a problem of hardware support it seems, but more that there isn’t a mainstream way of integrating fingerprints scanners with the security system in linux. The solution I read needs you to install some fingerprint scanning software and load that as a module in PAM. Too much work for too little gain for my taste, but if you really want it, then you can get it to work (probably).


I wanted a no-nonsense development laptop with lots of memory and preferably virtualization support in the CPU, it should also work under linux with minimal fuss and suspend working was a must have.
Mission successful it seems.

Seeing as there is very little recent user experience info about this laptop out there at the moment, i figured i should write this little blog, if only to give people the peace of mind that you can safely buy this laptop for running linux.

Categories: hardware Tags: , ,

user settings cookie

January 17th, 2010 No comments

Sometimes in applications you will have certain user settings that you want to apply, even when the user is not logged in. Take for instance these examples:

  • “welcome back <name>” msg on return.
  • You have a portal type page where the user can control what content is shown where
  • You want to track where the user was when he last visited the site, perhaps to offer him the option to return to there.

I recently needed some functionality like that. So I’ve created a object that can help me with that.

I thought about it for a moment and created a singleton settings object for me to call upon to set and retrieve certain settings. Now I have to warn you that there is a small problem with singletons, if you use unit testing it can be difficult to control the behaviour of singletons over multiple tests. So be wary of this when you are running unit tests.

I also wrap all data in a separate array. This isn’t really necessary, but it makes handling the data a lot easier. If you wanted you could also add some sort of encryption to the cookie data so that users couldn’t easily tamper with it.

easy and simple transparency effect using GIF

October 3rd, 2009 6 comments

Transparency in HTML/CSS is largely an already solved problem, recent browsers all seem to handle PNG transparency pretty well and there are scripts that will make sure older browsers will handle them as well.

transparant gif exampleHowever, I wanted to make a post about a little technique I rarely see used which I think is quite genius in it’s simplicity.  Whenever you want to create a semi-transparent surface you create a gif file that contains a simple pattern of transparent and opaque pixels; as in the example on the right.  The white you see in the chequered image is of course transparent.

So let’s demonstrate how this effect actually looks.

transparency examples

Now as you can see the effect itself is very specific, and different backgrounds have different outcomes for the effect. Which might not fit every design. Another disadvantage is that it can only be used to show a 50% transparency effect. There might be pixel patterns that will give you a different distribution but I’ve never seen them.

The biggest advantage however is that you don’t need any fancy CSS or javascript or PNG, which in certain specific cases can be a big plus.  It’s more of a hack on your eyes/brain then on the browser 🙂

Categories: design Tags: , , , ,