svn:propset [PreshWiki]

SVN Properties and keywords

To set properties use the svn propset command.

For example,

svn propset Author 'David Precious'

SVN Keywords

There are some useful keywords which can be set, the value of which will be substituted for ‘tags’ in the file itself.

The keywords are updated when the file is committed to the repository.

Some of the useful ones are:

  • $Revision (or $LastChangedRevision or $Rev) which will expand to the revision number the file was last updated in, like: $Revision: 23 $
  • $Id which will expand to $Id: file.pl 23 2006-0901 13:05:51Z dave $
  • $LastChangedDate, or $Date - the date this revision was committed to the repo.
  • $LastChangedBy, or $Author - the name of the person who committed this version to the repo.

To set the keywords which should be replaced, use:

svn propset svn:keywords 'Id Revision' file.pl

Unless you use the svn propset command above, the anchors (e.g. ”$Id) will *not* be replaced with their value… SVN will not mess with your file unless you’ve explicitly told it to.

Very useful to print revisions and dates on the log.

21.07.11

Setup Guides: Installing Elegant Gnome on Ubuntu 10.10

Installing Elegant Gnome on Ubuntu 10.10

One of my favourite things about Ubuntu is the ability to customize the UI and installing Elegant Gnome is one of the first things I do after a fresh installation.  Elegant Gnome is a theme pack that gives your desktop a very slick look(compared to the default theme) including matching Firefox and Chrome themes and an awesome icon set.

My netbook with Elegant Gnome showing the nautilus theme, also using Docky

12.07.11

High Scalability - High Scalability - 35+ Use Cases for Choosing Your Next NoSQL Database

We’ve asked What The Heck Are You Actually Using NoSQL For?. We’ve asked 101 Questions To Ask When Considering A NoSQL Database. We’ve even had a webinar What Should I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications.

Now we get to the point of considering use cases and which systems might be appropriate for those use cases.

What are your options?

First, let’s cover what are the various data models. These have been adapted from Emil Eifrem and NoSQL databases

Document Databases

  • Lineage: Inspired by Lotus Notes.
  • Data model: Collections of documents, which contain key-value collections.
  • Example: CouchDB, MongoDB 
  • Good at: Natural data modeling. Programmer friendly. Rapid development. Web friendly, CRUD.

Graph Databases
  • Lineage: Euler and graph theory.
  • Data model: Nodes & relationships, both which can hold key-value pairs
  • Example: AllegroGraph, InfoGrid, Neo4j
  • Good at:  Rock complicated graph problems. Fast.

Relational Databases
  • Lineage: E. F. Codd in A Relational Model of Data for Large Shared Data Banks
  • Data Model: a set of relations
  • Example: VoltDB,  Clustrix, MySQL
  • Good at: High performing, scalable OLTP. SQL access. Materialized views. Transactions matter. Programmer friendly transactions.

Object Oriented Databases

  • Lineage: Graph Database Research
  • Data Model: Objects
  • Example: Objectivity, Gemstone

Key-Value Stores

  • Lineage: Amazon’s Dynamo paper and Distributed HashTables.
  • Data model: A global collection of KV pairs.
  • Example: Membase, Riak
  • Good at: Handles size well. Processing a constant stream of small reads and writes. Fast. Programmer friendly.

BigTable Clones 
  • Lineage: Google’s BigTable paper.
  • Data model: Column family, i.e. a tabular model where each row at least in theory can have an individual configuration of columns.
  • Example: HBase, Hypertable, Cassandra
  • Goog at: Handles size well. Stream massive write loads. High availability. Multiple-data centers. MapReduce.

Data Structure Servers
  • Lineage: ?
  • Example: Redis
  • Data model: Operations over dictionaries, lists, sets and string values.
  • Good at: Quirky stuff you never thought of using a database for before.

Grid Databases
  • Lineage: Data Grid and Tuple Space research.
  • Data Model: Space Based Architecture
  • Example: GigaSpaces, Coherence
  • Good at: High performance and scalable transaction processing.

What should your application use?

  • Key point is to rethink how your application could work differently in terms of the different data models and the different products. Right data model for the right problem. Right product for the right problem.
  • To see what models might help your application take a look at What The Heck Are You Actually Using NoSQL For? In this article I tried to pull together a lot of unconventional use cases of the different qualities and features developers have used in building systems. 
  • Match what you need to do with these use cases. From there you can backtrack to the products you may want to include in your architecture. NoSQL, SQL, it doesn’t matter.
  • Look at Data Model Product Features Your Situation. Products have such different feature sets it’s almost impossible to recommend by pure data model alone.
  • Which option is best is determined by your priorities.

 If your application needs…

  • complex transactions because you can’t afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.
    • Example: an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!
  • to scale then NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.
  • to always be able to write to a database because you need high availability then look at Bigtable Clones which feature eventual consistency.
  • to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also consider SSD.
  • to implement social network operations then you first may want a Graph database or second, a database like Riak that supports relationships. An in- memory relational database with simple SQL joins might suffice for small data sets. Redis’ set and list operations could work too.

If your application needs…

  • to operate over a wide variety of access patterns and data types then look at a Document database, they generally are flexible and perform well.
  • powerful offline reporting with large datasets then look at Hadoop first and second, products that support MapReduce. Supporting MapReduce isn’t the same as being good at it.
  • to span multiple data-centers then look at Bigtable Clones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.
  • to build CRUD apps then look at a Document database, they make it easy to access complex data without joins. 
  • built-in search then look at Riak.
  • to operate on data structures like lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.
  • programmer friendliness in the form of programmer friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.

If your application needs…

  • transactions combined with materialized views for real-time data feeds then look at VoltDB. Great for data-rollups and time windowing.
  • enterprise level support and SLAs then look for a product that makes a point of catering to that market. Membase is an example.
  • to log continuous streams of data that may have no consistency guarantees necessary at all then look at Bigtable Clones because they generally work on distributed file systems that can handle a lot of writes.
  • to be as simple as possible to operate then look for a hosted or PaaS solution because they will do all the work for you.
  • to be sold to enterprise customers then consider a Relational Database because they are used to relational technology.
  • to dynamically build relationships between objects that have dynamic properties then consider a Graph Database because often they will not require a schema and models can be built incrementally through programming.
  • to support large media then look storage services like S3. NoSQL systems tend not to handle large BLOBS, though MongoDB has a file service.

If your application needs…

  • to bulk upload lots of data quickly and efficiently then look for a product supports that scenario. Most will not because they don’t support bulk operations.
  • an easier upgrade path then use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.
  • to implement integrity constraints then pick a database that support SQL DDL, implement them in stored procedures, or implement them in application code.
  • a very deep join depth the use a Graph Database because they support blisteringly fast navigation between entities.
  • to move behavior close to the data so the data doesn’t have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.

If your application needs…

  • to cache or store BLOB data then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.
  • a proven track record like not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use on of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).
  • fluid data types because your data isn’t tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and Bigtable Clone databases. Each has a lot of flexibility in their data types.
  • other business units to run quick relational queries so you don’t have to reimplement everything then use a database that supports SQL.
  • to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.   

If your application needs…

  • support for secondary indexes so you can look up data by different keys then look at relational databases and Cassandra’s new secondary index support.
  • creates an ever-growing set of data (really BigData) that rarely gets accessed then look at Bigtable Clone which will spread the data over a distributed file system.
  • to integrate with other services then check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.
  • fault tolerance check how durable writes are in the face power failures, partitions, and other failure scenarios.
  • to push the technological envelope in a direction nobody seems to be going then build it yourself because that’s what it takes to be great sometimes.
  • to work on a mobile platform then look at CouchDB/Mobile couchbase.

Which is Better?

  • Moving for a 25% improvement is probably not a reason to go NoSQL.
  • Benchmark relevancy depends on the use case. Does it match your situation(s)?
  • Are you a startup that needs to release a product as soon as possible and you are playing around with ideas? Both SQL and NoSQL can make an argument.
  • Performance may be equal on one box, but what happens when you need N?
  • Everything has problems, if you look at Amazon forums it’s EBS is slow, or my instances won’t reply, etc. For GAE it’s the datastore is slow or X. Every product which people are using will have problems. Are you OK with the problems of the system you’ve selected?

    20.06.11

    Microjs: Fantastic Micro-Frameworks and Micro-Libraries for Fun and Profit!

    Fantastic Micro-Frameworks and
    Micro-Libraries for Fun and Profit!

    How much library code do you really need — 50K? 100K? 150K? More? How much of that do you really use?

    Sure, we all love our favorite monolithic frameworks, and sometimes we even use them fully. But how often do we reach for the ride-on John Deere tractor with air conditioning and six-speaker sound system, when a judiciously applied pocketknife would do the trick better, faster, slicker?

    Micro-frameworks are definitely the pocketknives of the JavaScript library world: short, sweet, to the point. And at 5k and under, micro-frameworks are very very portable. A micro-framework does one thing and one thing only — and does it well. No cruft, no featuritis, no feature creep, no excess anywhere.

    Microjs.com helps you discover the most compact-but-powerful microframeworks, and makes it easy for you to pick one that’ll work for you.

    Want to add your own? Fork this site on GitHub, add your framework to data.js and submit a pull request.

    Can’t get enough?

    140byt.es provides tweet-sized JavaScript goodness!

    12.06.11

    Music, please!

    This is what I always remember when I do think about music - Bingo, the clown!. Although I do remember that, I’ll just post a music made some time ago when I was “musically active and productive”.

    Unfortunately the project called “Int21” do not exist anymore… sad…

    The song name is Ciphian, just because it was composed to a project that used Cipher Game Engine. This project consisted in a electronic visit card for the company that I was working at the time: Eleven Cells.

    The music used in the visit card suffered lots of modifications until it became the version that I’m posting right now.

    int21_-_ciphian_v1.ogg Listen on Posterous
    09.06.11

    IPv4 variable reference

    Chapter 3. IPv4 variable reference

    This chapter will go through each and one of the IPv4 variables possible to set via sysctl or the proc filesystem. You will be provided with a basic explanation on what behaviour the variable will change and how, as well as default behaviour, if possible, and what values the variable may be set to. We will not go into any deeper discussion about why each variable should be changed unless there are any very normal reasons to change the values. The structure used within this reference chapter will follow the same structure as the structure used within ipsysctl structure, as well as the default ipv4 directory being further structured due to its large size and mix of many different variables.

    3.1. IP Variables

    This list contains all of the variables available in a standard 2.4.x kernel that pertains to the IP settings. As you will see, there is a huge set of them, and some should be properly set from the beginning for you, and others may not be so properly set. Most of them should look quite proper, however, some do require some extra configuration depending on your needs, but most should be decently set for you as is.

    3.1.2. ip_default_ttl

    The ip_default_ttl variable tells the kernel what Time To Live to set as default on packets that leaves this host. This tells how long the packets may live on the internet before they are dropped. Each time the packet passes a router, firewall, computer, etcetera, the TTL is decremented with one step.

    The default value for ip_default_ttl is 64, which is a fairly good TTL which will not cause too much trouble. It is very unlikely to time out in transit to the host in question. This variable takes an unsigned integer, but the actual TTL field is only 8 bit long. The value may in other words be as high as 255 and as low as 0, however 255 could be considered rude and 0 wouldn’t leave your computer at all. 64 is a good value, unless you are trying to connect to computers extremely far away counted in hops or jumps. These would then time out. As it looks today, I have pretty much never seen a host that lives more than 30 hops away on the internet, so I don’t think there is any need to make this value higher than the default value for now.

    Setting the TTL to 255 would be considered rude since this would make a packet live an extremely long time on the internet. If there would be a glitch in 2 routers, this packet could bounce back and forth for a huge amount of time, eating away on the bandwidth without any reason at all. Normally, don’t set this value higher than 100 or something alike.

    3.1.3. ip_dynaddr

    The ip_dynaddr variable is used to allow a few problems with dynamic addressing to be fixed. This allows diald oneshot connections to get established by dynamically changing packet source address, and sockets if local processes. This option was implemented for TCP diald-box connections and Masquerading connections. Masquerading will in other words work 100% with this option, letting Masquerading switch source adress of packets if the boxes own address change.

    This option takes an integer, but only makes use of 3 possible states, 0, 1 or 2.

    • 0 means that this option is turned off, which is also the default behaviour.

    • 1 means that the option is enabled and running.

    • Any non 0 or 1 values means that we have turned on verbose mode, which in turn will add extra debugging messages that you may use to get things to work properly.

    If this variable is turned on and forwarding interface changes, this is what may happen

    • Socket and packet source address is rewritten on retransmissions while in SYN_SENT state. This is the diald-box processes.

    • Outbound masqueraded source address changes on output, when internal host does retransmission, until a packet from the outside is received by the tunnel.

    This is especially helpful for auto-dialup links (diald), where the actual outgoing address is unknown at the moment the link is going up. This enables the same, local and masqueraded, connection requests that brought the link up to actually establish their connections. This means that we will not have to first issue an connection request just to bring the connection up, and then have to issue the “real” connection request when we have actually established the connection.

    3.1.4. ip_forward

    The ip_forward variable is used to turn IP forwarding on or off. This means that we can turn off the functions for forwarding packets between interfaces, which lets the computer act as a firewall, or router. Note that this is an extremely important variable for Network Address Translation, firewalling, routing, masquerading, and all other things where we actually let packets through the box to another network, as you can understand.

    This is an boolean variable. In other words, it will take a 1 or a 0. The default value for this variable is 0, or disabled. As you can understand, 0 means disabled and 1 means enabled.

    Note that this is an very special variable since it will reset all configuration parameters to their default states if it is changed. For a complete list of the exact states, look closer at RFC1122 for hosts and RFC1812 for routers.

    3.1.5. ip_local_port_range

    The ip_local_port_range variable consists of two integers which tells the kernel which ports to use for client connections. This means, all connections going from our box to some other box and where we are the client. The first port is the lower bound and the second one is the upper bound.

    The default value in this variable depends on how much memory you have. If you have more than 128 megabytes of physical memory, the lower bound will be 32768 and the upper bound will be 61000. If the computer has less than 128 megabytes of physical memory, the lower bound will be 1024 and the upper bound will be 4999, or even less.

    This number defines the possible active connections which this system can issue simultaneously (ie, at the same time) to other systems that does not support the TCP extension timestamps.

    If you have tcp_tw_recycle enabled (the default behaviour) range 1024-4999 is enough to issue up to 2000 connections per second to systems supporting timestamps. In other words, this should be more than enough for most of us.

    3.1.6. ip_no_pmtu_disc

    The ip_no_pmtu_disc disables PMTU (Path Maximum Transfer Unit) discovery if enabled. In most cases this is good, so it is per default set to FALSE (ie, Path Maximum Transfer Unit is used). However, in some cases this is bad and may lead to broken connectivity. If you are experiencing problems like this, you should turn this option off and set your MTU to a reasonable value yourself.

    Do note that MTU and PMTU are two different things. MTU tells the kernel the maximum transfer unit for our connection, but not over the whole connection to the other end. PMTU discovery tries to discover the maximum transfer unit to specific hosts, including all the intermediate hops on the way there.

    The default value is that the ip_no_pmtu_disc is FALSE, as already stated. If this is set to TRUE, PMTU discovery is turned off. The ip_no_pmtu_disc takes a boolean value, in other words either an 1 or a 0, where 1 is on and 0 is off.

    3.1.7. ip_nonlocal_bind

    The ip_nonlocal_bind variable allows us to set if local processes should be able to bind to non-local IP addresses. This could be quite useful, in such cases where we want specific programs or applications to be able to listen to non-local IP adresses, such as sniffing for traffic to a specific host which may commit bad things, etcetera. The variable may, however, break some applications and they will no longer work.

    The ip_nonlocal_bind variable takes a boolean value which can be set to 1 or 0. If the variable is set to 0, this option is turned off and if it is set to 1 it is turned on. The default value is to turn this option off, or 0 in other words.

    3.1.8. ipfrag_high_thresh

    The ipfrag_high_thresh tells the kernel the maximum amount of memory to use to reassemble IP fragments. When and if the high threshold is reached, the fragment handler will toss all packets until the memory usage reaches ipfrag_low_thresh instead. This means that all fragments that reached us during this time will have to be retransmitted.

    Packets are fragmented if they are too large to pass through a certain pipe. If they are to large, the box that is trying to transmit them breaks them down into smaller pieces and send each piece one by one. When these fragments reaches their destination, they need to be defragmented (ie, put together again) to be read properly. Note that IP Fragmentation are in general a good thing, but there are a lot of people that do bad things with them since fragments are inherently a security problem.

    The ipfrag_high_thresh variable takes an integer value, which would mean 0 through 2147483647 bytes can be assigned to be the upper limit of this function. The default value is 262144 bytes, or 256 kilobytes, which should work well in even the most extreme cases.

    3.1.9. ipfrag_low_thresh

    This option has a lot to do with the ipfrag_high_thresh option. The ipfrag_low_thresh is the lower limit at which packets should start being assembled again. What this means, all in all, is that our fragmentation handler has an queue that grows larger the more packets are waiting in the queue to be defragmentized, when this queue grows to ipfrag_high_thresh byte size, the fragmentation handler queue will stop queueing any further fragments until we reach the ipfrag_low_thresh again. This stops our system from being overloaded with fragmentized packets and may stop certain Denial of Service attacks.

    This variable takes an integer value between 0 and 2147483647, and refers to the amount of bytes used at which the fragmentation handler should resume the receiving of IP fragments again. Per default it is set to 196608 bytes, or 192 kilobytes which should be a reasonable amount of memory set aside for this task even in the hardest of attacks. This value should be lower than ipfrag_high_thresh, or else it will be invalid.

    3.1.10. ipfrag_time

    The ipfrag_time variable tells the IP fragmentation handler how long to keep an IP fragment in memory, counted in seconds. This only refers to fragments that has been impossible to reassemble since fragments that has been assembled most probably has already been sent on to either the next layer, or to the next host.

    The ipfrag_time variable takes an integer as its input and the value is counted as seconds. In other words, if you input 5 to this variable, it counts as 5 seconds.

    30.05.11

    VirtualBox Hell

    I’ve been using VirtualBox over the last few weeks and it is really interesting. Just works for testing and some development, but when it was installed on Ubuntu a little ugly message appeared:

    Failed to access the usb sybsystem

    Googling around I found a very wise post that make everything work perfectly with just one line of shell:

    if [ "`grep vboxusers /etc/group|grep $USER`" == "" ] ; then sudo usermod -G vboxusers -a $USER ; fi

    After typing (or pasting) this line, just press enter and type in the sudoer password. It just works =).

    06.05.11

    VirtualBox Hell

    I’ve been using VirtualBox over the last few weeks and it is really interesting. Just works for testing and some development, but when it was installed on Ubuntu a little ugly message appeared:

    Failed to access the usb sybsystem

    Googling around I found a very wise post that make everything work perfectly with just one line of shell:

    if [ "`grep vboxusers /etc/group|grep $USER`" == "" ] ; then sudo usermod -G vboxusers -a $USER ; fi

    After typing (or pasting) this line, just press enter and type in the sudoer password. It just works =).

    06.05.11

    Ubuntu, Wireless and Flash 10 on 64bit Dell Notebooks

    Okie dokie. I “have” a Dell Vostro 1000 that I use for the work and I decided to install Xubuntu with WUBI on it. The main instalation was very easy and the Xubuntu is very lightweight running very well in a Sempron with 1GB of RAM.

    But as expected, nothing is perfect. First problem: the WiFi card didn’t work. It was a pain in the ass to make this shit work. I needed to follow a little HowTo but using other driver. It was very difficult to find this HowTo and to find the adequate driver for this damn Notebook.

    The second problem was to put the Flash plugin to work, just because the processor have a kind of 64bit emulation. I needed some research on that and I finally ended up with this script, that I just made some modifications to get the right packages.

    I’m just posting this because it can help a noob like me.

    23.01.11

    Tyrannosaurus Rex

    Another Paper Craft assembled. This one will be used as decoration at my daughter birthday. It’s the T-Rex from Cannon Creative Park printed in an A3 Couché 300 mg/m² paper.

    23.01.11