7
Aug

Computational Finance – Portfolio Optimization

I’ve been studying a Computational Finance module for the past few months now, and it’s absolutely fascinating. This blog post focuses on Markowitz Portfolio Optimization (AKA Modern portfolio theory), and will be part of a number of Computational Finance posts I’ll be posting over the coming months. The following post includes MATLAB examples, but these could be adapted to other languages such as Python.

So first of all, some basic definitions.

Definitions

Stock – In this context, a “stock” is a holding in a company that might be traded through the stock market

Portfolio – A collection of financial assets. In this context, I’ll be talking in terms of holdings of financial stocks. So for example, £30 in company A, £70 in company B. For the purposes of this blog post, it is better to think in terms of percentages. In the previous example, that would therefore be 30% in company A and 70% in company B if I had a total portfolio of £100.

Markowitz Portfolio Optimization

Markowitz Portfolio Optimization (modern portfolio theory, MPT for short), is a theory used in the real world to decide on how best to invest your money in a given set of stocks (Markowitz, 1952). It works with two underlying variables: risk and return. It attempts to maximize returns of a portfolio for a given allowable risk, or minimize risk of a portfolio for a selected return. For example, you may say you want a return of 10%, and you want to invest in a selection of, say, 20 stocks. Using some relatively straightforward calculations that we’ll cover, we can find the optimum portfolio (I.E. the proportions for which we invest our money in each stock) in which the risk of losing money is minimized as low as it can be while still reaching our target return.

Sounds useful! We tell it the stocks we’re interested in, and it tells us how much to invest in each one according to our desires risk or our desired return. But how do we define risk and return? In order to do so, we need to define some mathematical variables. Throughout, we’ll assume we’re interested in only three stocks, A B and C.

Variables

w – A vector representing our portfolio weights. I use the term weights here because each element of the vector represents the weighting of investment (out of our total starting money) in a particular stock. For example, if we have stocks A B and C, we might have a w = [0.10 0.50 0.40], which represents an investment of 10% of our money in stock A, 50% in stock B and 40% in stock C.

– A matrix containing the historical return data of each stock, where each column represents an individual stock, and each row of the column represents the return of that stock at a point in time. Rit is therefore the return on stock i at time t. Conventions differ for how the return is defined, but I will use the proportional change from open price to close price of a particular day for a particular stock. So for example, if Rit = 0.02, then stock i rose by 0.02% on day t.

m – A vector representing the average returns of the individual stocks on historical data over a period of time. This is calculated over a time period, and it is calculated by averaging the proportional change over a time period. For example, we could take the value of stock i at time t = 1, and its value at time t = T, and the percentage change between the two time points would be represented by mi, the mean return of stock i. The m vector is laid out in a similar fashion as w: m = [mA mB mC]. Mathematically, the mean return for stock i over a period from t=1 to t=T is: $\displaystyle m_i = \frac{1}{T}\sum\limits_{t=1}^T \mathbf{R}_{it}$

We can perform this calculation in MATLAB like so:

% Ntraining = number of samples to use in training w
% R = historic stock data samples
mean( R(1:Ntraining, :) );

C – A matrix representing the covariances (AKA covariance matrix) between all of the stocks, calculated from historical data over a period of time. If you’re unfamiliar with covariance matrices or you need a refresher, here is a good online resource. While the formulation of such a matrix is not important to remember since it can be easily calculated using MATLAB or other tools, it is important to understand the significance of the covariance matrix in this context. Put simply, the covariance is a measure of the correlation1 between two variables. If two variables go up over time, they have a positive covariance. If one variable goes up and the other goes down, they have a negative covariance.  In this context, the covariance matrix represents the correlations between stocks A B and C. One could say that element Cij is a measure of how stock j correlates with stock iC is symmetrical, since the correlation of stock j with stock i is the same as the correlation of stock i with stock j, I.E.  Cij = Cji. The diagonal of the matrix, I.E. when i = j, is the variance of each stock. The variance of a stock can be thought of as how “risky” that stock is. If it fluctuates a lot, we might consider it risky to invest in it. We’ll come back to this assumption later. If this all seems a bit confusing, that’s okay. Just remember that C is a variable that stores the correlation between each stock (A and B, A and C, B and C) and the variances of each stock (A B and C).

We can perform the calculation of C in MATLAB like so:

% Ntraining = number of samples to use in training w
% R = historic stock data samples
cov( R(1:Ntraining, :) );

Defining Risk and Return

Now that we have some underlying definitions and variables to use, we can define what exactly risk and return are.

Firstly, it is helpful to point out a relationship between the returns of a stock, R, and the returns of a portfolio. The returns of a portfolio is defined as ρ, and it is mathematically defined as: $\displaystyle \boldsymbol{\rho} = \mathbf{R}\mathbf{w}^\intercal$ (1)

The resulting ρ is a vector in which each row represents the returns of that particular day, given the weighting of investment of each stock. For the purposes of MPT, all we need to know is that ρ is a linear transform of R.

Secondly, we need to make an assumption about how we model the returns R when estimating future expected values of returns. It’s all well and good having historic data, but how do we extrapolate to estimate what R might do in the future? The returns R are modelled as multivariate Gaussian distributions. We write the Gaussian distribution of R as : $\displaystyle \mathbf{R} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$

Where: $\displaystyle \mu$ = mean (m) $\displaystyle \Sigma$ = variance (C)

We also use the following relationship: $\displaystyle \mathbf{AR} \sim \mathcal{N}(\mathbf{A}\boldsymbol{\mu}^\intercal, \mathbf{A}^\intercal\boldsymbol{\Sigma}\mathbf{A})$

Using this relationship, and the relationship in (1), we have: $\displaystyle \boldsymbol{\rho} \sim \mathcal{N}(\mathbf{w}\boldsymbol{\mu}^\intercal, \mathbf{w}^\intercal\boldsymbol{\Sigma}\mathbf{w})$

Substituting mean and variance: $\displaystyle \boldsymbol{\rho} \sim \mathcal{N}(\mathbf{w}\mathbf{m}^\intercal, \mathbf{w}^\intercal\mathbf{C}\mathbf{w})$

Still with me? If so, the rest will be a breeze. If you didn’t quite understand the derivation above, don’t worry too much. Just make sure that you understand the two meaningful terms we obtained, return and risk, from the above equation: $\displaystyle \mathbf{w}\mathbf{m}^\intercal$ represents the mean return of the portfolio. This is what we refer to when we use the term “return”. We’re multiplying the mean return of each stock, mi,by its corresponding weighting, wi,to obtain the mean return that would have been achieved had we invested this portfolio over the time period m was calculated on. This makes intuitive sense, since if the mean return of stock i is 2%, say, and our portfolio weight wi is 0.5, for example, then the return we actually obtain on such a portfolio would be 1%. If the mean return was instead -5%, but we only had a weight of 0.01 on that stock, then we’d only lose 0.05% (or in other words, we’d gain -0.05% if that makes more sense). Not the end of the world! $\displaystyle \mathbf{w}^\intercal\mathbf{C}\mathbf{w}$ represents the variance on the mean return of the portfolio. This is what we refer to when we use the term “risk”. Put simply, it’s a measure of how much stocks in the portfolio fluctuate over time. Stocks that fluctuate a lot have a higher variance and therefore are a higher risk. This is the fundamental assumption of MPT. This assumption has been criticised, but let’s assume that it’s true for now. It’s nice to work with, since it’s intuitive that a stock that jumps around a lot is more risky than one that steadily moves in a particular direction. There isn’t a nice scale to work with here, like there is with the returns that are based on percentages. However, we can just compare risks relatively to decide which stock is less risky than other.

Representing the Problem Mathematically

So, now we finally have a way to calculate return and risk of a portfolio. We base these calculations on historic data (represented in m and C). But, historic data is not immediately useful. What good is it that we can plug in different numbers into our portfolio w and have it tell us what we would have got had we invested this portfolio? Rather than saying “Here’s a portfolio, go and figure out what I get in terms of return and risk“, we really want to be able to say “Give me a portfolio that will give me x return, but has the least possible risk“. That sounds like it might be complicated, but this sort of problem is actually just a straightforward optimization problem! We can write this mathematically, using our little formulas for risk and return above: \begin{aligned} \displaystyle & \underset{\mathbf{w}}{\text{minimize}} & & \mathbf{w}^\intercal\mathbf{C}\mathbf{w} \\ & \text{subject to} & & \mathbf{w}\mathbf{m}^\intercal = r_0 \end{aligned}

For those unfamiliar with optimization problems, this simply says “Find the portfolio weights, w, with the minimum possible risk that gives me return r0“. The good thing about this minimization equation is that we’ve now got a succinct mathematical way of representing exactly what we said in words before. There are many mathematical tools out there that can compute such problems, and below is an example using the CVX toolbox for MATLAB which performs the minimization problem. CVX is nice because it has an intuitive interface which is quite similar to the mathematical representation.

cvx_begin
variable w(N) % N = number of assets
minimize(w'*C*w)
subject to
w'*ones(N, 1) == 1,
w'*m == r0,
w &gt; 0;
cvx_end

The first constraint in the above code ensures that the elements in w add up to 1 (100%). Of course, we cannot have 110% of our money invested in a portfolio, and we also would not want only 90% of it invested, for example. The second constraint is the constraint we’re already familiar with, the part that states that we must obtain the return we’re seeking, r0. The final constraint ensures that all of the elements of w are positive. This constraint imposes a limit of no “short-selling”.

So, what is the result? Running the above code gives us some weights in w, representing the portfolio which would give you the lowest possible risk while still giving you your desired return r0. That’s pretty neat! Of course, C and m are calculated on historic stock data, and since we’re actually interested in applying the resulting portfolio to futuristic data (I.E. we want to invest our portfolio now and see its performance after some time has passed), we’re working under the assumption that the stock’s risks and mean returns (C and m) are representative of how the stocks will behave in future, based on their past. So the resulting weights are obviously only really “estimations” of the optimum portfolio when applied to future investments. We call the historic stock data “training data”, because we’re using it to “train” w. We can actually also use some more historic data that hasn’t been used in the calculation of C and m in order to see how the portfolio performs in “real-world” scenarios. This data we use to test the performance is called “test data”, because we’re using it to verify the returns of on data that it hasn’t been trained on. It is still historic data, but it allows us to see how our portfolio would have performed on some “unseen” data. In other words, it allows us to see how the portfolio would have performed if we had actually invested in the proportions given by w. Of course, we could indeed invest in these proportions, and see how the portfolio performs, but it is clearly more efficient to simply emulate this by pretending some historic data is what we’re investing on.

There’s another way we could approach optimizing our portfolio which I mentioned at the start. Rather than seeking the portfolio that gives us the lowest risk for a particular desired return, we can instead seek the portfolio which gives us the highest return for a maximum risk limit. Mathematically, we can write this as such: \begin{aligned} \displaystyle & \underset{\mathbf{w}}{\text{maximize}} & & \mathbf{w}\mathbf{m}^\intercal \\ & \text{subject to} & & \mathbf{w}^\intercal\mathbf{C}\mathbf{w} = \sigma_0 \end{aligned}

Similarly to before, this formula simply states “Find the portfolio weights, w, that give me the maximum possible return that has a risk of σ0“. Again, numerous mathematical tools exist to solve this kind of optimization problem. The code for CVX is as follows:

cvx_begin
variable w(N) % N = number of assets
maximize(w'*m)
subject to
w'*ones(N, 1) == 1,
w'*C*w == s0,
w &gt; 0;
cvx_end

This code is very similar to the previous approach, only the risk and return terms have been swapped, and the risk constraint is s0 (σ0). Think about what might happen if we removed the risk constraint. If you want to know the answer, hover over this footnote2.

Efficient Frontier

Hopefully you’re still with me by this point. So far, we’ve calculated two different portfolios: one in which we want to minimize risk for a given return, and one where we want to maximize return for a given risk. This is all well and good, but clearly there are some limits to what we can choose for our risk or return constraints. For example, we can’t choose to have a 200% return and expect a portfolio to be able to achieve that, no matter what risk we allow. Likewise, how do we choose a sensible limit for risk, especially since it’s dimensionless? This is where something called the Efficient Frontier comes in. The Efficient Frontier is a characteristic we can calculate and plot which demonstrates, for a particular selection of stocks, what the best possible trade off between risk and return is. An example of an Efficient Frontier is shown in Figure 1. Figure 1 – Efficient Frontier example

This demonstrates the maximum realizable returns for a range of risks. We can see that, as we increase our risk, we can obtain better returns – which intuitively makes sense. The Efficient Frontier, represented by the blue line, represents the most efficient trade-off between risk and return possible for a particular selection of stocks. All of the points on the line represent the maximum possible realizable returns for their corresponding risks. Of course, we cannot select a point above the line (I.E. we cannot seek a return greater than the point on the Efficient Frontier for a given risk). We could indeed select a point below the line (I.E. accept a lower return for the same risk), but this is inefficient. So, in order to find some sensible values to plug into our calculations in the MATLAB code above, we can plot the Efficient Frontier, choose a good trade-off between risk and return by selecting a point on the line, and plug the value of risk in to our code above by setting s0 to it.

But how do we calculate the Efficient Frontier? The steps are outlined below.

1. Find the portfolio with the maximum possible return, unconstrained by risk
• Calculate the risk of this portfolio, this will be the maximum possible risk
2. Find the portfolio with the minimum possible risk, unconstrained by return
• The risk of this portfolio will be the minimum possible risk
3. Make a list of risks by stepping from the least possible risk calculated in step 2, up to the risk calculated in 1 and incrementing in small steps, forming our x axis
4. Using each risk in the list as a constraint, calculate the portfolio with the maximum possible return for each risk

The MATLAB code to perform this is shown below. The code is adapted from (Brandimarte, 2006), but I have replaced equivalent parts with CVX code since that’s what we’re now familiar with.

function [PRisk, PRoR, PWts] = NaiveMVCVX(ERet, ECov, NPts)
ERet = ERet(:); % makes sure it is a column vector
NAssets = length(ERet); % get number of assets
% vector of lower bounds on weights
V0 = zeros(NAssets, 1);
% row vector of ones
V1 = ones(1, NAssets);
% Find the maximum expected return
cvx_begin
variable w(NAssets)
maximize(w'*ERet)
subject to
w'*ones(NAssets, 1) == 1;
w &gt; 0;
cvx_end
MaxReturnWeights = w
MaxReturn = MaxReturnWeights' * ERet;
% Find the minimum variance return
cvx_begin
variable w(NAssets)
minimize(w'*ECov*w)
subject to
w'*ones(NAssets, 1) == 1,
w &gt; 0;
cvx_end
MinVarWeights = w
MinVarReturn = MinVarWeights' * ERet;
MinVarStd = sqrt(MinVarWeights' * ECov * MinVarWeights);
% check if there is only one efficient portfolio
if MaxReturn &gt; MinVarReturn
RTarget = linspace(MinVarReturn, MaxReturn, NPts);
NumFrontPoints = NPts;
else
RTarget = MaxReturn;
NumFrontPoints = 1;
end
% Store first portfolio
PRoR = zeros(NumFrontPoints, 1);
PRisk = zeros(NumFrontPoints, 1);
PWts = zeros(NumFrontPoints, NAssets);
PRoR(1) = MinVarReturn;
PRisk(1) = MinVarStd;
PWts(1,:) = MinVarWeights(:)';
% trace frontier by changing target return
VConstr = ERet';
A = [V1 ; VConstr ];
B = [1 ; 0];
for point = 2:NumFrontPoints
B(2) = RTarget(point);
cvx_begin quiet
variable w(NAssets)
minimize(w'*ECov*w)
subject to
w'*ones(NAssets, 1) == 1,
w'*ERet == RTarget(point), %this time we're targeting RTarget
w &gt;= 0;
cvx_end
Weights = w;
PRoR(point) = dot(Weights, ERet);
PRisk(point) = sqrt(Weights'*ECov*Weights);
PWts(point, :) = Weights(:)';
end
end

Alternatively, if you have access to the financial toolbox, you can use the MATLAB function frontcon to calculate the efficient frontier for you. The two approaches give identical results, but frontcon is faster.

You could, as mentioned previously, simply pick a desirable risk from the Efficient Frontier and use the weights associated with it. Alternatively, you could use the Sharpe ratio to determine the optimum trade off between risk and return by calculating the Sharpe ratio for each portfolio that exists on the efficient frontier. The Sharpe ratio (Sharpe, 1994) is defined as: $\displaystyle \frac{r_p - r_f}{\sigma_p}$

rpportfolio return.

rf – risk-free return. This is the return we’d get if we instead just dumped all the money in the portfolio into a bank where the interest rate is constant and you can never really lose. Set this to 0 if you’re not interested in investing in such a manner. Otherwise, you can include the return you’d get in the original matrix R.

σpportfolio risk.

The optimum portfolio out of all the portfolios calculated on the Efficient Frontier is the one which gives the largest Sharpe ratio. Simply iterate over the elements in the efficient frontier, calculate the return for each, and pick accordingly, like so:

%calculate the efficient frontier
[risk, returns, w] = frontcon(m, C, 100);

%calculate Sharpe ratio on the training data, and extract the best
%performing portfolio
for i=1:length(returns),
sharpeTrainingRatios(i) = returns(i)/risk(i);
end

%extract the most efficient portfolio
[maxSharpeTrainingRatio, maxSharpeTrainingRatioIndex] = max(sharpeTrainingRatios);
efficientPortfolio = w(maxSharpeTrainingRatioIndex, :);

Recap

We saw the foundations and assumptions that MPT is based upon, the goals for optimizing a portfolio, and the formulas for calculating risk and return. We saw how to use these formulas to derive optimization problems that we can express programmatically to obtain an optimized portfolio. Finally, we saw how to find the best trade-off for a set of portfolios using the Efficient Frontier and the Sharpe ratio. Now it’s up to you to go and implement these tools, and see for yourself if you can make a good investment!

References

Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7(1), p.77.

Brandimarte, P. (2006). Numerical methods in finance and economics. Hoboken, N.J.: Wiley Interscience.

Sharpe, W. (1994). The Sharpe Ratio. The Journal of Portfolio Management, 21(1), pp.49-58.

1+
15
Jan

Drive My Internet-Controlled Car! Radio controlled toy cars don’t normally have a Raspberry Pi, 10 batteries and a stripboard loosely hanging off them, but this isn’t your standard RC car.

The primary goal of ICC was to enable a toy car (which has now become a robot, as seen in the image above) to be controlled in real-time over the internet. I did just that, and you can try it out right now!

Come and drive my Internet-Controlled Car: http://projects.bitnode.co.uk/ICC/

Read on for a technical write-up of how I completed the project.

2+
23
Nov

Astrovoids – Singleplayer/Multiplayer Asteroids Clone This is a project I worked on around June 2014. It’s written in JavaScript, and uses the IvanK graphics library. The nice artwork isn’t my own! I used the Space Shooter Redux sprites, available here.

Mainly just a project I used to practice JavaScript, specifically OO JavaScript using the base.js library. Coming from nice OO languages like C#, I felt a bit hindered by the prototype based inheritance JavaScript offers. base.js sets out to help you there, well worth looking into!

The singleplayer code is available on GitHub: https://github.com/OliverF/astrovoids

The singleplayer version is available here: http://projects.bitnode.co.uk/Astrovoids
(Use WASD to move, space to shoot)

The multiplayer version is where it gets interesting. Unfortunately, I’m not happy with the performance of the multiplayer version right now, so I’ll post an update about that later.

1+
10
Feb

PiDoor – Security Door Lock v2

Some readers may recall my old security door lock, which I made about a year ago. It was controlled with just a single AVR, which proved very inflexible. Though in theory it was capable, it was very limited unless I spent a lot of time and effort interfacing with an internet connectivity module. While that may have been an interesting challenge, the inflexibility and impracticality of the old system was simply too much for me, and I didn’t have time to maintain it. It was generally too time consuming to dismantle the module and reprogram it as required, and so when it broke (the strings simply snapped!) I decided it wasn’t worth the hassle, and went back to a good old-fashioned regular lock.

But, where’s the fun in that? Fast-forward a year to 2014, and I finally convinced myself to buy a Raspberry Pi to mess around with. I already have quite a bit of experience with Linux, so I wasn’t planning to use it to learn Linux. In fact, I didn’t know what I’d use it for until the night before it arrived when it hit me: I could use it to make a much better door lock! And so when it arrived, along with a servo I happened to order with it, I got to work. Here’s the end result:

At the start of the video, you see the Raspberry Pi on the left, the servo/circuit housing in the middle, and the lock itself on the right. In the second half, you can see the keypad connected via a ribbon cable to the circuit housing.

Features:

• Keypad code locking/unlocking
• Web interface to control and view status from any internet connected device • Easy to update (thanks to the Raspberry Pi itself being a Linux device)
• Mains powered with a single 5v power source
• GPIO ribbon cable/socket for easy removal of the Raspberry Pi

Read on to find out how it works.

0+
9
Feb

K-Nearest Neighbours

I spent a few hours over the past few days working on an interactive K-NN (K-Nearest Neighbours) classification map.

Try it out here: http://projects.bitnode.co.uk/KNN/

If you’re unfamiliar with it, K-NN is a part of machine learning. Specifically, it can be used for classification (I.E does this belong to x or does it belong to y). The points you see on the map are training data. They essentially define the boundaries for classification, so that if we were to bring an unclassified point into the data set, we could decide which class it belongs to based upon the training data. The map is showing the classification of each individual pixel in the feature space.

So the algorithm for K-NN is really simple. You just look at K training points nearest to the particular input point (in this case, the location of a pixel) and average out the class. So for example if K=3, at pixel (10,10) we simply find the distance from the pixel to all the points of the training set, then look at the nearest 3 points, and average out their class. In this case, class is a colour (red or blue) and so we can just sum up the RGB components of the point individually and divide them by 3, giving us our averaged colour and therefore class which we can assign to the pixel in question.

Here are some interesting outputs:    Also an interesting bug I encountered while making this: Try it for yourself here: http://projects.bitnode.co.uk/KNN/

0+
17
Aug

DriveEditor – Vehicle config editor for BeamNG Drive

Another desktop application! This one was actually challenging, (which makes a change from other desktop applications I’ve made) and I learned a lot while creating a useful tool at the same time!

Well, useful if you like smashing cars in BeamNG’s Drive: a softbody car physics simulator which is much more fun than it sounds! Here’s the kind of stuff you can do after editing your vehicles:

So how does it work?

First, you might need to tell it where your BeamNG Drive vehicle directory is located (just go into Options->Settings), and then it will scan the chosen directory for vehicle config files. The config files are stored as .jbeam, which is some variant of JSON that I had lots of fun regex-ing to convert to valid JSON. After that, you get a screen with a huge tree on the left, and an editor on the right. You can edit values to your liking: It saves automatically as you type, so you can hop back into BeamNG Drive and hit “CRTL+R”, and the config will be loaded!

Features

• Automated scanning for .jbeam files
• Full .jbeam parsing, allowing for total flexibility of input
• Smart lookahead (so common config arrays will be presented in a multi-column format for easy editing)
• Type detection for boolean values (so that editing is a checkbox) and decimal/integers (so that saving them does not save them as a string, and maintains the original type)
• Tab index to allow you to quickly move between edit boxes by pressing tab
• Auto update checker (can be disabled)
• Component creation by duplicating existing components
• Node deletion
• Custom node adding directly from the editor

Planned features

• (done) Type detection (so numbers will be a number box, booleans will be a check box, etc.)
• (done) Custom node adding (So you can create vehicles from within the editor)
• (done) Tab index (So you can tab your way through values to edit them easily)

v0.5

• Recoded a major portion of the code, improving editing speeds DriveEditor
DriveEditor-v0.5.zip
Version: 0.5
181.6 KiB
Details

Older releases:

v0.4:

• Fixed bug where fullsize.jbeam would be incorrectly parsed
• Added “Create component from…” function, so that parts can be duplicated and modified without editing the original
• Patch v0.4.1: Made the “add new component” create JSON friendly validated names
• Patch v0.4.2: Added automatic update checking (can be disabled) and “Delete node” option (right click on a node to open the menu) DriveEditor (179.0 KiB)

(August 19th, 21:00)

v0.3:

• Improved (but not perfected) jbeam parsing. At least now all default files should be able to be parsed
• Added tab indexes to text boxes, so you can easily move between edit boxes by pressing tab
• Fixed bug where text would be truncated on long labels (oops)
• Added type detection, so values will be loaded, presented, and saved as their original types (e.g booleans will be checkboxes, integers will be saved without quotes, etc.)
• Allowed for window resizing DriveEditor (176.0 KiB)

v0.2:

• Added loading screen, made it easier to browse arrays (useful for BeamNG’s use of arrays) DriveEditor (176.3 KiB)

v0.1 – Initial release

1+
27
Jul

BitBak – Offsite Encrypted Dropbox Backup

What is the aim of this?

BitBak is an automated offsite encrypted Dropbox backup which utilizes TrueCrypt volumes to keep your data secured while storing it on the cloud. The goal is to protect your work from prying eyes and data loss at the same time by automatically encrypting then backing up your files to existing free cloud services. Sold already? Go to download!

Why take offsite backups in the first place?

If you’re a student like me, you don’t have a great deal of money to throw around. You probably develop your small or individual work either using a local repo, a cheap repo, or no repo at all. A local repo means that all your files are at the mercy of your (or your local server’s) hard drive. Same with no repo at all. A remote, free/cheap repo may one day disappear off the face of the earth, or you may consider the fact that they are cheap is a means for them to steal people’s projects illegally (in which case you should opt for a local repo if you’re the only one working on it). So essentially the risks amount to hard drive damage or theft, or remote service termination (along with all your files if you’re unlucky).

How does it work?

Very simply, it mounts a TrueCrypt volume, looks through all the folders you specify to backup, zips them up, copies them to the mounted volume, and unmounts again. In between this process, excess old backups are removed as per your specification. You can specify the interval at which to run, or you can silently run the program from the Windows Task Manager (just supply the argument “runsilent”) to allow total flexibility of when to run.

At this point, I should make it clear why I use Dropbox this. Dropbox only syncs changed blocks, rather than the entire volume, so adding small files to your volume won’t cost you a few hours of uploading. I should also point out that dynamic sized volumes will prevent a huge upload when you first create it. Of course, both of these choices are entirely up to you.

Features

• Automated interval-timed or manual backups to an encrypted TrueCrypt volume on Dropbox
• Multi-directory backups
• Encrypted key storage for TrueCrypt volume key (so that the volume can be securely mounted and unmounted automatically)
• Old backup archives automatically deleted as per your settings
• Can be run silently via Windows Task Scheduler or batch script by adding argument “runsilent”

Instructions

1. Create a new TrueCrypt volume. Follow the steps in TrueCrypt to create an encrypted file container (‘standard TrueCrypt volume’) inside your Dropbox folder (or anywhere, but Dropbox works best as syncing doesn’t require the entire volume to be uploaded)  2.   Open Options->Settings to let BitBak know where your TrueCrypt volume resides, what its password is, to add directories to the backup process, and to configure a few preferences. 3. Run a backup from the main screen. That’s all! You should be able to see all the backups in your TrueCrypt volume when you next mount it through TrueCrypt. I’m sold, let’s try it

TrueCrypt and Dropbox are individually associated with thier own license agreements to which you must agree before utilising and installing those services. You must also agree to the license provided in ‘BitBak license.txt’ before using the program.

Versions:

v0.1.1

• Fixed “runsilent” bug (thanks John N. for pointing this out)
• Fixed bug where backups would ignore specified path within archive and appear in the root backup directory
BitBak
BitBak-v0.1.1.zip
Version: 0.1.1
5.2 MiB
Details

Source code: BitBakSrc (5.2 MiB)

Older versions:

v0.1 – Initial release BitBak (5.2 MiB)

Source code: BitBakSrc (29.5 KiB)

If you get a malware warning, that’s because I’ve requested administrator privileges. If you don’t trust me, read through the source code – it’s only a few hundred lines!

1+
9
Jun

Radial Spatial Hashing for 2D Lighting This builds on my previous system, which you can see in my previous post. If you want to learn how to make simple 2D lighting, read that first. If you want to know how to make it 10x+ faster, read this!

So, this post deals with the issues faced with the inefficient method described originally – that was, checking every single ray against every single 2D surface. Imagine doing this manually: draw some rectangles on a page, and for each degree from 0 – 360, you want to draw a ray extending from the center, limiting the length of the ray to the first thing the ray meets. Now assuming you’re a smart individual, you would only consider the rectangles on your piece of paper that are roughly in the direction of the ray you’re drawing. That makes sense, when you think about it: why would you even bother to look at rectangles that are in totally the wrong direction of the ray? Well, that’s what our previous algorithm was doing! Fortunately, there’s a solution. It’s called ‘spatial hashing’. It’s great! Here’s a video of it in action:

We’re getting 60+FPS with 100 rectangles and multiple lights, now, which is a vast improvement. There’s some extra optimizations we can make, in addition to spatial hashing, but we’ll stick with this for now for the sake of simplicity.
Just for a rough idea of just how good this can be:
The old algorithm of checking EVERY surface with EVERY ray regardless of the direction took about ~117ms per light for 400 surfaces
With this spatial hashing algorithm, that time is reduced to ~4ms
A huge improvement, and we still have more improvements we can make! Read more to find out how, and to see the source code.

1+
7
Jun

Basic 2D Shadows and Lighting System (OpenGL, C#) I’ve decided to tackle the challenge of 2D shadows and lighting first in my to-be 2D game. It’s something I’ve wanted to try for a long time, and never really had the guts to try. It seems like a daunting task, but after a few days of work, I’ve produced a basic lighting system which can be extended to any polygon. It doesn’t use the best algorithm in the world, however it’s a start, and it’s something I can build on and improve later. Performance-wise, it’s not terrible. It can handle 400 2D lines (100 rectangles) at 20FPS and one light source. Here’s a video of it in action:

The above video demonstrates a slightly more advanced multi-light system. Click “read more” to read the details of how it works, how to make it, and the source code.

2+
7
Apr

Keypad Security Door Lock I just finished my Keypad Security Lock project a week or so ago, which I’ve been working on and off with a break in the middle for the past few weeks now. Held together with sticky pads and hookup wire, I present my latest creation:

(excuse the shoddy camera angles, it’s tricky to hold the camera and type when the keypad is on the other side of the door!)

Components:
ATmega 328P
Strip board
ABS housing
Full rotation servo (why did I get a full rotation one…)
Reed switch and magnet pair (to accommodate for the decision above) 