Thursday, July 24, 2014

Accessing individual files from a "dd backup" of a encrypted hard disk/partition

Whenever I do something risky with my laptop like resizing partitions, I always do a full backup of the hard disk by running:
dd if=/dev/sda of=/path/to/backup/directory/image.img bs=4M
from a live archlinux USB pendrive.

Restoring from a backup like that is super easy.
dd if=/path/to/backup/directory/image.img of=/dev/sda bs=4M
Works perfectly. Every single time. Except licensing issues with Windows installation if you've a dual boot but WHO CARES!!!

In my case, one of my partitions /dev/sda5 is LUKS encrypted & if I want to access files from that partition using the .img file, things are not very straight forward but easy. For reference, this is what you'll typically do.
➜ 0 /home/shadyabhi [ 9:40PM] % sudo modprobe loop
➜ 0 /home/shadyabhi [ 9:40PM] % sudo losetup -v /dev/loop0 ./official_laptop_backup.img
➜ 0 /home/shadyabhi [ 9:40PM] % ls /dev/loop0*
➜ 0 /home/shadyabhi [ 9:40PM] % sudo partprobe /dev/loop0
➜ 0 /home/shadyabhi [ 9:40PM] % ls /dev/loop0*
/dev/loop0  /dev/loop0p1  /dev/loop0p2  /dev/loop0p5  /dev/loop0p6  /dev/loop0p7
➜ 0 /home/shadyabhi [ 9:40PM] % sudo mkdir /mnt/my_encrypted_partition
➜ 0 /home/shadyabhi [ 9:40PM] % sudo cryptsetup luksOpen /dev/loop0p5 encrypted_partition
Enter passphrase for /dev/loop0p5: 
➜ 0 /home/shadyabhi [ 9:40PM] % sudo mount /dev/mapper/encrypted_partition /mnt/my_encrypted_partition 
➜ 0 /home/shadyabhi [ 9:40PM] % sudo ls /mnt/my_encrypted_partition | wc -l
➜ 0 /home/shadyabhi [ 9:41PM] % 

Saturday, June 28, 2014

veth pair: How to know what interfaces are connected by a veth pair?

Here is a quick one. Couldn't find a direct solution on Google.

You can find out what's the peer of a interface by issuing the command:
[root@compute-1 ~]# sudo ethtool -S int-br-ex
NIC statistics:
     peer_ifindex: 55
[root@compute-1 ~]# ip link  | grep 55:
    link/ether c6:5f:55:82:20:72 brd ff:ff:ff:ff:ff:ff
55: phy-br-ex:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
[root@compute-1 ~]#
So, you can easily find out that int-br-ex and phy-br-ex are peer to each other, that means they are connected via veth pair. To find out what all veth pairs are present in your system:
[root@compute-1 ~]# ip -d link show
.... output omitted ...
55: phy-br-ex:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:92:8e:3f:6e:f8 brd ff:ff:ff:ff:ff:ff
56: int-br-ex:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether d6:3b:77:c0:70:4f brd ff:ff:ff:ff:ff:ff
[root@compute-1 ~]#
While using "-d" option, ip command gives you a little more details about the interface. In this case, it tells you whether a interface is part of a veth pair.

Saturday, June 14, 2014

Use j,k keys to navigate clipboard history in parcellite or all gtk applications

I'm a parcellite clipboard manager fan. It's clean, it's simple and it's lightweight. I use it all the time to search my clipboard history and select past clipboard entries.

But, I always used to get annoyed by having to use "Up", "Down" arrow keys to navigate the GtkMenuShell. A simple "j", "k" will be so much better. To do that, simply add the below snippet in ~/.gtkrc-2.0:
binding "gtk-binding-menu" {
    bind "j" { "move-current" (next) }
    bind "k" { "move-current" (prev) }
    bind "h" { "move-current" (parent) }
    bind "l" { "move-current" (child) }
class "GtkMenuShell" binding "gtk-binding-menu"
This lets you use "vim-like" keys in all gtk programs (for menus) including parcellite :)

Saturday, October 19, 2013

Encoding and Python: The UnicodeDecodeError exception

UnicodeDecodeError: 'ascii' codec can't decode something in position somewhere: ordinal not in range(128)

It all started with "ASCII" (it's a encoding, things will get more clear later) which was proposed in 1962. The idea was to represent english text by relating them to "decimal numbers" (read bytes and ultimately bits).

So, "1000001" (binary number, or 65 in decimal) in ASCII encoding corresponds to "A". This "A" is just a "glyph" (A mark that corresponds to A).

Sadly, this way of representing was not sufficient to represent all characters/symbols in the world. In the good old world, when people couldn't find the characters they wanted, they started creating their own encodings. Hence, encodings like latin, utf-8, utf-32 came in. This was good until a chinese guy just wanted to just write chinese (read any chinese dialect) and not combine both chinese and latin. Hence, there was a problem to represent all possible characters in one string (as not all characters might lie in one encoding).

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13144: ordinal not in range(128)
Now, lets understand what this error actual means.
  1. It's a exception UnicodeDecodeError that is not caught.
  2. It says that while using "ascii" codec (read encoding), it couldn't decode the byte "0xe2" which is present at 13144.
Lets start with understand what unicode is. Unicode is a way to represent different glyphs using strings. It tries to include all characters possible. For ex, a "halfwidth katakana middledot" which has gylph ( can be represented by a string like \uff65. This way, Unicode tries to represent all the characters and symbols possible in all languages.

So, 'ascii' is a encoding. Old-style str instances use a single 8-bit byte to represent each character of the string using its ASCII code. Python tried to represent a character with 'ascii' encoding but it failed as it didn't exist. But, why the hell ascii? Isn't it old? That's because, python 2's default encoding is "ascii".
➜ 0 /home/shadyabhi [ 8:19PM] % locale
➜ 0 /home/shadyabhi [ 8:19PM] % python2 -c 'exec("import sys; print sys.getdefaultencoding()")'
➜ 0 /home/shadyabhi [ 8:19PM] %
If you want to change default encoding to utf-8 in python, you can do a hack:
import sys
# Set default encoding to 'UTF-8' instead of 'ascii'
# Bad things might happen though

This part is fixed in python3 by making "str" as a Unicode object where "str" object actually manages the sequence of Unicode code-points.

Now that we understand the exception, to fix it, you need to "decode" the string in the proper encoding that actually understands it. The "decoding" will make sure that the particular character which caused the exception earlier is actually a known character now. To encode/decode strings, python has two functions:
  1.  s.decode("ascii"): converts str object to unicode object
  2.  u.encode("ascii"): converts unicode object to str object
>>> u'・'
>>> u'・'.encode('utf-8')
>>> '\xef\xbd\xa5'.decode('utf-8')
>>> print '\xef\xbd\xa5'.decode('utf-8')
>>> '\xef\xbd\xa5'.decode('ascii')
Traceback (most recent call last):
  File "<input>", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)
 I faced above error when I was trying to parse webpages and get text out of it using "html2text" module. As python 2's default encoding is "ascii", it's stupid to assume that all the websites can be represented in "ascii" encoding.

How do we guess the encoding of text then? We can't. Some encodings have BOM and they can be used to detect text encoding while for others, there is simply no way. Well, there is a module named chardet that you can use to guess the encoding though. I repeat, there is no reliable way to guess the encoding. While parsing web-pages, there is mostly a header like: 
Content-Type: text/html; charset=utf-8
or the webpage may start with:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

which can be used to get the encoding related information. Wait, but how do we read the encoded text without knowing the encoding? Luckily, the name of all encodings can be represented in that basic "ascii" encoding so that's not a problem. 

This information can then be used to convert it to Unicode in python
Once that is done, you can do whatever you want with it. If you need to save to disk, you need encode it back though.

If the webpage is bitchy and gives a false header, you get a exception UnicodeDecodeError if you're using "strict" option, which is default. If you still want to decode anyway, use "ignore" or "replace".
page_content.decode(encoding_in_header, 'ignore')
It's a good practice to decode string to Unicode as soon as we receive it from external source and operate on it. When we're done with it and want to give back or store somewhere, encode it again. Then why is it not done in Python 2? Because, not all core parts of python operate on Unicode. This is fixed in Python 3.

I hope this gives a little idea of what's this Unicode and how to handle different encodings in your code.

Further Reading:

Google Bookmarks shortcut in pentadactyl

Till now I used Shareholic extension to add bookmarks in Google for the websites.

This worked great but I had to install a addon just for this functionality. I like minimalistic design so I don't have a menuar, bookmarks bar, location bar etc, it's just the webpage. That's the very reason I use pentadactyl on my Firefox. So, the very thought of adding a addon bugs me.

Just today, I figured that I can map javascript functions with shortcuts. So, here is a little thing that you can add in your `.pentadactylrc` and add bookmarks just by pressing a shortcut.
map -modes=n z -javascript (function(){var a=window,b=content.document,c=encodeURIComponent,""+c(b.location)+"&title="+c(b.title),"bkmk_popup","left="+((a.screenX||a.screenLeft)+10)+",top="+((a.screenY||a.screenTop)+10)+",height=510px,width=550px,resizable=1,alwaysRaised=1");a.setTimeout(function(){d.focus()},300)})();
Notice the "content" in variable "b", that's because if I just use "document.location", I'll get the value as  "chrome://browser/content/browser.xul".
Now, you can press "z" and add current location to Google Bookmarks.