File operations | Linux Notes

how tosplit one text file into multiple *.txt files

September 18, 2018 - Reading time: 4 minutes

You can use the Linux Bash core utility split:

split -b 1M -d  file.txt file

Note that M or MB both are OK but size is different. MB is 1000 * 1000, M is 1024^2

If you want to separate by lines you can use -l parameter.

Update:

a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d  file.txt file

Another solution as suggested by Kirill, you can do something like the following

split -n l/12 file.txt

Note that is l not one, split -n has a few options, like N, k/N, l/k/N, r/N, r/k/N.

Using Bash:

readarray -t lines < file.txt
count=${#lines[@]}

for i in "${!lines[@]}"; do
    index=$(( (i * 12 - 1) / count + 1 ))
    echo "${lines[i]}" >> "file${index}.txt"
done

Using AWK:

awk '{
    a[NR] = $0
}
END {
    for (i = 1; i in a; ++i) {
        x = (i * 12 - 1) / NR + 1
        sub(/\..*$/, "", x)
        print a[i] > "file" x ".txt"
    }
}' file.txt

Unlike split, this one makes sure that the number of lines are most even.

how to repack an epub file from command line

September 9, 2018 - Reading time: 5 minutes

To unzip the epub, move the ePub to a folder, cd to it then simply:

unzip MyEbook.epub

To zip up an epub:

1. zip -X MyNewEbook.epub mimetype

2. zip -rg MyNewEbook.epub META-INF -x \*.DS_Store

3. zip -rg MyNewEbook.epub OEBPS -x \*.DS_Store

Some explanations necessary here. We start each line with two flags:

-r (recursive)

This means move down through any directories/folders recursively, ensuring that everything in the folders specified gets included

-g (grow file)


even more concise :

// To unzip the epub, move the ePub to a folder, cd to it then simply:

zip -rX ../my.epub mimetype META-INF/ OEBPS/

Without -X you could get the following when validating it with EpubCheck:

ERROR: my.epub: Mimetype entry must not have an extra field in its ZIP header

If mimetype is not the first in the epub file EpubCheck prints the following:

ERROR: my.epub: Mimetype entry missing or not the first in archive

Linux uses the name of the current folder as the epub filename:
zip -rX "../$(basename "$(realpath .)").epub" mimetype $(ls|xargs echo|sed 's/mimetype//g')

I found this version also worked better for epubs I found that don't use the OEBPS folder. I don't know if not having that folder is valid per standards, but I found examples of it being missing in the wild.

concatenate multiple files but include filename as section headers

April 2, 2017 - Reading time: ~1 minute

tail -n +1 file1.txt file2.txt file3.txt

grep "" *.txt

This should do the trick as well:

find . -type f -print -exec cat {} \;

Means:

find    = linux `find` command finds filenames, see `man find` for more info
.       = in current directory
-type f = only files, not directories
-print  = show found file
-exec   = additionally execute another linux command
cat     = linux `cat` command, see `man cat`, displays file contents
{}      = placeholder for the currently found filename
\;      = tell `find` command that it ends now here

You further can combine searches trough boolean operators like -and or -or, also find -ls.

batch-convert document formats

February 8, 2017 - Reading time: 4 minutes

Install the latest LibreOffice and it's library
On most Ubuntu version LibreOffice is defaulty installed, just upgrade it to the latest version.

apt-get install libreoffice

Install unoconv
unoconv is a command line utility that can convert any document format (doc, docx. odt, ods, xls, xlsx) that LibreOffice can import, to any file format (xml, pdf, doc, docx. odt, ods, xls, xlsx) that LibreOffice is capable of exporting.

apt-get install unoconv

Installing LibreOffice is more than enough to answer your needs in converting document, but for some reason unoconv do better (despite using the same library) when the output is use by other application, we can say it more acceptable.

Batch Convert Document to csv, pdf, jpg, docx, xlsx, odt, or ods

To batch convert document you have to work under terminal, prepare a document that will be converted in a folder (can be consist of several sub-folders), then do following :

Synopsis :

unoconv [options] to-file from-file

Convert single xls format to pdf

unoconv -f pdf some-document.xls

Convert single png format to jpg

unoconv -f jpg some-document.png

Batch Convert docx format to pdf

unoconv -f pdf *.docx

Batch Convert xlsx format to ods

unoconv -f ods *.xlsx

Batch Convert csv format to xlsx

unoconv -f xlsx *.csv

Batch Convert csv format to ods

unoconv -f ods *.csv

You can change the option, source, and destination format according to your needs. Some document format which supported by the this application are pdf, odf, odt, ods, xls, xlsx, doc, docx, rtf, ppt, pptx, csv, png, jpg, bmp and svg.

download all links from a site and save to a text file

June 11, 2016 - Reading time: ~1 minute

wget is not designed for this. You can however parse its output to get what you want:

$ wget http://aligajani.com -O - 2>/dev/null | grep -oP 'href="\Khttp:.+?"' | sed 's/"//' | grep -v facebook > file.txt

You could use lynx for this:

lynx -dump -listonly http://aligajani.com | grep -v facebook.com > file.txt

This command dumps the links of a single page. To do this recursively:

      wget -r -p -k http://website

wget -r -p -k --wait=#SECONDS http://website

The second one is for websites that may flag you if downloading too quickly; may also cause a loss of service, so use second one for most circumstances to be courteous. Everything will be placed in a folder named the same as website in your root folder directory or whatever directory you have terminal in at time of executing command.

how tosplit one text file into multiple *.txt files

September 18, 2018 - Reading time: 4 minutes

how to repack an epub file from command line

September 9, 2018 - Reading time: 5 minutes

concatenate multiple files but include filename as section headers

April 2, 2017 - Reading time: ~1 minute

batch-convert document formats

February 8, 2017 - Reading time: 4 minutes

download all links from a site and save to a text file

June 11, 2016 - Reading time: ~1 minute

About

Search

Resources

Linked Notes

Domains

Recent

Tags