Introducing the Offen Protocol

We are building Offen, a fair and lightweight web analytics software that treats operators and users as equal parties. Along the way, we discovered many subtleties and details to consider and created the Offen Protocol for all software out there that aims to handle usage data in a transparent way. Read the full version of this article on the Offen blog.

The underlying concept is the definition of five actions that clients can take when they interact with a server that processes their data. These actions correspond to the rights of the data subject as defined by GDPR.

  • Probe is used to request additional information about the service.
  • Register is used when a client wants to make itself known to the server.
  • Submit is the action taken when a client transfers data to the client.
  • Query will be used when clients want to query the server for data.
  • Purge can be used by clients that want to initiate removal of data.

The full specification document can be found on the website. The protocol is not overly complicated and is perhaps even more of a convention than a specification. However, we have extracted what we use in Offen and added these implementations to the GitHub repository, which also contains the specification itself.

Please let us know what you think. We’re happy to open the discussion: tweet at us, send us an email, or open an issue on the GitHub repository.

_




Source link

Bring Your Home Network Anywhere For Free – Home VPN with WireGuard on Raspberry Pi + Pi-hole (Ubuntu Server 20.04 LTS)

In the previous blog post, I talked about setting up Ubuntu Server 20.04 LTS and Pi-hole DNS on Raspberry Pi. You can go through the process step by step following Block Ads, Tracking, and Telemetry With Pi-hole on Raspberry Pi (Ubuntu Server 20.04 LTS).

Having Pi-hole set up on our home network, we will have a much better internet browsing experience without ads and better control of available resources (if any). Also, maybe you have a network-attached storage (NAS) in your network and would want to have access from anywhere, or you just want a safe browsing experience when connected to public WiFis.

The setup above is limited only to your home network, and after a couple of days of browsing, you will think – why can’t I bring this network setup wherever we go!? Well, YOU CAN. A logical presumption would be to have a way to connect to our home network from anywhere and browse through it. Even when you connect from the other side of the world.



Virtual Private Network

Virtual Private Network (VPN) allows us to connect our devices to another network over the internet in a secure manner. We can browse the internet using other computers’ (server) internet connection.

I am sure you came across internet ads for paid services like ExpressVPN, NordVPN, Surfshark, etc. They are awesome without a doubt, you can fake your device’s IP location and use some geographically limited services like Netflix, but it won’t get you to your home network. And you have to pay for it. All VPNs use VPN protocols to create and secure your connection, so why shouldn’t you, for your needs?



WireGuard or OpenVPN

Two most popular VPN protocols used today are WireGuard and OpenVPN. There is no specific reason why I choose one over the other, but it is said that WireGuard is much faster than OpenVPN and it consumes around 15% less data, handles network changes better and appears to be just as secure (I don’t know who said it).



WireGuard (or OpenVPN) on Raspberry Pi

We could go through the manual installation instructions for WireGuard, but there is a great tool, PiVPN which allows us to install the desired VPN very easily.

Log in to your Raspberry Pi directly or via Secure Shell (SSH), and run:

curl -L https://install.pivpn.io | bash

The process will use sudo and install the necessary dependencies. Just wait for it to do its job. After installing the necessary packages, you will be prompted with graphical options:
image.png
We previously talked about setting up a static IP address on Ubuntu Server 20.04. PiVPN won’t configure static IP for us because we are not using Raspbian OS for our Raspberry Pi.
image.png
Just accept default options, and be sure to select the WireGuard option when prompted.
image.png
You can change the default WireGuard port if necessary but have in mind that you will need it later, so make sure you remember it (I will use the default option, port 51820)
image.png
If you have a Pi-hole installation, PiVPN will detect it and ask if you want to use it as a DNS.
image.png
In the next steps, you will be prompted to use Public IP or DNS. Choose your public IP address.
image.png

If your ISP provides you with a dynamic IP address, there is a solution in the next post. For now, continue with this article.

Continue with the process and accept unattended upgrades to the server.
image.png
Just follow the process and accept to reboot Raspberry Pi after the installation, so everything is set up.
image.png

If you use Pi-hole as a DHCP server, you won’t have an internet connection while Raspberry Pi is rebooting.



Port Forwarding

To be able to connect to your Raspberry Pi VPN server, we need to set up a port forwarding option on your router. I have Technicolor CGA2121, but you can find that on every router, under settings (or advanced settings, usually under the Application & Gaming option).
image.png



Adding VPN Client

To add a new VPN client user, use the integrated PiVPN command:

pivpn add

Choose your client name and hit ENTER.

You may have a warning to Run ‘systemctl daemon-reload’ to reload units, so just do it.

Now your client is ready to connect. You can find installation files here for different operating systems.

For Android and iOS devices, there is a WireGuard application on PlayStore/AppStore, so download it. To quickly set up WireGuard VPN, from your Raspberry Pi run:

pivpn -qr

You will have a QR code on the screen which you can read from your mobile phone to set it up.
image.png

Now when you leave your home network, you are always a flip of the switch away from it.



Pi-hole DNS Troubleshooting

If you installed PiVPN before Pi-hole, edit the PiVPN configuration with:

$ sudo nano /etc/pivpn/wireguard/setupVars.conf
  • Remove the pivpnDNS1=[...] and pivpnDNS2=[...] lines
  • Add this line pivpnDNS1=192.168.0.50 (your Pi-hole IP might be different) to point clients to the Pi-hole IP
  • Save the file with Ctrl+X, Y and exit
  • Run pihole -a -i local to tell Pi-hole to listen on all interfaces



Dynamic IP Address

If you are lucky enough or you are not sorry to pay for the static IP address, you can skip this part. Otherwise, here is how to

%[https://amelspahic.com/set-up-dynamic-dns-for-dynamic-ip-addresses-at-home]



Final Words

I hope this tutorial will help you set up your VPN communication and bring even more privacy, security, and comfort while browsing the internet.


Source link

Private AI: Machine Learning on Encrypted Data

Kristin Lauter is currently the head of research science at Facebook AI Research (Meta), formerly the head of Cryptography and Privacy Research Group at Microsoft Research. This post is based on her talk at the OpenMined Privacy Conference.



What is the privacy problem with AI?

Let us begin by looking at a generic Machine Learning (ML) algorithm that takes in our data as input and outputs some kind of decision – a classification label, numerical value or a recommendation.

The privacy problem stems from the fact that we have to input our data to get those nice and valuable predictions.

Many AI applications powered by smart agents are hosted on the cloud, and protecting privacy of user data is pivotal to building secure applications.

The privacy of data can be protected through encryption. However, standard encryption methods do not allow computation on encrypted data. Here’s where Homomorphic Encryption (HE) helps.



What is Homomorphic Encryption?

In simple terms, Homomorphic Encryption is a mathematical tool:

  • that allows for encryption of data,
  • ensuring privacy while at the same time, allowing computations to be performed on the encrypted data.
  • The result of computation can then be decrypted to get the results.

As shown in the figure below, with Homomorphic Encryption (HE), the order of encryption and computation can be interchanged.

Suppose you have data a and b. You can perform computation on the data and then encrypt the result, denoted by E(a*b).

Alternatively, you could encrypt the data and then perform computation, denoted by E(a) * E(b) in the figure below. If the encryption is homomorphic, then these two values, E(a*b) and E(a) * E(b) decrypt to the same value.

Therefore, we can choose to encrypt private data a and b,
outsource computation, say, to the cloud, and decrypt the obtained result of the computation to view the actual meaningful results.



Understanding Homomorphic Encryption intuitively through Homer-morphic Encryption

Let’s try to think of a fictional character and draw a relatable analogy.

Remember Homer Simpson from ‘The Simpsons’?ūüôā

The following illustration is aimed at giving an intuitive explanation to what Homomorphic Encryption does.

Let us say you need to get a jewel made and you have your gold ready! You’d now like to call your jeweler (Homer Simpson) and get your jewel made, however you’re not very sure if your jeweler is trustworthy. Here’s a suggestion.

You may place your gold in a box, lock it and keep the key with yourself. You may now invite over your jeweler and ask him to work on the gold nuggets through the glove box. Once the jewel is done, you may unlock your box and retrieve your jewel. Isn’t that cool?

Let us try to parse the analogy a bit.

  • Your private data is analogous to gold,
  • outsourcing computations on encrypted data in a public environment is similar to getting your jeweler to work on the gold through glove box, and
  • decrypting the results of computation to view the meaningful results is analogous to opening the box to get your jewel after the jeweler has left. ūüôā

Without delving deep into the math involved, the high-level idea behind homomorphic encryption is as follows.

  • Homomorphic Encryption uses lattice-based encoding.
  • Encryption adds noise to a secret inner product.
  • Decryption subtracts the secret inner product and the noise becomes easy to cancel.

In the next section, let’s see the capabilities of Microsoft SEAL, a library for Homomorphic Encryption.



Microsoft SEAL (Simple Encrypted Arithmetic Library)

SEAL (Simple Encrypted Arithmetic Library) is Microsoft’s library for Homomorphic Encryption (HE), widely adopted by research teams worldwide. It was first released publicly in 2015, followed by the standardization of HE in November 2018. Microsoft SEAL is available for download and use here.

In recent years, availability of hardware accelerators has also enabled several orders of magnitude speedup. Here’s a timeline of how Homomorphic Encryption has been adopted due to advances in research and easier access to compute.

  • Idea: Computation on encrypted data without decrypting it
  • 2009: Considered impractical due to substantial overhead involved
  • 2011: Breakthrough at Microsoft Research
  • Subsequent years of research: Practical encoding techniques that achieved several orders of speed-up
  • 2016: Crypto Nets at ICML 2016 – Neural Net predictions on encrypted data

Now that we’re familiar with how HE has been adopted over the years, the subsequent sections will focus on the possible applications of HE.



Cloud Compute Scenarios benefitting from HE

The following are some of the cloud computing scenarios that could potentially benefit from Homomorphic Encryption (HE):

  • Private Storage and Computation
  • Private Prediction Services
  • Hosted Private Training
  • Private Set Intersection
  • Secure Collaborative Computation

The markets benefitting from such services include healthcare, pharmaceutical, financial, government, insurance, manufacturing, oil and gas sector, to name a few. A few applications of Private AI across industries are listed below.

  • Finance: Fraud detection, automated claims processing, threat intelligence, and data analytics are some applications.
  • Healthcare: The scope of Private AI in healthcare includes medical diagnosis, medical support systems such as healthcare bots, preventive care and data analytics.
  • Manufacturing: Predictive maintenance and data analytics on potentially sensitive data.



Azure ML Private Prediction

An image classification model for encrypted inferencing in Azure Container Instance (ACI), built on Microsoft SEAL, was announced at Microsoft Build Conference 2020. The tutorial can be accessed here.



References

[1] Recording of the PriCon talk


Source link

Internet cookie warnings: An ineffective and inefficient way to enforce laws.



Written by Leonardo Zamudio López



Introduction

One of the most well-known aspects of the web are the famous “cookies”, those old friends who are responsible for tracking our information on the Internet for advertising purposes.

It is well known, and has even become a meme, the fact that every time we enter a website, that well-known alert appears saying “We use cookies for a better user experience and recommendations”.

But before we talk about the whole moral and ethical side of cookies, we must first know what they are.



What is a cookie?

A cookie is a packet of data that a web browser automatically stores on a user’s computer when the user visits a website. The cookie is sent from the server to the visitor of the web page. Subsequently, each time the user visits the same web page or another web page of the same domain, the cookie will be read by the web browser, without being modified, and sent back to the web server.

Therefore, a cookie is just data that is stored on the user’s computer. But since the storage is done at the behest of the web server, there has always been a fear that something malicious could be done. However, cookies are not software, nor are they code snippets, they are simply data. Therefore, in principle, cookies cannot transmit and execute viruses or install malware such as Trojans or spyware.

However, cookies can be used to track user activity on the Web.

Cookies were developed in 1994 by engineers at the Netscape company, and their browser, now defunct, was the first to accept them. Since then, cookies have been an essential element for the Web to function as we know it today.

As a curiosity, the original Netscape cookie specification can still be found on the Web today.

Cookies are necessary because the HTTP protocol used on the Web to transmit Web pages is a stateless protocol, and does not provide a mechanism for maintaining state, i.e., the history of requests and actions performed by a user between different requests.

Cookies were originally developed by Netscape to provide a reliable means of implementing a virtual shopping cart.

A virtual shopping cart, also called a virtual shopping basket, acts as a virtual device in which the user places the items he or she wishes to purchase, so that users can browse the site displaying the items for sale and add or remove them from the shopping basket at any time. Subsequently, cookies have been used for different purposes. The main one is to be able to differentiate users from each other in order to act differently depending on the user visiting a web page.

For example, cookies are used to store user preferences such as the preferred language for viewing a website.

Another example, most search engines such as bing have a preferences option. bing displays 10 results by default when performing a search. However, from the preferences page you can change this value to 50, for example. From then on, the results will always appear paged 50 by 50, even if the browser is closed, since the cookie is maintained from one day to the next.

But the main use of cookies is to store the session. The session is a basic concept in web applications that allows you to control user access to certain parts of a website and show you information particular to that user. Finally, there are also some problematic uses of cookies, such as tracking cookies, which allow a user to be tracked between different websites.

Tracking allows to know which websites a user has visited, how long he/she has been on each of them, and is usually used to create anonymous user profiles that can be used later for different purposes, such as the creation of advertising campaigns based on user profiles. This use of cookies is employed by companies that manage ads on the Internet, such as DoubleClick, one of the most important in the sector.



What do I think about cookies?

I certainly don’t think it’s a bad thing for a website to use cookies to store session information or any other type of data that helps the site’s performance. However, there has been a lot of controversy surrounding cookies, mainly for advertising purposes.

We can take some responsibility away from social networks or forums, since they warn you about the use of cookies explicitly in the Terms and Conditions that absolutely no one reads when creating an account on these sites. However, what can we say about other sites?

I remember being on the computer searching for information for my school research assignments. Many of the pages I consulted were of a commercial nature. Every time I entered one of these sites, I would see the typical banner at the bottom: “We use cookies to improve the user experience”. And I remember thinking to myself, “Ok, at least I, a programming student, know what cookies do. But what about a person who doesn’t belong to an IT branch?”

Let’s put this hypothetical situation: Let’s go out on the street and ask random people “What is a cookie?”. Most (if not all) people will answer you the following, “A cookie is a small-sized, sweet or savory, baked dessert usually made from wheat flour, eggs, sugar and butter.”

And this is where the problem lies. User data is being used for advertising purposes without the user’s explicit knowledge. And the worst part of the case is not that, many sites do not explicitly tell you what data they are using and for what purposes.

How can a user be 100% sure that their data is being used properly? Many sites still lack data use transparency protocols.

I believe that the best way for a user to be aware of the data being used is to say this explicitly. For example, instead of saying for the umpteenth time “We use cookies to improve the user experience”, we could say: “We use data such as pages visited, location, device type (etc, etc, etc, etc), for performance, session and advertising purposes”.

If we do not explicitly state the data used, how can we ensure that our data are used within our preferences and even within the legal framework?

And it is up to us, users, to inform ourselves about technological issues involving our personal data and our digital footprint. We must ensure that our data is being used appropriately. And if not, we can rise up in virtual arms and sing that glorious war cry, “Feed me your kings and queens, I will spit their bloody crowns to the ground.”

Anyways, that’s it for this post. Let me know what you think in the comments. Remember to always give your opinion based on logical reasoning and respect. Take care. See you next time.


Source link

Customer Data Pipeline And Data Processing: Types, Importance, And Benefits

A data pipeline consists of actions that ingest raw data from various sources and move the data to a storage and analysis destination. This post will look at what a customer data pipeline is and a few of its key elements. We will further talk about the different types of data processing taking place within the data pipeline.

Business analysis deeply depends on data pipelines, which are essential for any online venture. Given the voluminous amount of data today, the need for managing and storing the data s drastically heightened.



Why Use An Automated Pipeline?

A data pipeline is an automated process, but why do we need it in the first place?

Data pipelines take the role of arranging all data volumes in the same format and one place, thus reproducible.



Key Elements Of A Data Pipeline

A data pipeline represents a data preparation process based on a few key elements –¬†source, destination, dataflow, processing, workflow, and monitoring.

Data pipelines enable application or data lake information to travel to a data warehouse or an analytics database, respectively.



Source

A data source is a place from where the pipeline retrieves information. These can include cloud-based customer tools (Salesforce, Facebook Ads, Google Ads, etc.), relational management systems, social media database management tools, and even sensor devices.



Destination

You can enter the data directly in the visual data analysis tools, but this is the data pipeline’s endpoint, where it removes all data pulled.



Dataflow

The raw data can be changed while traveling from its source to its destination- a move also referred to as data flow.



Data Processing

Data processing is a method that enables you to evaluate how data is gathered, transformed, and stored.

As an important step, data processing in data pipelines chooses whereby you should implement the dataflow. The processing of data includes the extraction of business data from all available sources. Once processed, this data undergoes an inspection and is adjusted to the business user before it is loaded into the data store.



Types Of Data Processing

So, can you make a difference between a data processing system and data processing? Yes, and here’s how it works:

Data processing regards data transformed into beneficial information, whereas a data processing system is an optimized tool for proper data management.

Some of the basic types of data processing include:



Transaction Processing

The processing of the transaction is deployed in mission-critical situations. If violated, these conditions will negatively affect your business.



Hardware

The system for processing transactions should have redundant hardware. Excess hardware allows partial defects because you can automate the unnecessary components to take the system and maintain it operationally.



Software

The software system for processing transactions shall be designed to recover quickly from the defect. Usually, systems for processing transactions use abstraction transactions to achieve this. If there’s a failure, uncommitted transactions are interrupted, allowing the system to restart quickly.



Distributed Processing

Distributed data processing breaks down these big stores, while data sets across many machines or servers. Data within a distributed processing system has a huge tolerance for slips. When one or two network servers crash in a distributed processing system, the system redistributes the data processing to other remaining servers.



Real-Time Processing

Real-time processing takes the same approach seen in transaction processing.

If an error is detected in the input, it ignores it and instead switches to the next piece of incoming data. The most popular application of real-time data processing is GPS tracking apps.

Batch Processing

As the name suggests, serial processing happens when pieces of data stored at a certain time are analyzed either together or in batches. Serial processing is required when you need to analyze huge volumes of data for a detailed inspection. As the system handles mass volumes of data, the processing period might take longer to complete. The favored real-time data processing method is serial processing, given that accuracy of information is more significant than the processing speed.



Multiprocessing

Multiprocessing is a method in which two or more processors work on the same database. The most obvious drawback of this data processing type is the cost. Keep in mind that building and maintaining internal servers is very expensive.



Workflow

Workflow usually refers to the sequencing of jobs within the pipeline data, as well as their co-dependence. Here, dependencies and sequencing decide when a data pipeline is operating.



Monitoring

The final element is monitoring, meaning the pipeline is continually monitored for efficiency and speed to evaluate data accuracy and loss.



Data Pipeline And ETL

ETL stands for Extract, Transform and Load, and is a method applied with batch loads regarding particular pipeline data.

Essentially, this process refers to source data transferring, like an app target, such as a data warehouse.

  • Data¬†Extraction¬†using various source systems enables an easy acquiring of relevant data from a specific source.
  • Data¬†Transformation¬†applies to the processes of filtering, aggregating, and preparing data for further analysis.
  • Data¬†Loading¬†represents the loading of data into its final destination.



Sign Up For Free And Start Sending Data

Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.

Get Started


Source link

Meeting our standards

We are building Offen, a fair and lightweight web analytics software that treats operators and users as equal parties. Here is what we have achieved in the past weeks.

Statistics about the location of visitors had been on our to-do list for a fairly long time. Yet implementing it in a way that met our privacy standards proved to be a veritable challenge. After careful consideration and intense research, we finally decided on an approach based on time zones.

To derive the geographical location, this method does not rely on an IP database, but asks the browser for the selected time zone and tries to assign it to a country. This fully protects the privacy of users and provides sufficiently accurate results for analysis as well.

Furthermore, we have addressed the issue of user awareness in Offen. Since the only direct link to the User Auditorium is in the consent banner, it was important for us to provide additional features to increase the attention of users. Widgets now give operators the opportunity to easily integrate a reference to the User Auditorium. Preferably with a link on every page.

While there are still a few improvements on the agenda, with the implementation of location statistics we are a major step closer to v1.0. Be sure to stay tuned and follow us here or on Twitter and Mastodon for the next release updates.




Source link

Encryption is important – DEV Community

Encryption is crucial to everyone, whether it’s the average user or enterprise deployment. Everyone has information they’d rather hide. It may not be much; you may be the kind of person that doesn’t save embarrassing photos or cringy search histories. But what about private conversations between you and your friends?

You don’t have to be a criminal to enjoy encryption, simply stopping random people from reading texts you sent to your friend and texts they sent back is most likely something you’d be interested in. After all, you don’t send them texts publically – it’s a private conversation!

This is where encryption comes into play. It may not be much – it can easily be automatic encryption like what’s found in Signal or iMessage, but it’s important nonetheless. Unless you trust iMessage, in which case you don’t care about privacy and should really take my advice and use Signal so Jared can quit reading your fantasy texts.

I have seen encryption that can be broken in seconds – a good example of this is poorly encrypted AES128 where the key can be extracted. For good encryption, you should use more layers, more random passwords, and smarter encryption.
Let’s assume you have a laptop with private communications between you and a friend. You don’t need to be privacy paranoid to not want some random stranger reading these texts.

The first step is adding a password to your account. This way, someone can’t just click “Sign in” and log into your computer. But, if they have long-term access to your device, they can boot into a different OS or unplug your hard drive and read the texts through a file browser. This is fairly basic and if you aren’t doing this, Joe is going to love reading your search history.

The next layer is enabling user account encryption, like with NTFS’s “Encrypt contents to secure data” option in Windows or Linux’s EncryptFS. Now, if you have a weak password like your birthdate or your crush’s name because we all know you have one Jared, then you should reconsider what kinds of passwords you use. Try to avoid using words or w0rd$ in your password because that can be just as bad as setting your password to “password,”

The next layer is whole disk encryption, which can be done with LUKS on Linux and, I believe, MacOS as well. Feel free to set this to an actual phrase with numbers, letters, and special characters in it, as long as it is as long as possible. Doing this can prevent most attacks that don’t have a lot of time, like a random stranger in the library. Joe is watching.

Your final step should be encrypting sensitive files and folders themselves with whatever your system can use. Avoid using programs like 7Zip and instead, use things like OpenSSL or dedicated tools that are well-vetted in the encryption community. Use PGP encryption to communicate via email and use Signal to prevent someone from reading the message logs.

My next chunk of advice is to encrypt a flash drive with sensitive data and carry it on you at all times, maybe put it on your keychain. This makes it much more difficult for someone to decrypt your data because they won’t likely have physical access to it. The creepy stalker behind you Chrome tried to warn you about won’t be able to see your private messages.

While you’re at it, create a profile in your web browser and store it on that flash drive. Save your passwords in your browser with that profile and it will protect your browser sessions from attack.

If anyone has any tips they’d like to contribute, feel free to share them below in the comments, I’ll be replying to any advice and I may add it to this article if it’s really good advice.


Source link