Install

In this chapter, we will install Kubeflow on Amazon EKS cluster. If you don’t have an EKS cluster, please follow instructions from getting started guide and then launch your EKS cluster using eksctl chapter

Increase cluster size

We need more resources for completing the Kubeflow chapter of the EKS Workshop. First, we’ll increase the size of our cluster to 6 nodes

export NODEGROUP_NAME=$(eksctl get nodegroups --cluster eksworkshop-eksctl -o json | jq -r '.[0].Name')
eksctl scale nodegroup --cluster eksworkshop-eksctl --name $NODEGROUP_NAME --nodes 6 --nodes-max 6

Scaling the nodegroup will take 2 - 3 minutes.

Install Kubeflow on Amazon EKS

curl --silent --location "https://github.com/kubeflow/kfctl/releases/download/v1.0.1/kfctl_v1.0.1-0-gf3edb9b_linux.tar.gz" | tar xz -C /tmp
sudo mv -v /tmp/kfctl /usr/local/bin

Setup your configuration

Next step is to export environment variables needed for Kubeflow install.

We chose default kfctl configuration file for simplicity of workshop experience. However, we recommend to install Cognito configuration and add authentication and SSL (via ACM) for production. For additional steps needed to enable Cognito, please follow Kubeflow documentation

cat << EoF > kf-install.sh
export AWS_CLUSTER_NAME=eksworkshop-eksctl
export KF_NAME=\${AWS_CLUSTER_NAME}

export BASE_DIR=${HOME}/environment
export KF_DIR=\${BASE_DIR}/\${KF_NAME}

# export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_aws_cognito.v1.0.1.yaml"
export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.0-branch/kfdef/kfctl_aws.v1.0.1.yaml"

export CONFIG_FILE=\${KF_DIR}/kfctl_aws.yaml
EoF

source kf-install.sh

Create Kubeflow setup directory

mkdir -p ${KF_DIR}
cd ${KF_DIR}

Download configuration file

wget -O kfctl_aws.yaml $CONFIG_URI

We will use IAM Roles for Service Account in our configuration. IAM Roles for Service Account offers fine grained access control so that when Kubeflow interacts with AWS resources (such as ALB creation), it will use roles that are pre-defined by kfctl. kfctl will setup OIDC Identity Provider for your EKS cluster and create two IAM roles (kf-admin-${AWS_CLUSTER_NAME} and kf-user-${AWS_CLUSTER_NAME}) in your account. kfctl will then build trust relationship between OIDC endpoint and Kubernetes Service Accounts (SA) so that only SA can perform actions that are defined in the IAM role. Because we are using this feature, we will disable using IAM roles defined at the Worker nodes. In addition, we will replace EKS Cluster Name and AWS Region in your $(CONFIG_FILE).

sed -i '/region: us-west-2/ a \      enablePodIamPolicy: true' ${CONFIG_FILE}

sed -i -e 's/kubeflow-aws/'"$AWS_CLUSTER_NAME"'/' ${CONFIG_FILE}
sed -i "s@us-west-2@$AWS_REGION@" ${CONFIG_FILE}

sed -i "s@roles:@#roles:@" ${CONFIG_FILE}
sed -i "s@- eksctl-eksworkshop-eksctl-nodegroup-ng-a2-NodeInstanceRole-xxxxxxx@#- eksctl-eksworkshop-eksctl-nodegroup-ng-a2-NodeInstanceRole-xxxxxxx@" ${CONFIG_FILE}

Until https://github.com/kubeflow/kubeflow/issues/3827 is fixed, install aws-iam-authenticator

curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.15.10/2020-02-22/bin/linux/amd64/aws-iam-authenticator
chmod +x aws-iam-authenticator
sudo mv aws-iam-authenticator /usr/local/bin

Deploy Kubeflow

Apply configuration and deploy Kubeflow on your cluster:

cd ${KF_DIR}
kfctl apply -V -f ${CONFIG_FILE}

Run below command to check the status

kubectl -n kubeflow get all

Installing Kubeflow and its toolset may take 2 - 3 minutes. Few pods may initially give Error or CrashLoopBackOff status. Give it some time, they will auto-heal and will come to Running state

Expand here to see the output